[ejabberd] ejabberd and amnesia crashes

Brian Cully bcully at gmail.com
Tue Feb 9 18:01:16 MSK 2010

On 9-Feb-2010, at 07:58, Fabio Ricci wrote:

> At each crash we have a mnesia coredump
> the reason why is crashing is in the ejabberd.log file:
> =ERROR REPORT==== 2010-02-04 21:44:23 ===
> Mnesia('ejabberd at jabbr001'): ** ERROR ** (core dumped to file: "/data/ejabberd/bin/MnesiaCore.ejabberd at jabbr001_1265_316263_657059")
>  ** FATAL ** Cannot open log file "/data/ejabberd/database/ejabberd at jabbr001/PREVIOUS.LOG": {file_error,
>                                                                                                                     "/data/ejabberd/database/ejabberd at jabbr001/PREVIOUS.LOG",
>                                                                                                                     system_limit}

	I have seen these many times when using BOSH or other http modules. I submitted a fix for this in early December, which was applied to the 2.1.x branch (see: https://support.process-one.net/browse/EJAB-1119).

> We have noticed a lot of this messages from the last maintenance:
> =ERROR REPORT==== 2010-02-09 13:11:20 ===
> Mnesia('ejabberd at jabbr001'): ** WARNING ** Mnesia is overloaded: {dump_log,
>                                                                                          write_threshold}

	I'm not sure how dangerous this is. I see the issue fairly regularly, but it doesn't seem to cause me any harm.

> My idea is that we are reaching some system_limit internal to ejabber (there are no limits on the host itself, memory cpu and i/o are ok)

	You're reaching the max port limit. If you're interested, I have a perl script which I used to track down the BOSH issue. All it does is watch the ejabberd log file and whenever it sees "system_limit" it connected to the node and halted it in such a way that I would get an Erlang runtime dump. Using that dump you can see exactly which process is consuming too many ports.

> Talking about limit we can also see errors like this just before the crash happen:
> =ERROR REPORT==== 2010-02-08 20:38:12 ===
> ** Generic server <0.24022.18> terminating 
> ** Last message in was {become_controller,<0.24023.18>}
> ** When Server state == {state,
>                             {tlssock,#Port<0.4195053>,#Port<0.4196250>},
>                             tls,none,undefined,65536,
>                             {xml_stream_state,undefined,#Port<0.4208404>,[],
>                                 0,65536},
>                             infinity}
> ** Reason for termination == 
> ** {system_limit,[{erlang,open_port,[{spawn,expat_erl},[binary]]},
>                   {xml_stream,new,2},
>                   {ejabberd_receiver,handle_call,3},
>                   {gen_server,handle_msg,5},
>                   {proc_lib,init_p,5}]}
> This may say that the limit is in the erlang open_port.
> If we look at our configuration we have 

	See above.

> # ERL_MAX_PORTS: Maximum number of simultaneously open Erlang ports
> #
> # ejabberd consumes two or three ports for every connection, either 
> # from a client or from another Jabber server. So take this into
> # account when setting this limit.
> #
> # Default: 32000
> # Maximum: 268435456
> #
> This is set to the minimum. I think that raising this parameter can be worth a try.

	Unless you have around 32k connected users then you're probably running into a leak. If you have that many users (or more) then raising the number of open ports will help. In general, you need ~1 port per TLS connection, in addition to a few dozen as a baseline.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20100209/1df09593/attachment.html>

More information about the ejabberd mailing list