[ejabberd] ejabberd and amnesia crashes

Pablo Polvorin pablo.polvorin at process-one.net
Tue Feb 9 19:18:43 MSK 2010


Hi,

> In general, you need ~1 >port per TLS connection, in addition to a few dozen as a baseline.
keep in mind that you need to count sockets,  but also erlang ports
used for drivers like the expat parser,  tls,  zlib (each of these 1
per connection).  Situation with BOSH is worse as you said,  as each
client is probably using more than 1 http connection.


On 9 February 2010 12:01, Brian Cully <bcully at gmail.com> wrote:
> On 9-Feb-2010, at 07:58, Fabio Ricci wrote:
>
> At each crash we have a mnesia coredump
> the reason why is crashing is in the ejabberd.log file:
> =ERROR REPORT==== 2010-02-04 21:44:23 ===
> Mnesia('ejabberd at jabbr001'): ** ERROR ** (core dumped to file:
> "/data/ejabberd/bin/MnesiaCore.ejabberd at jabbr001_1265_316263_657059")
>  ** FATAL ** Cannot open log file
> "/data/ejabberd/database/ejabberd at jabbr001/PREVIOUS.LOG": {file_error,
>
> "/data/ejabberd/database/ejabberd at jabbr001/PREVIOUS.LOG",
>
> system_limit}
>
> I have seen these many times when using BOSH or other http modules. I
> submitted a fix for this in early December, which was applied to the 2.1.x
> branch (see: https://support.process-one.net/browse/EJAB-1119).
>
> We have noticed a lot of this messages from the last maintenance:
>
> =ERROR REPORT==== 2010-02-09 13:11:20 ===
> Mnesia('ejabberd at jabbr001'): ** WARNING ** Mnesia is overloaded: {dump_log,
>
> write_threshold}
>
> I'm not sure how dangerous this is. I see the issue fairly regularly, but it
> doesn't seem to cause me any harm.
>
> My idea is that we are reaching some system_limit internal to ejabber (there
> are no limits on the host itself, memory cpu and i/o are ok)
>
> You're reaching the max port limit. If you're interested, I have a perl
> script which I used to track down the BOSH issue. All it does is watch the
> ejabberd log file and whenever it sees "system_limit" it connected to the
> node and halted it in such a way that I would get an Erlang runtime dump.
> Using that dump you can see exactly which process is consuming too many
> ports.
>
> Talking about limit we can also see errors like this just before the crash
> happen:
> =ERROR REPORT==== 2010-02-08 20:38:12 ===
> ** Generic server <0.24022.18> terminating
> ** Last message in was {become_controller,<0.24023.18>}
> ** When Server state == {state,
>                             {tlssock,#Port<0.4195053>,#Port<0.4196250>},
>                             tls,none,undefined,65536,
>                             {xml_stream_state,undefined,#Port<0.4208404>,[],
>                                 0,65536},
>                             infinity}
> ** Reason for termination ==
> ** {system_limit,[{erlang,open_port,[{spawn,expat_erl},[binary]]},
>                   {xml_stream,new,2},
>                   {ejabberd_receiver,handle_call,3},
>                   {gen_server,handle_msg,5},
>                   {proc_lib,init_p,5}]}
>
> This may say that the limit is in the erlang open_port.
> If we look at our configuration we have
>
> See above.
>
> # ERL_MAX_PORTS: Maximum number of simultaneously open Erlang ports
> #
> # ejabberd consumes two or three ports for every connection, either
> # from a client or from another Jabber server. So take this into
> # account when setting this limit.
> #
> # Default: 32000
> # Maximum: 268435456
> #
> ERL_MAX_PORTS=32000
>
> This is set to the minimum. I think that raising this parameter can be worth
> a try.
>
> Unless you have around 32k connected users then you're probably running into
> a leak. If you have that many users (or more) then raising the number of
> open ports will help. In general, you need ~1 port per TLS connection, in
> addition to a few dozen as a baseline.
> -bjc
> _______________________________________________
> ejabberd mailing list
> ejabberd at jabber.ru
> http://lists.jabber.ru/mailman/listinfo/ejabberd
>
>



-- 
Pablo Polvorin
ProcessOne


More information about the ejabberd mailing list