[ejabberd] BOSH infinite loop followed by an ejabberd crash.

Max Kalika max.kalika+ejabberd at gmail.com
Thu Nov 18 03:55:55 MSK 2010


Hello List.

We're seeing some funky stuff with our BOSH testing.  We haven't been able to fully track down the cause yet, but at some point, ejabberd (2.1.5) starts spewing tons and tons of debugging logs (~500MB in a few minutes) with mostly this:

=INFO REPORT==== 2010-11-02 09:45:23 ===
D(<0.550.0>:ejabberd_http_bind:918) : OutPacket: [{xmlstreamstart,
                                                  "stream:stream",
                                                  [{"xml:lang","en"},
                                                   {"xmlns","jabber:client"},
                                                   {"xmlns:stream",
                                                    "http://etherx.jabber.org/streams"},
                                                   {"id","4224552436"},
                                                   {"from",
                                                    "ci-jabber.wopr.connectsolutions.com"}]}]


This is only for two or three users connected.  Eventually the server crashes with this reason:

Slogan: eheap_alloc: Cannot allocate 912262800 bytes of memory (of type "heap").  I analyzed the crash dump in the erlang crashdump_viewer and it looks like the error_logger process had it's mailbox filled up.  This makes sense since the disk on this machine is rather slow so it lagged in writing out the debug traces and the memory balloon burst.

What we suspect happens is that a client sends a bosh stream start request but doesn't wait for the reply and closes the socket.  Looking through the code, I see that in ejabberd_http_bind, prepare_statement/4 calls prepare_outpacket_response/4 which matches the second pattern of the function head:

%% Handle a new session along with its output payload
prepare_outpacket_response(#http_bind{id=Sid, wait=Wait, 
				      hold=Hold, to=To}=Sess,
			   Rid, OutPacket, true) ->

Here OutEls is set to [] as can be seen from the above log entry -- there's only one element in the list:

   case OutPacket of
	[{xmlstreamstart, _, OutAttrs} | Els] ->
...
	    OutEls =
		case Els of
		    [] ->
			[];

And eventually prepare_response/4 is called again and the whole cycle repeats:

	    case OutEls of 
		[] ->
		    prepare_response(Sess, Rid, OutPacket, true);


I haven't been able to trace http_get/2 in prepare_response/4 yet, so I am just bringing this up on the list for someone who is more intimate with this code.

Would anyone here have any ideas on how to track this down further?

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20101117/774ddd67/attachment.html>


More information about the ejabberd mailing list