[ejabberd] PEP being memory hungry?

Yao Ko koyao at raptr.com
Tue Sep 21 22:28:39 MSD 2010


On Mon, Sep 20, 2010 at 11:07 PM, Evgeniy Khramtsov <xramtsov at gmail.com> wrote:
> 21.09.2010 09:02, Yao Ko wrote:
>>
>> These are two crashes from two different nodes on 9/13 and 9/16
>> respectively.
>> http://clientupdater.raptr.com/koyao/erl_crash_20100913-210315.dump.gz
>>
>
> It is hard to say about the exact reason for the crash, because there were
> several processes consuming memory:
> 1) one c2s process: you should return max_fsm_queue option back. Set it to
> 10000 globally and 1000 for ejabberd_c2s listener.
> 2) one process holding iq stanzas; it is hard to say what was the module, so
> try to set {iqdisc, {queues, 50}} for every module which support that
> option.
> 3) the most problematic part: ejabberd_mod_pubsub_loop_raptr.com with 6329
> messages in its queue. I have a little knowledge of the pubsub code, so
> maybe Christophe or Karim have any ideas.

Thanks for the tips!  I'll enable those settings in the servers.

Based on your log analysis, I wrote a little Perl script [1] to report
on the logs:

$ /tmp/parse_erl_crash.pl < erl_crash_20100913-210315.dump
 mnesia_tm has 381 messages
 ejabberd_receiver_sup has 34872 items in the linked list
 ejabberd_c2s_sup has 34911 items in the linked list
 'ejabberd_mod_pubsub_loop_raptr.com' has 6329 messages

Is it normal for ejabberd_c2s_sup to have that many items in the
linked list?   Hopefully, the new max_fsm_queue settings would
mitigate this problem.

>> http://clientupdater.raptr.com/koyao/erl_crash_20100916-070055.dump.gz
>>
>
> On this node, the reason of crash was error_logger actually. You should have
> enormous ejabberd.log/sasl.log on this node. Some analysis would be great:
> what were those messages?

Argh... The log rotation was set to purge after five days, and it has
already purged the logs for 9/16 :-(  From memory, I think this node
died *after* the first node collapsed due to memory.  Probably it
couldn't handle a quick increase in number of c2s connections in such
a short period of time.  I've adjusted log rotation to keep 30 days
worth of logs.

I'll re-enable PEP in one of the nodes with the new max_fsm_queue and
iqdisc settings and see if it happens again.

Thanks,
Yao

[1] http://clientupdater.raptr.com/koyao/parse_erl_crash.pl


More information about the ejabberd mailing list