[ejabberd] Server allocating all available ram, then stops at 100% CPU

Peter Schwindt ejabberd at schwindt-net.de
Mon Nov 11 17:41:12 MSK 2013


Evgeniy,

On 11.11.2013 10:58, Evgeniy Khramtsov wrote:

>> =ERROR REPORT==== 2013-11-11 07:35:27 ===
>> E(<0.487.0>:gen_iq_handler:118) : {badarg,
>>
>> [{erlang,phash,[{1384,151727,368591},0],[]},
>>                                      {ejabberd_odbc_sup,get_random_pid,1,
>>                                       [{file,"ejabberd_odbc_sup.erl"},
>>                                        {line,108}]},
>>                                      {ejabberd_odbc,sql_call,2,
>>
>> [{file,"ejabberd_odbc.erl"},{line,124}]},
>>                                      {mod_roster,read_roster_version,3,
>>
>> [{file,"mod_roster.erl"},{line,201}]},
>>                                      {mod_roster,roster_version,2,
>>
>> [{file,"mod_roster.erl"},{line,180}]},
>>                                      {mod_roster,push_item,4,
>>
>> [{file,"mod_roster.erl"},{line,555}]},
>>                                      {mod_roster,process_item_set,3,
>>
>> [{file,"mod_roster.erl"},{line,481}]},
>>                                      {lists,foreach,2,
>>                                       [{file,"lists.erl"},{line,1262}]}]}
>>
>>
>> Any ideas? Is there another possibility to look for the reason of these
>> problems?
>
> The log message you provided might be a consequence, not a reason. Are
> there any other error reports in ejabberd.log/sasl.log/erlang.log?

I could offer something like this from erlang.log


=CRASH REPORT==== 11-Nov-2013::14:34:12 ===
   crasher:
     initial call: gen:init_it/6
     pid: <0.25081.30>
     registered_name: []
     exception exit: {process_limit,{max_queue,38992}}
       in function  p1_fsm:terminate/7 (p1_fsm.erl, line 694)
       in call from p1_fsm:loop/10 (p1_fsm.erl, line 398)
     ancestors: [ejabberd_s2s_out_sup,ejabberd_sup,<0.37.0>]
     messages: [terminate_if_waiting_before_retry]
     links: [<0.303.0>]
     dictionary: [{'$internal_queue_len',1}]
     trap_exit: true
     status: running
     heap_size: 2629425
     stack_size: 24
     reductions: 12235488
   neighbours:

=SUPERVISOR REPORT==== 11-Nov-2013::14:34:12 ===
      Supervisor: {local,ejabberd_s2s_out_sup}
      Context:    child_terminated
      Reason:     {process_limit,{max_queue,38992}}
      Offender:   [{pid,<0.25081.30>},
                   {name,undefined},
                   {mfargs,{ejabberd_s2s_out,start_link,undefined}},
                   {restart_type,temporary},
                   {shutdown,brutal_kill},
                   {child_type,worker}]

Always appearing as a couple, quite a lot of these in the last minutes 
before the server stops reacting.

Peter


More information about the ejabberd mailing list