[ejabberd] ejabberd crashed with emfile; how to diagnose

Daniel Dormont dan at greywallsoftware.com
Tue Apr 17 02:07:25 MSK 2012


Hello all,

Today my ejabberd crashed quite suddenly with the following error:

=ERROR REPORT==== 2012-04-16 15:59:31 ===
Mnesia('ejabberd at 10.0.0.100'): ** ERROR ** (could not write core file:
emfile)
 ** FATAL ** Cannot open log file "/var/lib/ejabberd/roster.DCL":
{file_error,

 "/var/lib/ejabberd/roster.DCL",
                                                                   emfile}

=ERROR REPORT==== 2012-04-16 15:59:32 ===
** Generic server ejabberd_mod_muc_sostest terminating
** Last message in was {mnesia_system_event,
                           {mnesia_down,'ejabberd at 10.0.0.100'}}
** When Server state == {state,"conference.sostest","sostest",
                               {muc,muc_admin,muc_admin,muc},
                               20,[],none}
** Reason for termination ==
** {badarg,[{ets,lookup,
                 [local_config,

{domain_balancing_component_number,"conference.sostest"}]},
            {ejabberd_config,get_local_option,1},
            {ejabberd_router,get_component_number,1},
            {ejabberd_router,unregister_route,1},
            {mod_muc,terminate,2},
            {gen_server,terminate,6},
            {proc_lib,init_p_do_apply,3}]}

=INFO REPORT==== 2012-04-16 15:59:32 ===
    application: mnesia
    exited: shutdown
    type: permanent

=ERROR REPORT==== 2012-04-16 15:59:32 ===
    application_master: shutdown_error
    ejabberd_app: {prep_stop,[[]]}
    error_info: {badarg,[{ets,lookup,[config,hosts]},
                         {ejabberd_config,get_global_option,1},
                         {ejabberd_app,stop_modules,0},
                         {ejabberd_app,prep_stop,1},
                         {application_master,prep_stop,2},
                         {application_master,loop_it,4}]}


Followed by a whole bunch of knock-on errors in various sessions,
connections and other processes due to being unable to read various mnesia
tables.

It's hard for me to guess what could have caused this. The node was running
in a two-node cluster on Linux, and should have had at most 20-30 actual
user sessions, and a couple of hundred mostly idle MUC processes, two
ejabberd_service connections. The MUCs should have been each running an
instance of mod_muc_log but that doesn't persist any open filehandles as
far as I can see.

So I have three questions:

1) Has anyone else encountered this crash type with ejabberd, and what was
the root cause?
2) Is there any other place I should look for more information about the
crash? I don't see anything that looks like a dump file in
/var/log/ejabberd, but I can look further.
3) The node restarted just fine, but is there any way I can monitor it to
see what the open files are looking like in a live manner so I can prevent
this in the future?

thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20120416/b795c878/attachment.html>


More information about the ejabberd mailing list