[ejabberd] Semi-frequent lockup / "crash" in random nodes in ejabberd cluster

Armando Di Cianno armando.dicianno at gmail.com
Tue May 31 21:35:47 MSD 2011


I should probably add that ejabberdctl.cfg looks like this:

<<<<<
POLL=true
SMP=auto
ERL_MAX_PORTS=338000
ERL_PROCESSES=2720000
ERL_MAX_ETS_TABLES=6144
>>>>>

__armando


On Tue, May 31, 2011 at 11:18 AM, Armando Di Cianno
<armando.dicianno at gmail.com> wrote:
> I'm having an odd case of freezing / "crashing" on seemingly random
> nodes in my 10 machine ejabberd cluster.
>
> Symptoms:
>
>  * Seemingly with no periodicity, one of the nodes in the cluster will
> freeze (the erlang process inside the beam.smp, not the VM)
>  * It doesn't crash (ergo the earlier "crash" scare-quotes), so
> there's no good erl crash dump file to look at
>  * The OS beam.smp process is still running, so there are some crash
> logs coming from our monitoring agent as it tries to restart ejabberd
> and *that* process crashes, since a node is already using that name
>  * The few times I've been right at my workstation, and able to log in
> and manually check what's going on, `ejabberdctl status` fails to run
> manually / connect to the ejabberd process
>
> Notes
>  * 10 machines?! Yeah ... this is running on a managed VM service,
> where we control everything about the guest VMs, but nothing about the
> host machines. Suffice to say, our web services don't seem to exhibit
> related issues, and I believe I have nearly exhausted all routes to
> put blame on the fact that we're using VMs (although, frankly, I'm
> still suspect).
>  * The machines seem to be over-provisioned for RAM, running ~4GiB
> each -- our stats aggregator shows that ejabbered rarely takes up
>>1.8GiB of RAM per node
>  * Average user count: ~4k
>  * Average burst user count / peak periods: ~10k
>  * Earlier tests showed we could handle ~40-50k users with that many VMs/nodes
>  * We had async threads on at one point, e.g. +A 32, but have turned them off
>  * SMP support is on
>  * kernel polling is on
>  * We do utilize `ejabberdctl reopen_log` as part of our log rotation
>  * I have written our own ejabberd modules for authentication -
> however, I'm fairly confident in them -- because our use case is
> *extremely* specialzed, most of the required auth functions return a
> happy default, and only the main "is the password valid" function does
> any work.
>  * The monitoring agent uses both the pid file and `ejabberdctl
> status` - status runs once every minute
>  * `ejabberdctl connected_users_number` also runs periodically - about
> once every 5 minutes
>  * We do not store users in mnesia nor mysql/etc, since we have an
> specialized method for authorizing users
>  * We only use mnesia for whatever mnesia needs to do internally
>  * Very few modules are turned on globally:
> {modules,
>  [
>  {mod_adhoc,    []},
>  {mod_caps,     []},
>  {mod_configure,[]},
>  {mod_disco,    []},
>  {mod_ping,     []},
>  {mod_privacy,  []},
>  {mod_filter,   []}
>  ]}.
>  * A few more are turned on or specialized per-vhost:
>  {{add, modules},
>   [{mod_ping,[{send_pings, true},
>               {ping_interval, 10},
>               {timeout_action, kill} ]},
>    {mod_muc,[{host, "lobbies. at HOST@"},
>              {access, 'fakename_muc'},
>              {access_create, 'fakename_muc'},
>              {access_admin, 'fakename_muc_admin'},
>              {access_persistent, 'fakename_muc_admin'},
>              {history_size, 0},
>              {max_users, 100},
>              {max_users_admin_threshold, 2},
>              {max_user_conferences, 1},
>              {max_room_id, 128},
>              {max_room_name, 256},
>              {max_room_desc, 1024},
>              {max_rooms_number, 99},
>              {default_room_options,[{allow_change_subj, false},
>                                     {allow_private_messages, false},
>                                     {allow_visitor_nickchange, false},
>                                     {public, true},
>                                     {public_list, true},
>                                     {allow_query_users, true},
>                                     {anonymous, false},
>                                     {logging, false},
>                                     {members_by_default, true}
>                                     ]}]}]
>  }
>
>
> Any pointers, advice, avenues to research, or points of obvious
> stupidity would be greatly appreciated.
>
> Thanks,
> __armando
>


More information about the ejabberd mailing list