[ejabberd] ejabberd leaking and crashing

Jan Koum jan.koum at gmail.com
Fri Dec 11 08:55:51 MSK 2009


(hi, just wanted to follow up my email)

so we figured today was as good of a day as any to learn mnesia and did the
following in attempt to fix this:

1. moved roster and roster_version to disc_only_copies

2. moved last_activity to disc_only_copies but quickly moved it back because
IO activity went through the roof (our users disconnect and connect 10-20
times a day)

at this point we restarted the ejabberd server and noticed HUGE ammount of
these errors in our log files:

=ERROR REPORT==== 2009-12-10 16:41:58 ===
E(<0.22144.0>:ejabberd_hooks:335) : {aborted,
                                     {badarg,
                                      [roster,
                                       {"JID","s.xxx.net"},
                                       3]}}
running hook: {resend_subscription_requests_hook,                  ["JID","
s.xxx.net"]}


since we this was a production server and we had to get it back up and
running and couldn't think of any better solution, we simply stopped
ejabberd, moved roster.DAT, roster_3.DAT, and roster_version.DAT out of the
way and restarted.  we also changed our cfg from:

{mod_roster,   [{versioning, true}, {store_current_id, true}]},

to

{mod_roster, []},

so our after everything is said and done, our questions are:

1. is 600K the max number of registered users ejabberd can handle on an 8GB
ram when roster is enabled?

2. ejabberd is now running fine and taking under 2G of memory, but as roster
regenerates itself, will we have this problem again and how do we avoid it?

3. why did we get those 'resend_subscription_requests_hook' errors after we
did "mnesia:change_table_copy_type(roster, node(), disc_only_copies)."?


we are running probably one of the most heavily (ab)used production ejabberd
server at this point, so if you think there are any special tweaks we should
be doing to handle huge number of registered users with rosters, please let
us know.  average roster size is 20-30

thanks,

-- yan


On Thu, Dec 10, 2009 at 2:21 PM, Jan Koum <jan.koum at gmail.com> wrote:

> hey guys,
>
> i am not sure if we are reaching the limits of ejabberd can do, but
> hopefully not..
>
> we have about 5,000 connected uses at a time and about 500,000 total
> registered uses.
>
> ejabberd has slowly been growing its memory usage until twice in the past
> 12 hours it crashed with:
>
> swap_pager_getswapspace(2): failed
> swap_pager_getswapspace(4): failed
> Dec 10 08:13:14 im101 last message repeated 37 times
> pid 96140 (beam), uid 1000, was killed: out of swap space
> Dec 10 08:13:15 im101 kernel: pid 96140 (beam), uid 1000, was killed: out
> of swap space
>
> machine is FreeBSD 7.2 with 8GB of RAM, ejabberd is 2.1.0-RC2
>
> top says:
>
> Mem: 6318M Active, 529M Inact, 710M Wired, 238M Cache, 399M Buf, 126M Free
> Swap: 2048M Total, 1889M Used, 159M Free, 92% Inuse
>
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> 17352 jkb    2  44    0  7363M  6370M ucond  0   0:06 14.26% [beam]
>
> our mnesia info is:
>
> ---> Active tables <---
> local_config   : with 4        records occupying 760      words of mem
> config         : with 16       records occupying 617      words of mem
> reg_users_counter: with 1        records occupying 314      words of mem
> user_caps_resources: with 0        records occupying 275      words of mem
> user_caps      : with 1142     records occupying 41025    words of mem
> privacy        : with 0        records occupying 275      words of mem
> passwd         : with 692806   records occupying 89096024 words of mem
> roster         : with 1479655  records occupying 349901545 words of mem
> last_activity  : with 688311   records occupying 45172333 words of mem
> roster_version : with 687260   records occupying 99385782 words of mem
> offline_msg    : with 195294   records occupying 174434167 bytes on disc
> route          : with 3        records occupying 431      words of mem
> acl            : with 2        records occupying 348      words of mem
> s2s            : with 0        records occupying 275      words of mem
> vcard          : with 5        records occupying 297858   bytes on disc
> captcha        : with 0        records occupying 275      words of mem
> caps_features  : with 22       records occupying 2195     words of mem
> session_counter: with 1        records occupying 314      words of mem
> vcard_search   : with 5        records occupying 1085     words of mem
> schema         : with 26       records occupying 3287     words of mem
> session        : with 4810     records occupying 870414   words of mem
> private_storage: with 0        records occupying 5752     bytes on disc
> muc_room       : with 0        records occupying 275      words of mem
> iq_response    : with 133      records occupying 28454    words of mem
> muc_registered : with 0        records occupying 275      words of mem
> muc_online_room: with 0        records occupying 275      words of mem
> ===> System info in version "4.4.7", debug level = none <===
> opt_disc. Directory "/home/jkb/ejabberd/var/lib/ejabberd" is used.
> use fallback at restart = false
> running db nodes   = [ejabberd at localhost]
> stopped db nodes   = []
> master node tables = []
> remote             = []
> ram_copies         =
> [captcha,iq_response,muc_online_room,reg_users_counter,
>                       route,s2s,session,session_counter,user_caps,
>                       user_caps_resources]
> disc_copies        = [acl,caps_features,config,last_activity,local_config,
>                       muc_registered,muc_room,passwd,privacy,roster,
>                       roster_version,schema,vcard_search]
> disc_only_copies   = [offline_msg,private_storage,vcard]
> [{ejabberd at localhost,disc_copies}] = [muc_registered,muc_room,schema,
>                                       vcard_search,caps_features,acl,
>                                       roster_version,last_activity,roster,
>                                       passwd,local_config,privacy,config]
> [{ejabberd at localhost,disc_only_copies}] =
> [private_storage,vcard,offline_msg]
> [{ejabberd at localhost,ram_copies}] = [muc_online_room,iq_response,session,
>                                      session_counter,captcha,s2s,route,
>                                      user_caps,user_caps_resources,
>                                      reg_users_counter]
> 1498735 transactions committed, 99 aborted, 60 restarted, 896420 logged to
> disc
> 0 held locks, 0 in queue; 0 local transactions, 0 remote
> 0 transactions waits for other nodes: []
>
>
> any help on this would be appreciated as this is our production server...
>
> -- yan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20091210/6fffe6f0/attachment.html>


More information about the ejabberd mailing list