[ejabberd] ejabberd leaking and crashing

Peter Viskup skupko.sk at gmail.com
Fri Dec 11 12:27:29 MSK 2009


Hi Jan,
your server crashed due to ejabberd's application memory pages were
allocated out of real memory - in swap area.
I did some tests before we moved our server to production and it does not
make sense to have swap area available for ejabberd server. Any time there
was allocation outside real memory the server crashed.
The root cause was the erlang (ejabberd+mnesia) process had to wait for
pages allocated on disks (swap area) and was not able to deliver new
messages - these messages were cached in memory and of course on disks
pages...and after that the server must crash...

My opinion is to run ejabberd on separate server (virtual or real machine)
and without swap. That is the way how our jabber.sk server is configured
;-).
Swap does not make sense for real-time applications like ejabberd.

Feel free to use ejabberd Munin plugin for monitoring how much memory your
erlang/ejabberd process needs.
Do not forget:
 - ejabberd is real time application
 - it does a lot of memory allocation/deallocation every second
If you will have a look on stats.jabber.sk page on ejabberd_memory
statistics you will see that there are big jumps in ejabebrd's memory usage.
And this server has 'only' 200+ concurrent users.
It is because mnesia database is working with tables in memory (not in
temporary filesystem like traditional SQL servers - e.g. MySQL,
PostgreSQL...) - selects and joins consume a lot of memory for temporary
tables (I hope I am correct with this 'temporary tables' conclusion).

Best regards,
Peter Viskup

On Thu, Dec 10, 2009 at 11:21 PM, Jan Koum <jan.koum at gmail.com> wrote:

> hey guys,
>
> i am not sure if we are reaching the limits of ejabberd can do, but
> hopefully not..
>
> we have about 5,000 connected uses at a time and about 500,000 total
> registered uses.
>
> ejabberd has slowly been growing its memory usage until twice in the past
> 12 hours it crashed with:
>
> swap_pager_getswapspace(2): failed
> swap_pager_getswapspace(4): failed
> Dec 10 08:13:14 im101 last message repeated 37 times
> pid 96140 (beam), uid 1000, was killed: out of swap space
> Dec 10 08:13:15 im101 kernel: pid 96140 (beam), uid 1000, was killed: out
> of swap space
>
> machine is FreeBSD 7.2 with 8GB of RAM, ejabberd is 2.1.0-RC2
>
> top says:
>
> Mem: 6318M Active, 529M Inact, 710M Wired, 238M Cache, 399M Buf, 126M Free
> Swap: 2048M Total, 1889M Used, 159M Free, 92% Inuse
>
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> 17352 jkb    2  44    0  7363M  6370M ucond  0   0:06 14.26% [beam]
>
> our mnesia info is:
>
> ---> Active tables <---
> local_config   : with 4        records occupying 760      words of mem
> config         : with 16       records occupying 617      words of mem
> reg_users_counter: with 1        records occupying 314      words of mem
> user_caps_resources: with 0        records occupying 275      words of mem
> user_caps      : with 1142     records occupying 41025    words of mem
> privacy        : with 0        records occupying 275      words of mem
> passwd         : with 692806   records occupying 89096024 words of mem
> roster         : with 1479655  records occupying 349901545 words of mem
> last_activity  : with 688311   records occupying 45172333 words of mem
> roster_version : with 687260   records occupying 99385782 words of mem
> offline_msg    : with 195294   records occupying 174434167 bytes on disc
> route          : with 3        records occupying 431      words of mem
> acl            : with 2        records occupying 348      words of mem
> s2s            : with 0        records occupying 275      words of mem
> vcard          : with 5        records occupying 297858   bytes on disc
> captcha        : with 0        records occupying 275      words of mem
> caps_features  : with 22       records occupying 2195     words of mem
> session_counter: with 1        records occupying 314      words of mem
> vcard_search   : with 5        records occupying 1085     words of mem
> schema         : with 26       records occupying 3287     words of mem
> session        : with 4810     records occupying 870414   words of mem
> private_storage: with 0        records occupying 5752     bytes on disc
> muc_room       : with 0        records occupying 275      words of mem
> iq_response    : with 133      records occupying 28454    words of mem
> muc_registered : with 0        records occupying 275      words of mem
> muc_online_room: with 0        records occupying 275      words of mem
> ===> System info in version "4.4.7", debug level = none <===
> opt_disc. Directory "/home/jkb/ejabberd/var/lib/ejabberd" is used.
> use fallback at restart = false
> running db nodes   = [ejabberd at localhost]
> stopped db nodes   = []
> master node tables = []
> remote             = []
> ram_copies         =
> [captcha,iq_response,muc_online_room,reg_users_counter,
>                       route,s2s,session,session_counter,user_caps,
>                       user_caps_resources]
> disc_copies        = [acl,caps_features,config,last_activity,local_config,
>                       muc_registered,muc_room,passwd,privacy,roster,
>                       roster_version,schema,vcard_search]
> disc_only_copies   = [offline_msg,private_storage,vcard]
> [{ejabberd at localhost,disc_copies}] = [muc_registered,muc_room,schema,
>                                       vcard_search,caps_features,acl,
>                                       roster_version,last_activity,roster,
>                                       passwd,local_config,privacy,config]
> [{ejabberd at localhost,disc_only_copies}] =
> [private_storage,vcard,offline_msg]
> [{ejabberd at localhost,ram_copies}] = [muc_online_room,iq_response,session,
>                                      session_counter,captcha,s2s,route,
>                                      user_caps,user_caps_resources,
>                                      reg_users_counter]
> 1498735 transactions committed, 99 aborted, 60 restarted, 896420 logged to
> disc
> 0 held locks, 0 in queue; 0 local transactions, 0 remote
> 0 transactions waits for other nodes: []
>
>
> any help on this would be appreciated as this is our production server...
>
> -- yan
>
> _______________________________________________
> ejabberd mailing list
> ejabberd at jabber.ru
> http://lists.jabber.ru/mailman/listinfo/ejabberd
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20091211/6f96f3e9/attachment.html>


More information about the ejabberd mailing list