[ejabberd] Jab_simul crashing ejabberd service

Bryan Barnes Bryan.Barnes at SEILLC.com
Mon Feb 6 22:20:50 MSK 2006


As far as I can tell it crashes even before it uses swap, I had one test
where it didn't use swap space at all.  Were you able to find any reason
for the memory usage to shoot up like that?

 

Bryan

 

________________________________

From: ejabberd-bounces at jabber.ru [mailto:ejabberd-bounces at jabber.ru] On
Behalf Of Staudinger, Ulrich
Sent: Monday, February 06, 2006 12:41 PM
To: ejabberd at jabber.ru
Subject: AW: [ejabberd] Jab_simul crashing ejabberd service

 

That basically matches my experience. 

 

Everything goes fine until the server starts to swap. So, ;-), try to
not bring it into swapping. 

 

We have had 20k users with 1 message per 1 user in 1 minute -> 20k
messages/minute on a 1GB machine for days. No problems so far. 

 

The restriction that i found is the memory consumption - it goes linear
up through the ceiling. With 1GB of Ram, 22k is the max for our linux
machine. .

 

 

Cheers,

Ulrich

 

 

________________________________

Von: ejabberd-bounces at jabber.ru [mailto:ejabberd-bounces at jabber.ru] Im
Auftrag von Bryan Barnes
Gesendet: Montag, 6. Februar 2006 19:36
An: ejabberd at jabber.ru
Betreff: [ejabberd] Jab_simul crashing ejabberd service

Hello,

 

I am running Ejabberd 1.0.0 with Erlang 10B-8. I am using Jab_simul to
benchmark the system, and am using
http://tkabber.jabber.ru/files/badlop/jab_simul.xml.chat60 as a baseline
for my testing. I have modified it for my server, and turned off
rostering. I can run the test successfully, and as I lower the message
frequency my performance degrades as I expected it to. However, I notice
that during these tests my server occasionally spikes in memory usage
and begins using swap space, then the jabber service refuses all further
connections.

 

The test will run with no errors for about an hour with message
frequency of 500 ms, then in the space of 5 minutes the memory usage
will spike from 400MB to 2GB and the swap space usage will jump from 0MB
to 1GB. Then all current connections are dumped and all further
connections are refused. Even if I restart the ejabberd service I am
unable to log in, and have to restart the server before I can connect
again.

 

I am running Gentoo 2.6.14-r5 with 2GB of memory and a 1GB NIC. I have
ejabberd starting as a service with the following command line:

 

ulimit -n 15000;/usr/local/bin/erl -pa /var/lib/ejabberd/ebin -sname
ejabberd -s ejabberd -env ERL_MAX_PORTS 5000 -env ERL_MAX_ETS_TABLES
20000 -ejabberd config \"/etc/ejabberd/ejabberd.cfg\" log-path
\"/var/log/ejabberd.log\" -sasl sasl_error_logger
\{file,\"var/log/ejabberd/sasl.log\"\} -mnesia dir
\"var/lib/ejabberd/spool\" +P 250000 +K true -detached

 

After troubleshooting:

 

No error messages appear on the ejabberd server, I checked the
ejabberd.log, sasl.log, and the server logs. All of the errors appeared
on the Jab_simul server.

 

I figured out that every time the server crashed the ejabberd server,
the jab_simul server had run out of disk space. I assumed this was
unrelated, but setup a cron job to delete the tmp log files. This did
make the error go away, and I was able to run a simulation for 70 hours
this weekend with no errors. I had 500 users with a message frequency of
100 ms using 160MB of memory.

 

I was able to recreate the error even with these settings by adding an
additional 300 users, bringing me to 800 total. The job runs for 5
minutes, then I start getting Kolejka za dluga, pakiet anulowany!
errors. Shortly after that I get POLLERR: Connection terminated and
POLLERR: Connection refused errors.  All of these errors occur on the
Jab_simul server.

 

Checking my ejabberd server, my beam service has become a zombie
process, and is still running, but refuses all connections. The memory
usage on the server spiked and then came back down after the beam
service crashed.

 

If I dial the message frequency back to 500 ms with 800 users, it runs.

 

I don't understand the interaction between ejabberd and jab_simul enough
to understand why this is happening, but I am concerned that the POLLERR
errors are causing my ejabberd server to crash. I was unable to find any
mention of this problem, has anything like it occurred before?

 

After even more troubleshooting:

 

I have discovered that if I restart the ejabberd service, then I am
still unable to connect using a Jabber client, but that if I restart the
server I am able to connect again.

 

If I leave the Jab_simul simulation running, it will reconnect to Jabber
after a server restart, but not a service restart.  For all of the test
data above I was using the default message in the .xml file.

 

Any assistance with this would be greatly appreciated.  Thank you.

 

 

Bryan Barnes 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jabber.ru/pipermail/ejabberd/attachments/20060206/09efe6b5/attachment.htm


More information about the ejabberd mailing list