[ejabberd] ejabberd mod_muc bottleneck
eric at ohmforce.com
Sun Jun 20 20:21:36 MSD 2010
Several notes :
- having a single process receiving events and delegating to another process is an erlang design pattern. It's a bit different here, because it's routing XMPP stanzas and not just erlang terms.
- however this single process should only route messages to achieve good throughput. In the case of mod_muc it's also doing synchronous stuff like creating the rooms, so I can understand it might get behind in processing its message queue.
I've looked at the profile output file. I don't see weird things.
I note that ACL matching is in the critical path and could be optimized by using Bob Ippolito's mochiglobal (http://code.google.com/p/mochiweb/source/browse/trunk/src/mochiglobal.erl?r=15). It's a pretty neat piece of code that could replace mnesia in ejabberd_config (and in ejabberd_router if necessary). Writes are pretty expensive, but it's rarely done and usually on startup.
I might write a patch.
Can you link your tsung scenario somewhere ?
Le 18 juin 2010 à 23:41, Karthik Kailash a écrit :
> I am load testing ejabberd MUC. I have a Tsung load test that generates 2500 users, puts them in 250 rooms, and has each one send 1 msg/s (for 2500msg/s overall being sent to ejabberd MUC).
> I’m running ejabberd 2.1.4 on Erlang 13B03 on a 4-core Ubuntu Server 10.04 LTS VM w/ 16GB of RAM. This is my ejabberdctl.cfg: http://pastebin.com/hs6Fijri and ejabberd.cfg:http://pastebin.com/deQ495gN
> As the test is running, I notice that the message queue for the mod_muc process grows and grows. From a crash dump I observed the message queue for mod_muc is bottlenecking the whole system, it’s the only process with a sizable message queue (it’s enormous, well over 100,000 queued messages). Using process_info in a remote shell it is possible to see the queue length growing very rapidly in real time (~1000msg/s).
> I used fprof to do some profiling on the mod_muc process, see the output file here: http://pastebin.com/QXchAKtM. >From the log mod_muc:do_route was called around 26,000 times over the course of the 38s of data collection. This is ~1.4ms per packet, which translates to millions of clock cycles, highly inefficient! It seems that nearly all the time is being used in “suspend” calls (if you add up the ACC time for suspend, it becomes 100% of the overall measured time). Indeed, when I call process_info on the mod_muc process inside a remote shell, its status seems to always be “runnable”. However, my fprof can only use wallclock time, not the high resolution CPU time, so I’m not sure how accurate all of this is.
> It seems odd that a 4-core box can’t even handle 2,500 messages/s of incoming traffic. It seems like the bottleneck is a single process, which if I understand correctly can only run on 1 core. Maybe if the process could run at higher priority? Or if the routing in mod_muc can be distributed to multiple processes?
> Can anyone shed any light on what is going on?
> Karthik Kailash | Product
> SocialVision, Online Television Becomes a Social Experience
> CELL • 408.768.7704 | WEB • www.socialvisioninc.com | FACEBOOK • facebook.com/socialvision | TWITTER • twitter.com/socialvision
> ejabberd mailing list
> ejabberd at jabber.ru
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ejabberd