[ejabberd] ejabberd mod_muc bottleneck

Karthik Kailash karthik at socialvisioninc.com
Thu Jul 1 06:40:25 MSD 2010


I've been doing more load testing, this is what I've learned.

The bottlenecking in mod_muc is not actually a flaw in the mod_muc process
itself, I was wrong to assume that, and the fprof analysis was misleading
because fprof itself slowed the system down so much.  Using timer:tc I
profiled mod_muc:do_route and found that its execution time is always under
50 us, so it can handle up to 20,000 messages per second.  I verified this
by setting the priority of the mod_muc process to high, and saw that its
queue stayed empty.  It's safe to say that mod_muc.erl isn't the bottleneck.

So, I turned to profiling mod_muc_room execution using timer:tc.  Each
message is handled in the normal_state function with the "message" guard,
which on average takes around *30,000-50,000 us (30-50 ms).  *With four
cores, even if every core spends 100% of its time in mod_muc_room to handle
messages, that's still only 80-120 messages per second.  It's easy to see
that this is where the bottleneck is!

I did some digging to find out why mod_muc_room:normal_state takes so long
to handle a groupchat message.  Nearly 100% of its time goes in
mod_muc_room:process_groupchat_message.  In process_groupchat_message,
nearly 100% of the time goes in using lists:foreach to call
ejabberd_router:route to route the message to the JID of each user in the
room.  I profiled the ejabberd_router:route calls using timer:tc as well,
and found something interesting.  *Around 80-90% of the calls to
ejabberd_router:route from mod_muc_room:process_groupchat_message are fast
(< 100 microseconds).  The remaining 10-20% are excruciatingly slow (>
30,000 us).*  Thus, the slowness of mod_muc_room:process_groupchat_message
is usually due to one of its calls to ejabberd_router:route taking a long
time.

The execution flow of ejabberd:router:route to handle a message being sent
out from a muc room to a user's JID is the following:  ejabberd_router:route
-> ejabberd_router:do_route -> ejabberd_local:route ->
ejabberd_local:do_route -> ejabberd_sm:route -> ejabberd_sm:do_route.

*Finally I found the source of ejabberd_router:route's occasional extreme
slowness is a single mnesia call in ejabberd_sm:do_route.*  The line is:
mnesia:dirty_index_read(session, USR,  #session.usr).  Sampling using
timer:tc every few thousand messages, I found that 80-90% of the time, this
call is fast (< 100 us), but the remaining 10-20% it takes tens of ms to
complete.  This is exactly the same as the ejabberd_router:route behavior.

So I'm pretty certain I've narrowed the bottleneck problem down to this
line.  However, the session table is in RAM and there is an index on the usr
field, so I can't figure out why this problem is happening!

Any thoughts or ideas?

Cheers,
Karthik
-- 
Karthik Kailash
SocialVision Inc.
"Online Television Becomes a Social Experience"
www.socialvisioninc.com
Phone: (408) 768-7704
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20100630/c5ea72b4/attachment.html>


More information about the ejabberd mailing list