[ejabberd] Need help with highly available clustered ejabberd setup for 400-500k connections...

Jan Koum jan.koum at gmail.com
Mon Dec 21 21:45:13 MSK 2009


On Mon, Dec 14, 2009 at 5:42 PM, Matt Wise <wise at wiredgeek.net> wrote:

> I'm setting up a rather large-scale XMPP installation at my office... we
> expect to handle a few hundred thousand connections at any given time. We do
> not expect very much traffic, but a very large # of simultaneous (but 99%
> idle) connections.
>

we are doing something very similar, so let me share some of my thoughts.


> Our current plan is to deploy in multiple world-wide datacenters "XMPP
> Racks"... each rack would have its own domain name (ie us.xmpp.domain.com,
> eu.xmpp.domain.com, etc). Each rack would then talk to the others
> world-wide using S2S. At each rack we're planning a mirrored pair of
> Microsoft AD LDS nodes to handle user accounts (which all sync back to a
> master somewhere), and a clustered pair of EJabber nodes. I'd expect each
> ejabber node to handle up to 100k connections or so.
>

our goal is to have 100k connections per physical machine.


> We've done our preliminary testing with ejabberd + mnesia + internal
> authentication, but ran into some scaling issues when you have 400k
> registered XMPP accounts in the database locally. I've got a few questions I
> could really use some help with...
>

what scaling issues you have ran into?  we have over 700k registered xmpp
accounts right now in our mnesia database and so far the only two issues we
have seen are:

1. slow startup times
2. constant and almost hourly "mnesia is overloaded" warnings which, from
the research i have done, appear to be harmless:

=ERROR REPORT==== 2009-12-21 10:18:55 ===
Mnesia(ejabberd at localhost): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}



> 1) In an ideal world, each "xmpp node" would be 'dumb'. At any point I'd
> like to be able to completely wipe the node and rebuild it from scratch. I'm
> having a hard time seeing how to do this when MNesia is used in the back-end
> for clustering, since it seems to require some by-hand steps for the cluster
> setup.
>   1a) one step further... what if we wanted to completely rebuild the
> entire cluster -- all nodes??
>   1b) actually just asking a conceptual question here... in a 'simple'
> setup you use mnesia as the database and authentication system on all nodes.
> when you pull authentication out (say , LDAP), what is mnesia used for?
> Anything beyond current-state? Does it need to be backed up/etc?
>

while you could move authentication out of mnesia into LDAP, mnesia will
still be used for other stuff.  here is an example of mnesia tables in our
use:

ram_copies         = [captcha,iq_response,muc_online_room,reg_users_counter,
                      route,s2s,session,session_counter,user_caps,
                      user_caps_resources]
disc_copies        = [acl,caps_features,config,last_activity,local_config,
                      muc_registered,muc_room,passwd,privacy,schema,
                      vcard_search]
disc_only_copies   =
[offline_msg,private_storage,roster,roster_version,vcard]

for example, offline_msg table stores offline mesages and if you don't want
mnesia table for it, you could simply disable offline messages in
ejabbed.cfg or use mod_offline_odbc.


> 2) When you use a clustered setup of XMPP nodes, how do you balance the
> connections? Do you use simple DNS round robin or do you need a real load
> balancer in front?
>

we plan to do DNS round robin.  it is easy to do load balancer for HTTP
connections because HTTP connections are short lived.  you can have a web
server farm handle millions of connections during a day, only thousands of
http requests per second would be taking place at any given second.  since
xmpp connections are persistent, it will be difficult to find a load
balancer to handle 500K long lived persistent tcp connections.  we plan to
do simple DNS round robin and have monitoring system remove/add A records
into the round robin based on xmpp node server health.



>   2a) does anything prevent you from doing 4, or 5, or 10 nodes -- given
> enough local bandwidth on the network to manage the cluster?
>

shouldn't.  in fact, we are starting right now to work on a cluster design
and build where we would have two masters that would have complete copy of
each mnesia data on local drives and 4 slaves which would only handle tcp
connections, store nothing on local disk and just cache mnesia in RAM from
the masters.


> 3) We ran into some MNesia resource limits when we hit about 75k
> connections split across two servers... when we pull the authentication out
> of the database and put it into an LDAP cluster, should that improve?
>

what resource limits?  what warning/error did you get?  was it from OS or
mnesia itself?

-- yan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20091221/056f663b/attachment.html>


More information about the ejabberd mailing list