[ejabberd] Need help with highly available clustered ejabberd setup for 400-500k connections...
wise at wiredgeek.net
Tue Dec 15 04:42:58 MSK 2009
I'm setting up a rather large-scale XMPP installation at my office... we expect to handle a few hundred thousand connections at any given time. We do not expect very much traffic, but a very large # of simultaneous (but 99% idle) connections.
Our current plan is to deploy in multiple world-wide datacenters "XMPP Racks"... each rack would have its own domain name (ie us.xmpp.domain.com, eu.xmpp.domain.com, etc). Each rack would then talk to the others world-wide using S2S. At each rack we're planning a mirrored pair of Microsoft AD LDS nodes to handle user accounts (which all sync back to a master somewhere), and a clustered pair of EJabber nodes. I'd expect each ejabber node to handle up to 100k connections or so.
We've done our preliminary testing with ejabberd + mnesia + internal authentication, but ran into some scaling issues when you have 400k registered XMPP accounts in the database locally. I've got a few questions I could really use some help with...
1) In an ideal world, each "xmpp node" would be 'dumb'. At any point I'd like to be able to completely wipe the node and rebuild it from scratch. I'm having a hard time seeing how to do this when MNesia is used in the back-end for clustering, since it seems to require some by-hand steps for the cluster setup.
1a) one step further... what if we wanted to completely rebuild the entire cluster -- all nodes??
1b) actually just asking a conceptual question here... in a 'simple' setup you use mnesia as the database and authentication system on all nodes. when you pull authentication out (say , LDAP), what is mnesia used for? Anything beyond current-state? Does it need to be backed up/etc?
2) When you use a clustered setup of XMPP nodes, how do you balance the connections? Do you use simple DNS round robin or do you need a real load balancer in front?
2a) does anything prevent you from doing 4, or 5, or 10 nodes -- given enough local bandwidth on the network to manage the cluster?
3) We ran into some MNesia resource limits when we hit about 75k connections split across two servers... when we pull the authentication out of the database and put it into an LDAP cluster, should that improve?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ejabberd