[ejabberd] Best practices on ejabberd clusterization

norðurljósahviða nz at os.vu
Sat Sep 1 21:16:03 MSK 2018

I have a cluster of 4 nodes, that I was able to configure after lots of trial and error given the complete lack of documentation on the KB. I've previously offered to document the necessary steps to do so [which have been written in code in our aenigma ejabberd server automation project], and I'd be really interested in still doing so, please let me know if anyone feels it could be of help. Of course, this is dependent on first finding the answers to a few lingering questions, and a thorough peer review by an actual process-one dev.

The fact is that I have no idea as to how ejabberd clusterization works behind the scenes. This is the logical conclusion I've reached for now. Please note there are no load-balancers involved as I first need to understand how to configure the actual nodes, anything sitting in front of them will be an issue to be tackled later on.

Here it is:
ae00.domain.xyz [MUC configured as xc.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae01.domain.xyz [MUC configured as xc.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae02.domain.xyz [MUC configured as xc.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae03.domain.xyz [MUC configured as xc.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae.domain.xyz -> DNS roundrobin to all 4 nodes via 4 A and 4 AAAA records
xc.domain.xyz -> DNS roundrobin to all 4 nodes via 4 A and 4 AAAA records
This is different from all other services: mod_echo, mod_irc, mod_mix, mod_pubsub, and mod_uploads, that are all configured node-specifically:
ae00.domain.xyz [e.g.: mode_echo configured as xe00.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae01.domain.xyz [e.g.: mode_echo configured as xe01.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae02.domain.xyz [e.g.: mode_echo configured as xe02.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae03.domain.xyz [e.g.: mode_echo configured as xe03.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
xe00.domain.xyz -> CNAME to ae00.domain.xyz
xe01.domain.xyz -> CNAME to ae01.domain.xyz
xe02.domain.xyz -> CNAME to ae02.domain.xyz
xe03.domain.xyz -> CNAME to ae03.domain.xyz
This is because a chatroom must be the same globally [e.g.: groupchat at xc.domain.xyz] and cannot be fragmented into node-specific MUC IDs [like: groupchat at xc01.domain.xyz].

There are some issues with this in terms of MUC, but other than that everything seems to work great for now. I have however noticed that sometimes a client will not use the HTTP_UPLOADS server specified by the node it's connected to [such as node 03 if it's connected to node 03, as per the config above, whereby if the client is connected to ae03 then its ejabberd.yml config file will tell it that the HTTP_UPLOADS server to use is xu03, which is in fact the same machine from both a DNS and configuration standpoint], as if ejabberd was load-balancing HTTP_UPLOADS behind the scenes somehow [e.g.: the client uses xu02 even though it's connected to ae03, but only if xu02 is online, otherwise it "magically" doesn't use it].

I'd really love to have some insight from someone who actually knows how ejabberd works under the hood, behind the scenes, and understand how the configuration must be performed for an optimal clusterization, with load-balancing in mind, as of course we cannot rely forever on a round-robin to make our service highly available, but instead we need to ensure that all clients always reach online nodes every single time.

Thank you very much in advance for your help and thank you as always for the great work.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20180901/a5f5dac8/attachment.html>

More information about the ejabberd mailing list