[ejabberd] Clustering question...

Felix GV felix at mate1inc.com
Mon May 7 21:03:41 MSK 2012


Interesting, thanks for your comments :) ! Here are my thoughts:

It does seem to work when I take off the extra_db_nodes parameter from the
start up script. So that parameter is just for setting up replication, you
say? Interesting...!

However, the official guide and the one I linked to mention to add this
parameter to the second (or extra) node only. In my case, it seems the
clustering was not proper until I also added the parameter to the
first/original node. But you are right that once the clustering is set up,
then it seems possible to take off that parameter and things still work
(with the expected fault-tolerance and all)...!

In your experiments, did you (initially) add that parameter to all nodes,
or were you able to get things working correctly by adding it to the second
node only? I'm wondering if I'm coming to the right conclusions, or if
something else is going on in my environment that would have caused the
behavior I observed...!

Regarding your other comment: I am surprised that you say the configuration
of Mnesia should be the same on both/all nodes, because that would imply
that Remote copies are never ok. Is that what you mean?

Thanks for your help!

I'm definitely looking forward to reading your blog posts on the subject!
Do post them here once you write them, please :) !


On Sat, May 5, 2012 at 3:45 AM, Michael Weibel <michael.weibel at gmail.com>wrote:

> Hu felix,
> Good that you solved it. I struggled with clustering as well but now I
> think i know how to do it correctly.
> I followed the guide once as well. While it seems to work it doesn't
> mention a few things:
> -extra_db_nodes is only used when you initially set up the mnesia
> replication. After doing this, this param doesn't need to be in the startup
> script at all.
> - replication of mnesia: it's important that the configuration for each
> table is the same on node2 as on node1.
> as soon as my company has setup the planned labs blog I'll publish a step
> by step guide and i hope i can write an automated join cluster script and
> opensource it.
> -Michael
> Am 04.05.2012 22:50 schrieb "Felix GV" <felix at mate1inc.com>:
> Hi Michael,
>> Thanks for your reply :)
>> I am actually launching ejabberd as root, which is probably not a good
>> idea, but that's the way the binary installer set it up (it did not create
>> an ejabberd user, like apt-get does, but I wanted 2.1.10 and apt-get was
>> giving me an older version...). I should definitely set this up more
>> appropriately at some point, but I found out what's causing the issue and
>> it's not related to this.
>> I checked the cookie and they're identical on my two nodes. I was also
>> seeing the two nodes in the web admin interface, so it SEEMED like the
>> clustering was done properly, but there is something that was not correct.
>> I followed this guide to configure the nodes:
>> tdewolf.blogspot.ca/2009/07/clustering-ejabberd-nodes-using-mnesia.html
>> But there is something that this guide does not mention, which seems
>> pretty evident in retrospect, but since it was not explicitly mentioned, I
>> didn't think of doing it.
>> The guide instructed me to modify the ejabberdctl script on node #2 so
>> that it has the following extra parameter:
>> *-mnesia extra_db_nodes "['ejabberd at ejabberd1']"*
>> After doing this, I saw node #2 coming up in the web admin interface, so
>> I thought everything was ok, but it was not ok, because I also needed to
>> add this parameter to node #1's startup script (while mentioning that the
>> extra_db_node is node #2, and not itself, obviously).
>> Doing this made the clustering actually happen for real. After this, the
>> status command outputted correct values on both nodes, and I could bring
>> down any one node, and the other would keep serving chat messages. (The
>> connected clients would be disconnected and reconnect automatically to the
>> remaining alive node, but that is all right.)
>> Now, having different configurations embedded in the ejabberdctl script
>> of each node is not really convenient, so I tried having a uniform script
>> instead, where all nodes mention all nodes (including themselves) in the
>> extra_db_nodes parameter, and that seems to work correctly, so I'm planning
>> to leave it this way.
>> I really wish there was a complete and accurate guide to clustering
>> ejabberd, because the official ejabberd administration guide does not go in
>> depth and is missing some important pieces of information. Making this
>> clustering work required a lot of guess work.
>> I think the guide I provided a link to above, along with the
>> clarification I just mentioned, should be enough for someone else to get it
>> up and running, but it'd be nice to have this in the official doc!
>> The mnesia replication also requires a fair amount of guesswork, and
>> while I think my choices are ok, I'm afraid I might only discover that
>> something is problematic once we're already in production. A default,
>> official, up to date, recommended clustered mnesia configuration would be
>> nice to have as well.
>> Anyway, thanks again for your help! I didn't know about the debug command
>> and poking around with it indirectly lead me to figure out what was going
>> on :) ...
>> Hopefully, this post might save someone else some headaches :)
>> Cheers ;) !
>> --
>> Felix
>> On Thu, May 3, 2012 at 2:31 AM, Michael Weibel <michael.weibel at gmail.com>wrote:
>>> Hi,
>>> >  sudo ./ejabberdctl status I get:
>>> > The node ejabberd at jabber2 is started with status: started
>>> > ejabberd is not running in that node
>>> > Check for error messages: /opt/ejabberd-2.1.10/logs/ejabberd.log
>>> > or other files in that directory.
>>> >
>>> > Commands to start an ejabberd node:
>>> >   start  Start an ejabberd node in server mode
>>> >   debug  Attach an interactive Erlang shell to a running ejabberd node
>>> >   live   Start an ejabberd node in live (interactive) mode
>>> I assume you're running ejabberd as the user ejabberd. If you do a sudo
>>> ejabberdctl status, it could be possible that it takes a different
>>> .erlang.cookie than ejabberd. This way, ejabberdctl can't get the status.
>>> Usually erlang cookies in $HOME/.erlang.cookie. But it may be that there
>>> are also other places where the erlang cookie is stored.
>>> If you do a ejabberdctl debug, you can open a shell attached to the
>>> ejabberd process. There you can get the cookie value by using
>>> "erlang:get_cookie().". (Btw to close the shell: 2x ctrl+c)
>>> Maybe that helps you.
>>> >
>>> > Whereas on jabber1.dev, the same command gives me:
>>> >
>>> > The node ejabberd at jabber1 is started with status: started
>>> > ejabberd 2.1.10 is running in that node
>>> >
>>> > Is there someone that can help me debug this problem...?
>>> >
>>> > I can provide other informations if the above is not sufficient...!
>>> >
>>> > Thanks a lot :) !!
>>> >
>>> > --
>>> > Felix
>>> >
>>> > _______________________________________________
>>> > ejabberd mailing list
>>> > ejabberd at jabber.ru
>>> > http://lists.jabber.ru/mailman/listinfo/ejabberd
>>> _______________________________________________
>>> ejabberd mailing list
>>> ejabberd at jabber.ru
>>> http://lists.jabber.ru/mailman/listinfo/ejabberd
>> _______________________________________________
>> ejabberd mailing list
>> ejabberd at jabber.ru
>> http://lists.jabber.ru/mailman/listinfo/ejabberd
> _______________________________________________
> ejabberd mailing list
> ejabberd at jabber.ru
> http://lists.jabber.ru/mailman/listinfo/ejabberd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20120507/e2c89c96/attachment.html>

More information about the ejabberd mailing list