[ejabberd] recover from inter-node network problems in a cluster

Daniel Dormont dan at greywallsoftware.com
Wed Mar 9 02:18:37 MSK 2011

I have two nodes in a cluster. Each has users connected to it. Suppose some network problem causes the servers to lose contact with each other for a short time, even though the clients are still able to reach both. I have three questions:

1) What is the timeout before the error "** Node ejabberd at othernode not responding **" appears? Is this configurable?

2) What is the best way to detect this state of affairs from outside ejabberd itself? I see one option here: http://www.ejabberd.im/node/3379#comment-55607 Is that recommended?

3) Is there any way at all to restore the Mnesia connectivity without completely restarting ejabberd? I tried stopping and restarting mnesia itself from the ejabberd debug console, and that gave the appearance of working (both nodes show up in the Nodes list in the web admin) but it doesn't seem to really work in that, for example, users get "not authorized"  errors while trying to log in.

I know I could set up a process to issue ejabberdctl commands to restart the nodes, but I'd rather avoid that because the users would be disconnected.


