[ejabberd] Clustered setup, problems after update (ejabberd 2.0.5 > 2.1.5)

Sven 'Darkman' Michels sven at darkman.de
Mon Mar 7 00:38:04 MSK 2011


short facts:
- 2 nodes
- mysql clustered on both
- debian 5.0.8 on both nodes

Today we upgraded our both nodes from 2.0.5 to 2.1.5. Before the upgrade both
nodes where running fine for about one year without problems. The update was
done via. aptitude, all services on both servers where stopped upfront and we
removed the network connection between both to avoid getting some "nasty" status
when the server is automaticly started after the upgrade. We've been doing this
since a couple of years without problems, so far ;)

We upgrade both servers, after a reboot, we started the mysql cluster again. After
a couple of seconds the cluster was back in sync and everything was fine so far.
Then we started ejabberd on node1, worked without problems (its running right
now). After testing the first node without any problems, we started the second
one. But that one didn't came up. It doesn't even log any problems, we just found
some "core" files like: MnesiaCore.ejabberd at node2.domain.tld_1299_440860_127056
The core files stated something like "failed to merge schema". So we decided to
remove the node from the cluster and rejoin it to get a clean state. But that
also failed. We synced the cookie and did (as ejabberd):
erl -name ejabberd at node2.domain.tld -mnesia dir '"/var/lib/ejabberd/"' -mnesia
extra_db_nodes "['ejabberd at node1.domain.tld']" -s mnesia

after that, we verified the working connection with mnesia:info(). and checked
the webinterface on node1 which showed the node2 just fine.

Then we issued the mnesia:change_table_copy_type(schema, node(), disc_copies).
command which succeeded. Then q(). to leave mnesia shell. This is just like we
did it when node2 was joined the first time. Worked fine. But this time, ejabberd
didn't came up after we tried to start it. Instead its filling the logs with the
=ERROR REPORT==== 2011-03-06 22:10:13 ===
E(<0.37.0>:ejabberd_rdbms:67) : Start of supervisor 'ejabberd_odbc_sup_domain.tld'

(i get this message a couple of times within a second, running ejabberd in
loglevel 5...). So ejabberd "try" to start, but hangs at this problem. Google
didn't help much with this problem and i'm not sure about whats going wrong there.
In fact, the same software, same version, same config etc. is working on the other

Anyone aware about this issue? anything i forgot? anything i'm missing?

Thanks for your time and help!


