[ejabberd] Clustered setup, problems after update (ejabberd 2.0.5 > 2.1.5)

Sven 'Darkman' Michels sven at darkman.de
Tue Mar 15 00:05:23 MSK 2011


Hi Daniel!

Thanks a lot for your response. It was indeed a problem with the sql_pool table.
To solve my Problem, i did the following:
- shut down node2
- remove node2 from the cluster on node1 with ejabberdctl remove_node node2
- remove all database files from /var/lib/ejabberd
- connect from node2 to node1 again with:
erl -name ejabberd at node2.domain.tld -mnesia dir '"/var/lib/ejabberd/"' -mnesia
extra_db_nodes "['ejabberd at node1.domain.tld']" -s mnesia
- import the schema AND the sql_pool:
mnesia:change_table_copy_type(schema, node(), disc_copies).
mnesia:add_table_copy(sql_pool, node(), disc_copies).
q().
- start ejabberd

This solved the Problem. I choosed disc_copies just because i want node2 to work
even if node1 is down etc. and both can connect to a local, also clustered, mysql
daemon.

Hope that helps other ones, too.

Kind regards,
Sven

Am 10.03.2011 18:17, schrieb Daniel Dormont:
> Hi,
> 
> I just ran into that situation myself. I had a two-node cluster which was working fine, but when I tried to activate ODBC on the second node (previously I'd been running it in only one node) and I got exactly the situation you talked about. Not only that, but the errors seem to have caused an infinite loop - I was getting about 500 of them a second.
> 
> The issue has to do with the Mnesia table sql_pool. At least in my testing so far, if you create the second node by doing only a disc copy of schema, all other tables are treated as "remote" in the second node, and in the specific case of sql_pool, it caused the behavior you saw. The solution is to create a RAM copy of sql_pool on the node *before* running ejabberd with ODBC enabled.
> 
> This does get to a larger question I would pose to the community: is there a guide on how to set up the storage type based on the cluster requirements? I found this: http://lists.jabber.ru/pipermail/ejabberd/2009-December/005535.html which is a good start but it seems a bit out of date and also starts with the configuration of the "master" node as a baseline without really getting into how *that* should be decided.
> 
> -Dan
> 
> 
> On Mar 6, 2011, at 4:38 PM, Sven 'Darkman' Michels wrote:
> 
>> Hi,
>>
>> short facts:
>> - 2 nodes
>> - mysql clustered on both
>> - debian 5.0.8 on both nodes
>>
>> Today we upgraded our both nodes from 2.0.5 to 2.1.5. Before the upgrade both
>> nodes where running fine for about one year without problems. The update was
>> done via. aptitude, all services on both servers where stopped upfront and we
>> removed the network connection between both to avoid getting some "nasty" status
>> when the server is automaticly started after the upgrade. We've been doing this
>> since a couple of years without problems, so far ;)
>>
>> We upgrade both servers, after a reboot, we started the mysql cluster again. After
>> a couple of seconds the cluster was back in sync and everything was fine so far.
>> Then we started ejabberd on node1, worked without problems (its running right
>> now). After testing the first node without any problems, we started the second
>> one. But that one didn't came up. It doesn't even log any problems, we just found
>> some "core" files like: MnesiaCore.ejabberd at node2.domain.tld_1299_440860_127056
>> The core files stated something like "failed to merge schema". So we decided to
>> remove the node from the cluster and rejoin it to get a clean state. But that
>> also failed. We synced the cookie and did (as ejabberd):
>> erl -name ejabberd at node2.domain.tld -mnesia dir '"/var/lib/ejabberd/"' -mnesia
>> extra_db_nodes "['ejabberd at node1.domain.tld']" -s mnesia
>>
>> after that, we verified the working connection with mnesia:info(). and checked
>> the webinterface on node1 which showed the node2 just fine.
>>
>> Then we issued the mnesia:change_table_copy_type(schema, node(), disc_copies).
>> command which succeeded. Then q(). to leave mnesia shell. This is just like we
>> did it when node2 was joined the first time. Worked fine. But this time, ejabberd
>> didn't came up after we tried to start it. Instead its filling the logs with the
>> following:
>> =ERROR REPORT==== 2011-03-06 22:10:13 ===
>> E(<0.37.0>:ejabberd_rdbms:67) : Start of supervisor 'ejabberd_odbc_sup_domain.tld'
>> failed:
>> {error,{{'EXIT',{badarg,[{ets,delete,[sql_pool,"domain.tld"]},
>>                         {mnesia,delete,5},
>>                         {mnesia_tm,non_transaction,5},
>>                         {ejabberd_odbc_sup,start_link,1},
>>                         {supervisor,do_start_child,2},
>>                         {supervisor,handle_start_child,2},
>>                         {supervisor,handle_call,3},
>>                         {gen_server,handle_msg,5}]}},
>>        {child,undefined,'ejabberd_odbc_sup_domain.tld',
>>               {ejabberd_odbc_sup,start_link,["domain.tld"]},
>>               transient,infinity,supervisor,
>>               [ejabberd_odbc_sup]}}}
>> Retrying...
>>
>> (i get this message a couple of times within a second, running ejabberd in
>> loglevel 5...). So ejabberd "try" to start, but hangs at this problem. Google
>> didn't help much with this problem and i'm not sure about whats going wrong there.
>> In fact, the same software, same version, same config etc. is working on the other
>> node.
>>
>> Anyone aware about this issue? anything i forgot? anything i'm missing?
>>
>> Thanks for your time and help!
>>
>> Regards,
>> Sven
>> _______________________________________________
>> ejabberd mailing list
>> ejabberd at jabber.ru
>> http://lists.jabber.ru/mailman/listinfo/ejabberd
> 
> _______________________________________________
> ejabberd mailing list
> ejabberd at jabber.ru
> http://lists.jabber.ru/mailman/listinfo/ejabberd


More information about the ejabberd mailing list