[ejabberd] The cluster: I have done it!

Felipe Brito Vasconcellos fbv at trf4.gov.br
Sat May 28 01:35:46 MSD 2005

Well, folks, I have succed!!!!!!! For the migration purposes I have done 
it. The new machine (lnxaplic) now has all the tables and it is running 
alone! Everything has migrated. All the contacs from everybody! That is 
great. :)

Afer this proccess, I got some questions that may be useful to everyone.

1) how to set up a cluster correctly. I mean, I had here for a couple of 
hours, two erlang nodes running the same jabber host (icq2, thus 
user at icq2). They were, obviously in two different machines (lnxim and 
lnxaplic). The first one was the hot one, everybody connected at lnxim 
through icq2 (my dns server was set up to point to lnxim, icq2 and lnxim 
has the same IP address, but the machine name is lnxim and the jabber 
instance is icq2.)
Once I have done it, two nodes running and seeable (terrible word, is it 
correct? :)) through web admin, I changed the icq2 dns on the fly 
(before: icq2 -> lnxim, after: icq2 -> lnxaplic), I mean, both nodes 

first node: ejabberd at lnxim (-sname ejabberd)
second node: lnxaplic at lnxaplic (I changed to -sname lnxaplic)

Next step, I stopped node ejabberd at lnxim and the service kept running on 
the fly!
I stop the second node lnxaplic at lnxaplic and then started just the 
second one and the mnesia database was all like rock!!!!!!!

Now I'm running here just the second node (lnxaplic machine) in 
production. Thanks everybody. :)
The service passed a period of test with about 50 users connected 
simultaneously, and soon will pass a hard teste: about hundreds of users 
at the same time! Hope it will keep running solid. :)

So, back to the question. Because of this hard test I wish to set up a 
real cluster to be fault tolerant.

But I discover that if I stop the main node that is pointed by dns icq2 
the users cannot connect anymore because the machine pointed by icq2 is 
not open anymore. Note that the first one is running (lnxim). So, how 
can I make any erlang node respond to a connection query? I mean, 
lnxaplic (pointed by icq2) is down and lnxim is up. But users cannot 
connect to the icq2 jabberd instance.

Does anyone know something about this situation? Maybe someone is trying 
to do the same thing, we could put our effort together. :)


Felipe Brito Vasconcellos
Suporte Linux - fone (51) 3213-3624
Secretaria de Recursos Tecnológicos - Diretoria de Informática
Tribunal Regional Federal da 4ª Região 

More information about the ejabberd mailing list