[ejabberd] Cluster Servers stop communicating

Jesse Thompson jesse.thompson at doit.wisc.edu
Mon Nov 30 22:48:42 MSK 2009


It's too dependent on our environment to share directly.

Basically, here is a command you can use to determine if the current 
node believes the other node is stopped.

   echo "rpc:call($ERLANG_NODE, mnesia, info, [])." | exec erl -name 
dbinfo_$THIS_HOSTNAME

and you can parse that output with this perl code

   for ( @mnesia_info ) {
       if ( m/stopped db nodes\s+=\s+\[(.*?)\]/ ) {
           my @stopped_nodes = ();
           $_ = $1;
           while ( /'(.*?)'/g ) {
               push @stopped_nodes, $1;
           }
           return @stopped_nodes;
       }
   }
   return $this_node;  # this node must be down

Here is some perl code that verifies that the other node is listening on 
port 5222, which is a good indication that it is still serving client 
traffic.

   my $available = 0;
   eval {
       alarm $commandtimeout;
       my $sock = new IO::Socket::INET (
           PeerAddr => $master_server,
           PeerPort => 5222,
           Proto    => 'tcp',
       ) or die "could not connect: $!";
       $available = 1 if $sock;
       close $sock;
       alarm 0;
   };
   if ( $@ ) {
       syslog LOG_ERR, "unable to connect to master node: $@";
       alarm 0;
   }
   return $available;

At that point, you can stop or restart the node as you see fit.

Jesse

Mark wrote:
> Care to share your script?
> 
> -- 
> Mark
> "Blessed is he who finds happiness in his own foolishness, for he will 
> always be happy."
> 
> 
> On Mon, Nov 30, 2009 at 9:26 AM, Jesse Thompson 
> <jesse.thompson at doit.wisc.edu <mailto:jesse.thompson at doit.wisc.edu>> wrote:
> 
>     We see this problem as well.  We have our nodes in different data
>     centers, so presumably the problem is being caused by the network.
>     However, it is very hard to prove what the cause is.
> 
>     We created a script that periodically checks to see if the nodes
>     have lost sight of each other.  The script will then automatically
>     restart the "slave" node.
> 
>     It would be nice if this functionality was built into ejabberd.
> 
>     Jesse
> 
>     Steve Bond wrote:
> 
>         We have an ejabberd cluster with 2 nodes, occasionally when I
>         check the admin interface on each node they no longer see each
>         other as running preventing users connected to one node from
>         communicating with users connected to the other.
> 
>          
>         Does anyone know of a command or something I can run to get the
>         2 nodes to check if the other is available again when they lose
>         communication?
> 
>          
>         Steve Bond
> 
>         Operations Engineer
> 
>         NewVoiceMedia
> 
>         +44 (0)7786 653039
> 
>         +44 (0)800 280 2888 Ext. 007
> 
>         www.newvoicemedia.com <http://www.newvoicemedia.com>
>         <http://www.newvoicemedia.com/>
> 
> 
>          
>         Registered Office: NewVoiceMedia Ltd, Belvedere, Basing View,
>         Basingstoke, Hampshire. RG21 4HG.
> 
>         NewVoiceMedia Registered in England No: 3602868
> 
>         This email and any attachments to it may be confidential and are
>         intended solely for the use of the individual to whom it is
>         addressed. Any views or opinions expressed are solely those of
>         the author and do not necessarily represent those of
>         NewVoiceMedia. If you are not the intended recipient of this
>         email, you must neither take any action based upon its contents,
>         nor copy or show it to anyone. Please contact the sender if you
>         believe you have received this email in error.
> 
>          
> 
>         ------------------------------------------------------------------------
> 
>         _______________________________________________
>         ejabberd mailing list
>         ejabberd at jabber.ru <mailto:ejabberd at jabber.ru>
>         http://lists.jabber.ru/mailman/listinfo/ejabberd
> 
> 
>     -- 
>      Jesse Thompson
>      Division of Information Technology, University of Wisconsin-Madison
>      Email/IM: jesse.thompson at doit.wisc.edu
>     <mailto:jesse.thompson at doit.wisc.edu>
> 
>     _______________________________________________
>     ejabberd mailing list
>     ejabberd at jabber.ru <mailto:ejabberd at jabber.ru>
>     http://lists.jabber.ru/mailman/listinfo/ejabberd
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> ejabberd mailing list
> ejabberd at jabber.ru
> http://lists.jabber.ru/mailman/listinfo/ejabberd

-- 
   Jesse Thompson
   Division of Information Technology, University of Wisconsin-Madison
   Email/IM: jesse.thompson at doit.wisc.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3317 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20091130/b2ed3752/attachment-0001.bin>


More information about the ejabberd mailing list