[ejabberd] Cluster check problem

Zbyszek Żółkiewski zbyszek at toliman.pl
Mon May 15 03:31:34 MSD 2006


There is problem with clustered setup. I did not fill bug report due to 
this problem - i think need discussion how to resolve it.
Problem: If You are running 2 or more cluster setup of ejabberd and 
there is phisical (or other) network issue (failure) that connecting all 
then nodes (like internal network that interconnect nodes), act as other 
node failed. But this node is not failing - this cause to very strange 
situation.: For example: cluster setup with 2 nodes: A and B. A and B 
are connected via private network, both have independent internet 
access. Now when private network fails, node A and node B "thinks" that 
node A,B have failed - and A remove B from cluster. The same thing 
happening on B. Now - users can still connect to both nodes - but the 
nodes are not in "sync". That can result in strange situations like s2s 
issues, or 2 users connected to different nodes don't see each other... 
Now when private network start to work - nodes will not "see" each other 
(you need to restart one of the node) - i think that must be fixed 
(maybe some check intervals if failed node have back to life?).
So 2 problems:
1) interconnection checks
2) problem with working nodes, but not inter exchanging info (?)


Zbyszek Żółkiewski

