[ejabberd] strange registration error

Konstantin Khomoutov flatworm at users.sourceforge.net
Thu Nov 12 17:02:39 MSK 2009


On Thu, 12 Nov 2009 11:14:33 +0100
Peter Viskup <skupko.sk at gmail.com> wrote:

> >  A solution is to use a pseudo-random value: restricted to a small
> >> amount of values that repeat periodically.
> >> Quick example in Bash:
> >>  MINUTE=`date +%M`
> >>  SECOND=`date +%S`
> >>  PNUM=$[ $MINUTE + $SECOND ]
> >>  erl -sname ctl-$PNUM-ejabberd at localhost ...
> >>
> >> I suspect this solution is not yet good enough for inclusion in main
> >> ejabberd, so suggestions are welcomed.
> >>
> > Just copy the solution from Debian -- it uses nanoseconds to greatly lower
> > the chances of a race condition.
> You are right, I am running Debian version of ejabberd and I have been never
> seen such error message.
> The Debian way for suffix:
> $(date +%s%N)
> 
> Anyway I am using ejabberdctl script for munin-based statistics and after
> 71-day uptime of ejabberd I have got 601648 ejabberdctl records.
> 
> jabber:/var/log/ejabberd# echo "ejabberdctl1257569451228580106 at localhost" |
> wc -c
> 41
> jabber:/var/log/ejabberd# zgrep -c ejabberdctl erl_crash.dump.gz
> 601648
> 
> If I calculated that correctly it is cca 23MB:
> jabber:/var/log/ejabberd# echo $((41*601648/1024))
> 24089
[...]

Thanks for pointing this out.
We discussed this with Badlop (possibly that was precisely about
your case). We came up with two possible solutions:

* Add certain degree of "node identity management", that is:
  1) ejabberdctl queries empd for node names matching "ejabberdctl*".
  2) if this instance of ejabberdctl does not detect the presence
     of concurrent instances, it first checks whether a special
     file with "used ejabberdctl node names" exist.
     a) if it does not exist, ejabberdctl generates a pseudo-random
        node name, appends it to that file and proceeds using that
        node name.
     b) if the file does exist, ejabberdctl picks any node name
        from it and proceeds.
  3) if this instance of ejabberdctl detects the presence of
     concurrent instances, it moves to step (2.a).
  This way, at most N node names will be generated
  and used, where N is the maximum number of concurrent ejabberdctl
  instances happened to run at the same time.
  I should note that checking for concurrent instances by querying
  epmd is not the best way to go as it is still open for race conditions,
  so it requires futher thinking.

* Move away completely from using Erlang node-level IPC and instead
  make ejabberd to maintain some OS-level IPC method, for instance,
  a TCP or Unix-domain socket or a named pipe.
  ejabberdctl could then just connect to this command interface and
  send commands to it. Using this method, it can be made completely
  unbound from issues with Erlang node names by not using Erlang
  for the implementation of ejabberdctl.

Anyway, your solution seems like a good palliative solution
to this problem. I'm now pondering the idea about inclusion a special
option for ejabberdctl in Debian which would explicitly enable
generation of pseudo-random node names, say, "--concurrent".
Then with the normal "manual" usage (or other ways of usage when it's
guaranteed that only one ejabberdctl instance will be active at any
given time) a fixed node name will be used, and with that special
option provided, a pseudo-random node name will be used.


More information about the ejabberd mailing list