[ejabberd] Losing messages to dead connections

Raoul Duke rduke496 at gmail.com
Sat Mar 22 22:32:44 MSK 2014


Hi list,

I am having the same problem as outlined here:

http://stackoverflow.com/questions/17424254/ejabberd-online-status-when-user-loses-connection

I will quote the scenario outlined in the above post for convenience:

<quote>

1] User A is messaging User B via their mobiles.
2] User B loses all connectivity, so client can't disconnect from server.
3] ejabberd still lists User B as online.
4] Since ejabberd assumes User B is still online, any message from
User A gets passed on to the dead connection.
5] So user B won't get the message, nor does it get saved as an
offline message, as ejabberd assumes the user is online.
6] Message lost.

Until ejabberd realises that the connection is stale, it treats it as
an online user.

</quote>

Before I move on I have a question: why does ejabberd not notice that
the send to the "dead connection" failed?  i.e. if the other end is
gone/dead and therefore not ACK-ing the TCP send from ejabberd then
why doesn't ejabberd notice this and deem the message as undelivered?
 I guess this is some well known issue with TCP but it would help
visualize it if someone could explain it a bit.  Is this perhaps
related to terminating proxies/firewalls between ejabberd and the
user?

BTW, I realize that xep-0198 would probably be a much better solution
to this and I plan to test the new patch but for the meantime I am
trying to mitigate this with mod_ping.

The issue with mod_ping however, as the above post also points out, is
that the 32 second timeout for receiving pongs is quite long in the
context of this message black-holing problem.  I would like to lower
the window of potentially black-holed messages.  Am I correct in
assuming that this is the best I can do (in lieu of something like
xep-0198?).

I had a look at the ejabberd sources and it looked to me that the main
diifficulty in changing this timeout for mod_ping pongs is that it
seems to be implicit/important in many other things in ejabberd
(things unrelated to mod_ping).

Therefore I have patched mod_ping so that I can override the 32 second
timeout for mod_ping timeouts only (and set this via confguration). My
patch resolves around change the call to ejabberd_local:route_iq to
pass the optional extra timeout argument allowing me to set a lower
timeout for mod_ping only (without affecting other things that clal
route_iq).

Do you anticipate any problems with this approach?  Can someone please
outline the thinking behind the 32 seconds and why it is not
configurable.  Am I setting myself up for some sort of fall by
lowering it in mod_ping?

What would you recommend as a minimum value for this timeout?  Would a
value of (say) 5 seconds be reasonable?

BTW - the above stackoverflow also has a reply which outlines a rather
interesting idea for an ejabberd add-on as follows:

"I created a mod and hooked up to the send_packet and receive_packet
events. Save the message ID to a table. Start a 10 sec waiting thread.
If the receive_packet hook gets the message ID back under 10 sec I
kill the thread, else I manually store the message in the offline
table. Worst case now is, I might have the msg twice in the offline
table. But it will have the same ID, our clients know not to duplicate
messages. - Johan Vorster Dec 12 '13 at 16:49"

This sounds like a bit of a kludge but it also sounds like it may be
an effective one.  Does this sound viable?  Does anyone know or have
any mod already implemented like this?  It isn't clear to me in the
user_receive_packet how I could interrogate the MessageID, can someone
send me a pointer/example of this?

Any pointers/suggestions appreciated.

Thanks.


More information about the ejabberd mailing list