[ejabberd] add_ and delete_rosteritem throughput

Steven Lehrburger lehrburger at gmail.com
Thu May 16 20:50:15 MSK 2013


Hi David, thanks for the algorithm/questions. Answers below!

a.1) What are these roster pushes for?

I'm working on a chat product that feels more like real-life group social
situations (office, conference, party, coffee shop, etc). Users can see the
conversations their friends are having, but not what they're saying unless
they join that conversation. I've built it on top of XMPP so that it works
with existing clients, and this GIF shows the basics of what I've built so
far: https://dashdash.com/static/img/screencast.gif.

The roster displays for the user the active conversations that s/he can
see. I do a roster push when: a conversation starts, a conversation ends,
or the participants in a conversation change. That push needs to go to the
participants, as well as to each user who is a "friend" on the service of
any of the participants, so it's a potentially large number.

b.1) How are they grouped?

Each user has one roster group of contacts, and one roster group of
conversations. Contacts move from the former to the latter when a
conversation becomes active (via a roster push), and vice versa when the
conversation ends.

  b.3) Would something like http://www.ejabberd.im/shared-roster-all ?)

I've considered shared roster groups, but not recently. As I understand it,
I'd need a shared roster group for every edge in the social graph + every
active conversation, and that might not even work, since I need to set and
change contact nicknames whenever the conversation state changes.

c.1) Is there any redundancy?

Sometimes there is redundancy: If a user joins a conversation, it will fire
off one set of changes, but if a second user joins immediately afterward,
then it will fire off a second set of changes that will overwrite the
first. Because of this, my previous idea of using a standard queue might
not be the best one - it would be much better to use a data structure that
let me modify messages that had been queued but not yet processed (perhaps
just a MySQL table).

e.1) Do online clients *need* to be notified about them immediately?

The sooner the better, especially for the participants of a conversation.
If a user joins a conversation and the roster takes 30 seconds to reflect
the change, then the service feels broken from a UX perspective. There's
slightly less urgency for users who can see, but are not in, the
conversation, but it should be prompt.

  e.3) Could you inject the changes directly into the sql db and then
inject a
       roster push in sideways (or "if not online: inject into sql")

This is a good idea, but I haven't tried to use mod_roster_odbc yet. As of
four years ago, however, mod_admin_extra doesn't support mod_roster_dobc:
http://www.ejabberd.im/node/3356#comment-53931 :-/

f.1) Do you have control of the client software?

No - it works with Adium/Pidgin/etc. One day I'll write my own clients, but
not yet :)

f.2) Do you need to support legacy versions of client software?

No.

f.3) Can legacy versions be allowed to have known regressions (like e.1)?

It depends on how much longer the roster pushes take, I guess. Worst case
is the user has to upgrade to a better client? But this hasn't been an
issue yet.

3) Where Could we be?

I could be modifying ejabberd itself instead of relying on a Python
component, but have only a superficial knowledge of Erlang.

4) Where Should we be?

I should probably write an XMPP server from scratch that has all of this
weird custom group logic built in, but I don't have enough users to justify
that amount of work yet.

5) How do we get there?

So, to get there, I need to make the current architecture work for as long
as possible :)

Thanks again,
Steven



On Thu, May 16, 2013 at 10:06 AM, David Laban <alsuren at gmail.com> wrote:

> Steven
>
> Thinking laterally, let's go back a couple of steps.
>
> My Dad's "Standard British Telecom Problem Solving Algorithm" has the
> following steps:
>
> 1) Where are we now?
> 2) Where did we come from?
> 3) Where Could we be?
> 4) Where Should we be?
> 5) How do we get there?
>
> Can you give me a few more details about steps 1 and 2? Specifically:
>
> a.1) What are these roster pushes for?
> b.1) How are they grouped?
>   b.3) Would something like http://www.ejabberd.im/shared-roster-all ?)
> c.1) Is there any redundancy?
> e.1) Do online clients *need* to be notified about them immediately?
>   e.3) Could you inject the changes directly into the sql db and then
> inject a
>        roster push in sideways (or "if not online: inject into sql")
> f.1) Do you have control of the client software?
> f.2) Do you need to support legacy versions of client software?
> f.3) Can legacy versions be allowed to have known regressions (like e.1)?
>
> David.
>
> On Thursday 16 May 2013 08:48:09 Steven Lehrburger wrote:
> > To answer the last of my questions: I can just call
> > ejabberdctl's connected_users_info instead of user_sessions_info to
> > efficiently get the information I need to further-prioritize roster
> > changes.
> >
> > That said, I'm still curious about how I might increase the overall
> > add_rosteritem and delete_rosteritem throughput, and, if I can't increase
> > throughput, whether my priority queue workaround sounds reasonable, and,
> if
> > it is reasonable, how to figure out how many requests the queue readers
> can
> > make simultaneously.
> >
> > Thanks!
> >
> > /~s
> >
> > On Wed, May 15, 2013 at 1:37 PM, Steven Lehrburger
> <lehrburger at gmail.com>wrote:
> > > Hi,
> > >
> > > I've been using ejabberd along with an XMPP component in an application
> > > that requires regular changes to user rosters.
> > >
> > > I often need to make as many as 150 of these changes at a time, and
> this
> > > number will increase as the service grows. I've been doing this from my
> > > component via XML-RPC, but get "[Errno 104] Connection reset by peer"
> on
> > > some requests, while others take 20 seconds or more to return, even
> > > though the timeout is set to 5 seconds and maxsessions is set to
> > > infinity.
> > >
> > > I've just been retrying the requests that fail, but the slow requests
> > > cause user experience problems. Furthermore, some of the requests are
> > > more urgent than others, and if ejabberd is still busy churning through
> > > one batch of 150 when a second comes in, then things get really clogged
> > > up. I'm running ejabberd on an m1.small Amazon EC2 instance.
> > >
> > > Does anyone have any suggestions on how I might increase throughput?
> > >
> > > A few other thoughts/ideas/questions:
> > >
> > >
> > > 1) I considered switching from Mnesia to MySQL so that I could modify
> the
> > > rosters directly, and to generally simplify my application by using
> only
> > > one datastore. It doesn't sound like this will work for me, however,
> > > based on various forum posts and a conversation with Badlop last week:
> > >
> http://chatlogs.jabber.ru/ejabberd@conference.jabber.ru/2013/05/08.html.
> > >
> > > 2) The best workaround I've been able to come up with is to buffer the
> > > XML-RPC requests in a priority queue (or, if I use Amazon SQS, in
> > > separate low- and high-priority queues). The queue readers could make
> > > sure they performed the high-priority requests first, but it would
> still
> > > be preferable to somehow increase overall throughput. Also, how should
> I
> > > select a number of queue readers/concurrent requests so as to saturate,
> > > but not overwhelm,ejabberd?
> > >
> > > (I had hoped by putting the queue readers on the same machine as
> ejabberd
> > > I could further improve performance by using ejabberctl, but
> > > http://lists.jabber.ru/pipermail/ejabberd/2012-August/007674.html says
> > > that XML-RPC is faster, and I confirmed this with my own tests.)
> > >
> > > 3) At any given time most of the users who require roster changes are
> not
> > > logged in. I could further prioritize my XML-RPC requests by first
> > > calling user_sessions_info, but I doubt that would be an overall
> > > performance gain. Is there a way for my component to directly
> > > readejabberd's "Last Activity" information for a user? If this is
> stored
> > > in Mnesia, is there a Python library I could use to query this
> > > information from my component?
> > >
> > > Thanks!
> > >
> > > Best,
> > > Steven Lehrburger
> _______________________________________________
> ejabberd mailing list
> ejabberd at jabber.ru
> http://lists.jabber.ru/mailman/listinfo/ejabberd
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jabber.ru/pipermail/ejabberd/attachments/20130516/d56bcc64/attachment.html>


More information about the ejabberd mailing list