hi there,<br><br>just stumbled into a problem with 'ejabberdctl restore' we are hoping somebody can give hand with. what we want to do is rename our ejabberd@localhost node to <a href="mailto:ejabberd@master.xmpp.example.net" target="_blank">ejabberd@master.xmpp.example.net</a><br>
<br>so following the instructions in the guide, we did:<br><br>$ ejabberdctl --node ejabberd@localhost start<br>$ ejabberdctl --node ejabberd@localhost status<br>The node ejabberd@localhost is started with status: started<br>
ejabberd 2.1.0 is running in that node<br><br>doing backup takes about 5 minutes and creates a 522MB file:<br><br>$ time ejabberdctl --node ejabberd@localhost backup /tmp/node.localhost<br><br>real 4m40.485s<br>user 0m0.187s<br>
sys 0m0.091s<br><br>$ ls -l /tmp/node.localhost <br>-rw-r--r-- 1 jkb wheel 522707410 Dec 22 00:59 /tmp/node.localhost<br><br>$ ejabberdctl --node ejabberd@localhost stop<br><br>so far so good, and following the guide, we now we move old DCD/DAT/DCL files out of the way and start the cluster with a new node name:<br>
<br>$ ejabberdctl start<br>$ ejabberdctl status<br>The node '<a href="mailto:ejabberd@master.xmpp.example.net" target="_blank">ejabberd@master.xmpp.example.net</a>' is started with status: started<br>ejabberd 2.1.0 is running in that node<br>
<br>everything still looks good. time to do mnesia_change_nodename:<br><br>$ time ejabberdctl mnesia_change_nodename ejabberd@localhost <a href="mailto:ejabberd@master.xmpp.example.net" target="_blank">ejabberd@master.xmpp.example.net</a> /tmp/node.localhost /tmp/<a href="http://node.master.xmpp.example.net" target="_blank">node.master.xmpp.example.net</a><br>
<br>mnesia_change_nodename goes through successfully:<br><br>[...]<br> * Checking table: 'last_activity'<br> + Checking key: 'ram_copies'<br> + Checking key: 'disc_copies'<br> - Replacing nodename: 'ejabberd@localhost' with: ''<a href="mailto:ejabberd@master.xmpp.example.net" target="_blank">ejabberd@master.xmpp.example.net</a>''<br>
+ Checking key: 'disc_only_copies'<br>[...]<br>switched<br><br>real 0m31.713s<br><br>so next thing we do is 'ejabberdctl restore' and this is where everything breaks:<br><br>$ time ejabberdctl restore /tmp/<a href="http://node.master.xmpp.example.net" target="_blank">node.master.xmpp.example.net</a><br>
Failed RPC connection to the node '<a href="mailto:ejabberd@master.xmpp.example.net" target="_blank">ejabberd@master.xmpp.example.net</a>': nodedown<br><br>=ERROR REPORT==== 22-Dec-2009::01:07:52 ===<br>** Node '<a href="mailto:ejabberd@master.xmpp.example.net" target="_blank">ejabberd@master.xmpp.example.net</a>' not responding **<br>
** Removing (timedout) connection **<br><br>real 1m58.774s<br><br>what actually happens is beam will eat up all available RAM (7GB), eat up all avaiable swap (2GB) and get killed by the OS.<br><br>my guess this is because ejabberd/erlang/mnesia is trying to load everything into memory first before writing it into the DCD/DAT/DCL files, correct? is there any way to modify this behavior or work around it somehow?<br>
<br>[... 5 minutes later of trying various things like mnesia:restore(...), google searches, etc...]<br><br>AHA! there is install_falback command which says:<br><br><dl><dt><b><tt>install_fallback ejabberd.backup</tt></b></dt>
<dd>
The binary backup file is installed as fallback:
it will be used to restore the database at the next ejabberd start.
Similar to <tt>restore</tt>, but requires less memory.
</dd><dt><br></dt></dl>perfect -- just tried it and seems to have worked, except for this scary core:<br><br>=ERROR REPORT==== 2009-12-22 01:25:44 ===<br>Mnesia('<a href="mailto:ejabberd@master.xmpp.example.net" target="_blank">ejabberd@master.xmpp.example.net</a>'): ** ERROR ** (ignoring core) ** FATAL ** A fallback is installed and Mnesia must be restarted. Forcing shutdown after mnesia_down from '<a href="mailto:ejabberd@master.xmpp.example.net" target="_blank">ejabberd@master.xmpp.example.net</a>'...<br>
<br>[this fatal errors comes with either 'ejabberdctl restart' or 'ejabberdctl stop' commands after install_fallback command -- is this scary fatal error expected?]<br>
<br>i guess the really one question i have is: why does 'restore' not act like 'install_fallback' when it comes to memory consumption? and more importantly: maybe it makes sense to modify the documentation guide to recommend people use install_fallback when doing cluster renames in production. <br>
<br>-- yan<br><br>