There's two reasons to reboot a mud: a code upgrade and a crash.
A code upgrade is easier to handle... you know that all data structures are (should be) consistant, so it's a simple matter of stowing your current datastructures somewhere safe, setting all descriptiors copy-on-exec and execing your current mud with a flag to tell it it's a reboot.
A crash is more troublesome... some things are going to get lost. A crash is 99.9% likely to be caused by a segmentation fault... or something scribbling on memory it dosn't own. There's two levels of severity: NULL dereference and wild pointer/overflow.
A NULL dereference is inherently safe, since generally it simply means you forgot to check a pointer before you derefed it. It's not usually a sign of a catastrophic failure.
A wild pointer or buffer overflow is more troublesome... you have no idea what structures it scribbled on so basically everything in memory is suspect. Toss it all.
Here's the basic gameplan:
Global variable 'failure'. Generally set to 0 (NOFAILURE). As the last
*_init() call, call recovery_init(). recovery_init() sets failure to 0
and sets the signal handlers. Here's some psudeocode:
struct {
volitile int failure;
jmp_buf longjump_buf;
} * recovery;
... in main() {
db_init();
...
recovery=malloc(4096); /* REQUIRED! IT MUST BE ON IT'S OWN PAGE! */
recovery.failure=-1; /* no recovery attempted */
sigaction(SIGSEGV, recovery_sigsegv, NULL);
sigaction(other signals...);
return=setjmp(recovery.longjump_buf);
if (return) { /* We came back, something went wrong */
if (recovery.failure==CRITICAL) /* give up and restart with what's been commited already. */
recovery_restart();
if (recovery.failure==SEVERE) { /* save some changes in presumably safe code, then restart */
db_commit(SYNC); /* dosn't return until it's finished */
recovery_restart();
}
exit(1); /* recovery type unknown... total reboot. */
}
mprotect(recovery, 4096, PROT_READ);
game_loop();
}
void recovery_sighandler(int, siginfo_t *si, void *d) {
if (recovery.failure!=0)
exit(1); /* don't attempt recovery, bad things */
if (!si)
return NULL;
if (!mprotect(restoration_addr, 4096, PROT_READ|PROT_WRITE)) {
/* re-allow writing to the recovery.failure data. This prevents
it from being wiped out by the original recovery.failure.
*/
exit(1); /* unable to continue, lost our recovery information */
}
recovery.failure=CRITICAL;
if ((si->si_signo==SIGSEGV) && (si->si_addr == NULL))
recovery.failure=SEVERE;
longjmp(recovery.longjump_buf, 1);
}
Now, assuming you were smart and saved things like socket descriptior structs
outside the game, on bootup with the --reboot flag you can do the following: