Thu Oct 22 01:33: busy-beaver b-b spontaneously rebooted. The best explanation I can come up with, and it's a bad one, is that I mistyped when I was executing sysrqs and mistyped sysrq-b, which is reboot. But I don't buy that, because I didn't see any console output from it, and it happened while I was not typing. There's nothing useful in any of the logs (they all show a boot, but no cause for shutdown/reboot) # Thu Oct 22 19:14: busy-beaver AFS was wedged in a really interesting way. While load was over 2000 we were able to get a shell. nslcd was kill -9'ed and then restarted; kill -9'ing the shell unwedged the console. 1540 processes from failed Nagios probes. We removed the machine from the pool by shutting down postfix. /proc/meminfo reports 1.5/4GB physical and 5.5/8GB swap free. Processes (ns-slapd, httpd.worker, fs, check_disk) were mostly stuck in traces like: ns-slapd D 0000000000000002 0 1534 1 ffff8800b4c53de8 0000000000000286 0000000082dcc877 0000000082dcc877 ffff8800b4c53da0 ffffffff8100ee82 0000000000000206 0000000082dcc877 ffff8800b4d03248 000000000000e2e8 ffff8800b4d03248 0000000000012d00 Call Trace: [] ? check_events+0x12/0x20 [] ? xen_mc_issue.clone.0+0x34/0x4d [] ? xen_write_cr0+0x3f/0x46 [] schedule+0x21/0x49 [] __down_read+0xa9/0xd5 [] ? finish_task_switch+0x6c/0xfb [] down_read+0x3e/0x59 [] sys_madvise+0x88/0x510 [] ? trace_hardirqs_off_thunk+0x3a/0x6c [] ? audit_syscall_entry+0x12d/0x16d [] ? trace_hardirqs_on_thunk+0x3a/0x3c [] system_call_fastpath+0x16/0x1b No BUGs or OOPSes were in dmesg output. /proc/1534/syscall was: 28 0x7f589cc16000 0x289000 0x4 0x2a9450 0x1000 0x8 0x7f58a1e1ad38 0x39330dae77 So this is madvise((void *)something, (size_t)289 pages, MADV_DONTNEED). MADV_DONTNEED Do not expect access in the near future. (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.) Subsequent accesses of pages in this range will succeed, but will result either in re- loading of the memory contents from the underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file. We should set up remote syslog. Can we decide that the syscall itself is irrelevant and the presence of schedule() etc. means that something else strange happened? That doesn't seem unreasonable to me, but you're far more knowledgeable about the kernel.