When does the problem happen? - Xen guests on AMD hosts - Multiple virtual CPUs It does NOT happen when running a guest as uniprocessor - A few times a week What are the symptoms? - Hard wedge - Unresponsive console - sysrqs /can/ be issued - Emergency syncs fail Diagnostics: - We've seen several types of backtraces - getnstimeofday() : the "original" backtrace, seen in 2.6.30 and 2.6.29 - schedule also xen_clocksource/clocksource_read [kjournald] : the "new" backtrace, seen in 2.6.31 - schedule also elv_next_request/clocksource)read [kjournald] : the "jiffies" backtrace, seen in 2.6.31 Events: Tue Jul 14 00:31: pancake-bunny - no useful output from console - sysrqs working - no useful from active cpu backtraces - log level to 9 indicates getnstimeofday - echoing jiffies into /sys/devices/system/clocksource/clocksource0/current_clocksource causes freeze - emergency sync completed (before reboot) Tue Jul 14 12:15: busy-beaver - ssh opens connection then immediately hands - getnstimeofday on top of sysrq - we don't have /proc/sys/xen/independent_wallclock possibly related to hard hang from jiffies Tue Jul 14 12:26: real-mccoy - missing /real/ function due to optimization? @ config change: give Xen one vcpu to real-mccoy - 32-bit operation causing hangs on AMD because of glibc.i586 instead of i686? higher level, due to responsive kernel? - lock being held too long? Wed Jul 15 22:20: pancake-bunny - nstimeofday'd Thu Jul 16 00:21: real-mccoy - it was running with two processors (inconsistent?) - testing two theories? (mitchb) Thu Jul 16 02:28: real-mccoy - left alone for hours, so it got Nagios to get load and process reports but postfix flapped - no LDAP masters down during that time - lockup more likely to happen for SMP paravm on j-l - locked up to point where ssh doesn't accept connections, login prompt hung w/o password field Thu Jul 16 13:35: real-mccoy @ config change: changed to one processor - c-o exhibits this more than j-l. v-p is identical, j-c is similar. - suspect bug in F11 kernel - i586 glibc can't run 32-bit apps on AMD? Can't test, since users need 32-bit apps, and needs to be in pool Fri Jul 17 07:25: busy-beaver - n/c Fri Jul 24 13:46: pancake-bunny - n/c Sun Jul 26 06:52: busy-beaver? - not rebooted Sun Jul 26 09:44: cats-whiskers? - Intel host - not all in gettimensofday() (no followup) Wed Jul 29 09:02: pancake-bunny - n/c Wed Jul 29 10:22: busy-beaver - n/c Thu Jul 30 00:29: pancake-bunny @ config change: p-b and b-b to one cpu permanently - does f11 exhibit problems on bare metal? can't test because it means losing one/two machines while testing could need over a week before tickling bug Sun Aug 2 04:31: pancake-bunny @ config change: p-b and b-b to one cpu # Thu Oct 22 01:22: real-mccoy @ config change: r-m to two cpus Thu Oct 22 15:31: real-mccoy - Several processes are blocked in xen_clocksource_get_cycles Assertion in question, since these are not real frame pointers - Google seems to find reports of people observing this with Xen + 389-ds - https://bugzilla.redhat.com/show_bug.cgi?id=526627 (has patch) - As a workaround, the bug suggests booting with "clocksource=jiffies" on the kernel command line. Mitch tried this? Maybe it only works at boot. @ config change: clocksource-jiffies on real-mccoy at boot-time @ config change: turned off quiet boots - kjournald trace: kjournald D ffff8800a0ac9380 0 55 2 ffff88013d09fad0 0000000000000246 ffff88013d09fa30 ffffffff8100ed0d 0000000001080001 000000003324697e ffff88013d09fa50 ffffffff8107957a ffff88013da84998 000000000000e2e8 ffff88013da84998 0000000000012d00 Call Trace: [] ? xen_clocksource_get_cycles+0x1c/0x32 [] ? clocksource_read+0x22/0x38 [] ? check_events+0x12/0x20 [] ? ktime_get_ts+0x61/0x7d [] schedule+0x21/0x49 [] io_schedule+0x44/0x6c [] get_request_wait+0xfe/0x1a0 [] ? autoremove_wake_function+0x0/0x5f [] __make_request+0x35a/0x49e [] generic_make_request+0x254/0x2b2 [] submit_bio+0xe3/0x100 [] submit_bh+0xf1/0x12a [] journal_do_submit_data+0x4f/0x73 [] journal_commit_transaction+0x3b7/0xe42 [] ? xen_restore_fl_direct_end+0x0/0x1 [] ? _spin_unlock_irqrestore+0x4e/0x64 [] ? try_to_del_timer_sync+0x69/0x87 [] kjournald+0xfd/0x253 [] ? autoremove_wake_function+0x0/0x5f [] ? kjournald+0x0/0x253 [] kthread+0x6d/0xae [] child_rip+0xa/0x20 [] ? restore_args+0x0/0x30 [] ? child_rip+0x0/0x20 Fri Oct 23 00:05: real-mccoy - We got a different class of backtraces; things are still blocking in schedule() but with different fake frame pointers - Jiffies results in undesirable clock drift (since Xen can't deliver them consistently enough) @ Remove jiffies from kernel options, kept 2 vcpus. Plan is to build https://bugzilla.redhat.com/attachment.cgi?id=364778 patch into our kernel, and run again to see if we get this style backtrace, or the earlier style backtrace, or nstimeofday backtrace. - Backtrace: kjournald D ffffc20000000000 0 55 2 ffff88013d017ad0 0000000000000246 ffff88013d017a50 ffffffff8120eabd 00000000010c0001 00000000473dac89 ffff88013d017a50 ffffffff8107957a ffff88013d8ec998 000000000000e2e8 ffff88013d8ec998 0000000000012d00 Call Trace: [] ? elv_next_request+0x1a4/0x1ce [] ? clocksource_read+0x22/0x38 [] ? check_events+0x12/0x20 [] ? ktime_get_ts+0x61/0x7d [] schedule+0x21/0x49 [] io_schedule+0x44/0x6c [] get_request_wait+0xfe/0x1a0 [] ? autoremove_wake_function+0x0/0x5f [] __make_request+0x35a/0x49e [] generic_make_request+0x254/0x2b2 [] ? check_events+0x12/0x20 [] submit_bio+0xe3/0x100 [] submit_bh+0xf1/0x12a [] journal_do_submit_data+0x4f/0x73 [] journal_commit_transaction+0x3b7/0xe42 [] ? xen_restore_fl_direct_end+0x0/0x1 [] ? _spin_unlock_irqrestore+0x4e/0x64 [] ? try_to_del_timer_sync+0x69/0x87 [] kjournald+0xfd/0x253 [] ? autoremove_wake_function+0x0/0x5f [] ? kjournald+0x0/0x253 [] kthread+0x6d/0xae [] child_rip+0xa/0x20 [] ? restore_args+0x0/0x30 [] ? child_rip+0x0/0x20 - Diffed with the previous backtrace - [] ? xen_clocksource_get_cycles+0x1c/0x32 + [] ? elv_next_request+0x1a4/0x1ce [] ? clocksource_read+0x22/0x38 [] ? check_events+0x12/0x20 [] ? ktime_get_ts+0x61/0x7d @@ -13,6 +13,7 @@ [] ? autoremove_wake_function+0x0/0x5f [] __make_request+0x35a/0x49e [] generic_make_request+0x254/0x2b2 + [] ? check_events+0x12/0x20 [] submit_bio+0xe3/0x100 [] submit_bh+0xf1/0x12a [] journal_do_submit_data+0x4f/0x73 Fri Oct 23 14:43: real-mccoy - Two vcpus, no jiffies - Bunch of calls hung in schedule() @ Reboot with LDAP set with n-b style config - Backtrace of kjournald kjournald D ffffc20000000000 0 55 2 ffff88013d0b3bc0 0000000000000246 ffff88013d0b3b20 ffffffff8100ed0d 00000000e857b088 00000000e857b088 ffff88013d0b3b40 ffffffff8107957a ffff88013da84998 000000000000e2e8 ffff88013da84998 0000000000012d00 Call Trace: [] ? xen_clocksource_get_cycles+0x1c/0x32 [] ? clocksource_read+0x22/0x38 [] ? xen_force_evtchn_callback+0x20/0x36 [] ? ktime_get_ts+0x61/0x7d [] ? sync_buffer+0x0/0x6b [] schedule+0x21/0x49 [] io_schedule+0x44/0x6c [] sync_buffer+0x53/0x6b [] ? sync_buffer+0x0/0x6b [] __wait_on_bit+0x57/0x9e [] out_of_line_wait_on_bit+0x7d/0x9c [] ? sync_buffer+0x0/0x6b [] ? wake_bit_function+0x0/0x5a [] __wait_on_buffer+0x37/0x4d [] wait_on_buffer+0x4a/0x65 [] journal_commit_transaction+0x497/0xe42 [] ? xen_force_evtchn_callback+0x20/0x36 [] ? xen_force_evtchn_callback+0x20/0x36 [] ? xen_restore_fl_direct_end+0x0/0x1 [] ? _spin_unlock_irqrestore+0x4e/0x64 [] ? try_to_del_timer_sync+0x69/0x87 [] kjournald+0xfd/0x253 [] ? autoremove_wake_function+0x0/0x5f [] ? kjournald+0x0/0x253 [] kthread+0x6d/0xae [] child_rip+0xa/0x20 [] ? restore_args+0x0/0x30 [] ? child_rip+0x0/0x20 - Backtrace of nslapd ns-slapd D ffffc20000000000 0 1304 1 ffff88013883fc68 0000000000000282 ffff88013883fbc8 ffffffff8100ed0d ffffffff8113cdf2 000000003e1fdee9 ffff88013883fbe8 ffffffff8107957a ffff88013da83248 000000000000e2e8 ffff88013da83248 0000000000012d00 Call Trace: [] ? xen_clocksource_get_cycles+0x1c/0x32 [] ? end_buffer_async_write+0x0/0x16c [] ? clocksource_read+0x22/0x38 [] ? ktime_get_ts+0x61/0x7d [] ? sync_page+0x0/0x71 [] schedule+0x21/0x49 [] io_schedule+0x44/0x6c [] sync_page+0x59/0x71 [] ? sync_page+0x0/0x71 [] __wait_on_bit+0x57/0x9e [] wait_on_page_bit [] search_binary_handler+0xe5/0x2b5 [] do_execve+0x204/0x349 [] sys_execve+0x52/0x7d [] stub_execve+0x6a/0xc0 Fri Oct 23 16:48: real-mccoy - LDAP was set n-b style - We didn't check the console until 7 hours later - Console was responsive, but getty wouldn't say hello. @ Touch /etc/nolvs, see if bug can be reproduced with ~no load - Some new, interesting backtraces suexec D ffff8801145d941c 0 21929 21644 ffff8800a5497df8 0000000000000286 ffff8800a5497dc8 ffffffff810f240a ffff8800a5497da8 00000000c65dd5f8 00007f9efaa29000 0000000000000000 ffff8800a5559af8 000000000000e2e8 ffff8800a5559af8 0000000000012d00 Call Trace: [] ? vma_merge+0x2b6/0x340 [] __mutex_lock_common+0x12c/0x1a8 [] __mutex_lock_slowpath+0x2c/0x42 [] mutex_lock+0x43/0x71 [] generic_file_llseek+0x3b/0x81 [] vfs_llseek+0x4d/0x63 [] sys_lseek+0x52/0x82 [] system_call_fastpath+0x16/0x1b kjournald D ffffc20000016000 0 55 2 ffff88013d023bc0 0000000000000246 ffff88013d023b20 ffffffff8100ed0d 0000000052128bb4 0000000052128bb4 ffff88013d023b40 ffffffff8107957a ffff88013da61af8 000000000000e2e8 ffff88013da61af8 0000000000012d00 Call Trace: [] ? xen_clocksource_get_cycles+0x1c/0x32 [] ? clocksource_read+0x22/0x38 [] ? xen_force_evtchn_callback+0x20/0x36 [] ? ktime_get_ts+0x61/0x7d [] ? sync_buffer+0x0/0x6b [] schedule+0x21/0x49 [] io_schedule+0x44/0x6c [] sync_buffer+0x53/0x6b [] ? sync_buffer+0x0/0x6b [] __wait_on_bit+0x57/0x9e [] out_of_line_wait_on_bit+0x7d/0x9c [] ? sync_buffer+0x0/0x6b [] ? wake_bit_function+0x0/0x5a [] __wait_on_buffer+0x37/0x4d [] wait_on_buffer+0x4a/0x65 [] journal_commit_transaction+0x497/0xe42 [] ? xen_restore_fl_direct_end+0x0/0x1 [] ? _spin_unlock_irqrestore+0x4e/0x64 [] ? try_to_del_timer_sync+0x69/0x87 [] kjournald+0xfd/0x253 [] ? autoremove_wake_function+0x0/0x5f [] ? kjournald+0x0/0x253 [] kthread+0x6d/0xae [] child_rip+0xa/0x20 [] ? restore_args+0x0/0x30 [] ? child_rip+0x0/0x20 __ratelimit: 2 callbacks suppressed audit: audit_backlog=321 > audit_backlog_limit=320 audit: audit_lost=1355 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded audit: audit_backlog=321 > audit_backlog_limit=320 audit: audit_lost=1356 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded audit: audit_backlog=321 > audit_backlog_limit=320 audit: audit_lost=1357 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded audit: audit_backlog=321 > audit_backlog_limit=320 Sat Oct 24 14:30: real-mccoy - Same type of wedge: conclusion, patch didn't fix bug @ Run real-mccoy with /etc/nolvs, to see if bug is reproducable with ~no load Mon Oct 26 20:39: real-mccoy @ Planned outage to abort no load experiment (didn't fall down) and reboot the host in order to check what the non-xen kernel does as a clocksource. Found that clocksource=acpi_pm. Then rebooted guest with elevator=noop Mon Oct 26 21:47: real-mccoy - Fell over, same type of backtrace. @ Rebooted guest and host, set clocksource=tsc, kept guest with elevator=noop Tue Oct 27 02:39: real-mccoy - Experiment failed @ Rebooted guest twice, removing elevator Wed Oct 28 12:05 - Experiment failed @ Rebooted guest twice, set elevator=noop, set one vcpu, host is still on tsc. No pending experiments.