(Message afs:64) Return-Path: Received: from pacific-carrier-annex.mit.edu by po12.mit.edu (8.9.2/4.7) id FAA29376; Thu, 27 Sep 2001 05:48:53 -0400 (EDT) Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by pacific-carrier-annex.mit.edu (8.9.2/8.9.2) with ESMTP id FAA13008 for ; Thu, 27 Sep 2001 05:48:53 -0400 (EDT) Received: from hs-eber02-01.Germany.Sun.COM ([129.157.155.3]) by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id CAA26867 for ; Thu, 27 Sep 2001 02:48:50 -0700 (PDT) Received: from gargoyle (gargoyle [129.157.155.120]) by hs-eber02-01.Germany.Sun.COM (8.9.3+Sun/8.9.3/ENSMAIL,v2.1p1) with SMTP id LAA19627; Thu, 27 Sep 2001 11:48:46 +0200 (MEST) Message-Id: <200109270948.LAA19627@hs-eber02-01.Germany.Sun.COM> Date: Thu, 27 Sep 2001 11:47:05 +0200 (MEST) From: Frank Batschulat Reply-To: Frank Batschulat Subject: Re: [OpenAFS-devel] AFS and ufs/sds logging on solaris To: kolya@MIT.EDU Cc: Frank.Batschulat@Sun.COM MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Content-MD5: LVzgKC8j4bxlQbKqKbEHTA== X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.4 SunOS 5.8 sun4u sparc >To: Frank Batschulat >Subject: Re: [OpenAFS-devel] AFS and ufs/sds logging on solaris >Date: Wed, 26 Sep 2001 15:52:57 -0400 >From: Nickolai Zeldovich > >> Well, can you make them available for me ? > >Sure; the crash dump files are available from >ftp://ftp.zoomph.net/pub/kolya/hidden/frank.742984691/afs-ufslog-hang.tar.gz. >The hanging process in this case is 13730 (cp), waiting for the >truncate daemon 292 to clear out a dcache slot, which is waiting >for 13730's UFS transaction to finish, as you can see in the >output below. > >Let me know when you've gotten the crash dump so I can clean it up. > >Hope this helps, > >-- kolya > >0t13730$?(30001869a18,78534d20,30001869a18,62b96,10,10) >afs_osi_Sleep(78538101,0,20,ffff0000,30000,20003) + b8 >Afs_Lock_Obtain(78538100,2,20,0,0,1) + 158 >afs_FindDCache(30000f846a8,0,30000e14000,2a10051ca68,1,107eb430) + 180 >afs_UFSRead(30000f846a8,2a10051cae8,300004a7bc8,0,0,1) + 300 >afs_ustrategy(3000192b7b8,300004a7bc8,10423768,0,2000,ffffffffffffffff) + 120 >afs_GetOnePage(30000f846a8,0,2000,2a10051cf94,2a10051cfa8,2000) + 8b0 >afs_getpage(30000f846a8,0,2000,2a10051cf94,2a10051cfa8,2000) + 124 >segvn_fault(0,ff382000,0,0,1,ff380000) + 784 >as_fault(1,ff380000,30001a1a748,1,0,0) + 3a4 >pagefault(1,0,300019c6000,30000724e90,ff380000,70008) + c4 >trap(ff380000,1,0,101411f4,2a10051d3a0,0) + 828 >prom_rtt(0,2a10051d620,10459800,2a10051d628,1,0) >hat_memload(ff380040,2a752ac0040,40,ffffffffffffffc0,1,30000051f88) + 10c >uiomove(2a10051da30,40,1,2a10051da00,40,2a752ac0000) + b0 >wrip(3000140d330,0,30000d28000,ffffffffff,8000,1) + 504 >ufs_write(30000b825c8,0,0,3000140d2b0,3000140d3e0,0) + 4ac >write(0,40,2002,300017929b0,4,40) + 204 >write32(4,ff380000,40,2,21e14,12504) + 30 > >78538100$afs_xdcache: >afs_xdcache: wait_states excl_locked readers > 3 2 0 >afs_xdcache+4: waiting wait_sec wait_ms > 6 4 350309 >afs_xdcache+0x10: last_reader writer source > 0 292 527 > >0t292$?(30000cad40c,30000cad400,30000cad40c,30,0,14) >top_end_sync(30000cad40c,2a10032494c,30000cad6e0,0,30000cad400,30000cad320) + 78 >top_issue_sync(3000005e468,30000e85ac0,20,7,1016932c,2f01) + 44 >top_end_async(30000cad400,1,3000005e468,30000cad6e0,30000cad320,10) + 9c >ufs_write(0,0,0,30000d5cb58,30000d5cc88,0) + 53c >vn_rdwr(1,2a100324ca0,78534064,fffffffffffffffd,0,1) + e4 >afs_osi_Write(fffffffffffffffd,78534064,300010ad490,30,8,8) + 124 >afs_WriteDCache(300010ad450,1,78534064,0,0,1) + e0 >afs_GetDownDSlot(4,9bad0,300010a5ab0,30,30000e85ac0,0) + 264 >afs_UFSGetDSlot(33ac,0,0,0,0,0) + 1e4 >afs_GetDownD(ffffffffffffdc2b,2a1003252f4,1041b130,2a100045d40,30000e85ac0,0) +76c >afs_CacheTruncateDaemon(78534d20,0,f,30000b86db0,30000028f00,30000accf38) + 2f8 >afs_syscall_call(3,1,ff23f9fc,0,22604,0) + 52c >Afs_syscall(2a100325978,2a100325978,104374b0,1,48,48) + e0 >syscall_ap(10433af0,2a100325ba0,30000e83838,ff23f9fc,0,30000088880) + 6c >loadable_syscall(30000088880,10433af0,1400,5,0,22604) + 90 >indir(10433af0,1c,3,1,ff23f9fc,0) + a4 > Thanks, obvious problem, ufs_write() in 0t13730 has opened an async transaction got a pagefault during the uiomove() (nothing really surprising if it's the cp command, cp mmap's the source file) thus calling back into the FS getpage routine, afs_getpage() and friends get stuck on 0t292, which issued a sync transaction to flush the log, logmap or deltamap. Unfortunately 0t292 obviously holds afs locks needed by 0t13730. hence we deadlock. this sync op in 0t292, can not proceed until the async op in 0t13730 has finished. The async op doesn't get cleared as the thread is waiting on an afs lock owned by the thread which issued the sync op. I tend to say, this is not a ufs problem. It's kind of the same problem you get when you use local NFS mounts instead of loopback mounts via lofs(7FS). frankB