Derrick Brashear provides an update on the state of OpenAFS. New features: AFSDB RRs are now supported in tools and the cache manager. Dynroot support: automounter-like ability to dynamically generate mountpoints in /afs. @sys name list functionality: - Can set a list of values; @sys will be expanded to the first matching one. - some problems: disagreement about reuse of pioctl and parsing output of 'fs'. Should be resolved before OpenAFS 1.2.3 is released. New build system, with autoconf. - building kernel modules for multiple linux kernels is harder under the new build system than the old. Alternate install directory support. - /usr/local/libexec/bosserver vs. /usr/afs/bin/bosserver New ports: - MacOS X - Irix 6.5 on IP35 hardware - Solaris 9 memcache now works on Irix. Significant Rx tuning: - Proper MTU calculation, etc. Misc. fileserver bugs fixed: - Proper handling of deleted hosts/clients. - Abort delays. - Working bulkstat. Features Coming Soon: Disconnected operation - Will be contributed by UMich once OpenBSD work is done. Kerberos 5 integration Support for better encryption algorithms than fcrypt. New InitCallBackState to be more extensible - Take advantage of new client functionality without adding new RPCs) Upcoming releases: - 1.2 branch will continue with 1.2.3 as stable with only bug fixes. - 1.3 branch will be development release with new featues. Ongoing efforts: Build system conversion to automake. BSD support. - progress on FreeBSD. - OpenBSD support coming from UMich. Linux cache manager modernization - VLRU cycle problem; should now be fixed. - dentry cache problem; problems may be fixed for 1.2.3. HP-UX support - no recent progress. Still need 1 kernel header to have working HP-UX 11 port. Misc: Bugs, requests, and patches can be submitted to openafs-bugs@openafs.org openafs-elders@openafs.org can help find resources for desired new features. Still no full-time developers. No included test suite yet, at least one is being developed. openafs-testers@openafs.org - not announced, but you can subscribe, volunteer to test OpenAFS on some particular platform you care about. N.B.: This has now been announced. Questions: Q: AFSDB support in Windows client? A: It's there. Installer still prompts you to pick a CellServDB, though. Things in CellServDB trump AFSDB RRs. Q: Ongoing efforts to cooperate with IBM? A: We've been bringing important issues we discover to their attention. Q: Characterize IBM interaction with OpenAFS? A: More useful than it has been in the past. Not taking patches directly, but if they want something from OpenAFS, will contact contributor directly and get them to sign off on allowing IBM to use the code. Q: Design issues surrounding GSSAPI work? A: Need IBM's input/consent to do useful work in the area. IBM was resistant to having a common number space for pioctls and com_err error tables. Probably need to come up with something clear and tell them. Q: Is anyone at IBM working on AFS? A: They have a development lab in India. They're doing something; have produced AIX 5, Linux 2.4 ports. Q: Status of cell alias patch? A: Not in 1.2.3, in 1.3.0. Not known to be stable. Love Hornquist-Astrand provides an update on Arla. Goals stated at LISA2000: - Support files larger than the cache - Clean up code in disconnected mode - Keep up with Linux changes Recent Improvements to arla (0.36) - Improved OSF/1, MacOS X, FreeBSD support. - Support for incremental open of files in AFS. Allows you to open files larger than your cache, can read up to size of your cache. - use arla-send-pr to submit bug reports. Arla Hackathon - Held in Stockholm: September, 2001 - Derrick Brashear, Jeff Hutzelman attended. - Cooperation issues between arla, OpenAFS, and IBM discussed. - Common development issues: * GSSAPI/SPNEGO support in Rx * Split pioctl space to avoid collisions * Unused and undocumented Rx features were "found". Future work - Remove unwanted features from Rx - Clean up disconnected operation code. Misc. - Magnus and Love will work halftime on Arla at KTH until 12/31/2001. Thereafter, Magnus will continue to work on Arla at KTH, although less than before. Love will only be working on Arla in his spare time. - KTH still sees AFS meltdowns regularly, running the latest IBM/Transarc release on their servers. Love and Derrick are working on this. Questions Q: Can you use OpenAFS userspce utilities with a Arla cache manager? A: They work if you use them. Arla isn't going to import the code or distribute them. Q: Why is halftime arla work stopping at the end of the year? A: Funding issue. "It's complicated." Q: More info on server meltdowns? A: Server gets too many connections; starts responding busy to all clients. Seen when there are clients that can talk to the fileserver, but which the server has no route to. MR-AFS availability MR-AFS is available from CMU. Licesning is such that you need to sign a form saying you won't use it such a way that it is later sold for profit. There may be some sort of licensing ambiguity, but it doesn't matter: it can't be rolled into OpenAFS with the license in its current form. Current rumor is that one of the original developers is currently considering turning MR-AFS into a product. As things stand now, it should be possible to contact CMU Tech Transfer office and sign license to get MR-AFS, if you want it. Coupling PTS with directories? Possible future development: API to allow the ptserver to answer queries by making calls to another database/directory service? Sites manage their data in a lot of different ways. Consensus seems to be that it would be better to focus on an API that allowed for better synchronization betwee the ptserver and other db/directory services. Tangentally related: create LDAP frontend on top of ptserver? Some interest, but no current work being done. Database servers? Common wisdom is to not have more than 3. andrew.cmu.edu has 5. athena.mit.edu used to have 4, now has 3. Only 1 db server? Seems like a bad idea. Might want to have 2, even if that doesn't help you for quorum. You probably aren't doing a lot of writes to your database. budb is the biggest performance problem. Expiring tapes can result in lots of writes to budb, and if you have a lot of database servers, this can be very slow. If you have > 3 db servers and use the buserver, you might want to consider only running the buserver on a subset of them. Cache Manager issues Coping with overfull partitions? Server partition or cache partition? Server partition. Would like failure mode to be nicer than it is currently. Stop writing to all volumes on that partition, or tell client to stop writing. - Overfull partitions causes volume corruption? (AIX and Linux) Doesn't seem to be a common experience. - Action Item: Derrick will modify the fileserver to optionally return EDQUOT when partition is above a certain percentage full. - Known bug: signed/unsigned problem in fileserver makes it possible to ignore quotas on volume. This has since been fixed in OpenAFS CVS, but hasn't yet been released. - Be proactive: do balancing to avoid full partitions. What are the largest machines you know about running OpenAFS clients? - IP35 (Irix) machine running OpenAFS (NRL). - Sun E10k running IBM AFS, now replaced with Sun Fire 6800s. - Everyone else has puny machines. memcache on openafs Linux client? - OpenAFS linux 2.4 memcache doesn't work at all. IBM 2.4 memcache might not either. - cache on ramdisk does work. (ext2, not tmpfs) - Action item: Make i386_linux24 OpenAFS -memcache on afsd return an error message and fail gracefully. Rudy volunteers to write a patch. - Related: Solaris 8 tmpfs, 4GB in size, for cache? Worked with IBM. Should work with OpenAFS. - experience with 10/15 GB afs caches. Works, but sucks up lots of kernel memory. Slow at boot time. - build cache in background? Probably not. Disconnected operation OpenAFS doesn't have it. Arla does. UMich plans to contribute disconnected AFS to OpenAFS once OpenBSD port is done. Clean client shutdown needed for machines that aren't always on the network. OpenAFS releases < 1.2.1 leaked inodes on Linux, would cause filesytem to be busy at unmount time. Coda? Has disconnected operation (and read-write replication) but is intended for research, not production deployment. vos dump performance Seems to be very slow, and maxes out long before network is saturated or client and server run out of CPU. Suggestion: disable jumbograms; this might help performance. Backups, or "You're so screwed." Ongoing work (by Cornell?) to modify Amanda to be used for backing up AFS. Isn't quite done yet, but almost there. Q: Is this work going to be going through fileserver, or using the namei interface? A: Will be going through fileserver.. Q: What are you using for backup? A: * buserver w/TSM (CERT) * buserver (MIT) * vos dumps to Cray w/DMF (PSC) * Stage (CMU) * Veritas (volume level backups!); some issues, though. * Seagate BackupExec on Windows w/vos dump to local disk. Q: We're switching from IBM to OpenAFS. Is the buserver in OpenAFS tested? A: MIT uses the buserver; will be switching to it in 6 months or so. Q: Switching to Veritas? A: One site using it; the version they're running has a 2GB volume limit, but this is fixed by the most recent version. Being used with IBM AFS, not OpenAFS. Q: Using TSM for backup? We see speed issues. How do you make it faster? A: Isn't a problem for small sites (~200GB). Weekly incremental of 500 GB (size of incremental == 250GB) takes ~7 hours. Kerberos State of krb5 support? No progress. Site running krb5 kdc and kaserver in parallel. Ow, don't do that. Use Heimdal to support krb5, krb4, kaserver all in one kdc. If you're running kaserver and krb5 KDC in parallel, you may need to tun off autochange (nocpw) on some of the kaserver principals. Derrick has more details. krb5 migration? Derrick has a Heimdal doc, located at /afs/andrew.cmu.edu/usr/shadow/ka2heim.txt. Ken Hornstein also has migration documentation for MIT krb5, but it's harder than it needs to be, and hasn't been updated very recently. MIT krb5 krb524d will let you work with MS kdc. krb5 and MS kdc compatability? ms2mit program to convert tickets from MS stuff to MIT krb5 ccache, then can run aklog. Otherwise, run Win2k KDC in parallel with krb5 and do interrealm trust. aklog exists for Win2k. Can be found here: Issues for administering AFS AFS infrastructure tools? * ANL - log in to special server; use special kerberos princial that's in the .klogin for all machines. - use this to apply patches, run tripwire, etc. * Cornell - AFSfree: program for graphically monitoring free space on vice partitions. * package/depot/other similar things. * cfengine? Need to make it more AFS-aware. Author is interested in doing this. * Athena update model. Check file in afs every 10 minutes when no one is logged in, schedule update of workstation software if necessary. Monitoring and proactive AFS administration? CMU has useful AFS monitoring tools. Derrick will tell people about them. (sentinel, balancer, etc). Some of them can be found at ftp://ftp.andrew.cmu.edu/pub/AFS-Tools. MIT has scripts for wrapping the balancer to dynamically generate its config file nightly. (Garry will make this available) CMU ECE has an AFS server monitor for "Big Sister". Work on client monitor in progress. Merging two CellServDB files? Tool (made available soon?) to concatenate multiple CellServDBs and produce a single file. No hard and fast rule for which cellservdb to trust. Do sanity checking with udebug to test validity of servers? IBM came up with the policy of not including OpenAFS cells in their CellServDB. Why? Support relationship with various sites? If you don't have a support contract, they can't/won't talk to you. OpenAFS promised not to steal IBM CellServDB. AFSDB RRs - lots of people don't have them. AFS client script: what changes do people make? - check if networking up - change some parameters to afsd Create a service that does what vos does, but does it only by issues RPCs to the server. Allows priviledge delegation. Discussed at Arla hackathon; notes from the meeting have since been posted to openafs-info@openafs.org. Delegated volume administration. CMU uses ADM. Requires you to do your scripting in Scheme. Extended to also maintain Heimdal KDCs and Cyrus IMAP mailboxes. Hierarchical cells: cells with a parent/child relationship, where a child cell will dynamically pull content from a parent on demand, populating a local copy of the volume. Do people want this? Yes. Work may be done. Installs and shared libraries? use rpaths. Keep pathname constant; can install later versions with new sonames. Some people don't want to be able to keep the path the same. Recovery from db problems? Back up your ubik databases. ptserver has support for recovering damaged database. kadb has a dumper client as well that doesn't seem to work. Performance tuning AFS is *slow*. Can we make it faster? Is Rx really the problem? Old Andrew benchmark. Use it to compare AFS and NFS. Fine grained dcache locks and readahead should help a lot. Ports and hardware issues Running AFS on Win2k terminal servers? No one else is this insane. They're running OpenAFS 1.1.1a; claim it's very nice. *Must* run Win2k SP2 to have anything work. Going to try to get Arla to work as well. OpenAFS 1.2.2a has some problems on XP. Works, but won't survive going to sleep on a laptop. OpenAFS 1.1.1a insecure on Windows NT/2000. Migrate to 1.2.2a. MacOS X. It works; experience is limited. Tree won't build on 10.1.1 due to version not being recognized. Work in progress to correct this. Other people's problems and experiences? OpenAFS needs more publicity. Tutorial at LISA2002? AFS is considered to be expensive and difficult. What can we do to change this perception? Corporations started buying into Linux; same thing needs to happen with OpenAFS. User testimonials on openafs.org web site? Need better Windows support, and support for MS apps. byte-range locking is highly desired.