[Wed Dec 4 16:21:58 2013] Hi Jason [Wed Dec 4 16:22:09 2013] Hi [Wed Dec 4 16:22:47 2013] Should I summarize the problem here? [Wed Dec 4 16:23:24 2013] Sure. I guess it would be "cross-realm tokens were not working from a windows client", roughly, but it's your experience. [Wed Dec 4 16:26:35 2013] We're hoping secureendpoints would have some insight. [Wed Dec 4 16:27:49 2013] I have Network Identity Manager 2.0.102.907, OpenAFS Version 1.7.2700, on Windows 8.1. When I have tokens in both network identity manager and the graphical kerberos client for jgross@CSAIL.MIT.EDU, I can access my user directory and write to it, but I can't access \\AFS\csail.mit.edu\group\plv, even though I'm in the plv group, which has access to that folder. If I run kinit (/cygdrive/d/Program Files/Heimdal/bin/kinit) and aklog [Wed Dec 4 16:27:49 2013] (/cygdrive/d/Program Files (x86)/OpenAFS/Client/Program/aklog) from command line, then it works fine. But I'd prefer not to have to auth in so many places every time I want to do this. [Wed Dec 4 16:30:03 2013] Both your athena and csail pts ids were on the acl for that directory, are you sure you had kerberos credentials for csail and not athena? [Wed Dec 4 16:32:48 2013] I'm pretty sure I had kerberos credentials for both. Let me see if I can recreate the issue. (Because kdestroy isn't enough...) [Wed Dec 4 16:33:11 2013] You'd need to drop tokens, too, of course. [Wed Dec 4 16:34:19 2013] What's the command for that? [Wed Dec 4 16:34:42 2013] unlog? [Wed Dec 4 16:35:01 2013] Yeah, or maybe something in the gui. [Wed Dec 4 16:38:14 2013] Hmm... I don't seem to be able to write anything to my csail directory now... [Wed Dec 4 16:38:28 2013] Were your tests from a unix machine using the csail kerberos principal? [Wed Dec 4 16:38:49 2013] Which tests? [Wed Dec 4 16:40:36 2013] you mentioned cagnode? [Wed Dec 4 16:40:48 2013] Ah, found the guide I was following: http://wiki.openafs.org/WindowsEndUserQuickStartGuide/ tells me to install hemidal kerberos. [Wed Dec 4 16:40:49 2013] (I don't know what that is) [Wed Dec 4 16:41:46 2013] Oh, yes, cagnode is a csail machine. The test I ran to ensure that my csail principal had access to the plv folder was from a unix machine with only a csail kerberos principal. [Wed Dec 4 16:48:14 2013] Ok, I killed afscreds.exe and afsd_service.exe, restarted network identity manager and afscreds.exe, and I can now access \\AFS\csail.mit.edu\u\j\jgross\only-csail (and the file in it), but not \\AFS\csail.mit.edu\group\plv. [Wed Dec 4 16:50:46 2013] And now I can't figure out how to access \\AFS\csail.mit.edu\group\plv at all (even if I kinit from command line) [Wed Dec 4 16:52:33 2013] latest build is 1.7.2800 and I suggest you try with that one. If it still doesn't work, use SysInternal's Process Monitor to isolate which file system request is failing. [Wed Dec 4 16:53:05 2013] klist is telling me I have no cached tickets and gives me nothing else other than my logonid, even though tokens tells me I have tokens, and I've kinited and akloged. Is this wrong? [Wed Dec 4 16:53:09 2013] I'll try updating. [Wed Dec 4 16:57:05 2013] it also sounds like you are running as two different user sessions [Wed Dec 4 16:58:00 2013] Same problem with 1.7.2800. [Wed Dec 4 16:58:04 2013] one is a user account that is a member of the administrators group and is therefore a restricted session and the other is running as the "elevated" (aka unrestricted) administrator [Wed Dec 4 16:58:44 2013] Do you mean on csail, or do you mean locally on windows (or something else)? [Wed Dec 4 16:59:02 2013] locally on windows [Wed Dec 4 16:59:07 2013] The Kerberos API: and MSLSA: cache is not shared across sessions. AFS tokens are shared if the Authentication Group is the same [Wed Dec 4 17:02:43 2013] The user that I login as is an administrator. Is that what you mean? [Wed Dec 4 17:03:06 2013] (And I have UAC on, and click through the dialogs when it gives them to me.) [Wed Dec 4 17:03:35 2013] I think the question is (if I am not misusing terminology) whether you have clicked through any UAC boxes for any of the processes involved. [Wed Dec 4 17:04:47 2013] IIRC, I clicked through the UAC boxes for the installers, but not when I run the programs (they run automatically at startup and I don't get boxes then). [Wed Dec 4 17:05:22 2013] It sort of sounds like the SysInternals Process Monitor is the best way to understand what's happening. [Wed Dec 4 17:07:25 2013] What process should I be looking at? [Wed Dec 4 17:07:33 2013] I don't know. [Wed Dec 4 17:07:55 2013] Neither afscreds.exe nor afsd_service.exe give me any output when windows denys me access to plv. [Wed Dec 4 17:12:22 2013] You want to monitor the process that is making the request of the file system [Wed Dec 4 17:12:35 2013] explorer.exe? [Wed Dec 4 17:12:50 2013] if that is the process you are using, then yet [Wed Dec 4 17:12:51 2013] yes [Wed Dec 4 17:15:10 2013] although really you should filter on the path and not the process because if you have an anti-malware solution on the box that could be blocking your request [Wed Dec 4 17:15:35 2013] since the anti-malware process probably does not have access to your tokens [Wed Dec 4 17:17:51 2013] How do I export a log? (I decided to use cmd.exe, to get less noise.) [Wed Dec 4 17:17:59 2013] Or what am I looking for? [Wed Dec 4 17:23:44 2013] Log at http://people.csail.mit.edu/jgross/tmp/Logfile.CSV [Wed Dec 4 17:31:22 2013] Why is 'ls plv' trying to CreateFile (and failing)? [Wed Dec 4 17:31:36 2013] I'm walkiing out the door. However, since the problem is an inability to read the root directory of the volume you may be hitting a file server bug since the file server is 1.6.1+security fixes. It doesn't permit volume information to be queried by anonymous users. You can use Wireshark to track the traffic with the file server and see whether the file server tang.csail.mit.edu (in this case) is rejecting a request. [Wed Dec 4 17:32:09 2013] CreateFile is use to create a handle to a file or directory (new or existing) [Wed Dec 4 17:33:08 2013] How is /afs/csail.mit.edu/group/plv the root directory of the volume? [Wed Dec 4 17:33:41 2013] fs lsmount claims it is, at least. [Wed Dec 4 17:36:43 2013] I'll be afk for the next 30 minutes or so. I'll be back after that, and trying to solve the problem again. (I'll look at wireshark/tang.csail.mit.edu) [Wed Dec 4 18:51:13 2013] How do I figure out what machine connections in wireshark are using? (I see a bunch of UDP things, but nothing that mentions csail.mit.edu obviously) [Wed Dec 4 19:16:30 2013] Is wireshark supposed to be crashing on me a lot? (Requesting that the runtime terminate it in an unusal way) Anyway, what should I be looking for in the transmissions to/from tang.csail.mit.edu? [Wed Dec 4 19:19:46 2013] Also, how do I get afs to look up the acls from scratch? Should I be doing "flush volume" and then try to navigate? [Wed Dec 4 19:21:39 2013] Well, there's the acl checks on the fileserver and the local checks. flushv will do the latter; you need to aklog -force (i.e., get a token based on a new service ticket) to do the former. [Wed Dec 4 19:25:54 2013] Packet logs at people.csail.mit.edu/jgross/tmp/packets.csv (First I did "flush file/dir" from the context menu on a csail folder I am allowed to access, then accessed it, and then I did the same with the plv folder) [Wed Dec 4 19:32:17 2013] packets after a flush volume and aklog -force csail (only captured after I did that, from my attempted access) at people.csail.mit.edu/jgross/tmp/packets-aklog-force.csv [Wed Dec 4 19:38:36 2013] I'm very confused. If I (on cagnode) fs sa ~/only-csail plv none (and also remove all other acls), then I can't open it on windows. If I fs sa ~/only-csail plv rlidwka, then I can open it on windows (immediately, without needing to aklog -force or anything). [Wed Dec 4 19:39:15 2013] But I still can't open /afs/csail.mit.edu/groups/plv. [Thu Dec 5 10:57:51 2013] is there any good reason why the 'bossserver' and others run as 'root'? Couuldn't they all run as 'afs' or whatever as long as '/vicep*/' content were owned by the same user? [Thu Dec 5 10:58:54 2013] kaserver would have needed access to bind to port 88 (below 1024) [Thu Dec 5 10:59:22 2013] presumably that is not longer an issue now though [Thu Dec 5 10:59:30 2013] cclausen: let's not alk about 'kaserver' indeed :-) [Thu Dec 5 10:59:48 2013] cclausen: and 'kadmind' can be run on any port ... [Thu Dec 5 11:00:00 2013] but that is probably the reason why it runs as root currently [Thu Dec 5 11:00:06 2013] YFS just finished the work to make all of the servers run as non-root. [Thu Dec 5 11:00:17 2013] There are a fair number of changes required. [Thu Dec 5 11:01:04 2013] sxw: I am moderately surprised, At least a fileserver should be able to run as non-root in 1.6.x, but perhaps I am an optimist... [Thu Dec 5 11:05:06 2013] Does anyone monitor dcache/vcache miss ratio? [Thu Dec 5 11:06:05 2013] * only asks because he created a stupid perl script to monitor it using the output of xstat_cm_test [Thu Dec 5 11:08:52 2013] I'm using the numbers from here to set the alert thresholds in Nagios: http://docs.openafs.org/AdminGuide/HDRWQ402.html [Thu Dec 5 11:11:00 2013] (I'm almost postitive that some of the web servers that serve content out of AFS are going to create alerts) [Thu Dec 5 11:12:24 2013] not yet here... [Thu Dec 5 11:12:27 2013] The fileserver is actually one of the hardest bits to make non-root, because the namei backend to the fileserver plays fast and loose with the file metadata to encode vnode data. And it can only do that if its root. [Thu Dec 5 11:13:43 2013] sxw: I was thinking about the totally boring default one for GNU/Linux. I think it does not do queasy things to the underlying filesystem... I HOPE :-) [Thu Dec 5 11:16:28 2013] namei is the default one [Thu Dec 5 11:18:36 2013] I wonder how many people are still using the inode-based fileservers? [Thu Dec 5 11:18:42 2013] well [Thu Dec 5 11:18:47 2013] yeah, we aren't anymore. [Thu Dec 5 11:19:26 2013] I managed to bring everything up to the latest OpenAFS 1.4.x code before the entire cell goes dark at the end of 2013. [Thu Dec 5 11:21:04 2013] non-root fileservers would be nice if you wanted to run them on NFS volumes, I suppose. [Thu Dec 5 11:22:05 2013] they would just be nice full stop, surely :) [Thu Dec 5 11:27:29 2013] Yeah, they are nice. You can suddenly do all sorts of stuff like having "make test" spin up a whole cell as part of the test suite, all just using the developer's account [Thu Dec 5 11:30:21 2013] Yeah, we can do that for MIT krb5's test suite, which is very handy. [Thu Dec 5 11:32:12 2013] nice [Thu Dec 5 11:40:34 2013] oooh [Thu Dec 5 11:48:10 2013] sxw: isn't the 'namei' fileserver the one that under GNU/Linux uses the VFS API? [Thu Dec 5 11:49:27 2013] Yes. [Thu Dec 5 11:49:31 2013] namei is the default for linux fileservers, yes. [Thu Dec 5 11:50:10 2013] But as Simon says, it stores some metadata in the file metadata (flags/permission bits/owners, that sort of thing), some of which are only modifiable by root. [Thu Dec 5 11:53:27 2013] jgross: there are going to be two contributing factors to your inability to access the group.plv volume. [Thu Dec 5 11:53:44 2013] jgross: First, the file server does not include http://git.openafs.org/?p=openafs.git;a=commit;h=dc8952ff29584a8bbc7be66a53f6c4fffd3178f3 [Thu Dec 5 11:54:35 2013] jgross: That is not entirely fatal but it prevents volume information state and statistics from being accessed by anonymous calls. [Thu Dec 5 11:56:13 2013] secureendpoints: I managed to fix the problem by restarting my computer. I'm about to run away for lunch, but will be back later. [Thu Dec 5 11:57:14 2013] jgross: the fatal contributor is that the request to read the directory by your explorer.exe process which has access to your tokens is being intercepted and another process (or driver) on the system is performing the directory read without tokens. That request is the unencrypted fetch status which was in your Wireshark output. [Thu Dec 5 11:58:21 2013] jgross: If you trigger a fetch status using your tokens first, then the volume object will get created in the windows kernel. When the unauthenticated request gets issued first, you lose. [Thu Dec 5 11:59:02 2013] jgross: stupid question. did you restart your computer after installing 1.7.28 ? [Thu Dec 5 11:59:10 2013] kaduk_: but but but... I'll have a look into that, because the "file metadata" accessed via the VFS ought to be extended attributes, I hope... [Thu Dec 5 11:59:44 2013] secureendpoints: Yes, I'm pretty sure I did. [Thu Dec 5 11:59:44 2013] No, the file metadata is owner, group, access permissions, that kind of thing. [Thu Dec 5 11:59:47 2013] 1980s, remember. [Thu Dec 5 12:00:06 2013] (yeah, yeah, namei is more recent than that, but it's much older than extended attributes) [Thu Dec 5 12:00:57 2013] jgross: so restarting your computer didn't work yesterday but restarting it today does. [Thu Dec 5 12:03:03 2013] It's possible that I did something differently after restart (like whether or not I filled out all the startup prompts for my password vs. closing them and filling them in later, or whether or not I ran kinit from command line in addition to filling in the prompts). [Thu Dec 5 12:03:37 2013] anyhow asking again about an interrupted 'vos move'. It broke because of token expiry, which means that most likely the replication completed, but the VLDB updates could not be authorized. suggested using '-localauth', good idea. Now however in part to get my understanding better I wonder... [Thu Dec 5 12:05:08 2013] jgross: as long as your command prompt and NetIDMgr or afscreds processes are in the same logon session it should not matter where you get the tokens. [Thu Dec 5 12:06:45 2013] What i wonder is that I have a VLDB-locked source volume with the right name, and an unattachable, but probably complete, target volume, with no name in VLDB. I guess I could just 'vos unlock' the source, 'vos zap' the unattachable target volume. But I wonder whether I could instead use 'vos salvage' of the target and then 'vos changeloc'. Secondarily, it would be interesting to know that is the relaitonship of the target to the source: copy of re [Thu Dec 5 12:06:58 2013] "copy *or* replica?" [Thu Dec 5 12:32:28 2013] Walex2: If you are not sure what you are doing, I would recommend just cleaning things up to the state before the move failed, and doing the move again. [Thu Dec 5 12:33:05 2013] Whilst it may be possible to take the failed state, and continue to completion to that point, it would have to be done with a reasonable amount of knowledge and care, [Thu Dec 5 12:40:07 2013] I suspected as much, I was just wondering about what is the use case for 'vos changeloc'... [Thu Dec 5 12:41:01 2013] vos is an odd collection of general user tools, and other things that perform much more specialised tasks. Sadly the difference isn't particular well marked (see the frequent discussions about vos clone and vos shadow) [Thu Dec 5 12:43:21 2013] these days the only valid use case for vos changeloc is with -remove [Thu Dec 5 12:57:02 2013] geekosaur: the 'man' page specifically mentions 'vos move' as in: [Thu Dec 5 12:57:04 2013] "In essence, vos changeloc performs the same operations on the VLDB as vos move, but it does NOT move the data from one server's file system to another. [Thu Dec 5 12:57:35 2013] I'm sure it does. go ahead and mention that on openafs-info and watch the smackdowns descend... [Thu Dec 5 12:57:38 2013] But yes, I am aware that some 'vos' commands etc. are high level and some are low level, that's why I am curious [Thu Dec 5 12:57:51 2013] the official word on vos changelog is don't. [Thu Dec 5 12:57:57 2013] geekosaur: well, that's why I am asking on IRC :-) [Thu Dec 5 12:58:10 2013] er, vos changeloc [Thu Dec 5 12:58:43 2013] there was a most interesting 'vos shadow' discussion recently :-) [Thu Dec 5 13:00:27 2013] geekosaur: I think you are confusing changeloc and changeaddr. changeaddr has a -remove option and is used for file server addresses. http://docs.openafs.org/Reference/1/vos_changeaddr.html changeloc is used to change the vldb entry for the volume. http://docs.openafs.org/Reference/1/vos_changeloc.html [Thu Dec 5 13:01:06 2013] BTW, completely different topics: currently we keep the RW "top level" volumes on one fileserver, and replicas on the DB servers. Is there any small or big reason to prefer things one way or another? [Thu Dec 5 13:01:10 2013] hrm, thought both got bitched about. loudly [Thu Dec 5 13:01:37 2013] however, I will also recommend that unless you really know what you are doing and know exactly where the "vos move" failed, that you start over. [Thu Dec 5 13:01:55 2013] geekosaur: let's say when I read that "same as "vos move" without the copy" it looked moderately edgy to me... [Thu Dec 5 13:01:56 2013] Maybe we should update the documentation to reflect the general consensus.... [Thu Dec 5 13:03:06 2013] there have been many passes on the documentation. there is a need for many more passes. [Thu Dec 5 13:03:10 2013] I'll start over with 'vos unlock[vldb]', 'vos zap' and 'vos sync(serv|vldb)' for good measure. [Thu Dec 5 13:03:23 2013] the man pages are in better shape then the admin and quick start guides. [Thu Dec 5 13:03:45 2013] does that sound right? I had a look at a few relevant mailing list entries and that seems the standard. [Thu Dec 5 13:04:22 2013] the destination volume isn't in the vldb so you can simply vos zap it [Thu Dec 5 13:04:36 2013] then unlock the source volume and restart with tokens that will not expire [Thu Dec 5 13:05:09 2013] secureendpoints: yes, that was the intent. As to the tokens suggested '-localauth' rather than 'krenew -t -- ....' and seekms a pretty good idea to me. [Thu Dec 5 13:05:21 2013] doesn't it :) [Thu Dec 5 13:05:50 2013] :-) Amazingly it never occurred to me; almost always done admin from a suitable client. [Thu Dec 5 13:06:10 2013] or use http://www.eyrie.org/~eagle/software/kstart/ [Thu Dec 5 13:06:33 2013] and before this current site I had small workgroup servers with small volumes. [Thu Dec 5 13:08:29 2013] we use 'k5start' a fair bit in various scripts. Hadn't thought of that either, but then I running the commands interactively under 'screen' [Thu Dec 5 13:09:56 2013] so any opinions on where top level RW volumes should be? DB/file server or doesn't matter? [Thu Dec 5 13:13:15 2013] to be honest, I don't understand the question. why do you think there is a difference between a DB/file server and a file server without a DB? [Thu Dec 5 13:13:52 2013] I think I have heard lore recommending to, say, put an RO copy of root.afs and root.cell on each dbserver. [Thu Dec 5 13:14:34 2013] secureendpoints: I think it does not matter as long as there is a RO replica available somewhere. But there are sometimes corner cases or obscure second order reasons to do things in a specific way. [Thu Dec 5 13:14:45 2013] (presumably so that if even one dbserver was up, then all volumes in the cell whose servers are up could be navigated to, or thereabouts). But that doesn't affect the placement of RW copies. [Thu Dec 5 13:15:29 2013] what does a dbserver have to do with the fileserver ? [Thu Dec 5 13:15:43 2013] kaduk_: yes, what we do is put replicas on each DB+fileserver, and the filerserver part of the DB hosts only has the top level volumes. [Thu Dec 5 13:15:47 2013] secureendpoints: was that to me or Walex2? [Thu Dec 5 13:15:53 2013] both [Thu Dec 5 13:16:03 2013] Well, my answer is "not much". [Thu Dec 5 13:16:33 2013] if you have no dbservers up the clients are not going to find volumes anywhere [Thu Dec 5 13:16:41 2013] secureendpoints: the question is one of opportunity; I understand that it is not "wrong" either way, but whether it is slightly more resilient or flexible. [Thu Dec 5 13:17:26 2013] if you only have one site and that site goes down, then your clients will fail [Thu Dec 5 13:17:46 2013] if you have two sites, then you can lose one [Thu Dec 5 13:17:53 2013] if you have three sites, you can lose two [Thu Dec 5 13:18:50 2013] secureendpoints: for example, a second order question is whether to have dedicated DB servers *at all*, and instead put a DB server on every fileserver... (for N not too large) [Thu Dec 5 13:20:46 2013] secureendpoints: I get that -- but if you may remember I also keep a page of openAFS "gotchas", where I list things that are seemingly fine but in practice don't work that well. [Thu Dec 5 13:21:30 2013] Walex2: and I am in the business of building a better file system that addresses the gotchas [Thu Dec 5 13:23:04 2013] examples: even number of DB servers, multihoming, having DES keys in both keytab and KeyFile with rxkad, too large UDP buffers, ... [Thu Dec 5 13:23:30 2013] * notices that I haven't added them to the page yet, and that I should copy the contents of the page over to openafs.org [Thu Dec 5 13:23:45 2013] you should maintain the page at wiki.openafs.org [Thu Dec 5 13:23:58 2013] or contribute documentation updates to the administrator guide. [Thu Dec 5 13:24:07 2013] secureendpoints: yes, yes, and I am behind even maintaining the page on my own server. [Thu Dec 5 13:26:10 2013] and I would love to contribute doc updates too. it is somewhat wasted that at this site we keed also our own internal wiki with admin hints, and we are very keen on free sw, and we should merge it yet for every else's benefit with shared resources. But wait, getting there. "Something better than nothing" is a good principle. [Thu Dec 5 13:27:00 2013] so start by adding a link to your page from wiki.openafs.org [Thu Dec 5 13:27:32 2013] Totally different question again: does '-localauth' just mean "look at KeyFile"? Can I just put KeyFile on a well secured client and use '-localauth'? [Thu Dec 5 13:27:45 2013] yes [Thu Dec 5 13:28:02 2013] KeyFile or Keytab [Thu Dec 5 13:28:09 2013] Ah! useful detail to add to the page/wiki... [Thu Dec 5 13:28:13 2013] hopefully you are using keytabs [Thu Dec 5 13:29:04 2013] secureendpoints: not yet :-(. This is all part of an upgrade from Debian 4/Etch+1.4.7 server to Debian 7/Wheezy+1.6.x ones. [Thu Dec 5 13:30:57 2013] fortunately the cell is internal and well firewalled currently, but we still worry. We have 3 DB servers + 2 medium fileservers in this cell, and the idea is to add 1-3 new DB servers, and 2 fileservers, copy everything over and bye-bye old ones. [Thu Dec 5 13:32:18 2013] A very big deal for us has been 's report that 1.4 and 1.6 DB servers are compatible with each other, so we don't have to do it the way he did it with backporting 1.4 to 7/Wheezy. [Thu Dec 5 16:41:48 2013] I was just reading 'man vos_clone' and I jumped when I read "temporary clone that is created when executing a vos move, vos dump, or other vos commands [Thu Dec 5 16:42:12 2013] I was just reading 'man vos_clone' and I jumped when I read "temporary clone that is created when executing a vos move, vos dump, or other vos commands" and then "the maximum size of a clone volume is 2 terabytes". [Thu Dec 5 16:42:38 2013] I thought that was indeed the supported limit for a volume? [Thu Dec 5 16:42:53 2013] The jump was because in recent OpenAFS versions volumes can be larger than 2TB; but if clones cannot be larger than 2TB then effectively one cannot move or dump such a volume. [Thu Dec 5 16:43:11 2013] and I thought that was indeed the case [Thu Dec 5 16:43:26 2013] cclausen: IIRC the limit in recent OpenAFS releases is larger than 2TB, the _quota_ though is limited to 2TB. [Thu Dec 5 16:43:52 2013] are you talking documented limit? Or what people get away with doing? [Thu Dec 5 16:44:21 2013] I/we have no intention to create volumes larger than 2TB, but I guess that > 2Tb for a volume but not a clone volume should go, if it exists, on some doc. [Thu Dec 5 16:44:33 2013] cclausen: I am talking about documented limit IIRC. Let me check. [Thu Dec 5 16:44:36 2013] the doc should be fixed [Thu Dec 5 16:44:56 2013] feel free to submit a patch to gerrit.openafs.org [Thu Dec 5 16:45:11 2013] a clone volume is no different than any other volume [Thu Dec 5 16:46:39 2013] thanks! [Thu Dec 5 16:49:07 2013] ah the same 2TB limit appears in 'man vos_create' too, so it just needs updating. [Thu Dec 5 17:00:04 2013] Please submit the patch :) [Thu Dec 5 17:03:15 2013] http://wiki.openafs.org/GitDevelopers/ [Thu Dec 5 17:16:20 2013] yes, yes, i'll do and submit the patch, but tomorrow or during the weekend, going to bed now, but before sleeping... [Thu Dec 5 17:16:31 2013] No rush, thanks. [Thu Dec 5 17:17:20 2013] I want to ask question, a terrible :-) question. What happens to 'fsync' and OpenAFS? If I 'fsync' on the client, does that end up being an 'fsync' on the file server? [Thu Dec 5 17:17:37 2013] I want to ask question, a terrible :-) question. What happens to 'fsync' and OpenAFS? If I 'fsync' on the client, does that end up being an 'fsync' on the file server *very soon thereafter*? [Thu Dec 5 17:18:14 2013] * wonders how often he should run the GNU/Linux flusher on the client or fileserver? [Thu Dec 5 17:20:18 2013] on unix, the afs client is write on close. nothing goes to the file server until the fd is closed or the cache runs out of space or a lock is obtained and then released or a flush is performed on the file [Thu Dec 5 17:20:40 2013] on the file server, there are configurable sync options for 1.6.5 [Thu Dec 5 17:21:06 2013] I suggest reading the man pages that ship with the software version you have installed as they have changed over time [Thu Dec 5 17:21:12 2013] see the fileserver page [Thu Dec 5 17:22:18 2013] secureendpoints: but does 'fsync' on client perform a "flush"? Or is that 'fs flush'? [Thu Dec 5 17:22:41 2013] secureendpoints: also, does 'fsync' on the client result in a disk write on the cache? [Thu Dec 5 17:23:58 2013] because, if lock/unlock sends changes to the fileserver, 'fsync' on rhe client is at least _possible_. [Thu Dec 5 17:24:33 2013] I don't know [Thu Dec 5 17:25:06 2013] uhm, i'll do some testing. It is an interesting question for some of our applications that *might* end up on AFS. [Thu Dec 5 17:25:08 2013] flushing data on a lock release is a cache coherency requirement. [Thu Dec 5 17:25:19 2013] flushing data on fsync is not [Thu Dec 5 17:25:53 2013] the flush I'm referring to is fflush() on the stream [Thu Dec 5 17:26:09 2013] that might need changing eventually -- it means that for example putting certain types of mailboxes on AFS is a bit risky. [Thu Dec 5 17:26:22 2013] mail in afs is risky anyway [Thu Dec 5 17:26:29 2013] it *hammers* servers [Thu Dec 5 17:26:55 2013] ahhhh but 'fflush' is a much higher level operation than 'fsync', and IIRC as a rule 'fflush' does not do 'fsync'. [Thu Dec 5 17:27:02 2013] mail is a problem because of the openafs implementation. if you want to put mail in an afs namespace I strongly recommend that you look at "yfs". [Thu Dec 5 17:27:04 2013] geekosaur: direct delivery into AFS, or something like nmh? [Thu Dec 5 17:27:33 2013] geekosaur: small file writes are bad for any network (and distributed even worse) file-system. [Thu Dec 5 17:27:55 2013] callback storms due to multiple changes to the same vnode from a large number of clients or when there are a large number of listeners [Thu Dec 5 17:28:02 2013] ^ [Thu Dec 5 17:28:35 2013] kaduk_: I personally dislike MH/Maildir style mailbox, also because they work really badly over high-latency systems. [Thu Dec 5 17:28:42 2013] typically any number of clients greater than 10 for 1.4.x servers and greater than 1024 for 1.6.x servers [Thu Dec 5 17:29:35 2013] secureendpoints: I think those numbers for 1.6.x are optimistic. Can't imagine the latency with a few hundred clients refreshing. [Thu Dec 5 17:30:51 2013] an openafs file server will probably fall over much closer to 256 clients since that is maximum number of RPCs that can be processed in parallel. [Thu Dec 5 17:34:59 2013] The yfs fileserver's limit is 16,000 RPCs and it is capable of processing better than 8gbits/second of incoming data packets on the rx listener thread. The is minimal rx thread contention and no global locks for client or callback mgmt and callbacks. Plus RPCs can be processed in parallel to callback breaks. All of which increases scalability and reduces latency. [Thu Dec 5 17:42:43 2013] unfortunately here we are a committed Debian/OpenAFS site. Fortunately the cell is not that huge for now. [Thu Dec 5 17:44:53 2013] yfs supports debian packaging [Thu Dec 5 17:45:32 2013] Debian, RHEL, and Fedora are our supported Linux platforms at the moment [Thu Dec 5 17:45:49 2013] secureendpoints: but we just do everything by the Debian book :-) [Thu Dec 5 17:45:57 2013] you get what you pay for [Thu Dec 5 17:46:12 2013] if I was at a more commercial organisation... [Thu Dec 5 17:47:48 2013] secureendpoints: I understand the advantages of YFS, but we have passed over other stuff, irrespective of price, because it was not Debianly correct :-), and from a "strategic" point of view there is some rationale for that. [Thu Dec 5 17:50:38 2013] BTW interesting page of AFS tipes from NC Compsci: http://www.cs.unc.edu/cms/help/help-articles/afs-tips and it says that 'fsync' causes a network write. Also Just noticed that the fileserver does quite a bit of 'fsync' by default (e.g. 1.4.2 patch to *disable* 'fsync' in the fileserver). [Thu Dec 5 17:52:53 2013] 1.4.2 is _old_ [Thu Dec 5 17:52:57 2013] that is for 'fsync' on metadata. I still wonder about data 'fsync'. [Thu Dec 5 17:53:49 2013] sxw: I know, just doing random searchers, and the patch obviously did not make it in mainline. In my Debian 7/Wheezy 1.6.1* the fileserver seems to have no 'fsync' or similar options yet. [Thu Dec 5 17:58:13 2013] sxw: found another patch reviewed by your tht seems to have made it in some recent release that implies that the *client* does 'fsync' fairly sensibly... http://scripts.mit.edu/trac/browser/trunk/server/common/patches/openafs-linux-3.1-fsync.patch?rev=2066 [Thu Dec 5 18:02:07 2013] All that patch tells you is that we implement the fsync VFS operation. It doesn't tell you what we do with it… [Thu Dec 5 18:02:22 2013] sxw: ahhhh. [Thu Dec 5 18:05:46 2013] going to sleep now, will continue to investigate, but cautiously optimistic... [Thu Dec 5 18:06:05 2013] So, if the client fsync()s, all dirty pages will be written to the fileserver. [Thu Dec 5 18:06:14 2013] What the fileserver does with them then is configurable. [Thu Dec 5 18:07:14 2013] ahhh interesting. "coonfigurable" in some recent version. Plus I seem to understand that the fileserver and volserver protect their *metadata* with fsync regardless. [Thu Dec 5 18:08:00 2013] That's also a little complicated. But, in general, yes. [Thu Dec 5 18:08:04 2013] I seem to remember asking you to look at the fileserver man page for 1.6.5 (or the versions that you are running) because they have been changed [Thu Dec 5 18:09:54 2013] secureendpoints: yes, that's what I meant by "configurable in some recent version" as a summary. I also wrote above that I looked at my 1.6.1 man page (and 'dafileserver -h') and there is nothing I recognize as 'fsync' related. So not there yet. [Thu Dec 5 18:10:07 2013] nope [Thu Dec 5 18:10:26 2013] so I think between the various responses I got I have a much better idea. [Thu Dec 5 18:11:11 2013] if we were really desperate we can always mount the filetree under '/vicep*' as '-o sync' or very short flusher timeouts. [Thu Dec 5 18:11:39 2013] but for now we just do media files, mirrors, and mostly other stuff that does not need 'fsync' guarantees. [Thu Dec 5 18:11:59 2013] Bear in mind that your media probably won't respect fsync [Thu Dec 5 18:13:10 2013] sxw: I know the infinite njmber of peculiar ways in which sw/fw/vendors/... betray 'fsync'. for decades... :-) [Thu Dec 5 18:13:43 2013] * is known as the 'fsync' obsessed man in the group :-) [Thu Dec 5 18:13:57 2013] anyway the most recent 1.6.x patch related to fileserver fsync is http://git.openafs.org/?p=openafs.git;a=commit;h=be2cc944febb262ae0168bf5fe9411b6ef611469 [Thu Dec 5 18:14:11 2013] the behavior is already different on 'master'. [Thu Dec 5 18:14:40 2013] sleep well [Thu Dec 5 18:15:11 2013] ahhh thanks again :-). I have not being going to sleep only because I am obsessed with 'fsync' :-) [Thu Dec 5 18:16:13 2013] but now that I know it is mostly there I'll sleep soundly, and dream of many fast volume moves tomorrow and getting rid of 1.4.7 servers next week if we are lucky, or else as a christmas present :-) [Thu Dec 5 18:16:36 2013] bye! [Thu Dec 5 23:35:58 2013] what exactly needs to change in the man pages? [Thu Dec 5 23:36:15 2013] it was a little hard to follow from the discussion above [Thu Dec 5 23:37:50 2013] the manpage change is just removing the 2TB volume size limit [Thu Dec 5 23:40:39 2013] I'm having trouble finding that in vos_create [Thu Dec 5 23:41:09 2013] even on openafs-stable-1_6_x [Thu Dec 5 23:42:35 2013] ah, it's in pod1/fragments/volsize-caution.pod [Thu Dec 5 23:43:13 2013] it just says Currently, the maximum quota for a volume is 2 terabytes (2^41 bytes). Note that this only affects the volume's quota; a volume may grow much larger if the volume quota is disabled. However, volumes over 2 terabytes in size may be impractical to move, and may have their size incorrectly reported by some tools, such as L. [Thu Dec 5 23:43:39 2013] I still can't find references to a hard 2TB limit in the man pages [Thu Dec 5 23:46:52 2013] I don't see it either. but it did come out later that they were running 1.4 [Thu Dec 5 23:52:04 2013] ah, I'll check that [Thu Dec 5 23:53:21 2013] yep, for 1.4.x it says the max size is 2TB in fileserver(8). I think that was correct for 1.4 though? [Thu Dec 5 23:54:16 2013] (also present in vos_create(1) and vos friends on 1.4) [Fri Dec 6 07:43:24 2013] sorry, it is the man pages in 1.6.1 on Debian 7/Wheezy [Fri Dec 6 09:54:29 2013] Walex2: which man page? [Fri Dec 6 11:07:58 2013] hi there [Fri Dec 6 11:12:17 2013] hi [Fri Dec 6 11:20:38 2013] Im just setting my first cell with openafs under rhel6 [Fri Dec 6 11:21:31 2013] excellent! [Fri Dec 6 11:46:26 2013] congrats :) [Fri Dec 6 11:48:22 2013] Im a little bit in trouble with the kerberos afs principal number versions [Fri Dec 6 11:48:39 2013] I mean the kvno [Fri Dec 6 11:58:55 2013] I just started the server and client, I did kinit, aklog, tokens, and everything fine [Fri Dec 6 11:59:04 2013] then when I try to check the status I got this: [Fri Dec 6 11:59:12 2013] [root@openafs-srv1 ~]# bos status openafs-srv1 [Fri Dec 6 11:59:14 2013] bos: failed to contact host's bosserver (ticket contained unknown key version number). [Fri Dec 6 11:59:56 2013] I was searching on the mailist archive, they mention that it is a mismatch in kvno at the KDC and the kvno in the keytab file [Fri Dec 6 12:00:08 2013] Yup. [Fri Dec 6 12:00:48 2013] This is from KDC: [Fri Dec 6 12:00:50 2013] [root@openafs-krbsrv ~]# kvno afs/qindel.com [Fri Dec 6 12:00:52 2013] afs/qindel.com@INT.QINDEL.COM: kvno = 2 [Fri Dec 6 12:01:19 2013] and this from the FS: [Fri Dec 6 12:01:21 2013] [root@openafs-srv1 ~]# kvno -k /etc/afs.keytab afs/qindel.com [Fri Dec 6 12:01:23 2013] afs/qindel.com@INT.QINDEL.COM: kvno = 2, keytab entry valid [Fri Dec 6 12:01:53 2013] kvno(1) will use a cached service ticket if one is present. [Fri Dec 6 12:02:00 2013] Do you still see kvno = 2 after a fresh kinit? [Fri Dec 6 12:02:36 2013] If you have admin rights on the KDC, kadmin's 'getprinc afs/qindel.com' will be most authoritative for what kvno is on the KDC. [Fri Dec 6 12:03:14 2013] kaduk_: I did kdestroy in the kdc [Fri Dec 6 12:03:54 2013] jmedina: I'm not sure what "kdestroy in the kdc" means. [Fri Dec 6 12:04:45 2013] I mean, that I ran kdestroy to delete the tickets in the cache, and then kinit again [Fri Dec 6 12:05:11 2013] Okay. The "in the kdc" is probably not relevant, then, if I understand correctly. [Fri Dec 6 12:05:23 2013] ok [Fri Dec 6 12:05:54 2013] Can you 'kinit -k -t /path/to/rxkad.keytab afs/qindel.com@INT.QINDEL.COM'? [Fri Dec 6 12:05:55 2013] I changed the password for the "admin" principal, can this affect? [Fri Dec 6 12:06:05 2013] No. [Fri Dec 6 12:07:19 2013] kaduk_: # kinit -k -t /etc/afs.keytab afs/qindel.com@INT.QINDEL.COM, it worked [Fri Dec 6 12:07:33 2013] I got a valid ticket [Fri Dec 6 12:07:59 2013] This is on rhel, right? What enctypes does 'klist -kt /etc/afs.keytab' list? [Fri Dec 6 12:08:16 2013] Er, klist -ket, sorry. [Fri Dec 6 12:08:21 2013] kaduk_: yes, rhel 6.4 [Fri Dec 6 12:08:39 2013] [root@openafs-srv1 ~]# klist -kte /etc/afs.keytab [Fri Dec 6 12:08:41 2013] Keytab name: FILE:/etc/afs.keytab [Fri Dec 6 12:08:42 2013] KVNO Timestamp Principal [Fri Dec 6 12:08:44 2013] ---- ----------------- -------------------------------------------------------- [Fri Dec 6 12:08:45 2013] 2 12/05/13 11:27:50 afs/qindel.com@INT.QINDEL.COM (aes256-cts-hmac-sha1-96) [Fri Dec 6 12:08:47 2013] 2 12/05/13 11:27:50 afs/qindel.com@INT.QINDEL.COM (aes128-cts-hmac-sha1-96) [Fri Dec 6 12:08:49 2013] 2 12/05/13 11:27:50 afs/qindel.com@INT.QINDEL.COM (des3-cbc-sha1) [Fri Dec 6 12:08:50 2013] 2 12/05/13 11:27:50 afs/qindel.com@INT.QINDEL.COM (arcfour-hmac) [Fri Dec 6 12:08:52 2013] 2 12/05/13 11:27:50 afs/qindel.com@INT.QINDEL.COM (des-hmac-sha1) [Fri Dec 6 12:08:54 2013] 2 12/05/13 11:27:50 afs/qindel.com@INT.QINDEL.COM (des-cbc-md5) [Fri Dec 6 12:09:20 2013] Having the des-hmac-sha1 and des-cbc-md5 keys in the keytab may be causing problems. [Fri Dec 6 12:10:00 2013] I would recommend rekeying that principal to only have (say) the aes enctypes. [Fri Dec 6 12:10:20 2013] kaduk_: I used the defaults, I didnt use the enctype in the examples [Fri Dec 6 12:10:37 2013] kaduk_: should I delete the principal? [Fri Dec 6 12:10:49 2013] and also to follow up with the KDC administrator and remove the des enctypes from the supported_enctypes stanza in the KDC's configuration. http://mailman.mit.edu/pipermail/kerberos-announce/2013q3/000149.html [Fri Dec 6 12:10:57 2013] No need to delete the principal. [Fri Dec 6 12:11:22 2013] Assuming that the KDC is running MIT krb5 (not heimdal or AD), that would be: [Fri Dec 6 12:12:06 2013] -e des-cbc-crc:v4 this one is the recommend in the quick start guide [Fri Dec 6 12:12:06 2013] kadmin -p afs/qindel.com -r INT.QINDEL.COM -k -t /etc/afs.keytab -q 'ktadd -e aes256-cts-hmac-sha1-96:normal,aes128-cts-hmac-sha1-96:normal -k /etc/afs.keytab afs/qindel.com' [Fri Dec 6 12:12:32 2013] The quick start guide has not yet been updated for openafs SA-2013-003 [Fri Dec 6 12:13:12 2013] kaduk_: and yes it is MIT [Fri Dec 6 12:13:33 2013] Since you're just setting up the realm now, you don't need to wait before deleting the old keys from the keytab: "k5srvutil delold -f /etc/afs.keytab" [Fri Dec 6 12:13:41 2013] (after the kadmin call) [Fri Dec 6 12:14:57 2013] ok the first one: [Fri Dec 6 12:15:07 2013] [root@openafs-srv1 etc]# kadmin -p afs/qindel.com -r INT.QINDEL.COM -k -t /etc/afs.keytab -q 'ktadd -e aes256-cts-hmac-sha1-96:normal,aes128-cts-hmac-sha1-96:normal -k /etc/afs.keytab afs/qindel.com' [Fri Dec 6 12:15:09 2013] Authenticating as principal afs/qindel.com with keytab /etc/afs.keytab. [Fri Dec 6 12:15:10 2013] Entry for principal afs/qindel.com with kvno 3, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/etc/afs.keytab. [Fri Dec 6 12:15:12 2013] Entry for principal afs/qindel.com with kvno 3, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:/etc/afs.keytab. [Fri Dec 6 12:15:24 2013] Looks good so far. [Fri Dec 6 12:15:41 2013] and: [Fri Dec 6 12:15:44 2013] [root@openafs-srv1 etc]# k5srvutil delold -f /etc/afs.keytab [Fri Dec 6 12:15:46 2013] Authenticating as principal afs/qindel.com@INT.QINDEL.COM with keytab /etc/afs.keytab. [Fri Dec 6 12:15:47 2013] Entry for principal afs/qindel.com@INT.QINDEL.COM with kvno 2 removed from keytab WRFILE:/etc/afs.keytab. [Fri Dec 6 12:15:49 2013] Entry for principal afs/qindel.com@INT.QINDEL.COM with kvno 2 removed from keytab WRFILE:/etc/afs.keytab. [Fri Dec 6 12:15:50 2013] Entry for principal afs/qindel.com@INT.QINDEL.COM with kvno 2 removed from keytab WRFILE:/etc/afs.keytab. [Fri Dec 6 12:15:52 2013] Entry for principal afs/qindel.com@INT.QINDEL.COM with kvno 2 removed from keytab WRFILE:/etc/afs.keytab. [Fri Dec 6 12:15:54 2013] Entry for principal afs/qindel.com@INT.QINDEL.COM with kvno 2 removed from keytab WRFILE:/etc/afs.keytab. [Fri Dec 6 12:15:55 2013] Entry for principal afs/qindel.com@INT.QINDEL.COM with kvno 2 removed from keytab WRFILE:/etc/afs.keytab. [Fri Dec 6 12:16:05 2013] Also as expected. [Fri Dec 6 12:16:35 2013] I think now you should touch the (server) CellServDB to get it to notice the new keys, and re-run kinit, then try again. [Fri Dec 6 12:19:40 2013] lets see [Fri Dec 6 12:23:22 2013] the same, I kdestroy, kinit admin, aklog, tokens and then: [Fri Dec 6 12:23:31 2013] [root@openafs-srv1 ~]# bos status openafs-srv1 [Fri Dec 6 12:23:32 2013] bos: failed to contact host's bosserver (ticket contained unknown key version number). [Fri Dec 6 12:24:03 2013] The kvno output says 3 now? [Fri Dec 6 12:24:34 2013] yes [Fri Dec 6 12:24:45 2013] should I run asetkey add 3 keytab princ? [Fri Dec 6 12:25:14 2013] I would try stopping and restarting the bosserver, then. [Fri Dec 6 12:25:24 2013] I don't think asetkey is needed. [Fri Dec 6 12:25:29 2013] I already did [Fri Dec 6 12:25:30 2013] Well, what version of openafs are you using. [Fri Dec 6 12:25:52 2013] openafs-server-1.6.5.1-1.el6.x86_64 [Fri Dec 6 12:26:14 2013] Yeah, you don't need asetkey with that version. [Fri Dec 6 12:27:00 2013] ok [Fri Dec 6 12:27:14 2013] What version of krb5 do you have in rhel6.4? [Fri Dec 6 12:28:17 2013] krb5-server-1.10.3-10.el6_4.6.x86_64 [Fri Dec 6 12:29:28 2013] Great, so you've got KRB5_TRACE. [Fri Dec 6 12:29:48 2013] o_O [Fri Dec 6 12:29:57 2013] Stop the bosserver again and run it as 'KRB5_TRACE=/dev/stdout bosserver -nofork'; that should print some logging to the terminal which might be helpful. [Fri Dec 6 12:35:47 2013] did it [Fri Dec 6 12:36:11 2013] (It should only print interesting things once you try running the bos client) [Fri Dec 6 12:36:52 2013] I ran bos status openafs-srv1 and nothing: [Fri Dec 6 12:36:54 2013] [root@openafs-srv1 ~]# KRB5_TRACE=/dev/stdout bosserver -nofork [Fri Dec 6 12:37:04 2013] Hmm. [Fri Dec 6 12:38:47 2013] nothing in /usr/afs/logs [Fri Dec 6 12:40:18 2013] just as a sanity check, what does "bos status openafs-srv1 -noauth" show? [Fri Dec 6 12:42:04 2013] # bos status openafs-srv1 -noauth [Fri Dec 6 12:42:06 2013] bos: running unauthenticated [Fri Dec 6 12:42:44 2013] "bos status openafs-srv1 -localauth" might be interesting, too. [Fri Dec 6 12:42:49 2013] wayt: [Fri Dec 6 12:42:54 2013] [root@openafs-srv1 ~]# bos status openafs-srv1 -noauth [Fri Dec 6 12:42:55 2013] bos: running unauthenticated [Fri Dec 6 12:42:57 2013] Instance buserver, currently running normally. [Fri Dec 6 12:42:59 2013] Instance ptserver, currently running normally. [Fri Dec 6 12:43:00 2013] Instance lvserver, temporarily disabled, stopped for too many errors, currently shutdown. [Fri Dec 6 12:43:02 2013] Instance fs, currently running normally. [Fri Dec 6 12:43:04 2013] Auxiliary status is: file server running. [Fri Dec 6 12:43:05 2013] Instance vlserver, currently running normally. [Fri Dec 6 12:43:07 2013] Instance upserver, currently running normally. [Fri Dec 6 12:43:25 2013] pastebin.com :) [Fri Dec 6 12:43:31 2013] ok [Fri Dec 6 12:43:32 2013] upserver, oh boy. [Fri Dec 6 12:43:34 2013] sorry [Fri Dec 6 12:44:03 2013] o_O upserver [Fri Dec 6 12:44:32 2013] http://pastebin.com/AUVkTyV0 [Fri Dec 6 12:47:57 2013] what is wrong with upserver? [Fri Dec 6 13:02:14 2013] ? [Fri Dec 6 13:17:36 2013] kaduk_: any suggestion? [Fri Dec 6 13:18:20 2013] jmedina: I suppose you could try the KRB5_TRACE=/dev/stdout with the bos command, but I don't really expect that to produce any useful information. [Fri Dec 6 13:28:04 2013] nothin :(: [Fri Dec 6 13:28:07 2013] [root@openafs-srv1 ~]# B5_TRACE=/dev/stdout bos status openafs-srv1 [Fri Dec 6 13:28:09 2013] bos: failed to contact host's bosserver (ticket contained unknown key version number). [Fri Dec 6 13:28:16 2013] I recreated the afs principal [Fri Dec 6 13:28:29 2013] Maybe use env to try and force KRB5_TRACE into the environment? [Fri Dec 6 13:33:42 2013] kaduk_: should I install the -debuginfo package? [Fri Dec 6 13:35:51 2013] What would you do with it? [Fri Dec 6 13:35:54 2013] kaduk_: I think I'm going to start over :S [Fri Dec 6 13:36:09 2013] I suppose there's no harm in that. [Fri Dec 6 13:38:17 2013] kaduk_: thanks for your help, Im going to continue later at night [Fri Dec 6 13:40:18 2013] Okay. Sorry we couldn't work things out right away. [Sun Dec 8 15:00:55 2013] Blah! Attempting to host a Fedora mirror backed by AFS has slammed me into the 64Ki slot entry limits for directories -- their drpms dirs are not split by initial letter. Suggestions? ("Don't do that?") [Sun Dec 8 15:01:29 2013] pretty much, if they like huge directories [Sun Dec 8 15:04:19 2013] * filed a bug in bugzilla, expects to be told to go home. [Sun Dec 8 15:18:11 2013] nwf: same here. Just don't mirror the DRPMS. [Sun Dec 8 15:34:55 2013] Walex: --exclude=drpms is enough? [Sun Dec 8 15:36:58 2013] nwf: I think so IIRC. [Sun Dec 8 15:37:19 2013] Will try that, then. :) [Sun Dec 8 15:37:27 2013] nwf: let me check... [Sun Dec 8 15:39:42 2013] nwf: for now I am just excluding explicitly /updates/18/$ARCH/drpms [Mon Dec 9 21:31:50 2013] Hi there [Mon Dec 9 21:32:03 2013] anyone is using openafs 1.6.x in rhel6.4? [Mon Dec 9 21:49:58 2013] one question, when I generate the /etc/afs.keytab file which enctype should I use? [Mon Dec 9 21:50:39 2013] I thought it was supposed to be named rxkad.keytab... [Mon Dec 9 21:50:57 2013] Anyway, "the strongest enctype supported by all servers in the cell", which is probably aes256-cts-hmac-sha1-96 [Mon Dec 9 21:51:11 2013] Im having After getting a token and check the status I get this: [Mon Dec 9 21:51:17 2013] hi kaduk_ :) [Mon Dec 9 21:51:26 2013] howdy :) [Mon Dec 9 21:51:42 2013] I setup a new enviroment :) [Mon Dec 9 21:52:06 2013] so rxkad.keytab is for 1.6? [Mon Dec 9 21:52:24 2013] because Im following the instructions from the quickstart guide [Mon Dec 9 21:52:32 2013] Yes. [Mon Dec 9 21:54:07 2013] kaduk_, several of the old tutorials have you create afs.keytab and then asetkey from that, for des keys [Mon Dec 9 21:54:40 2013] geekosaur: yes, I sort of remember that. [Mon Dec 9 21:55:03 2013] anyway for a modern installation you want to do rxkad.keytab, but I'm not sure we have an updated quckstart yet. (I should probably prioritize that) [Mon Dec 9 22:01:02 2013] kaduk_: geekosaur I renamed from afs.keytab to rxkad.keytab, restarted the server and it works :) [Mon Dec 9 22:01:19 2013] That's good :) [Mon Dec 9 22:06:09 2013] mm it didnt work :( [Mon Dec 9 22:18:56 2013] in 1.6 do I need to run: asetkey add 2 /etc/rxkad.keytab afs/example.com ? [Mon Dec 9 22:19:32 2013] no [Mon Dec 9 22:19:40 2013] asetkey is only for DES keys [Mon Dec 9 22:19:49 2013] you should avoid DES keys with modern openafs [Mon Dec 9 22:20:49 2013] geekosaur: where can I read a update documentation? [Mon Dec 9 22:21:28 2013] I don't think we have a from-scratch or quickstart doc for rxkad-k5 yet [Mon Dec 9 22:22:22 2013] geekosaur: well, unless you're running git master ;) [Mon Dec 9 22:22:37 2013] http://www.openafs.org/pages/security/install-rxkad-k5-1.6.txt will give you some idea [Mon Dec 9 22:22:54 2013] although it assumes an existing DES-keyed cell [Mon Dec 9 22:23:53 2013] it also refers to http://www.openafs.org/pages/security/how-to-rekey.txt [Mon Dec 9 22:24:13 2013] * will see about updating the quickstart tomorrow; it's late here [Mon Dec 9 22:26:01 2013] I can help to test :) [Mon Dec 9 22:31:20 2013] well, I'll be using it as an excuse to set up the home cell I keep not doing for various reasons (mostly related to not having anywhere to put it...) [Mon Dec 9 22:38:55 2013] Im reading the docs, slowly [Mon Dec 9 22:39:01 2013] where should put rxkad.keytab? [Mon Dec 9 22:39:18 2013] The same directory that would have the KeyFile. [Mon Dec 9 22:39:51 2013] depending on how your openafs was built, usually it's either /etc/openafs/server or /usr/afs/etc [Mon Dec 9 22:40:03 2013] As far as I have read KeyFile is generated with asetkey add [Mon Dec 9 22:40:14 2013] Im using the rpms for rhel6 [Mon Dec 9 22:40:20 2013] the latter then [Mon Dec 9 22:40:24 2013] I have /usr/afs/etc [Mon Dec 9 22:41:59 2013] it worked [Mon Dec 9 22:42:38 2013] I moved /etc/rxkad.keytab to /usr/afs/etc, restarted the server [Mon Dec 9 22:43:53 2013] now I can continue reading and testing [Mon Dec 9 22:44:01 2013] thanks for your help [Mon Dec 9 22:47:15 2013] geekosaur: I will read those documents [Tue Dec 10 13:31:02 2013] ... document last updated sometime around 1.4.10. lovely. (this after way too much yak shaving this morning and I'm expecting to be preempted). somehow I do not think I will be updating this thing today [Tue Dec 10 13:31:13 2013] but I think we also have some quickstarts on the wiki [Tue Dec 10 13:34:09 2013] that's almost as bad. fedora 7? although I think I already updated that one at least partially. [Tue Dec 10 13:34:49 2013] apparently not [Tue Dec 10 13:34:52 2013] hrm [Tue Dec 10 20:11:59 2013] Hey channel, how might it happen that one of my file servers appears under two different UUIDs in the VLDB? :) [Tue Dec 10 20:12:31 2013] (The VLDB and PTS servers do not try to register themselves there, just the file/volume servers?) [Tue Dec 10 22:10:27 2013] Grrr. What would it take to get DNS entries in the VLDB? [Tue Dec 10 22:36:04 2013] DNS entries are not stored in the VLDB. Only raw addresses. The "vos" command displays reverse dns values unless you provide the -noresolve option [Tue Dec 10 22:47:57 2013] Understood in full; I wish it were not so! [Tue Dec 10 22:48:50 2013] I'd like to do split-DNS for my servers, rather than publishing RFC1918 and public addresses for my servers. [Tue Dec 10 22:50:23 2013] (I know, I know, don't run them on VMs behind NAT. We're not suffused with addresses that have publicly reachable AFS ports, and I don't want our gateway nodes running more than they have to.) [Tue Dec 10 22:55:25 2013] (Failing that, a split-horizon VLDB might be OK, but I suspect that's only slightly less work.) [Tue Dec 10 23:08:51 2013] there is no split horizon vldb but you are welcome to implement the changes [Tue Dec 10 23:09:46 2013] your-file-system.com has file servers behind a NAT. They have public addresses and nat addresses. the clients see both and use the one(s) they can reach. [Tue Dec 10 23:10:33 2013] The windows clients are a bit smarter about it than the unix clients. the unix clients do not track servers by their uuid and consolidate server flags and reachability by the uuid. [Tue Dec 10 23:10:58 2013] that is another small project that could be implemented by someone with motivation. [Tue Dec 10 23:43:15 2013] anyone has seen this error afsd: Error -1 in basic initialization? [Tue Dec 10 23:43:23 2013] I try to start the client in rhel6 [Tue Dec 10 23:52:26 2013] I have followed the stes in the quickstart guide [Tue Dec 10 23:52:41 2013] and when trying to set the acls to /afs I get: [Tue Dec 10 23:52:45 2013] [root@openafs-srv1 ~]# fs setacl /afs system:anyuser rl [Tue Dec 10 23:52:46 2013] fs: You don't have the required access rights on '/afs' [Tue Dec 10 23:52:54 2013] I kinit with admin [Tue Dec 10 23:53:00 2013] I ran aklog -d [Tue Dec 10 23:53:04 2013] and I got a token [Tue Dec 10 23:53:36 2013] by the way I ran openafs-client without dynroot and afsdb [Wed Dec 11 06:11:47 2013] nwf: there was an interesting discussion on servers with multiple IP addresses and AFS in the mailing list some time ago... [Wed Dec 11 07:31:08 2013] not so far away from now IMHO [Wed Dec 11 10:02:47 2013] jmedina: for non-dynroot, /afs is still going to pick the RO copy preferencially, if one exists. [Wed Dec 11 12:28:26 2013] secureendpoints: Is there documentation about the vlserver protocol somewhere convenient? I tried spelunking the source and am a bit worse for the wear. ;) [Wed Dec 11 12:32:37 2013] nwf: you mean something like "AFS-3 Programmer's Reference: Volume Server/Volume Location Server Interface ", or something else? [Wed Dec 11 12:32:58 2013] docs/pdf/vvl-spec.pdf is the Transarc docs on the protocol. [Wed Dec 11 12:33:19 2013] and of course the source code [Wed Dec 11 12:33:53 2013] Is it the case that Zayas just never got around to writing the pts-spec document analogous to the others there? [Wed Dec 11 12:34:28 2013] That's probably the right place to start. Thanks. :) [Wed Dec 11 12:44:02 2013] nwf: if you are looking at split horizon VL responses. You want to modify the GetAddrsU response based upon the endpoint address of the rx peer object associated with the incoming call. See src/vlserver/vlprocs.c SVL_GetAddrsU(). However, the same RPC is used by clients that might want filtering and those that require the entire address list for administration. [Wed Dec 11 12:44:28 2013] What's the U for, anyway? [Wed Dec 11 12:44:33 2013] UUID [Wed Dec 11 12:44:40 2013] Ah, of course. [Wed Dec 11 12:44:59 2013] I don't suppose there's any way to distinguish "client" from "vos command"? [Wed Dec 11 12:50:02 2013] not with the existing rpcs [Wed Dec 11 14:12:46 2013] Someone on the openafs-info list said that a cache size of 2.5G was the maximum [Wed Dec 11 14:13:07 2013] I have a hard time believing that -- is there anything I might be missing? [Wed Dec 11 14:13:30 2013] (seeing as I regularly see the 20G caches fill up on our systems) [Wed Dec 11 14:13:57 2013] I think that was in reference to very old versions of OpenAFS [Wed Dec 11 14:14:13 2013] we run 9GB caches on our web servers with no problems [Wed Dec 11 14:14:16 2013] or IBM AFS 3.4 [Wed Dec 11 14:14:25 2013] crazy [Wed Dec 11 14:15:15 2013] https://lists.openafs.org/pipermail/openafs-info/2013-December/040306.html apparently "not too long ago" is over a decade? [Wed Dec 11 14:15:56 2013] for someone that has supported AFS since the creation of Transarc, perhaps [Wed Dec 11 14:16:31 2013] Kim was the lead trainer for Transarc [Wed Dec 11 14:16:41 2013] * nods [Wed Dec 11 14:17:31 2013] our cache size probably could be bigger than 20G -- I need larger disks in my login servers [Wed Dec 11 14:18:12 2013] since all our engineering software is in AFS *AND* everyone's 10G homedirs are as well, we regularly get spikes in dcache misses [Wed Dec 11 14:20:17 2013] yeah I wish we had better metrics on our cache hits/misses [Wed Dec 11 14:21:22 2013] I created a nagios check that runs xstat_cm_test [Wed Dec 11 14:22:45 2013] I'm not terribly fond of the results [Wed Dec 11 14:23:04 2013] I pointed the checks against the central IT login servers and all of them are critical according to my checks [Wed Dec 11 14:23:26 2013] ours only go critical once in a while. [Wed Dec 11 14:24:20 2013] yeah, interesting [Wed Dec 11 14:24:54 2013] wisc has some really great graphs at http://cricket.cs.wisc.edu/cgi-bin/cricket/grapher.cgi [Wed Dec 11 14:25:23 2013] unfortunately at my workplace the monitoring system we have is pretty clunky [Wed Dec 11 14:26:24 2013] hmm, cricket [Wed Dec 11 14:26:29 2013] we ran that at CMU [Wed Dec 11 14:26:42 2013] I have Icinga (nagios clone) with pnp4nagios for graphs [Wed Dec 11 14:27:07 2013] ah, cool [Wed Dec 11 14:27:15 2013] I was wondering what you used for graphs [Wed Dec 11 14:27:18 2013] I'm not a huge fan of it [Wed Dec 11 14:27:27 2013] legacy intertia [Wed Dec 11 14:27:42 2013] I know Russ has some good AFS nagios plugins too. We could be doing so much more here [Wed Dec 11 14:27:46 2013] right [Wed Dec 11 14:27:50 2013] I'd probably prefer zabbix, because then I could just have the check dump *every* entry in xstat_cm_test into the database [Wed Dec 11 14:28:14 2013] I think I'm using one of Russ's plugins [Wed Dec 11 14:28:40 2013] I had a custom one that monitored release time for volumes, so I'd get an alert if a volume that's supposed to be automatically released every night was more than a day old [Wed Dec 11 14:29:03 2013] but I don't use it anymore since I don't run my own cell [Wed Dec 11 14:29:07 2013] anymore [Wed Dec 11 14:29:21 2013] all I'm concerned about is the client [Wed Dec 11 16:17:10 2013] Are there implementations of rx in anything other than C? [Wed Dec 11 16:19:41 2013] I don't think so [Wed Dec 11 21:18:23 2013] folk are quite pleased with the afs (iyfs) client for iOS... any news if there will be an android client, too? [Wed Dec 11 21:18:51 2013] eventually. we have a large number of things on our plate. [Wed Dec 11 21:19:26 2013] * nods [Thu Dec 12 04:45:17 2013] g [Thu Dec 12 11:33:00 2013] Aaaargh, why is NetInfo in /var/lib/openafs/local, not /etc/openafs/server? [Thu Dec 12 11:34:26 2013] (It's fine now that I know, but damn if that didn't trip me up) [Thu Dec 12 17:02:35 2013] is there a reliable shortened ritual for adding a new DB-server? [Thu Dec 12 17:03:12 2013] or does it involve adding the new DB-server to the 'server/CellServDB' of all other DB-servers and restarting them all? [Thu Dec 12 17:03:36 2013] I think the latter. [Thu Dec 12 17:05:10 2013] no way around it? Because I'd like to try-before-you-buy the good news that a 1.4 and 1.6 DB-servers are really compatible as you reported... :-) [Thu Dec 12 17:05:47 2013] Touching the CellServDB file tells everybody that there are configuration changes, but (at least on 1.6) I don't believe there's logic to produce new intra-ubik connections. [Thu Dec 12 17:06:09 2013] But maybe I'm just thinking of client security objects, since that's what I have actually looked at recently. [Thu Dec 12 17:06:18 2013] Master is slightly better in the security object regard. [Thu Dec 12 17:06:45 2013] kaduk_: I was wondering whether updating 'server/CellServDB' on only *one* of the existing DB-servers would work... [Thu Dec 12 17:07:03 2013] I'm pretty sure that answer is 'no'. [Thu Dec 12 17:07:57 2013] kaduk_: sorry for insisting with my weird questions. Note that my goal is not to add a DB server to just one existing DB-server and the others pick it up, but the opposite... [Thu Dec 12 17:08:27 2013] I guess there is the 'bos addhost' command, but I don't think there's automation to propagate it. [Thu Dec 12 17:09:17 2013] kaduk_: I have 3x 1.4 DB-servers, first I'd like to take one out of the cell, so to speak, and then see if it syncs safely with the 1.6 one, and then do the full 'server/CellServDB ' update. [Thu Dec 12 17:09:22 2013] If you just want to test a new machine, you could make a non-voting clone. [Thu Dec 12 17:09:33 2013] kaduk_: ahhhhh that's useful. [Thu Dec 12 17:09:59 2013] kaduk_: my worry is whether there is a risk of corrupting the DB of the existing 1.4 DB-servers. [Thu Dec 12 17:10:16 2013] But the general ubik story is that you only get the voting/consistency guarantees if all the db servers know what the set of possible db servers is and agree on it (that is, have the same CellServDB entries). [Thu Dec 12 17:11:02 2013] kaduk_: I remember you went to some lenghts to ensure that wuld not happen, and then found you need not have bothered, but still if I can be easily/cheaplu paranoid. I would/ [Thu Dec 12 17:11:30 2013] Right. [Thu Dec 12 17:11:57 2013] otherwise my fallback plan is to save the DB-files on all 3x 1.4 DB-servers, full join of the new one, and then if something bad happens, restore the DB-files and restart. [Thu Dec 12 17:13:45 2013] That seems reasonable. [Thu Dec 12 17:14:12 2013] ah thasnks for the support. [Thu Dec 12 17:14:50 2013] BTW does 'bos addhost' do anything other than editing the 'server/CellServDB'? Of course we don't run 'upserver' :-). [Thu Dec 12 17:17:04 2013] The 'man' page says it does just that, and restarting the dæmons should be done by hand. [Thu Dec 12 17:17:30 2013] I expect the man page is accurate in those regards, though I haven't actually read the code. [Thu Dec 12 17:18:12 2013] but then I wonder why the stark warning to use the command instead of editing the file directly [Thu Dec 12 17:18:48 2013] I wonder if there is a "legacy" situation like for "bos addkey" [Thu Dec 12 17:20:25 2013] I believe that the BOZO RPCs in question are quite old, yes. [Thu Dec 12 17:21:04 2013] I suspect that the AFS world was a bit different when the update server was standard. [Thu Dec 12 17:21:32 2013] Indeed. [Thu Dec 12 17:23:16 2013] different but related question: how can I check (if possible) which current DB-server is the coordinator? [Thu Dec 12 17:23:22 2013] udebug. [Thu Dec 12 17:23:48 2013] The coordinator is determined per-service, so you'll have to check each port to find out what the coordinator is for each service. [Thu Dec 12 17:24:06 2013] I am pretty sure it is the lowest numbered, but just-in-case... [Thu Dec 12 17:24:21 2013] kaduk_: interesting new about the per-serice. [Thu Dec 12 17:24:36 2013] the lowest numbered host gets an extra bonus in the voting, but the voting can choose other servers in some cases. [Thu Dec 12 17:24:59 2013] news to me of course. Uhm, I am behind with updates to my "devil in details" list. [Thu Dec 12 17:25:07 2013] If the lowest numbered host goes offline for some amount of time, the other hosts will elect a new coordinator, and that coordinator is likely to remain in its role even when the original host regains connectivity. [Thu Dec 12 17:25:30 2013] kaduk_: the legendary extra half vote IIRC, and connectivity issues. [Thu Dec 12 17:25:51 2013] http://www.central.org/pages/numbers/rxservice.html has a list of port numbers for services. [Thu Dec 12 17:26:23 2013] PR_ and VL_ are the most common ubik ones, but we also run BUDB_ here. [Thu Dec 12 17:29:02 2013] lowest is sync/coordinator according to 'udebug' [Thu Dec 12 17:29:09 2013] lowest is sync/coordinator according to 'udebug' (here I mean) [Thu Dec 12 17:29:36 2013] Sure. [Thu Dec 12 17:30:34 2013] I waa wondering whether in a full rollover of DB servers (replacing all 3x 1.4 ones with 3x 1.6 ones in 3 stages). Wondering how to avoid the potential election issue when the lowest is replaced. [Thu Dec 12 17:30:55 2013] all three new addresses are going to be different from the old ones and higher numerically... [Thu Dec 12 17:32:26 2013] 'udebug' seems to imply elections every 57 seconds though. [Thu Dec 12 17:33:10 2013] and anyhow during the period where a new coordinator is being elected IIRC the only issues is that the DB is going to be RO. [Thu Dec 12 17:37:06 2013] The DB is going to be RO, and also clients will need to know which machines to talk to. [Thu Dec 12 17:37:51 2013] So, I think that replacing all three IP addresses at once is going to make clients sad. Maybe DNS records help, but I don't know how often the client is going to check those. [Thu Dec 12 17:38:21 2013] kaduk_: 'fs newcell' was part of the plan. [Thu Dec 12 17:38:30 2013] Okay, that should work fine. [Thu Dec 12 17:39:15 2013] BTW I am not going to replace all three at once; the idea here is to start with 3x 1.4, add *one* 1.6, retire one of the 1.4, install a new 1.6 on it with a different address, and so on. [Thu Dec 12 17:39:28 2013] Also sounds good. [Thu Dec 12 17:40:13 2013] kaduk_: but also 6 DB server restarts (3 adds, 3 removes), but hopefully they should be rather painless? [Thu Dec 12 17:40:29 2013] I believe so. [Thu Dec 12 17:40:44 2013] my calculation too :-). Let's hope :-). [Thu Dec 12 17:46:53 2013] BTW I have been reading the Debian 'README.servers.gz' which makes it sound so easy, but it seems a bit dated, and I worry a lot about potential glitches, so double checking everything. [Thu Dec 12 17:48:38 2013] Hmm, it does not seem to know about rxkad-k5, yes. [Thu Dec 12 17:50:15 2013] kaduk_: that is something that will be updated soon IIRC. Someone on 'openafs-info' also mentioned that next master OpenAFS release will have docs mentioning 'rxkad-k*' and a guide to install straight into it. [Thu Dec 12 17:50:32 2013] I hope so. [Thu Dec 12 17:50:37 2013] Anyway, AFK for now. [Thu Dec 12 17:50:48 2013] thanks! [Thu Dec 12 18:48:52 2013] Walex: up late again? [Fri Dec 13 06:52:23 2013] so there, if there someone up... I have a moderately critical AFS volume that I'd like to move as quickly and fussy-less as possible to another server. Is there any advantage to do an explicit 'vos addsite' first and then a 'vos move' to the same target partition after that? Is the result going to be the same as the viceversa, 'vos move' followed by a 'vos addsite' in the same partition? [Fri Dec 13 08:00:54 2013] <[gorgo]> if it's just a RW volume, you don't need vos addsite, you just need vos move [Fri Dec 13 08:01:11 2013] <[gorgo]> vos addsite is used to add RO sites for the volume which you then refresh from the RW using vos release [Fri Dec 13 08:02:23 2013] [gorgo]: there is a difference between "need" and "it has some small advantages"... [Fri Dec 13 08:02:25 2013] <[gorgo]> if you have a RW and a RO together, and you want to move the two at the same time, then you do a vos move, vos addiste to the new server, vos release, and then vos remove for the readonly from the old server [Fri Dec 13 08:03:36 2013] <[gorgo]> walex: vos addsite is for handling RO sites, if you don't have them, then vos addsite will not give you any advantages [Fri Dec 13 08:04:41 2013] [gorgo]: the difference I can see between a 'vos addsite ...; vos move ...' instead of viceversa is that during the 'addsite' the RW volume is basically untouched, and I get a "spare" RO volume at the target. This may be slightly more resilient if either the source or target servers have bad moments. [Fri Dec 13 08:06:14 2013] [gorgo]: but it seems a small difference. The minor issue with 'vos move' is that it deleted the source volume at the end... [Fri Dec 13 08:06:16 2013] <[gorgo]> walex: so you do have a single RO clone along with the RW volume? [Fri Dec 13 08:06:36 2013] <[gorgo]> walex: vos move handles the RW volume. vos addsite/release handles the RO volumes [Fri Dec 13 08:07:01 2013] <[gorgo]> if you do a vos move for the RW volume and you have a RO volume on the old server, the RO volume will remain on the old server [Fri Dec 13 08:07:24 2013] [gorgo]: I understand what 'vos move' and 'vos addsite' do, I am wondering about doing it thing in a most convenient way. [Fri Dec 13 08:08:03 2013] <[gorgo]> we do critical volume migrations in an automated way all the time [Fri Dec 13 08:09:21 2013] <[gorgo]> depends on your circumstances [Fri Dec 13 08:10:04 2013] [gorgo]: uhmmm, you are braver than me :-) I have done about 100 volume moves recently and had problems with 2-4 of them... In the specific case the old (old old old) server has currently developed a slightly dodgy MD RAID issue, so was lookin for a little edge. [Fri Dec 13 08:10:44 2013] <[gorgo]> how many ro clones do you have? [Fri Dec 13 08:10:58 2013] <[gorgo]> and do the clients use mostly the ro's ? [Fri Dec 13 08:11:51 2013] <[gorgo]> if you are concerned by the health of the old server, and you don't have any additional RO copies, I'd certainly add a new RO site to the new server first [Fri Dec 13 08:13:29 2013] [gorgo]: yes, the latter is more or less what I was asking, whether there was a (little) advantage to do a RO copy first on another server and then move, or just rely on move doing itself a copy first. [Fri Dec 13 08:14:09 2013] <[gorgo]> the move will be doing a copy anyway [Fri Dec 13 08:14:20 2013] <[gorgo]> even if you already have a ro site on the target [Fri Dec 13 08:14:46 2013] part of the subtlety of my question is whether when I 'vos move' to the same AFS-partition where there is already a replica, the replica gets converted to a "lightweight" (COW) replica or not. [Fri Dec 13 08:14:54 2013] <[gorgo]> no, it doesn't [Fri Dec 13 08:15:29 2013] [gorgo]: ahh then it is best to do a 'vos addsite' on the source and then a 'vos move' to the target. [Fri Dec 13 08:15:38 2013] [gorgo]: as you said... [Fri Dec 13 08:16:18 2013] <[gorgo]> I didn't say vos addsite for the source... [Fri Dec 13 08:16:41 2013] <[gorgo]> you also didn't say if the volume has any replicas now [Fri Dec 13 08:16:44 2013] [gorgo]: well, or at least *not* to the target... [Fri Dec 13 08:17:21 2013] <[gorgo]> when we have only RW volumes, we just vos move them, without creating any RO replicas [Fri Dec 13 08:18:00 2013] [gorgo]: that's what I usually do, just a 'vos move', because if the move fails it is easy enough to undo the side effects. [Fri Dec 13 08:18:28 2013] [gorgo]: I was just wondering whether and which 'addsite' might give some little extra safety. [Fri Dec 13 08:18:55 2013] <[gorgo]> do a vos dump first [Fri Dec 13 08:19:04 2013] <[gorgo]> and don't bother with addsite [Fri Dec 13 08:19:13 2013] <[gorgo]> if you have a volume dump, you can easily restore it with vos restore [Fri Dec 13 08:19:19 2013] <[gorgo]> if you have a ro site [Fri Dec 13 08:19:42 2013] <[gorgo]> you could use vos convertROtoRW, however I found that has issues, I don't know if all have been fixed in latest 1.6 [Fri Dec 13 08:19:55 2013] <[gorgo]> or you could use vos dump | vos restore [Fri Dec 13 08:20:09 2013] <[gorgo]> I'd go for a vos dump [Fri Dec 13 08:20:11 2013] [gorgo]: we have backups, but some of our services we try to maximize continuity too. And yes, 'vos convertROtoRW' is what I was thinking, and also knowing it is a bit strange sometimes. [Fri Dec 13 12:57:22 2013] Does "vos release" not know enough to try other IP addresses for multi-homed servers? [Fri Dec 13 12:57:46 2013] I'm watching it bang on the inside-NAT address from outside the NAT, which is not going well. [Fri Dec 13 13:00:14 2013] (Incidentally, would a heuristic of the form "this is an RFC1918 address and I am not on its subnet, give it a base rank of 50000" be an OK thing to add to the cache manager? Calling fs setservpref has made my out-of-NAT experience much more enjoyable, but I don't think our users will enjoy having to run that by han every time.) [Fri Dec 13 13:13:56 2013] nwf: you may be needing to read a recent mailing list discussion about multihoming and NAT. [Fri Dec 13 13:14:54 2013] nwf: and it depends a bit which version of the client package you have too. [Fri Dec 13 13:15:21 2013] nwf: but the basic summary is that multihoming is not really supported. IIRC. [Fri Dec 13 13:40:00 2013] Yeah, I got that; I gave up and turned on hairpin NAT on our gateway so that the few other machines on the same segment as the AFS server can still reach it at its public address. [Fri Dec 13 13:40:18 2013] If Linux supported generating ICMP redirects, I'd do that instead, but. [Fri Dec 13 13:40:46 2013] (The NetInfo file now solely contains a force directive for the public address) [Fri Dec 13 15:43:36 2013] Well I just installed my first cell and I can connect from a linux client :) [Fri Dec 13 15:43:57 2013] Now I need to setup additional database and file servers [Fri Dec 13 23:09:37 2013] Woah, why is one of my machines aklogging as 'host.bigbrother.trinidad.acm.jhu.edu' rather than 'rcmd.bigbrother'? Its Kerberos princ is host/bigbrother.trinidad.acm.jhu.edu@ACM.JHU.EDU as you might expect... [Fri Dec 13 23:09:58 2013] (OpenAFS 1.6.1-3+deb7u1 from Debian) [Fri Dec 13 23:11:39 2013] Everybody else does the rcmd.HOST thing I expect. I think I like the other behavior better, tho', as it means we could have foo.bar and foo.baz.... is it a bug in 1.6.1 or is there a config option I can kick or ...? [Fri Dec 13 23:13:17 2013] aklog is kerberos v5. it obtains a kerberos v5 ticket using the kerberos v5 client principal and wraps it as an afs token. the cache manager presents the token to the afs services. The services translate the kerberos v5 principal name to a kerberos v4 name for lookup in the protection db. [Fri Dec 13 23:14:32 2013] perhaps you can rephrase the question [Fri Dec 13 23:15:57 2013] secureendpoints: I... I don't believe that answers my question, sorry. Please compare and contrast http://pastebin.ca/2498067 [Fri Dec 13 23:16:16 2013] Those are two machines in our cluster, talking to the same KDC and the same PTS. [Fri Dec 13 23:16:22 2013] (I sincerely hope) [Fri Dec 13 23:17:07 2013] The latter is "what I expect". The former is... rather pleasant, but unexpected. [Fri Dec 13 23:17:24 2013] (I know that host.blabla doesn't exist in the PTS; I had created rcmd. as I was expecting.) [Fri Dec 13 23:17:30 2013] how many components can a kerberos v4 principal have? [Fri Dec 13 23:17:48 2013] I have no idea. [Fri Dec 13 23:17:58 2013] two. name.instance@REALM [Fri Dec 13 23:18:42 2013] OK. I don't see... what that has to do with what I'm asking? [Fri Dec 13 23:18:48 2013] I understand that rcmd.hostname is the V4 name. [Fri Dec 13 23:19:02 2013] in this case name = rcmd. so instance is the hostname. "." is the component separator. What do you expect bigbrother.trinidad.acm.jhu.edu to be converted to? [Fri Dec 13 23:19:04 2013] But why is one of my computers trying to use something that's decidedly... not that? [Fri Dec 13 23:19:20 2013] I would expect it to be rcmd.bigbrother? [Fri Dec 13 23:19:37 2013] Just like host/magellan.... is rcmd.magellan [Fri Dec 13 23:19:38 2013] what is different about bigbrother.trinidad.acm.jhu.edu vs magellan.acm.jhu.edu? [Fri Dec 13 23:19:59 2013] Well, there is that disable dotcheck option, but that hardly seems likely to be what's going on here. [Fri Dec 13 23:20:08 2013] Well, the ".trinidad." part, but even "kdc.vm.acm.jhu.edu" gets mapped to "rcmd.kdc". [Fri Dec 13 23:20:36 2013] bigbrother is a Debian wheezy box. [Fri Dec 13 23:21:06 2013] magellan is some Ubuntu saucy. kdc is Ubuntu raring. [Fri Dec 13 23:21:36 2013] (We were not careful about rollout of the same environment; we're fixing that as we go.) [Fri Dec 13 23:21:52 2013] the answer to your question is going to be related to the fact that rcmd.bigbrother.trinidad is not a valid kerberos v4 name. [Fri Dec 13 23:22:23 2013] But neither is "rcmd.kdc.vm", and that one doesn't behave in this other odd way. [Fri Dec 13 23:23:56 2013] kaduk_: Where is the "disable dotcheck" set, just so I can make sure it's not different between our systems? [Fri Dec 13 23:24:29 2013] dotcheck is for first components with dots in them [Fri Dec 13 23:24:43 2013] Oh, like foo.bar/baz@REALM? [Fri Dec 13 23:25:03 2013] john.doe@REALM [Fri Dec 13 23:26:58 2013] that translates from krb5 to krb4 john.doe@REALM but in the krb5 case it is a single component and in krb4 case it is two components. The problem is that john/admin@REALM and john.admin@REALM both translate to the same thing. [Fri Dec 13 23:27:10 2013] Ah, I See. [Fri Dec 13 23:29:27 2013] nwf: looks like it's an -allow-dotted-principals argument to viced and such. [Fri Dec 13 23:30:04 2013] Is that per... client? [Fri Dec 13 23:30:49 2013] server [Fri Dec 13 23:30:50 2013] that isn't going to address the problem nwf is having. The translation from krb5 to krb4 name is being performed in aklog for the prdb lookup (name to id). of course that is completely meaningless for a host name. [Fri Dec 13 23:31:48 2013] I did say that it was unlikely to be relevant here. [Fri Dec 13 23:31:58 2013] It's good to know all the same, thanks. [Fri Dec 13 23:32:43 2013] the name translation that nwf is seeing is being performed either by the Kerberos library's krb5_524_conv_principal function or by a code that strips the substring before the first "." in the second krb5 component. The difference in behavior is probably a result of krb524 being in Kerberos or not. [Fri Dec 13 23:34:02 2013] Can the PTS database store the longer dotted form? I'd actually be happier with those pervasively, if I could get them. :) [Fri Dec 13 23:34:16 2013] in any case, self registration doesn't apply to local identities and the name to id inclusion in the token is for display purposes only. I would just run aklog with -noprdb and forget about it [Fri Dec 13 23:34:58 2013] here is the thing. you don't care what the client guesses the name mapping will be. the only thing that actually matters is how the servers are built. [Fri Dec 13 23:35:16 2013] and how the servers translate the name [Fri Dec 13 23:35:24 2013] for that you will need to look at audit log output [Fri Dec 13 23:35:33 2013] or use a debugger [Fri Dec 13 23:36:46 2013] That's all a little alarming. But yes, -noprdb does work (and aklog -d no longer prints out my user ID; that too, I suppose, is just for display purposes?) [Fri Dec 13 23:36:59 2013] yes [Fri Dec 13 23:37:25 2013] it isn't accurate. its just a guess [Fri Dec 13 23:37:58 2013] if you were using kerberos v4 for authentication it wouldn't be a gues [Fri Dec 13 23:38:08 2013] but don't do that [Fri Dec 13 23:38:13 2013] Right, no, not happening. [Fri Dec 13 23:38:56 2013] Is this all changing in the grand future of rxgk anyway? [Fri Dec 13 23:40:28 2013] the rxgk wire format uses pts extended names, which have separate display and data forms. [Fri Dec 13 23:42:23 2013] which doesn't address this particular issue which is really the fact that aklog isn't asking the server "what is the name and id associated with this token?" It is trying to guess the name from the kerberos v5 client principal and then look up the ID using that guess. [Fri Dec 13 23:50:44 2013] Thanks for your help; I think it's my bedtime. Have to be up early for more of the same. :) [Sun Dec 15 01:29:32 2013] There's probably nobody here, but I think I need help. We had a server crash (@#*&@*#&^ Xen) and subsequently a good number of salvage runs on our dafs server, and now at least one volume (mirror.debian, mounted at /afs/acm.jhu.edu/mirror/{.,}debian) seems rather stuck; that mountpoint says "no such file or directory" despite that the file server claims to have that volume. :( [Sun Dec 15 01:30:44 2013] The salvager reported that it was deleting a great many files while working on this volume. [Sun Dec 15 09:14:22 2013] nwf: I'm here but I'm thinking you want to talk to someone who isn't currently here and probably won't be until tomorrow (mvitale) [Sun Dec 15 09:16:13 2013] being that he's spent the past couple weeks working on salvager bugs [Sun Dec 15 09:16:32 2013] but, first things first: does `vos listvol` on that server/partition show the volume as on-line? [Sun Dec 15 10:08:39 2013] geekosaur: Yes, it does: mirror.debian 536871037 RW 24541057 K On-line [Sun Dec 15 20:50:49 2013] Is there a reason that the davolserver is not built on FreeBSD (but the other components of DAFS are)? [Sun Dec 15 20:51:07 2013] (It builds when I patch the makefile to not exclude it based on system name) [Sun Dec 15 22:50:49 2013] Is "vos release -stayonline" expected not to work? :\ [Sun Dec 15 22:51:54 2013] there are some known bugs, yes [Sun Dec 15 22:53:30 2013] OK, then I'll skip it. [Sun Dec 15 22:56:31 2013] Among other things, it left temporary volumes hanging around after failure. I can push my notes (however brief) to RT, if that'd be useful; if not, I'll skip it. [Sun Dec 15 22:57:03 2013] probably not worth it [Sun Dec 15 22:57:36 2013] OK [Sun Dec 15 22:57:45 2013] we're already poking at it for a customer, and will upstream patches when we have them [Sun Dec 15 22:58:03 2013] (we = SNA) [Sun Dec 15 22:58:48 2013] Rock. [Sun Dec 15 23:25:56 2013] bug reports should always be filed to openafs-bugs@openafs.org. It permits the broader community to know there is an issue. A separate concern is that the author of a code contribution should always be given the opportunity to fix it. [Sun Dec 15 23:46:25 2013] davloserver should get built on FreeBSD on master IIRC; that patch has not been pulled up to 1.6, though. [Mon Dec 16 03:17:23 2013] is it possible to use posixAccount and posixGroup objects from ldap for afs? [Mon Dec 16 03:17:39 2013] so i do not have to maintain 2 different databases with the same content [Mon Dec 16 03:17:58 2013] there are some tools syncing them [Mon Dec 16 03:18:09 2013] but you cannot directly use them [Mon Dec 16 03:19:19 2013] where can i find these tools? [Mon Dec 16 03:22:19 2013] I know the italian guys did work on something like that, but I do not have any url or more. [Mon Dec 16 03:23:06 2013] do you know some kind of name? something i can google :) [Mon Dec 16 03:26:31 2013] sorry, no [Mon Dec 16 03:29:10 2013] thank you anyway gotta run, bye [Mon Dec 16 14:19:08 2013] mvitale: Are you around and have a while to help me with a salvaged volume gone wrong? (geekosaur recommended that I ask you for help) [Mon Dec 16 14:20:23 2013] (if not, it's OK; I will just destroy the volumes and move on with life; they're just mirrors) [Mon Dec 16 14:22:47 2013] I'm here, I'll do what I can to help [Mon Dec 16 14:30:56 2013] Whoo. The problematic volumes are mounted at /afs/acm.jhu.edu/mirror/{,.}{debian,ubuntu}. The volume names are mirror.{debian,ubuntu}, if that helps. [Mon Dec 16 14:32:10 2013] We had a server crash and the salvager ran and deleted a lot of things. I have the salsrv log and volser logs both if you want them, but the former is hundreds of megabytes and the latter is about a megabyte. [Mon Dec 16 14:32:29 2013] There's also a salvageserver core. o.O [Mon Dec 16 14:34:54 2013] wow, that sounds more involved than what I can spare at the moment [Mon Dec 16 14:35:04 2013] Right, I was afraid of that. [Mon Dec 16 14:35:21 2013] although I'm always tempted by a good core file [Mon Dec 16 14:35:25 2013] Do you think that "vos rename" would damage any evidence? [Mon Dec 16 14:35:29 2013] I just can't do it right now [Mon Dec 16 14:35:42 2013] Oh, sure, I'll grab everything and tar it up for you at your leisure. [Mon Dec 16 14:36:00 2013] vos rename shouldn't hurt any evidence [Mon Dec 16 14:36:57 2013] OK; I'll rename these away and begin rebuilding our mirrors. [Mon Dec 16 14:37:22 2013] may I pm you? [Mon Dec 16 14:37:27 2013] Of course. [Mon Dec 16 16:54:08 2013] Any idea why "vos release" would be crawling (writing ~100K/sec to the remote replication sites)? [Mon Dec 16 17:14:12 2013] hrm... I'm thinking we might be hitting http://scripts.mit.edu/trac/ticket/387 [Mon Dec 16 17:15:16 2013] It has been latent for a long time, though we have some suspicion that something happened "recently" to make it more common. [Mon Dec 16 17:15:42 2013] we're running 1.6.5 [Mon Dec 16 17:15:52 2013] I've had two users hit that issue on the same server today [Mon Dec 16 17:16:00 2013] "login server, not necessarily oafs fileserver" [Mon Dec 16 18:41:07 2013] re: secureendpoints's comments on the upstream bug, what features are missing from Linux's kafs? [Mon Dec 16 18:42:06 2013] (I know there have been some GSoC efforts to improve interop between it and the OpenAFS cache manager, but I don't know of definitive documentation. Forgive the noob, please.) [Mon Dec 16 18:42:26 2013] I don't believe there is definitive documentation [Mon Dec 16 18:43:50 2013] dhowells is the maintainer. if you are volunteering to help write code, speaking with David is the best place to start. [Mon Dec 16 19:12:16 2013] If there's low-hanging fruit to get started, that'd be grand. I might also try to sell some AFS hacking as a CS internship to the department for some of the bright-eyed undergrads around. ;) [Mon Dec 16 19:13:10 2013] A few weeks ago the chair of the department was swept up in the enthusiasm for our new world order and made mention that there might be money in it for us if we hold on to some of the department's "cloud storage". [Mon Dec 16 19:13:55 2013] Well, "funny-money". Which can go towards equipment or other funny-money sinks. No actual money, I assume. [Mon Dec 16 19:14:18 2013] (And still not enough to purchase YFS's services. One of these days!) [Mon Dec 16 19:14:59 2013] The last I heard, the linux in-kernel kAFS did not have a way to use krb5 to get a token, which makes it kind of useless in the modern security environment. [Mon Dec 16 19:15:09 2013] Well there's... [Mon Dec 16 19:15:12 2013] * rummages [Mon Dec 16 19:15:38 2013] (But I don't generally pay much attention to linux development, being more of a freebsd guy.) [Mon Dec 16 19:16:34 2013] http://people.redhat.com/dhowells/rxrpc/klog.c ... though looking that seems to use kaserver. :( [Mon Dec 16 19:19:04 2013] Yeah, that's what I'm remembering. [Mon Dec 16 19:21:12 2013] Hm; does kafs understand the new kdf? [Mon Dec 16 19:22:36 2013] (Actually, on that note, is the kdf implemented in aklog or in the cache manager itself?) [Mon Dec 16 19:23:25 2013] It's implemented in aklog. [Mon Dec 16 19:24:16 2013] https://github.com/openafs/openafs/blob/master/src/aklog/aklog.c#L721 [Mon Dec 16 19:24:45 2013] Thanks! [Mon Dec 16 19:32:38 2013] There's also http://www.mail-archive.com/afs3-standardization@openafs.org/msg01708.html but I believe the test vectors in it are wrong (They correspond to a 0-indexed KDF instead of a 1-indexed one.) [Mon Dec 16 21:09:05 2013] Is there an easy way to ask "how long has it been since 'vos release' finished on a given volume"? It looks like 'vos examine' can produce a lot of numbers but I'm not sure exactly what they all mean. [Mon Dec 16 21:10:12 2013] the creation time on a r/o is when the release finished [Mon Dec 16 21:10:32 2013] Do any of the numbers capture the time the release started? [Mon Dec 16 21:10:50 2013] (I've had a glacially slow release running for more than 24 hours...) [Mon Dec 16 21:10:58 2013] I don't think so [Mon Dec 16 21:11:29 2013] What is "Copy" supposed to capture? [Mon Dec 16 21:12:09 2013] although if the release is in progress you should be able to look at the transaction (see: vos status) [Mon Dec 16 21:12:25 2013] * nod [Mon Dec 16 21:13:53 2013] I am more looking for some way to alarm if volumes and their RO replicas diverge for more than N hours, even if there has not been a transaction running the whole time. [Mon Dec 16 21:14:09 2013] I know we're supposed to use .backup for this, but .backup can't be replicated off-server without 'vos shadow'. [Mon Dec 16 21:14:26 2013] copy is when the *first* release happened, for a r/o. it will be the same as creation for an r/w [Mon Dec 16 21:17:37 2013] (There are a lot of moving parts in the ACM's new cluster -- we seem to have suddenly staggered into "big data", complete with cluster filesystems. So I am rather keen on having user homedirs replicated to a much more traditional stack. Thus replication questions of late.) [Mon Dec 16 21:18:10 2013] (If you're curious, the proposed design we're testing is described in https://docs.google.com/document/d/1q2bo9p-zGU--aeU22fLoLes_IFZ0KR8zIfMLr4fQonA/edit ; if you're not curious, that's fine too. :) ) [Mon Dec 16 21:18:59 2013] I don't think you can get that time directly. what you *might* be able to do is cheat: if every volume in question has a cheap clone, you can compare the time on that to the time of an off-server backup [Mon Dec 16 21:19:12 2013] er, off-server replica [Mon Dec 16 21:19:44 2013] Ah, good call. Yes, we do have both on- and off-server replicas. [Mon Dec 16 21:20:35 2013] Am I correct in understanding that the volume is locked during the on-server clone and then the on-server clone is dump | restore over rx to the off-server clone? [Mon Dec 16 21:20:50 2013] pretty much, yes [Mon Dec 16 21:21:12 2013] The intent behind -stayonline was that the off-server destination would be a temporary volume which would then be used to clobber the off-server replica? [Mon Dec 16 21:22:01 2013] Without -stayonline the off-server replicas will be deleted first? [Mon Dec 16 21:22:19 2013] Or is there an incremental dump | restore procedure? [Mon Dec 16 21:23:27 2013] incrementals will be used when it makes sense (for some value of "when it makes sense"). without -stayonline the off-server replica is taken offline and then updated [Mon Dec 16 21:24:09 2013] Oh, just taken offline, not deleted. So if it all goes up in smoke during a replication pass we can just 'vos entrans' and 'vos online'? [Mon Dec 16 21:25:03 2013] you can endtrans but probably not online, since its state may be indeterminate. it will be marked in the VLDB as out of date, and the next release will update only those out of date volumes [Mon Dec 16 21:25:36 2013] (if you look at "vos ex" in such a situation, you will see those as "Old release" and the ones that were updated as "New release") [Mon Dec 16 21:25:47 2013] Darn; there's no way to make a 'two phase commit' version out of this? [Mon Dec 16 21:26:11 2013] I guess that's what -stayonline does, when it works? [Mon Dec 16 21:26:24 2013] also the RClone will still be around, and the release will be done from that instead of from the live volume [Mon Dec 16 21:27:12 2013] roughly, yes [Mon Dec 16 21:27:13 2013] Well, sorry, by "it all goes up in smoke" I mean something like "our cluster filesystem decides to delete everything". So I'm assuming the RW volume and everything colocated is gone up in smoke. [Mon Dec 16 21:27:27 2013] I realize I'm abusing replication. [Mon Dec 16 21:28:03 2013] if that happens, you pick a valid online r/o and use vos convertROtoRW [Mon Dec 16 21:28:09 2013] (Thus the paranoid discussion in the design document about traversing the RO copies and dumping them to not-AFS.) [Mon Dec 16 21:29:00 2013] Well, there are only two ROs ever in this story, the colocated one and the one on the server I affectionately named "chicago" (for rebuilding after it all burns down), so if the damage happens during a replication run, there won't be any valid ROs. [Mon Dec 16 21:31:30 2013] Incidentally, the channel has been remarkably helpful. Is the OpenAFS Foundation sufficiently extant to recieve donations? ("Too little for corporations to be intersted" might still be put to good use?) [Mon Dec 16 21:34:28 2013] you may want to think about using vos shadow and your own database of shadows, or a second replica server. [Mon Dec 16 21:36:05 2013] my understanding of the Foundation's status is it does not yet have the tax paperwork with the IRS (this tends to take a while) [Mon Dec 16 21:36:05 2013] OK [Mon Dec 16 21:36:27 2013] I am secretary of a hackerspace, so I empathize. [Mon Dec 16 21:39:50 2013] I can ask Roman or Margarete about status of donations [Mon Dec 16 21:41:25 2013] Thankee. [Mon Dec 16 21:47:49 2013] oh, hm, Margarete is out of the country. asked Roman instead [Mon Dec 16 22:57:23 2013] catching up. the idea behind the gsoc work is that openafs userland tools could be used with either openafs or kafs kernel modules. When the patches to permit that to happen are in place then there is no need for kafs to have its own aklog. it uses the one that openafs distributes. [Mon Dec 16 22:59:01 2013] as far as donations to the Foundation go. you can always give money to a legal entity with a bank account. what you cannot do is claim it as a tax deductible donation unless the receiver is designated a charity by the IRS. my suggestion would be to give any money to an individual developer that has done something you care about. [Mon Dec 16 23:00:38 2013] the gatekeepers have provided support to faculty members overseeing Capstone Projects and Masters Thesis Projects in the past. [Mon Dec 16 23:01:07 2013] We would do so for JHU CS [Mon Dec 16 23:05:26 2013] Are the patches around? Do they work, more or less? [Mon Dec 16 23:05:59 2013] Thanks for the advice on donations. I'll try to steer some students towards the gatekeepers. :) [Mon Dec 16 23:09:31 2013] And I hope the channel will permit me one more ignorant question: Chicago's AFS fileserver is sitting on ZFS; is there some relatively snappy opperation to ensure that the fileserver's state is entirely on disk so that it can be cheaply snapshotted? Doing that before a release operation seems like a viable way to get something close enough to two-phase commit for disaster recovery? [Mon Dec 16 23:09:52 2013] (If the answer is "please don't do that", I'll go back to "DIY with vos shadow") [Mon Dec 16 23:10:28 2013] if anything, it currently syncs to disk a bit too often :) [Mon Dec 16 23:35:20 2013] as I said earlier. speak with dhowells regarding the state of kafs [Mon Dec 16 23:36:43 2013] There is no mechanism to sync a volume to a zfs snapshot. You could certainly write such functionality. [Mon Dec 16 23:37:46 2013] in particular what you want to do is rewrite the file server backend to use a separate zfs dataset for each volume and then use the zfs snapshot as the source of the .backup or local .readonly volume [Mon Dec 16 23:38:24 2013] That's not a crazy plan, though it's more than an afternoon of scripting. ;) [Mon Dec 16 23:39:46 2013] correct, it is not a scripting project. it is a one semester project for a student to work on for credit [Mon Dec 16 23:40:45 2013] * will keep it up his sleeve [Mon Dec 16 23:42:36 2013] I wonder if there's a way to worm OpenAFS into our "Object-Oriented Software Engineering" course. [Mon Dec 16 23:43:10 2013] (Students do the whole pipeline of design / spec / implementation / review on a project of their choice, with some guidance.) [Mon Dec 16 23:43:53 2013] (Typically juniors, I think.) [Mon Dec 16 23:45:47 2013] I would think that Professor Amir would be the right person to talk to [Mon Dec 16 23:46:02 2013] I will, surely. [Mon Dec 16 23:47:01 2013] In theory I could stop by to meet with him at the end of January. [Mon Dec 16 23:47:13 2013] (I think his courses tend to be more rigidly structured, OOSE is nice for being open-ended.) [Mon Dec 16 23:47:30 2013] It'd be nice to see you again. :) [Mon Dec 16 23:48:46 2013] what is the policy for dogs on campus? [Mon Dec 16 23:51:13 2013] As far as I know they're welcome? [Mon Dec 16 23:52:33 2013] Other professors to talk to are Randal Burns, our storage expert, and possibly Yanif Ahmad, who works on (distributed) databases. [Tue Dec 17 05:21:03 2013] nwf: IIRC CompSci/Informatics at Edinburgh also use ZFS with AFS so they may have done something about that. [Tue Dec 17 05:21:36 2013] They weren't doing that when I was there, although that could have changed. At that point all of the fileservers with Linux [Tue Dec 17 05:22:24 2013] * had been tempted to add "and may know" :-) [Tue Dec 17 05:22:38 2013] There are lots of folk who use ZFS as a filesystem for a conventional AFS fileserver. However, that's different from what nwf/secureendpoints were discussing [Tue Dec 17 05:23:08 2013] The really cool thing you could do with ZFS is replace the fileserver's namei backend with a ZFS data store, and then do volume clones using ZFS snapshots. [Tue Dec 17 05:24:06 2013] uhmmmmmmmm that would make more sense than what happens now, that a filtree-level volume manager runs on top of a "block"-level volume manager. [Tue Dec 17 05:24:22 2013] I was wondering if a similar thing would be possible with btrfs [Tue Dec 17 05:25:12 2013] IIRC ironically in the origin AFS/Coda were designed as something vaguely similar to ZFS/BTRFS... [Tue Dec 17 05:26:09 2013] dhowells: As things stand now I am not seeing a lot of advantage at running a "logical" volume manager like AFS on top of a "physical" volume manager like ZFS. [Tue Dec 17 05:26:23 2013] dhowells: it would a bit like runnin BTRFS itself on top of DM/LVM2. [Tue Dec 17 05:27:04 2013] Does btrfs expose its object store directly? [Tue Dec 17 05:27:21 2013] * is pretty sure that somehone somewhere is running BTRFS on top of DM/LVM2 :-) [Tue Dec 17 05:28:07 2013] BTRFS does not really do "object" store, it manages storage in 256MiB "extents" IIRC. [Tue Dec 17 05:28:24 2013] The really interesting thing is getting access to snapshots. If your filesystem can do the snapshotting for you, then you can get rid of a lot of the complexity of the AFS volume package. [Tue Dec 17 05:28:33 2013] BTW at home just for fun I have BTRFS for '/vicep*' [Tue Dec 17 05:28:53 2013] sxw: that thought occurred to me [Tue Dec 17 05:28:58 2013] sxw: I'm not entirely certain what it provides beyond the typical fs interface [Tue Dec 17 05:29:29 2013] Yeah, that's the million dollar question - you need more than just open(), write(), close() [Tue Dec 17 05:29:41 2013] sxw: but I've talked to Chris Mason about it, he'd be willing to take changes suitable to running an AFS fileserver on it [Tue Dec 17 05:30:00 2013] but this is really the time to do it - whilst it's still in development [Tue Dec 17 05:30:06 2013] Cool. That sounds like it's definitely an avenue worth persuing. [Tue Dec 17 05:30:24 2013] sxw: but AFS still needs replicas/snapshotting across machines. It would only help with same-partition clones, but then they are the basis for much everything else. [Tue Dec 17 05:30:25 2013] I suspect it wouldn't just be AFS that would make use of it. [Tue Dec 17 05:30:27 2013] write out a list of what such a beast would require and give it to the BTRFS team to see what they can come up with [Tue Dec 17 05:30:30 2013] indeed [Tue Dec 17 05:31:30 2013] Walex: The hardest thing about snapshotting across machines is getting the data into a stable state in the first place. Once you have a local snapshot, transferring that between systems is relatively straightforward (it's _really_ straightforward if the filesystem has something similar to data version numbers) [Tue Dec 17 05:34:13 2013] Walex: what's your email address? [Tue Dec 17 05:38:32 2013] dhowells: can /msg you? [Tue Dec 17 05:38:49 2013] Walex: sure, or email dhowells@redhat.com [Tue Dec 17 05:39:09 2013] or, if you're on openafs-devel, watch for the email I'm about to send [Tue Dec 17 05:49:33 2013] dhowells: coming... [Tue Dec 17 05:59:24 2013] Walex: sent by email? [Tue Dec 17 06:01:54 2013] dhowells: yes... [Tue Dec 17 06:02:13 2013] hmmm... Unless your name is Jane, I don't have it yet [Tue Dec 17 06:02:26 2013] may be slow... [Tue Dec 17 14:27:20 2013] looks like we're going to need to update the openafs-kmodtool script for Fedora 20. [Tue Dec 17 14:27:27 2013] since it only has globs for fc1?* [Tue Dec 17 14:31:50 2013] * got 1.6.5.1 running on Fedora 20, but couldn't build the kmod-openafs because of packaging issues, but dkms-openafs works well enough that I can use it [Tue Dec 17 20:50:33 2013] Hey channel; we had another server catastrophe (AFS server suddenly lost connectivity to its backend) and while there are no irreprably damaged volumes (AFAICT), the salvager undid a lot of insertions that I'd done well before it crashed and hadn't touched in the interim (I think). What's the best way to demand of the server that everything's stable on disk? 'sync'? [Tue Dec 17 20:52:20 2013] A separate question: our primary vlserver and prserver are hosted behind a NAT. There are forwarding rules in place so that nobody, even inside the cluster, needs to know about their internal addresses, but adding a NetRestrict file causes the vlserver to fail to start, complaining about 'primary interface 127.0.1.1' even if there's a NetInfo file with "f public.addr". Thoughts? [Tue Dec 17 20:53:15 2013] (The internal addresses are visible to ubik, which makes my attempt at standing up replicas die with 'host with two primary addresses') [Wed Dec 18 07:57:27 2013] nwf: your network setup is begging, loudly and determinedly, for trouble. [Wed Dec 18 07:58:27 2013] nwf: as to the stable writes, IIRC OpenAFS tends to do too many 'fsync' rather than too little. [Wed Dec 18 07:59:34 2013] nwf: I would first have a look at the storage subsystem to double check whether it honors 'fsync'. I note with concern that you write "server suddenly lost connectivity to its backend [Wed Dec 18 08:00:35 2013] which might imply that your storage is at some remove pn some kind of network accessible device via severla layers of firmware and software. That is begging for trouble, fairly loudly and with some determination. [Wed Dec 18 08:01:53 2013] nwf: no info on privatenet (NAT is bad, always), but storage: usual bos salvage keeps track on it. And as you wrote, your latest changes were not on disk, do check for your disk setup [Wed Dec 18 08:02:01 2013] looks like the partitions were not reachable... [Wed Dec 18 08:02:21 2013] We do kkep track of partitions mount features with icinga [Wed Dec 18 08:28:11 2013] Is it possible that getting a DB server migration not quite right could cause fileservers to spontaneously decide to salvage volumes? We have a slightly weird situation here. [Wed Dec 18 08:28:35 2013] usually not IMHO [Wed Dec 18 08:34:10 2013] but if I remember well, I did have had the situation once... [Wed Dec 18 08:34:41 2013] solution was to kill the database for fileservers and recreate it with "vos syncserv XYZ" [Wed Dec 18 08:54:38 2013] Amiga4000: what do you mean by kill the database? [Wed Dec 18 09:11:19 2013] jmdh_: he probably means removing the VL database under '/var/lib/openafs/' and recreating it by re-registering all volumes into it. But I think that's not the issue. [Wed Dec 18 09:11:49 2013] BTW I am looking at 's issue too and looked at logs. [Wed Dec 18 09:11:54 2013] It would be very wierd for a vlserver issue to cause you to keep salvaging [Wed Dec 18 09:12:35 2013] it transpires that the fileservers were restarted at the same time. [Wed Dec 18 09:12:46 2013] yeah, I had that situation once on upgrading 1.4 to 1.6 .... [Wed Dec 18 09:12:59 2013] a server was caught in a salvage loop [Wed Dec 18 09:13:04 2013] sxw: the suspicion here by probably is that this being 1.4+1.6 mixed DB DB server situation the DBs may be corrupted. [Wed Dec 18 09:13:18 2013] thats what I had. [Wed Dec 18 09:14:25 2013] (years ago) [Wed Dec 18 09:14:29 2013] I can't see how a corrupt DB (if that's what you've got) would cause you to keep salvaging. [Wed Dec 18 09:14:49 2013] but what I see is that after the change in '/etc/openafs/server/CellServDB' the fileservers were restarted and they core-dumped [Wed Dec 18 09:14:53 2013] Is the fileserver bouncing up and down (restarting) [Wed Dec 18 09:15:14 2013] Ah, so your fileserver is crashing, and salvaging each time it restarts. Now that's expected behaviour. [Wed Dec 18 09:15:27 2013] it's not continually crashing. [Wed Dec 18 09:15:29 2013] (the salvaging, not the crashing) [Wed Dec 18 09:15:38 2013] it seems to be this case: https://lists.openafs.org/pipermail/openafs-info/2012-February/037649.html [Wed Dec 18 09:15:41 2013] But each time it crashes, it salvages? [Wed Dec 18 09:15:48 2013] so maybe "all" we need to do is wait for it to finish salvaging. [Wed Dec 18 09:16:03 2013] sxw: yes it is salvaging and there is quite a bit of damage. [Wed Dec 18 09:16:51 2013] sxw: my suspicion is that the fileserver process crashed with a fair bit of stuff in progress and this left the AFS-partitions in an inconsistent state. [Wed Dec 18 09:17:38 2013] Walex: That's possible. Or, if this is the first time you've salvaged in a while, you might be seeing older issues. [Wed Dec 18 09:18:56 2013] or this thread too https://lists.openafs.org/pipermail/openafs-info/2011-March/035755.html [Wed Dec 18 09:19:55 2013] sxw: actually the fileservers involved are all latest-and-greatest Debian 7/wheezy and all volumes were 'vos moved' freshly over the last two weeks, so I hope they were fine (the source AFS-partitions had some issues though). [Wed Dec 18 09:23:14 2013] interestingly the *dafileserver* on the new 1.6 DB server crashed too on restart, but the old 1.4 'fileserver' did not, on the DB-servers (where we keep only replicas of the top level volumes). [Wed Dec 18 09:23:57 2013] I don't know what OpenAFS version is implied by "latest-and-greatest Debian 7/Wheezy" [Wed Dec 18 09:24:25 2013] secureendpoints: 1.6.1+rxkad patches and some other updates. [Wed Dec 18 09:24:57 2013] '1.6.1-3+deb7u1' is the Debian tag [Wed Dec 18 09:25:07 2013] which doesn't tell me a whole lot about what is missing from openafs 1.6.5.1 [Wed Dec 18 09:25:44 2013] secureendpoints: indeed, but we stick Debian 'stable' stuff. [Wed Dec 18 09:26:02 2013] Sadly that means you get to stick with all of the bugs that we've already fixe [Wed Dec 18 09:26:36 2013] or that were introduced by back porting partial patch sets [Wed Dec 18 09:27:07 2013] we tend to trust Debian packagers... [Wed Dec 18 09:27:23 2013] if you have core files for the crashed processes and can obtain stack traces for the crashes, we might be able to tell you if the problem is something that was known and has been fixed [Wed Dec 18 09:30:22 2013] I am looking... [Wed Dec 18 10:58:45 2013] in the meantime we have a curious situation. One of 2x 1.6.1++ finished salvaging and seems fine. On rthe other most volumes were salvaged, but for the last 4 the salvager spawned 4 process that are not doing anything, waiting on a UNIX socket... [Wed Dec 18 11:00:02 2013] So wondering whether to just kill them and expect the top level salveger process to restart them, or better restart the whole 'dafs' bnode [Wed Dec 18 12:01:42 2013] BTW they are *salvageserver* processes, not *salvager* processses. [Wed Dec 18 12:03:12 2013] Walex: Yes, indeed, our server is backed by other storage nodes, namely Ceph. [Wed Dec 18 12:03:41 2013] Amiga4000: I had thougth that "bos salvage" was a bad plan for dafs? [Wed Dec 18 12:05:31 2013] Walex: It is not clear to me how to do a better job than we have, short of having money and staff. [Wed Dec 18 12:07:10 2013] so I suspect is some kind of synchronization issue. Looking at 'lsof' it is not one of the "named" sockets. [Wed Dec 18 12:08:50 2013] nwf: ahhhh. But then if you have a Ceph delay/timeout/issue you will also have an impact. [Wed Dec 18 12:10:02 2013] Understood. [Wed Dec 18 12:10:12 2013] nwf: from what you say it may be the case that Cepth does not fully honor 'fsync', which is something that indeed many distributed filesystems don't because it is so expensive [Wed Dec 18 12:10:18 2013] * nod [Wed Dec 18 12:10:32 2013] I was slightly afraid of that and will be taking it up in #ceph [Wed Dec 18 12:10:37 2013] nwf: you will be familiar with the 'async' vs. 'sync' NFS thingie. [Wed Dec 18 12:10:41 2013] (Maybe there's a flag we can kick on) [Wed Dec 18 12:10:57 2013] * nod again [Wed Dec 18 12:10:58 2013] nwf: exactly, perhaps your Ceph config has such a flag. [Wed Dec 18 12:12:05 2013] as a curious aside, the HEPiX storage interest group have done extensive performance testing of distributed filesystem, and the best results were given by AFS backed by Lustre; curiously it was even faster than Lustre native. [Wed Dec 18 12:13:15 2013] nwf: how are you interfacing AFS with Ceph? [Wed Dec 18 12:15:35 2013] nwf: are you using the block storage device interface or the POSIX interface? [Wed Dec 18 12:16:22 2013] nwf: in theory you could use the Object Storage interface but you would need to modify https://github.com/hwr/openafs-osd to support Ceph [Wed Dec 18 12:17:09 2013] BTW in my "stuck salvageserver instances" above, 'salvsync-debug query' hangs too not unexpectedly. [Wed Dec 18 12:20:54 2013] secureendpoints: The posix filesystem layer. [Wed Dec 18 12:21:17 2013] openafs-osd+ceph would be another student-semester project, I think, and might be worth-while, but we haven't yet. [Wed Dec 18 12:21:33 2013] nwf: so you mount a Ceph "volume" as an AFS "partition" I guess. [Wed Dec 18 12:22:39 2013] the question is. why use Ceph when AFS is a distributed file system? AFS is designed to work with lots of small file servers with direct attached disk. [Wed Dec 18 12:22:54 2013] Walex: Yes. [Wed Dec 18 12:24:09 2013] secureendpoints: We do not have lots of publicly reachable IP addresses, OpenStack apparently wants Ceph anyway (we're using OS for VM hosting), and Ceph offers seamless (in theory) failover between replicas while AFS does not. [Wed Dec 18 12:24:46 2013] That doesn't argue for storing AFS on top of Ceph. That argues for running Ceph for OpenStack [Wed Dec 18 12:25:41 2013] * expects the answer to be 'same rerason why AFS can export NFS": legacy... [Wed Dec 18 12:26:04 2013] Well, that our storage nodes are not the most stable beasts means we've had ample opportunity to test Ceph's handling of failover. :) [Wed Dec 18 12:26:09 2013] It's actually done a pretty good job. [Wed Dec 18 12:26:17 2013] but not good enough [Wed Dec 18 12:28:01 2013] #ceph claims that cephfs honors fsync, so I am back to being confused as to what happened with our two server crashes. In the former case I don't think I can blame ceph at all -- the AFS server just up and died because Xen nuked its Dom0 -- and so we would have had to salvage anyway. In the "AFS server lost its network" case, it's not really clear what happened. [Wed Dec 18 12:37:41 2013] It would be quite rad if there were a truly distributed AFS server -- in the sense of replicated state machines across two computers -- so that we could back them both by the same cephfs contents and take either master down [Wed Dec 18 12:39:01 2013] (I don't really believe in "vos move" as an answer to this -- some of our volumes are huge; I've had two "vos move"s running for more than 24 hours at this point, and a "vos release" to the not-ceph-backed fileserver is up to 12 hours for a tiny volume with a lot of files.) [Wed Dec 18 12:39:46 2013] nwf: part of the reason for that long time with lots of time is that AFS does a lot (as it is right) of metadata 'fsync'. [Wed Dec 18 12:39:54 2013] that isn't how replication in the AFS model works. the idea is that clients handle the failover between volume instances. OpenAFS only has readonly replication but in YFS with read/write replication the model is the same. [Wed Dec 18 12:40:19 2013] If we could afford YFS, I'd be handing you suitcases of money, believe me. [Wed Dec 18 12:40:43 2013] nwf: perhaps Ceph does 'fsync', and then perhaps either it was disabled to gain performance, or some bits of the underlying storage transports did not, [Wed Dec 18 12:41:15 2013] nwf: it is unfortunately rather difficult to get a whole storage chain to do 'fsync' reliably, because at every stage someone has the temptation to cheat to gain latency, [Wed Dec 18 12:41:30 2013] Walex: Yeah. [Wed Dec 18 12:41:37 2013] I think there are too many layers of indirection. If fsync is enforced from end to end across all of the layers the latency would be horrible [Wed Dec 18 12:41:40 2013] nwf: e.g. disk, host adapter caches, whatever... [Wed Dec 18 12:42:28 2013] my colleagues know that 'fsync' is one of my obsessions :-) [Wed Dec 18 12:43:41 2013] If we could get live read/write replication on AFS volumes, I could basically be doing the load balancing that Ceph is doing by just controlling where the primary and secondaries replicas were located on our four storage nodes. This would be a dramatic improvement, yes. [Wed Dec 18 12:45:48 2013] Walex: I don't believe that fsync is the reason for the glacial nature of my vos operations in all cases; without looking, it looks as if the network protocol is an RPC call per file or something like that, because it seems to move data quite quickly for the duration of the few large files in this volume and quite slowly otherwise. [Wed Dec 18 12:47:30 2013] (Ah, to ammend my earlier statement: "load balancing and replication that Ceph is doing") [Wed Dec 18 12:50:42 2013] ... AFS does not depend on atime on its partitions, does it? (Speaking of cheating, I kicked on the noatime mount flag when I created the cephfs mounts and only now wonder if this might be a problem.... if so, uh, oops? :) ) [Wed Dec 18 12:51:24 2013] (Relatedly, what does it store in the permission bits?) [Wed Dec 18 12:51:48 2013] <[gorgo]> walex: re salvageserver waiting on processes... how many open filedescriptors is allowed for the fileserver? and is it using select? [Wed Dec 18 12:52:40 2013] <[gorgo]> walex: if you have more than 1024 and fssync socket is handled using select, you can get corruption in the fdset for the select ending in a deadlock - or a crash [Wed Dec 18 12:58:43 2013] <[gorgo]> this has apparently been fixed in upstream 1.6.2 [Wed Dec 18 12:58:55 2013] <[gorgo]> to use poll instead of select [Wed Dec 18 13:00:05 2013] <[gorgo]> nfw: vos restore operations (which is on the receiving side of vos move) tend to be very slow if you have tons of small files, due to lots of fsyncs on the special files in volume metadata [Wed Dec 18 13:01:46 2013] <[gorgo]> when it does a large file, it can stream it without fsync [Wed Dec 18 13:10:28 2013] fsync is only part of the story. each time the receiver stops reading the rx call data the call's window closes and data stops flowing. the performance is much better when the receiver reads all of the incoming data at full speed and uses a secondary thread to perform the restore. [Wed Dec 18 13:12:16 2013] Would it be faster to do the 'vos release' by hand with a 'vos dump | ssh ... vos restore'? :( [Wed Dec 18 13:13:47 2013] you would cut out some of the network latency but you still have the rx issues between the "vos" client and the volserver [Wed Dec 18 13:14:53 2013] <[gorgo]> vos release also updates the vldb entry regarding the freshness of the ro volume between the different steps of the release. you may not care about it though [Wed Dec 18 13:16:00 2013] unix clients ignore the freshness information. only the windows clients select the .readonly source based upon the "new" flag [Wed Dec 18 13:42:01 2013] <[gorgo]> nfw: also vos is doing incremental releases, you'd need to manually calculate timestamps for vos dump if you want to emulate that [Wed Dec 18 13:42:13 2013] <[gorgo]> sorry, nwf, I keep mistyping :) [Wed Dec 18 13:43:06 2013] [gorgo]: No problem. [Wed Dec 18 13:44:06 2013] secureendpoints: That... that seems like a bug. [Wed Dec 18 17:01:54 2013] [gorgo]: I have been away, but thanks for the very interesting note. [Wed Dec 18 17:06:49 2013] [gorgo] thanks for the file descriptor issue. 'lsof' indeed tells me it has *lots* of open fds [Wed Dec 18 17:09:48 2013] it has around 42k files in '/vicep*' open... [Wed Dec 18 17:11:55 2013] that sounds really excessive. [Wed Dec 18 17:12:25 2013] * looking at 'man dafileserver' to see if there is an option to limit that. [Wed Dec 18 17:13:11 2013] there seems to be '-vhandle-initial-cachesize ' [Wed Dec 18 17:13:22 2013] oops sorry mistpaste. [Wed Dec 18 17:13:43 2013] '-vhandle-max-cachesize ' seems apposite, probably with a limit of 1000· [Wed Dec 18 17:15:38 2013] <[gorgo]> walex: you should just ulimit it in the startup script [Wed Dec 18 17:16:09 2013] [gorgo]: but what if '-vhandle-max-cachesize' is bigger than that? Wouldn't that cause issues? [Wed Dec 18 17:16:30 2013] <[gorgo]> walex: the fileserver looks at the ulimit settings [Wed Dec 18 17:16:33 2013] [gorgo]: but no problem with 'ulimit' it. [Wed Dec 18 17:16:55 2013] [gorgo]: ahhhh very useful. So I'll do that, many thanks. [Wed Dec 18 17:16:56 2013] <[gorgo]> btw you should look at /proc//limits [Wed Dec 18 17:17:15 2013] <[gorgo]> and you should check whether the fileserver uses select [Wed Dec 18 17:17:25 2013] <[gorgo]> as I said, upstream 1.6.2 has some fixes to use poll [Wed Dec 18 17:19:39 2013] [gorgo]: yes, it uses 'select' (lots of them ;->) and it has 42k open files according to 'lsof', I was asking about 'ulimit' because the max is 4096. [Wed Dec 18 17:20:16 2013] <[gorgo]> you mean the proc limits file says 4k, and you have 42k? that doesn't sound right [Wed Dec 18 17:20:23 2013] [gorgo]: what if I do belt-and-braces and put in both 'ulimit -n 1001' and '-vhandle-max-cachesize 991'? [Wed Dec 18 17:20:46 2013] <[gorgo]> ulimit -n 1024 is okay [Wed Dec 18 17:20:54 2013] <[gorgo]> I don't think -vhandle-max-cachesize is needed [Wed Dec 18 17:22:38 2013] [gorgo]: well I think the 4k/42k is because 'dafileserver' is multihreaded... I can't remember, but I guess that each 'clone'd process has its own limit. [Wed Dec 18 17:22:48 2013] <[gorgo]> no, file descriptors are shared [Wed Dec 18 17:24:12 2013] curiously right now eahc of the threads has got around 900 fds open. [Wed Dec 18 17:24:27 2013] <[gorgo]> -vhandle-max-cachesize is actually something else [Wed Dec 18 17:24:38 2013] [gorgo]: i'll put in the 'ulimit' of course. [Wed Dec 18 17:25:09 2013] <[gorgo]> you need to set both the soft and hard ulimits [Wed Dec 18 17:25:22 2013] <[gorgo]> if you just set the soft ulimit, the fileserver will increase itself to the max. just looking at the code [Wed Dec 18 17:26:24 2013] <[gorgo]> the default is to set both, as far as I can see [Wed Dec 18 17:27:14 2013] <[gorgo]> btw you'd be better off if you upgraded to >= 1.6.2 :) [Wed Dec 18 17:30:43 2013] <[gorgo]> no, -vhandle-max-cachesize is indeed what you're interested in, but it is limited by ulimit [Wed Dec 18 17:41:37 2013] [gorgo]: unless there is a good reason our policy is to stick with Debian 'stable' packages... Now we have found a good reason idneed. [Wed Dec 18 17:42:08 2013] BTW I had noticed a while ago that Debian 7's 'wheeezy-backports' has got 1.6.5.1 so we are looking at that. [Wed Dec 18 17:43:02 2013] [gorgo]: thanks for looking at the code for me. This is a production cell we have just upgraded from 1.4.7 to 1.6.1 so we tend to be careful. [Wed Dec 18 17:43:10 2013] <[gorgo]> what do you mean by "each of the threads... 900 fds" ? [Wed Dec 18 17:43:20 2013] <[gorgo]> are you looking at fd's for each thread? but as I said they're shared [Wed Dec 18 17:45:34 2013] [gorgo]: 'ps ax' lists a single 'dafileserver' process, but each thread has got its own pid. 'lsof' lists for each open '/vicep' file both the top level "process" pid and the 'clone'd thread pid. [Wed Dec 18 17:46:28 2013] <[gorgo]> I'm not sure what lsof does there makes sense... you should see the same files open for each thread [Wed Dec 18 17:47:42 2013] [gorgo]: so for example 'lsof | grep /vicep' right now is ~39k but 'lsof | grep '2513 2532' | wc -l' is 1123 (2513 is the pid of 'dafileserver' and 2532 is the pid of a thread. [Wed Dec 18 17:48:52 2013] <[gorgo]> you should just check ls -l /proc/2513/fd | wc -l [Wed Dec 18 17:49:29 2013] [gorgo]: that right now is 1095. [Wed Dec 18 17:49:55 2013] [gorgo]: so perhaps 'lsof' does report all fds open for each thread. [Wed Dec 18 17:50:03 2013] [gorgo]: as you were saying [Wed Dec 18 17:50:28 2013] ls /proc/2513/fd | wc -l; ls /proc/2532/fd | wc -l [Wed Dec 18 17:50:39 2013] the result is 1083 both times. [Wed Dec 18 17:50:54 2013] so we are definitely above 1024, and using 'select'. [Wed Dec 18 17:51:00 2013] oh well... [Wed Dec 18 17:52:12 2013] <[gorgo]> what kernel version are you using? [Wed Dec 18 17:52:56 2013] <[gorgo]> I'm surprised that threads appear as separate /proc entries instead of /proc//task/ entries [Wed Dec 18 17:52:58 2013] [gorgo]: standard Debian 7/Wheezy, 3.2 with backports [Wed Dec 18 17:53:16 2013] <[gorgo]> but I haven't followed kernel internals regarding this... [Wed Dec 18 17:53:34 2013] [gorgo]: they are also in /proc/2513/task/ [Wed Dec 18 17:53:41 2013] <[gorgo]> ah. indeed [Wed Dec 18 17:53:47 2013] <[gorgo]> I've got 3.2 too [Wed Dec 18 17:54:30 2013] [gorgo]: I think that the 'task' is mostly to indicate a hierarchical relationship... [Wed Dec 18 17:54:34 2013] <[gorgo]> I don't have openafs running here [Wed Dec 18 17:54:53 2013] <[gorgo]> but if I run lsof -p , I don't get any filedescriptors shown [Wed Dec 18 17:55:36 2013] [gorgo]: I am running 'lsof' without any filtering options [Wed Dec 18 17:56:28 2013] <[gorgo]> lsof | grep firefox also shows only one pid [Wed Dec 18 17:56:33 2013] <[gorgo]> this is ubuntu though [Wed Dec 18 17:56:39 2013] <[gorgo]> lsof version 4.81 [Wed Dec 18 18:13:40 2013] [gorgo]: best way to bring down an active fileserver with stuck salvageservers? I am going to try for 'bos stop fs' and then 'bos start dafs' as usual... [Wed Dec 18 18:14:12 2013] or per Debian practice later '/etc/init.d/openafs-fileserver stop' and then '.... start' [Wed Dec 18 18:16:41 2013] scratch that. I need to certainly start the BOS with the modified 'init.d' script. I;'ll do a 'bos stop' and then '/etc/init,d/openafs-filserver stop'. [Wed Dec 18 18:38:26 2013] [gorgo]: it seems fine now with the 'ulimit' in. [Wed Dec 18 18:39:00 2013] [gorgo]: many thanks for the pointer to the 1.6.1 bug without which probably we would have had some good and some bad restarts. [Wed Dec 18 18:40:21 2013] I'll add that version-dependent issue to my list of OpenAFS hints http://www.sabi.co.uk/Notes/linuxFS.html#fsHintsAFS and report the bug to the Debian packager. [Wed Dec 18 18:40:32 2013] (who is often around this channel too) [Wed Dec 18 20:15:07 2013] * has updated http://www.sabi.co.uk/Notes/linuxFS.html#fsHintsAFS with a note about this bug. RSN will find the time to update the OpenAFS Wiki. [Wed Dec 18 23:50:49 2013] Walex: the file descriptor limit bug is not a 1.6.1 only issue. It was identified as an issue and fixed after 1.6.1 I would need to look at the commit logs to determine which release it was in. [Thu Dec 19 08:50:46 2013] secureendpoints: I am going to have a look at the commit too for the Debian package report. [Thu Dec 19 14:50:04 2013] <[gorgo]> walex: I believe this is the one: http://gerrit.openafs.org/8367 [Thu Dec 19 20:27:42 2013] doh, I wanted to submit a gerrit to fix the fedora 20 issue [Fri Dec 20 09:42:51 2013] AFS on top of ZFS came up in #lopsa which raises some questions that if anyone has a moment to answer: [Fri Dec 20 09:43:35 2013] On Solaris(-ish) operating systems, does AFS run as a kernel module or is it all user-space? [Fri Dec 20 09:44:02 2013] client or server? [Fri Dec 20 09:44:17 2013] ultimately I'm curious about both, but let's start with server. [Fri Dec 20 09:44:32 2013] fileserver is userspace everywhere. if you use the old deprecated inode server then there is a kernel module needed to support them [Fri Dec 20 09:45:07 2013] and when running on top of ZFS, does one generally give AFS zvols to work with, does it talk directly to the DMU, or dies it use filesystems? [Fri Dec 20 09:45:16 2013] *does [Fri Dec 20 09:45:25 2013] it uses filesystems [Fri Dec 20 09:46:01 2013] ZFS does not document enough stuff to use it directly (there were indications that it was moving in that direction, until Oracle acquired it) [Fri Dec 20 09:46:31 2013] so does it just drop big opaque files into the filesystem? [Fri Dec 20 09:46:38 2013] there was actually a recent discussion on openafs-devel about zfs / btrfs usage [Fri Dec 20 09:47:00 2013] yes, one file per volume [Fri Dec 20 09:48:01 2013] nobody is opposed to different backends for the server, but zfs got closed before the necessary interfaces became available and nobody trusts btrfs enough yet [Fri Dec 20 09:48:44 2013] Right. If the biggest AFS users are paying Oracle for Solaris 11, then it's all closed and that's annoying. [Fri Dec 20 09:49:14 2013] ??? [Fri Dec 20 09:49:17 2013] If they moved to e.g. OmniOS, they'd get the open-source fork. [Fri Dec 20 09:50:29 2013] https://lists.openafs.org/pipermail/openafs-devel/2013-December/019701.html ff. for the btrfs discussion, where zfs also comes up [Fri Dec 20 09:53:04 2013] right. https://lists.openafs.org/pipermail/openafs-devel/2013-December/019707.html [Fri Dec 20 09:53:57 2013] If I understand his point, there isn't currently a nice clean API for talking directly the DMU other than living in the kernel and getting tightly integrated in there. [Fri Dec 20 09:55:19 2013] anyway, geekosaur, thanks for taking the time to answer my questions. :) [Fri Dec 20 10:10:23 2013] as for getting customers to move off Oracle Solaris, we only just finally got one of them to get rid of HP/UX... [Fri Dec 20 10:11:00 2013] haha [Fri Dec 20 10:11:53 2013] OmniTI offers support for a comparable price to Oracle's but they offer source code too. [Fri Dec 20 10:13:04 2013] it is not impossible that some of them are considering it. we would not be involved in that, and they tend to move glacially slowly [Fri Dec 20 10:19:09 2013] indeed [Fri Dec 20 12:20:10 2013] <[gorgo]> nahamu: running afs on top of zfs is good even now, compression is really useful [Fri Dec 20 12:20:37 2013] [gorgo]: indeed. [Sun Dec 22 02:05:44 2013] The on-disk Ubik header structure really is {magic, pad1, size, version}, yes? What does it mean when pad1 is non-zero in disagreement with the commentary? [Sun Dec 22 02:07:17 2013] (I just brought another VL server online and noticed that the database files did not have matching sha1s. The sole discrepancy is a pad1 of 0 on the new replica and 08 bc on the original.) [Mon Dec 23 06:10:33 2013] c [Mon Dec 23 06:10:38 2013] oops, sorry. [Mon Dec 23 09:37:08 2013] so, trying (and failing) to follow that 1.6.6pre and nat thread... something is broken w.r.t. openafs and firefox ? [Mon Dec 23 09:37:49 2013] firefox hates networked home directories period [Mon Dec 23 09:38:05 2013] ok.... [Mon Dec 23 09:38:19 2013] we've been running firefox 17 ESR on linux and windows with homedirs in afs, no problem [Mon Dec 23 09:38:38 2013] on linux, one's homedir is /afs/cnf.cornell.edu/home/username [Mon Dec 23 09:38:54 2013] on windows, using Folder Redirection to redirect one's profile to AFS [Mon Dec 23 09:38:56 2013] there's a switch in about:config that makes it behave somewhat better [Mon Dec 23 09:39:12 2013] well, it will work. it's just very slow [Mon Dec 23 09:39:55 2013] how do you mean slow? I honestly haven't noticed any speed difference from my running it locally on my linux box (homedir local) as compared to the windows and linux user machines [Mon Dec 23 09:40:22 2013] Oooh, someone at LISA this year taught me a trick to speed that up. [Mon Dec 23 09:40:23 2013] maybe you already hit that switch, then. or moved the cache to local disk or tmpfs, etc. [Mon Dec 23 09:40:30 2013] we're running 1.6.5 on linux, 1.5.78 on windows, and 1.4.(15?) on the server... so did some other version change or break something? [Mon Dec 23 09:40:33 2013] (firefox with homedir over network file system): [Mon Dec 23 09:40:43 2013] the firefox cache is in the default location... we haven't set any switches [Mon Dec 23 09:40:46 2013] no, this has always been an issue with firefox [Mon Dec 23 09:40:54 2013] set XDG_CACHE_HOME [Mon Dec 23 09:41:42 2013] though AFS probably handles that much more gracefully than NFS does... [Mon Dec 23 09:41:43 2013] that sounds like a linux specific setting, and not one in about:config [Mon Dec 23 09:42:12 2013] not linux specific, firefox uses XDG dirs everywhere it runs [Mon Dec 23 09:42:24 2013] well, on windows it probably doesn't but windows has its own versions anyway [Mon Dec 23 09:42:52 2013] ok, a non-windows environtment variable (and possibly non-mac, too, but we don't do any mac afs homedirs) [Mon Dec 23 09:43:13 2013] It's a *nix environment variable that a lot of applications will respect. [Mon Dec 23 09:43:15 2013] storage.nfs_filesystem is the about:config tweak [Mon Dec 23 09:43:22 2013] so, this "slow" issue, you're only seeing it on unix/linux, and only with nfs ? [Mon Dec 23 09:43:27 2013] geekosaur: ooh, what does that one do? [Mon Dec 23 09:43:59 2013] mostly tells firefox to access its sqlite databases in a way that works better for network filesystems [Mon Dec 23 09:44:18 2013] that sounds highly useful to me. [Mon Dec 23 09:44:19 2013] so sqlite doesn't go stupid slow because omg it might be shared!!! [Mon Dec 23 09:44:22 2013] course, my non-afs homedir rhel6.5 firefox runs slow whenever javascript or flash gets involved [Mon Dec 23 09:44:28 2013] yeah, it's crazy. [Mon Dec 23 09:45:36 2013] (it can't be shared anyway because firefox locks its profiles against multiple access, but sqlite doesn't know that) [Mon Dec 23 09:45:45 2013] so, setting that and/or setting the XDG_CACHE_HOME works around the issue ? [Mon Dec 23 09:46:01 2013] each of them solves a different issue although there is overlap in the case of the cache [Mon Dec 23 09:46:03 2013] XDG_CACHE_HOME moves all the web page cache files to the place you specify. [Mon Dec 23 09:46:10 2013] right [Mon Dec 23 09:46:13 2013] firefox also uses sqlite databases for other things though [Mon Dec 23 09:46:37 2013] so you really want to do both [Mon Dec 23 09:46:43 2013] it uses the sqlite databases for a ton of stuff. I'm glad there's an improvement with that storage.nfs_filesystem setting. [Mon Dec 23 09:46:56 2013] shame it doesn't just autodetect that. [Mon Dec 23 09:47:42 2013] autodetecting nfs is platform specific, autodetecting afs is either trivial or close to impossible depending on whether you want to do it cheerful-charlie or for real [Mon Dec 23 09:48:02 2013] (cheerful-charlie method being look for paths starting with /afs/...) [Mon Dec 23 09:58:22 2013] RedBear: Is this a relatively new AFS deployment? [Mon Dec 23 09:58:43 2013] or are the number of users going up that you're starting to see performance issues? [Mon Dec 23 09:59:12 2013] * reads more carefully. [Mon Dec 23 09:59:42 2013] no, it's not [Mon Dec 23 09:59:45 2013] and no they aren't [Mon Dec 23 09:59:57 2013] indeed :) [Mon Dec 23 09:59:59 2013] * gives up. What changed that's causing the pain? [Mon Dec 23 10:00:29 2013] oh, was I right that it's increasing number of users creating more load? [Mon Dec 23 10:01:06 2013] * drinks more coffee. [Mon Dec 23 10:01:10 2013] *we* are not seeing performance issues [Mon Dec 23 10:01:43 2013] someone else on the AFS mailing list? [Mon Dec 23 10:01:48 2013] someone else [Mon Dec 23 10:02:00 2013] I'm just wondering if upgrading to newer oafs is going to cause me to start seeing them [Mon Dec 23 10:02:09 2013] Ah, I see. [Mon Dec 23 10:02:38 2013] I've seen a different bug pop up a few times... where processes running out of oafs can't determine the current working dir [Tue Dec 24 09:37:00 2013] Does anyone know if it's easy to use a FreeIPA Kerberos Domain for AFS? [Tue Dec 24 09:48:13 2013] you need to use its mechanisms to add principals and extract keytabs. looks to me like it should just work [Tue Dec 24 09:48:25 2013] cool [Tue Dec 24 09:48:55 2013] I had a recollection of ZFS needing its own custom Kerberos domain, from the documentation it looks like that's just "the bad old days". [Tue Dec 24 09:48:58 2013] *AFS [Tue Dec 24 09:49:34 2013] mrr? once upon a time it used kaserver, yes. we very, very strongly recommend against doing that now [Tue Dec 24 09:49:49 2013] freeipa is mit kerberos underneath, that should work fine [Tue Dec 24 09:49:54 2013] cool [Tue Dec 24 09:50:53 2013] looking at http://docs.openafs.org/QuickStartUnix/index.html#HDRWQ53.html it seems like it's explicitly asking for DES.... can it not use AES? [Tue Dec 24 09:51:14 2013] the quick start is out of date [Tue Dec 24 09:51:27 2013] as of 1.6.5 there is rxkad-k5, also strongly recommended [Tue Dec 24 09:51:32 2013] is there better documentation somewhere? [Tue Dec 24 09:52:24 2013] (I will hopefully be working on that quickstart guide next year; it hasn't been updated since openafs 1.4.10) [Tue Dec 24 09:53:19 2013] there isn't much in the way of user documentation for rxkad-k5 yet, just the rekeying docs [Tue Dec 24 09:53:33 2013] I found this: http://www.openafs.org/pages/security/install-rxkad-k5-1.6.txt [Tue Dec 24 09:54:16 2013] yes, that's the rekeying doc. note that it assumes an existing des-based cell with multiple servers [Tue Dec 24 09:56:15 2013] so in theory setting up a new cell one should be able to use what I would call a "regular" keytab with a nice assortment of keys for multiple cipher/mac/whatevery things? [Tue Dec 24 09:57:28 2013] yes, as of 1.6.5. (in earlier versions, DES was all you could use) [Tue Dec 24 09:58:27 2013] And for new deployments is 1.7 series recommended or 1.6.5.2 ? [Tue Dec 24 09:58:34 2013] 1.6.5.2 [Tue Dec 24 09:58:57 2013] 1.7 is more or less Windows client only [Tue Dec 24 09:58:58 2013] Is 1.7 like the old linux 2.5 / 2.3 kernel series versions? [Tue Dec 24 09:59:01 2013] ah [Tue Dec 24 10:00:04 2013] it's supposed to be a development branch like on linux, but that's not how things worked out. (they're going to fix that on the next major release, I gather.) [Tue Dec 24 10:00:20 2013] cool [Tue Dec 24 10:01:55 2013] so anyway, while you can't use this directly it does go over starting up an kxkad-k5 cell directly: http://wiki.openafs.org/SolarisQuickStart/#index4h2 [Tue Dec 24 10:02:03 2013] *rxkad-k5 [Tue Dec 24 10:02:55 2013] nice [Tue Dec 24 10:04:32 2013] and I'll be walking through setting up a cell on fedora soonish, although I'm unlikely to use freeipa since the machine I have available for it is rather small (it's an ancient netbook) [Tue Dec 24 10:05:32 2013] If you had an account with any of the cloud providers I'd suggest just firing up an instance long enough to do the work and then shutting it down. [Tue Dec 24 10:06:26 2013] meh. I can do lots of VMs locally or use our devlab for experimentation, but I am specifically setting up a small local cell [Tue Dec 24 10:06:53 2013] As long as you're doing it on the netbook intentionally. :) [Tue Dec 24 10:07:25 2013] and I can't set it up on my desktop because OS X doesn't make a very good openafs server (I think it's known not to work quite right) [Tue Dec 24 10:08:52 2013] Given that AFS servers run entirely in user space I have some crazy ideas for running it in SmartOS zones. [Tue Dec 24 10:09:04 2013] Something I'll probably be experimenting with some time next year. [Tue Dec 24 10:09:32 2013] at some point I'll be replacing the netbook with something more appropriate, but I kinda need the cell soonish and am not in a position to get a sensible server in place right now. [Tue Dec 24 10:10:01 2013] Ah, need for a real cell as opposed to just doing it academically in a disposable environment. [Tue Dec 24 10:10:25 2013] yes [Tue Dec 24 10:10:33 2013] out of curiosity, how much RAM and disk in the netbook? [Tue Dec 24 10:10:52 2013] I got lots of disposables, including a 12-VM setup hosted on my desktop for server stress testing :) [Tue Dec 24 10:11:04 2013] nice [Tue Dec 24 10:11:24 2013] the netbook is 2GB RAM and 100GB onboard disk, with a 3TB drive external for the server partitions [Tue Dec 24 10:11:45 2013] (which will initially be usb connected, but when I get a real server I can use esata) [Tue Dec 24 10:12:09 2013] so it will run on the netboot until you can move the drive to a more logical host. [Tue Dec 24 10:12:15 2013] *netbook. [Tue Dec 24 10:12:19 2013] (and am seriously considering smartos for that, although solaris-based stuff won't work properly on the netbook, sigh.) [Tue Dec 24 10:12:48 2013] yeah, it's annoying that SmartOS doesn't run on lots of older crappy hardware. [Tue Dec 24 10:13:00 2013] on the other hand, it kind of keeps me from using older crappy hardware. :) [Tue Dec 24 10:14:52 2013] anywhow, thanks again for your time. :) [Tue Dec 24 10:15:03 2013] * gives up on typing for the day... [Wed Dec 25 01:19:09 2013] Would a patch which added 'vos releasesys' (analagous to 'vos backupsys' except wrapping vos release instead) be welcomed? [Wed Dec 25 23:18:26 2013] If I were to make "somewhat extensive" changes to vos's internals, how comprehensive is the test suite in making sure I haven't f'd up? ;) [Wed Dec 25 23:19:31 2013] (I am thinking about how to make a generic "vos system" command, which could wrap around {backup, examine, release, ...} but that depends on all of those commands having more of their argument handling in common than might currently be the case.) [Wed Dec 25 23:31:53 2013] nwf - w. the holiday break,you might get a better response by emailing openafs-devel [Wed Dec 25 23:42:59 2013] there has been talk about that, you definitely want to ask on -devel\ [Thu Dec 26 00:42:06 2013] nwf: openafs has minimal test suites. [Thu Dec 26 00:46:39 2013] OK. For fear of the internals of vos I am prototyping with my own command. [Thu Dec 26 00:46:53 2013] It's very informative, at least, to write from scratch though I am not sure it's... efficient. :) [Sun Dec 29 02:30:39 2013] Would it ever make sense for a volume to have both ITSROVOL and ITSRWVOL asserted at the same time? Why are these a bitmask rather than an enumeration? o.O [Sun Dec 29 09:48:55 2013] nwf: irc is not the best place to ask development questions. you will do better if you use the openafs@jabber.openafs.org conference room. [Sun Dec 29 09:57:42 2013] nwf: The meaning of the serverFlags bit values are described in section 3.2.6 of doc/pdf/vvl-spec.pdf. The setting of a bit indicates that a volume of that type is present on the specified server/partition. In practice I believe that they are only ever used one at a time but the protocol description does permit them being combined. [Sun Dec 29 21:37:42 2013] Hrm; what jabberd is jabber.openafs.org? [Sun Dec 29 21:38:43 2013] For some reason my bitlbee setup isn't working with it (but works with conference.jabber.org). Sigh. :( [Sun Dec 29 21:38:52 2013] i think ejabberd [Sun Dec 29 21:40:05 2013] I apologize for the rapid-fire bouncing in and out -- apparently the XMPP join works but I don't get a buffer created IRC-client side. :( [Tue Dec 31 01:25:26 2013] Hm. I have attempted to join openafs@conference.openafs.org, but it is not working; am I stuck in some pending queue or is my client just @#&@^#*@^'d? [Tue Dec 31 06:20:58 2013] there is no pending queue. if your jabber server's federation is flaky is can fail tho [Sun Jan 5 21:01:49 2014] kaduk_: No, but it's not really that important. :) [Mon Jan 6 00:02:26 2014] nwf: anyway, "d8fa251 ubik: Zero header before writing to disk" seems likely to be related :) [Mon Jan 6 00:02:42 2014] Ah ha! [Mon Jan 6 11:01:29 2014] * compiles 1.6.5.2 [Mon Jan 6 12:40:41 2014] * installs 1.6.5.2 [Mon Jan 6 12:45:35 2014] <[gorgo]> * wonders what the next verb will be [Mon Jan 6 12:57:49 2014] * crashes 1.6.5.2 [Mon Jan 6 12:58:02 2014] (not really) [Mon Jan 6 12:58:32 2014] oh? [Mon Jan 6 12:58:35 2014] :) [Tue Jan 7 00:31:12 2014] [freenode-info] please register your nickname...don't forget to auto-identify! http://freenode.net/faq.shtml#nicksetup [Tue Jan 7 14:54:47 2014] BTW BTW we finally managed to get our cell's server all upgraded from 1.4.7 to 1.6.5.2 (from Debian 7/Wheezy backports). Hurrah! :-) [Tue Jan 7 14:55:00 2014] Congrats! [Tue Jan 7 14:55:40 2014] so I am no longer a worried 1.4.7 guy :-) [Tue Jan 7 14:56:13 2014] kaduk_: thanks! also to your hints. 1.4.7 and 1.6 DB servers were indeed compatible, and all that. [Tue Jan 7 14:58:40 2014] there was some weird stuff though with "quorum", where I discovered that the tools don't seem to use at all the same logic as the client... [Tue Jan 7 14:59:42 2014] but that is probably for the mailing list, as it is a moderately weighty issue. [Tue Jan 7 15:01:37 2014] basically: 'vos' does not do "quorum", as far as I can see, or has a really huge timeout. can cause problems if some DB server is unavailable. [Tue Jan 7 15:10:28 2014] <[gorgo]> rx timeout is about 60 seconds by default, if it does not get a reply [Tue Jan 7 15:11:07 2014] <[gorgo]> and does not query db servers in parallel - this could probably be modified [Tue Jan 7 15:14:18 2014] [gorgo]: the impression I got from 'strace' is that 'vos' reads CellServDB, chooses "at random" a DB server and then sticks with it for a long time. If it happens to choose one that is offline, it gets stuck, [Tue Jan 7 15:15:12 2014] Walex: fs getserverprefs -vlservers [Tue Jan 7 15:15:28 2014] which had for me some non-fatal but unpleasant consequences: 'vos dump' gets stuck during cloning, the original volume is locked, and all accesses to it hang [Tue Jan 7 15:16:37 2014] sur5r: but the 'fs getserverprefs' asks the *client*. I think that 'vos' and the other tools don't use the client... [Tue Jan 7 15:16:48 2014] good point [Tue Jan 7 15:17:07 2014] There's a fair bit of stuff that goes through the client that technically need not do so. [Tue Jan 7 15:17:22 2014] (I don't know about this case in particular off the top of my head.) [Tue Jan 7 15:18:33 2014] but how will vos stick to a certain vlserver on two independent invocations? [Tue Jan 7 15:18:41 2014] i'm not aware of any state file [Tue Jan 7 15:18:45 2014] kaduk_: in the particular case 'vos dump' was run on a client, but I also tried on a fileserver that does not have a client installed. [Tue Jan 7 15:19:22 2014] vos does not maintain any server state for up/down for RPCs even during a single invocation [Tue Jan 7 15:19:38 2014] vos does not query the cache manager [Tue Jan 7 15:19:49 2014] sur5r: on each invocation it choose a DB server "at random". So for example with 4 DB servers, one of which is down, usually 1 'vos' invocation in 4 will get stuck on the DB server that is down. [Tue Jan 7 15:21:40 2014] if that is 'vos dump' it will in effect make 1 dumped in four volume unaccesible by clients. At least that's what I think I saw. [Tue Jan 7 15:23:18 2014] as soon as the 4th offlined DB server was taken out of CellServDB 'vos dump's worked immediately. [Tue Jan 7 15:23:27 2014] for starters. were you using the 1.6 vos or the 1.4 vos? [Tue Jan 7 15:23:51 2014] secureendpoints1: at that point the 1.4.12 'vos'. [Tue Jan 7 15:24:09 2014] there have been many many changes since 1.4 [Tue Jan 7 15:24:13 2014] secureendpoints1: good point. I'll check with the 1.6 'vos'. [Tue Jan 7 15:24:34 2014] secureendpoints1: Ah yes, the whole thing was part of the upgrade from 1.4.7 to 1.6.5.2 [Tue Jan 7 15:26:16 2014] <[gorgo]> walex: it fails over eventually to the working ones [Tue Jan 7 15:27:15 2014] what I had hoped would happen: the tool cycles quickly through all servers in 'CellServDB', until it finds the syncsite, which must be up, and then gets the list of up servers from the syncsite, and chooses one at random. [Tue Jan 7 15:27:58 2014] servers that are up from the perspective of the syncsite are not necessarily accessible from the 'vos' instance [Tue Jan 7 15:28:10 2014] [gorgo]: what I saw was it would not fails over. [Tue Jan 7 15:28:21 2014] The 'vos' instance has to rely on its own experience. [Tue Jan 7 15:28:22 2014] <[gorgo]> walex: well, I don't have experience with 1.4.7 [Tue Jan 7 15:28:23 2014] secureendpoints1: that would be unfortunate indeed. [Tue Jan 7 15:28:36 2014] <[gorgo]> but I do have experience with some newer 1.4 stuff [Tue Jan 7 15:28:58 2014] [gorgo]: that was 1.4.12 backported on Debian 5/Lenny [Tue Jan 7 15:29:12 2014] <[gorgo]> for example vos could start by an rx ping of all db servers with a very short timeout simultaneously [Tue Jan 7 15:29:23 2014] [gorgo]: that would be good [Tue Jan 7 15:30:17 2014] <[gorgo]> I've also been bitten by db servers delaying vos operations, but that was only delays [Tue Jan 7 15:33:42 2014] [gorgo]: yes, that was what I would expect too. perhaps in our case it was related specifically do 'vos dump' and it locking the RW volume being dumped while cloning it and then perhaps timing out. [Tue Jan 7 15:35:07 2014] the cache clients had some hiccups too, but they seemed to find one of the online DB servers in around 15 seconds, so very moderate delays. [Tue Jan 7 15:36:42 2014] <[gorgo]> btw was the host down, or was just the vlservice not running? [Tue Jan 7 15:37:36 2014] [gorgo]: the particular case the host was down, ut I think also with the VL not running. [Tue Jan 7 15:37:50 2014] <[gorgo]> currently rx does not deal with icmp dest unreachable packets [Tue Jan 7 15:38:07 2014] <[gorgo]> but this could also be improved [Tue Jan 7 15:38:41 2014] [gorgo]: yes, but then it ought to timeout quickly and try another one, instead of timing out slowly and trying the same repeatedly. [Tue Jan 7 15:39:02 2014] [gorgo]: but I shall try again with the 1.6.5.2 'vos' as per suggestion above. [Tue Jan 7 15:39:17 2014] <[gorgo]> on master there are patches to handle the icmp errors - at least on linux [Tue Jan 7 15:41:23 2014] <[gorgo]> lowering the timeout in general for rx may not be a good idea, and you can't lower it low enough to have no effect at all [Tue Jan 7 15:41:33 2014] <[gorgo]> a parallel discovery should be much faster [Tue Jan 7 15:42:05 2014] [gorgo]: but a more complicated patch :-) [Tue Jan 7 15:42:42 2014] the ability to process icmp is entirely up to the operating system. most OSes do not deliver icmp data for udp or delivers it too late for it to be useful on sockets that are used for communication with multiple endpoints [Tue Jan 7 15:43:18 2014] also now tht I think of it, the ICMP is not guaranteed either... Blackholes happen [Tue Jan 7 15:43:41 2014] <[gorgo]> not guaranteed, but if it's there it could be used. and at least on linux it seems usable [Tue Jan 7 15:43:56 2014] [gorgo]: good point too [Tue Jan 7 15:44:07 2014] * is behind a firewall that blocks ICMP [Tue Jan 7 15:44:25 2014] (Well, not the host I IRC from, but the laptop I'm on.) [Tue Jan 7 15:44:30 2014] kaduk_: that's really sad... :-) [Tue Jan 7 15:44:50 2014] Yeah, I complained about it, but didn't come up with any particularly compelling counterarguments. [Tue Jan 7 15:46:07 2014] kaduk_: well, for IPv6 ICMP is mandatory, and for IPv4 almost all ICMP packet types are entirely harmless. People who block all ICMP do it just "because SECURITY!" :-) [Tue Jan 7 15:46:38 2014] My network admin seems to be able to get away with saying "I have a class-A; I don't care about IPv6" ;) [Tue Jan 7 15:46:55 2014] <[gorgo]> also you can do proper mtu discovery in rx if you can process icmp errors [Tue Jan 7 15:47:26 2014] <[gorgo]> walex: once I had an experience with a firewall that blocked ports > 10000, because "those are used by hackers" [Tue Jan 7 15:47:43 2014] [gorgo]: that's one of the best I have heard! [Tue Jan 7 15:47:46 2014] <[gorgo]> this was at an isp. 15 years ago :) [Tue Jan 7 15:49:30 2014] kaduk_: this might help http://tools.ietf.org/html/draft-ietf-opsec-icmp-filtering-03 [Tue Jan 7 15:50:23 2014] it is pretty official looking even if an expired draft. Might have been enacted though [Tue Jan 7 15:50:51 2014] It has an -04 that only expired three days ago. [Tue Jan 7 15:51:04 2014] <[gorgo]> I'm actually running clients with the icmp error handling patch [Tue Jan 7 15:51:28 2014] I have written my firewall scripts using those recommendations [Tue Jan 7 15:52:03 2014] the ICMP bits that is. [Tue Jan 7 15:53:44 2014] oops time to sleep, got up early today to look at the sync site upgrade, all went fine. [Tue Jan 7 15:53:57 2014] bye! thanks again for the various suggestions [Tue Jan 7 15:54:15 2014] and I shall update my notes and the Wiki on the site etc. [Wed Jan 8 14:52:17 2014] [freenode-info] channel flooding and no channel staff around to help? Please check with freenode support: http://freenode.net/faq.shtml#gettinghelp [Thu Jan 9 13:37:19 2014] how do I get the win afs client to show hidden dot files, again? [Thu Jan 9 13:56:20 2014] RedFyre: HideDotFiles registry key [Thu Jan 9 13:57:02 2014] http://docs.openafs.org/ReleaseNotesWindows/appendix_a.html [Thu Jan 9 14:01:31 2014] shoot... not something I can easily turn on or off [Thu Jan 9 15:13:49 2014] ...oh wait, probably can't get away with macports gcc48 to build 1.6.6pre2 working around that used-uninitialized -Werror fail, sigh [Sat Jan 11 05:15:36 2014] nick jmdh [Mon Jan 13 10:47:30 2014] so, from the source, what do I need to run in order to build mac os binaries (presumably something from src/packaging/MacOS ) ? [Mon Jan 13 10:56:46 2014] You mean, like, installer .dmgs? [Mon Jan 13 10:56:49 2014] http://openafs-wiki.stanford.edu/MacOSXbuild/ note that, at least with 1.6, --with-krb5-conf does not work, PATH_KRB5_CONFIG must be set in the environment instead [Mon Jan 13 10:56:51 2014] yeah [Mon Jan 13 10:57:29 2014] yeah, that doesn't tell what to run, tho [Mon Jan 13 10:57:43 2014] I found a list post from 2005 that says to run "make all" "make dest" "make packages" [Mon Jan 13 10:57:47 2014] you will need to get Packagemaker from somewhere, I dug it out of an old xcode additional tools [Mon Jan 13 10:58:08 2014] it doesn't? "and then the installer is created, as root: [command follows]" [Mon Jan 13 10:58:18 2014] this worked for me on 10.9 [Mon Jan 13 10:58:23 2014] oh god that's hard to read [Mon Jan 13 10:58:24 2014] ok [Mon Jan 13 10:58:45 2014] except I had to drop both --enable-warnings and --enable-checking because of some type mismatches that kaduk is poking at [Mon Jan 13 10:58:55 2014] do I need to do the ARCH things realy? [Mon Jan 13 10:59:04 2014] I think only if you want a universal package [Mon Jan 13 10:59:23 2014] does Mavericks even run on non x86_64 stuff? [Mon Jan 13 10:59:29 2014] no [Mon Jan 13 10:59:49 2014] then universal is pointless for this build :) [Mon Jan 13 11:00:15 2014] pretty sure those instructions are for making the official package; arch i386 is pointless from Snow Leopard on, tbh [Mon Jan 13 11:00:19 2014] I don't think I need the enable-transarc-paths either, since on Mac, stuff is stored in other weird places? Or do I? [Mon Jan 13 11:00:42 2014] I think the "weird places" are predicated off it though [Mon Jan 13 11:00:49 2014] ok [Mon Jan 13 11:01:00 2014] would not bet on it making any sensible paths if you used LHS paths [Mon Jan 13 11:01:52 2014] lhs ? [Mon Jan 13 11:03:00 2014] er, FHS / LSB [Mon Jan 13 11:03:08 2014] linux pseudostandards [Mon Jan 13 11:04:10 2014] well, we're talking mac :) [Mon Jan 13 11:05:16 2014] ok, yup.. pckagemaker not found [Mon Jan 13 11:06:08 2014] seems packagemaker was replaced with "productbuild" [Mon Jan 13 11:21:40 2014] yes, but the current packaging doesn't know that (or how to use it) [Mon Jan 13 11:25:00 2014] ok, I found the gui version of PackageMaker in the July 2012 Aux tools for XCode [Mon Jan 13 11:26:18 2014] maybe commandline in the pkg... [Mon Jan 13 11:28:19 2014] let's see if adding that to my path works [Mon Jan 13 11:31:12 2014] I didn't need to addd to $PATH, just dropped it in /Applications [Mon Jan 13 11:38:20 2014] well, I dropped the .pkg in there [Mon Jan 13 11:38:30 2014] but the contents of that seemed to not be in my path, still [Mon Jan 13 11:40:45 2014] "there were errors with the installation" [Mon Jan 13 11:41:23 2014] of course, it doesn't tell me what [Mon Jan 13 11:41:33 2014] ah, log file [Mon Jan 13 11:42:44 2014] hrm... "dubious ownership on file (skipping): /Library/LaunchDaemons/org.openafs.filesystems.afs.plist [Mon Jan 13 11:43:07 2014] do I need to be root when I run the "make packages" step? [Mon Jan 13 11:43:33 2014] or do I need to be root the whole damn time? [Mon Jan 13 11:45:43 2014] that page I pointed to said "as root" for "make packages" [Mon Jan 13 11:45:48 2014] this worked for me [Mon Jan 13 11:46:08 2014] I do see that, now :) [Mon Jan 13 11:46:45 2014] * sudo's [Mon Jan 13 11:48:16 2014] got a warning about the kernel extension not being from an identified developer (but will still be loaded) [Mon Jan 13 11:49:01 2014] yeh, that's Mavericks for you [Mon Jan 13 11:49:25 2014] we'll need to solve that at some point because reportedly the next major version will only install signed kernel extensions [Mon Jan 13 11:52:48 2014] hrm... afs appears to not have actually started after the installer ran [Mon Jan 13 11:52:56 2014] in fact, "launchafs.sh" is still running [Mon Jan 13 11:53:28 2014] yeah, it's giving a "validation failure" on the kext [Mon Jan 13 11:54:32 2014] Invalid signature -67062 for kext "/var/db/openafs/etc/afs.kext" [Mon Jan 13 11:54:37 2014] I think the behavior is differnet for kexts in /Library/Extensions and /System/Library/Extensions [Mon Jan 13 11:54:57 2014] I think it was /Library that has the old behavior of allowing unsigned modules, but the other one requires the signature. [Mon Jan 13 11:55:11 2014] well, it's in neither of those two places [Mon Jan 13 11:55:48 2014] do I need to copy it to /Library/Extensions ? [Mon Jan 13 11:56:10 2014] I don't think so? [Mon Jan 13 11:57:07 2014] well, you saw the error I am getting [Mon Jan 13 11:57:58 2014] Just a point for the future - YFS's medium term plan is to make a signed client and installer for Mavericks freely available [Mon Jan 13 11:57:59 2014] did you set the thing in system preferences to allow unsigned stuff to begin with? it defaults off [Mon Jan 13 11:58:22 2014] "the thing"... no idea what you're talking about :) [Mon Jan 13 11:58:37 2014] it's in security settings [Mon Jan 13 11:58:58 2014] the "allow apps from" piece? [Mon Jan 13 11:59:11 2014] yes [Mon Jan 13 11:59:17 2014] "Anywhere" setting [Mon Jan 13 11:59:27 2014] that doesn't fix it [Mon Jan 13 12:01:25 2014] hunh... this one doc for 10.9 GM claims kexts outside of /Library/Extensions will oad if unsigned or sign verificatione rror [Mon Jan 13 12:01:52 2014] ok, -67062 is "not signed" [Mon Jan 13 12:14:59 2014] ah, here we go... real error output [Mon Jan 13 12:15:20 2014] "executable file doesn't contain kernel extension code (no kmod_info symbol or bad Macho-O layout) [Mon Jan 13 12:59:26 2014] * specifies the archflags this time [Mon Jan 13 13:03:50 2014] k. that made the diff [Mon Jan 13 13:04:08 2014] tho, interesting that "launchafs" does not seem to exit [Mon Jan 13 13:13:08 2014] while true; do sleep 20; done [Mon Jan 13 13:13:11 2014] not gonna exit [Mon Jan 13 13:13:30 2014] presumably to keep launchd from concluding that it needs to be restarted [Mon Jan 13 13:16:04 2014] hrm... drop folders (li perms) seem to not work [Mon Jan 13 13:16:25 2014] er, well, finder just shows a blank folder [Mon Jan 13 13:31:49 2014] oh, it's finally showing a very small subset of what's in the folder [Tue Jan 14 11:02:25 2014] is there any sneaky tricks for removing references to a cell that has been removed from the CellServDB and DNS records, but the client is still running? [Tue Jan 14 11:02:47 2014] since we have dynroot, it still shows up on systems that haven't rebooted [Tue Jan 14 11:04:53 2014] the man page fs_newcell seems to indicate you will be unhappy until you reboot [Tue Jan 14 11:05:01 2014] I figured as much [Tue Jan 14 11:05:09 2014] good thing today is patch tuesday [Tue Jan 14 11:06:03 2014] I just sent mail to 4help telling them to remove engin.umich.edu from their cellservdb. I can't wait to see how they handle my request. [Tue Jan 14 11:07:43 2014] reminds me, I need to update oafs on windows to 1.7.x [Tue Jan 14 11:08:25 2014] * still needs to build 1.6.6pre2 for his f19 container [Tue Jan 14 11:09:09 2014] oh, hm, there are rpms now. [Tue Jan 14 11:38:03 2014] Uh... oh boy. On one of our servers, the salvager is deleteting all the files on volumes. [Tue Jan 14 11:38:17 2014] Thankfully most of it is backed up, but that's not good. [Tue Jan 14 11:38:21 2014] * has shut the server down [Tue Jan 14 11:38:53 2014] ouch [Tue Jan 14 11:46:19 2014] That's putting it mildly. :) [Tue Jan 14 11:46:56 2014] Thankfully most (though sadly not all) of the impacted volumes had replicas elsewhere that I can (quickly!) vos dump before trying to fix anything. [Tue Jan 14 12:55:40 2014] Argh, of course... I brought the server back and have been carefully probing at it, and it's not freaking out any more. [Tue Jan 14 12:55:44 2014] * sigh [Tue Jan 14 12:58:26 2014] did you save the logs from the salvager before the restart? [Tue Jan 14 12:58:31 2014] Yes [Tue Jan 14 12:59:38 2014] Though I fear they will be less informative without 'ls -laR' information to go with them. [Tue Jan 14 13:00:13 2014] what reason is listed in the logs for the changes to the volumes? [Tue Jan 14 13:02:51 2014] I'm sorry, I don't understand what line(s) you mean? [Tue Jan 14 13:03:20 2014] Aaaaand it just deleted more volumes. OK, time to shut this thing down again. [Tue Jan 14 13:04:57 2014] FileLog says only things like "Tue Jan 14 09:17:52 2014 Volume 536871081 now offline, must be salvaged.". [Tue Jan 14 13:05:20 2014] I'm asking about the Salvage logs [Tue Jan 14 13:06:30 2014] The SalsrvLog has a lot of "Vnode 2 (unique 2): corresponding inode 8589934594 is missing; vnode deleted, vnode mod time=Sun Dec 1 21:38:43 2013", the occasional smattering of "dir vnode 1: ./bin (vnode 437): unique changed from 644 to 0 -- deleted" and fewer other lines. [Tue Jan 14 13:07:04 2014] that implies the files are already gone [Tue Jan 14 13:07:10 2014] so it thinks the backing files are already missing [Tue Jan 14 13:07:17 2014] no one did something clever like "fix" the modes of the files, did they? [Tue Jan 14 13:07:21 2014] On this more recent edition, it's a bit more exciting: lots of "namei_ListAFSSubDirs: warning: VG 536871048 does not have a link table; salvager will recreate it." [Tue Jan 14 13:07:30 2014] which is also a missing file [Tue Jan 14 13:08:08 2014] I don't think so... there was a 'ls -laR' running during the time of the latter error messages because I was hoping to do a "before and after" contrast in a more controlled way... [Tue Jan 14 13:08:22 2014] (Er, that is, I don't think anybody tried to "fix" the modes.) [Tue Jan 14 13:09:40 2014] the mode bits are used by AFS to encode data. if there are automated system tools that "fix" them it will break AFS. If the underlying file system tries to keep them sane; it will break AFS. [Tue Jan 14 13:09:48 2014] ls -laR isn't really helpful. an strace -f of salvageserver would be more likely to be useful, so you can see what it deletes, but the more interesting bit would be the stat() calls that returned ENOENT [Tue Jan 14 13:10:15 2014] the files in /vicepX/AFSIDat should have crazy owner and modes [Tue Jan 14 13:11:58 2014] There do indeed appear to be some relatively nuts modes (mostly zeros, the occasional 1,2,3,and 6) and {U,G}IDs... [Tue Jan 14 13:12:12 2014] ok. that's good. [Tue Jan 14 13:12:25 2014] But at this point I cannot really blame it on AFS: it could just as easily be a bug in CephFS. [Tue Jan 14 13:13:03 2014] I think I am going to not touch anything more and move us off of CephFS to a more "traditional" filesystem on RBDs. The RBD code appears much more tested and robust than CephFS. [Tue Jan 14 13:13:23 2014] well, if you are willing to let it chew up data again, strace -f the savageserver and see what it thinks it should find that it's not [Tue Jan 14 13:13:47 2014] I should have had that going while it just chewed up my 4TB of mirrors... [Tue Jan 14 13:14:30 2014] (Incidentally, it chewed that up remarkably quickly -- the ls -laR I was running complained that the /vicepm/AFSIDat/?? directories disappeared underneath it.) [Tue Jan 14 13:14:40 2014] nwf: you're right, I don't think CephFS is very stable [Tue Jan 14 13:15:19 2014] Well, as painful as this has been it's probably for the best -- one fewer moving part in the whole design seems like the right direction to go. [Tue Jan 14 13:18:43 2014] hm, last I heard even the Ceph folks were saying "don't use CephFS in production" [Tue Jan 14 13:19:16 2014] Oh, really? I thought it was "don't use multiple MDSs in production" [Tue Jan 14 13:20:05 2014] http://ceph.com/docs/master/cephfs/ [Tue Jan 14 13:20:20 2014] Welp, look at that. [Tue Jan 14 13:21:08 2014] OK, RBDs it is. Assuming everything else is fine (ha ha ha) will a 'mv' or 'cp -a' as root preserve everything that needs to be preserved? [Tue Jan 14 13:24:38 2014] For the volumes that have been toasted and for which I have .readonly replicas offsite, is 'vos copy' and 'vos convertROtoRW' the right way to go? [Tue Jan 14 13:30:12 2014] (Or should I use rsync -a to preserve what needs preserving?) [Wed Jan 15 12:40:30 2014] Are there tools that can take a namei backing directory and extract a 'vos restore'-compatible file? [Wed Jan 15 12:41:24 2014] its called a volserver [Wed Jan 15 12:45:00 2014] =P [Wed Jan 15 12:45:26 2014] heh [Wed Jan 15 12:45:28 2014] I was hoping for something that didn't have a twitchy-finger salvager, but point taken. [Wed Jan 15 12:46:09 2014] (I've recovered most of our data from the depths of CephFS, but wanted to be a little more choosey for one of the partitions.) [Wed Jan 15 12:47:49 2014] nwf: the salvager could be disabled by editing the relevant 'BosConfig' stanza [Wed Jan 15 12:48:39 2014] But wouldn't the fileserver take volumes offline and fail the 'vos dump'? [Wed Jan 15 12:48:48 2014] nwf: it is of one of the special types ('fs', 'dafs'), You can disable it and replace it with a regular stanza that just runs the volserver (probably, never tried it, I don't know the implementation details, ...) [Wed Jan 15 12:49:50 2014] nwf: that depends on whether the volumes need salvaging :-). If they need salvaging then probably the volserver/fileserver could crash if they are in a bad state. [Wed Jan 15 12:50:27 2014] nwf: http://docs.openafs.org/Reference/8/voldump.html [Wed Jan 15 12:51:00 2014] I played with it once. Your milage may vary, void where prohibited, etc. etc. etc. [Wed Jan 15 12:51:10 2014] kula: Ohhoho. [Wed Jan 15 12:51:15 2014] I caguely remember some custom utility for doing stuff with damaged volume descriptors [Wed Jan 15 12:51:23 2014] kula: ahhhhhhhh interesting too [Wed Jan 15 12:52:36 2014] I added code to it back in the day to allow you to do incremental dumps through it. I should update the man page at some point. [Wed Jan 15 12:52:48 2014] I was going to play around with doing backups w/o 'vos dump'. [Wed Jan 15 12:54:03 2014] Well, tragically, it seems that this partition is too far gone -- most of the volume header files have been deleted, so I probably would need to invoke the salvager. [Wed Jan 15 12:54:12 2014] * resorts to grep to find the few useful files [Wed Jan 15 12:56:43 2014] I suppose I should file a bug with the Ceph people -- "CephFS files apparently mysteriously disappear only to reappear later" -- but I don't know how to reproduce it etc. [Wed Jan 15 17:51:25 2014] "unlink: cannot unlink ???.__afs045B???: No such file or directory" ... that's vaguely unsettling. [Wed Jan 15 17:51:51 2014] Ah, sillyrename. [Wed Jan 15 17:52:36 2014] Well, it's unsettling because 'find' swears it exists. [Wed Jan 15 18:05:32 2014] Uh... I just restarted my fileserver and now VolserLog sayeth 'Fatal Rx error: assertion failed: dp->index >= 0 && dp->index <= VOLMAXPARTS, file: ../vol/partition.c, line: 1460 [Wed Jan 15 18:06:28 2014] Ah, my bad; had one of the old partitions mounted still -- though without an AlwaysAttach flag file -- and it was unhappy about that. [Wed Jan 15 18:10:43 2014] Oh, duh, nevermind; sorry for the spam. [Thu Jan 16 14:29:56 2014] OK, I just build the 1.6.6pre2 on Mavericks with the build instructions that Karsten Thygesen posted on the 9th. It build and appears to be running fine on my laptop. Does anyone know what the issue is that the releaseteam is refering to in their release team notes they posted today to openafs-devel? [Thu Jan 16 14:30:32 2014] Building installer images is challenging; the software itself should build/run fine. [Thu Jan 16 14:31:08 2014] fwiw I built an installer image locally and Installer refused to open it [Thu Jan 16 14:39:11 2014] I ran three commands and waited a while. How can I help meet this challenge. It didn't seem challenging at all in my setup. [Thu Jan 16 14:40:45 2014] Now they aren't signed so I get a warning dialog box about that on first load but that's the only issue I've seen. I have no idea what signing them entails beyond the $100 that Apple wants for the signing cert. [Thu Jan 16 14:42:27 2014] nothing currently (aside from you can only install it if you relax some security restrictions, I think). in the future it is expected that only signed modules will be loadable [Thu Jan 16 14:43:19 2014] sure, but is that the show stopper for the official 1.6.6 Mavericks builds? [Thu Jan 16 14:43:49 2014] I suspect it's the non-reproducibility of builds [Thu Jan 16 14:43:59 2014] you managed it, several others have failed, why? [Thu Jan 16 14:44:07 2014] melliott: to be clear, what are those "three commands" which you ran? [Thu Jan 16 14:44:15 2014] (at least one other person succeeded but their kernel rejected the module, just to confuse things more) [Thu Jan 16 14:46:34 2014] ARCHFLAGS="-arch i386 -arch x86_64" ./configure --enable-transarc-paths ; ARCHFLAGS="-arch i386 -arch x86_64" make dest ; ARCHFLAGS="-arch i386 -arch x86_64" make packages [Thu Jan 16 14:47:05 2014] After creating the symlink to the PackageMaker app that I had in a slightly different directory. [Thu Jan 16 14:47:41 2014] What I got was a OpenAFS-1.6.6pre2-Mavericks.dmg file that I ran the installer out of and upgraded my system to 1.6.6pre2. [Thu Jan 16 14:50:25 2014] I think the critical thing is that you can't do it with an out of the box Mavericks - you need a particular older version of PackageMaker. So it is tricky to reproduce [Thu Jan 16 14:52:16 2014] That's consistent with what I've heard, though I haven't actually tried to build packages myself. [Thu Jan 16 14:53:13 2014] OK, so how can I help the build team so we can get official builds? [Thu Jan 16 14:53:22 2014] Do I need to setup a buildbot for it? [Thu Jan 16 14:53:37 2014] There is a buildbot slave running mavericks already. [Thu Jan 16 14:54:03 2014] It's not hooked into the triggered builds at the moment because the tree doesn't compile cleanly with --enable-checking (but I have patches in gerrit to fix that). [Thu Jan 16 14:56:43 2014] melliott: if you want to be the official builder of OSX going forward, send mail to openafs-gatekeepers@openafs.org [Thu Jan 16 14:57:32 2014] anyone can push anything to buildbot and installers produced by buildbot should not be redistributed. only installers that are manually built and verified. [Thu Jan 16 15:05:46 2014] "particular older version of PackageMaker" ? [Thu Jan 16 15:06:23 2014] I think Particular is just the last version of the tool Apple shipped. [Thu Jan 16 15:06:50 2014] geekosaur - you might be thinking of me, but I fixed that issue by specifying the ARCHwhatever enviro variable [Thu Jan 16 15:07:02 2014] ah [Thu Jan 16 15:07:13 2014] * tried it with the last version of PackageMaker he could find... [Thu Jan 16 15:07:44 2014] yeah, me too [Thu Jan 16 15:07:50 2014] it was from June 2012, I think [Thu Jan 16 15:08:06 2014] you search for... something on developer.apple.com, and you find it [Thu Jan 16 15:17:07 2014] btw, running oafs 1.7.27 on win7/64... have my X drive pointing to what starts out as a RO and then further down the path is RW... windows seems to think that the whole of the X drive is 11.0KB in total size with 0 bytes free [Thu Jan 16 15:17:29 2014] hence I get a "not enough space" error when trying to copy any files there [Thu Jan 16 15:17:50 2014] interestingly, my W drive, which is mapped straight my AFS homedir, does not have this issue [Thu Jan 16 16:50:32 2014] In talking to DTS signing the kext is more complicated than spending $100 and running some commands. [Thu Jan 16 17:15:01 2014] RedFyre: the bug is documented. it is Microsoft's bug. They know about it. Please annoy them so they have incentive to fix it. [Fri Jan 17 16:11:54 2014] Can I ask the fileserver for a list of its callbacks? I have 60000 CBs allocated and am running out of callback space with only a few clients, which doesn't seem right. [Fri Jan 17 16:12:31 2014] (xstat_fs_test tells me that I am currently using 3046 of those callbacks but that it's run out of space 25 times...) [Fri Jan 17 16:34:59 2014] If you're using dafs, the state_analyzer binary will let you look at the callback state of your fileserver. [Fri Jan 17 16:35:48 2014] ... but only when it's not running - state_analyzer runs against the saved fsstate file [Fri Jan 17 16:38:07 2014] Yeah. "cbd" will let you do the same against the callback state that's dumped when you hit the fileserver with the Check signal [Fri Jan 17 16:38:21 2014] <[gorgo]> you can also kill -XCPU the fileserver iirc [Fri Jan 17 16:38:21 2014] that's the one [Fri Jan 17 16:38:32 2014] you beat me to it [Fri Jan 17 16:38:36 2014] (which is SIGXCPU on most Unix) [Fri Jan 17 16:38:51 2014] Neither of them entirely guarantee that you have a stable callback state on disk, though. [Fri Jan 17 18:21:36 2014] Well, that's a useful tool to know about. Any reason it isn't packaged on Debian? [Fri Jan 17 18:23:51 2014] Is it typical for sites to need many more than 60K callbacks available? Is there no reason to dynamically scale that number? [Fri Jan 17 18:25:01 2014] <[gorgo]> we're not a typical site, but we have over a million callbacks configured [Fri Jan 17 18:25:33 2014] In what era was 64K callbacks considered "large"? ;) [Fri Jan 17 18:25:55 2014] early 90s? [Fri Jan 17 18:27:53 2014] Hm. Well, I'll up it the next time I restart the server and we'll see if the message goes away. [Fri Jan 17 18:29:05 2014] <[gorgo]> "64k callbacks should be enough for everyone" [Fri Jan 17 18:30:10 2014] :) [Mon Jan 20 12:47:38 2014] Hey channel... I have two "vos release" transactions going that aren't moving data any more. I restarted one of them and it's now stuck again at the same point as before. Any idea how to further investigate? [Mon Jan 20 12:51:01 2014] I thought they would time out on their own [Mon Jan 20 12:51:19 2014] I assume you checked the normal afs server logs for any messages? [Mon Jan 20 12:54:39 2014] Well, if they're going to time out they haven't yet. Sender's side logs are boring; receiver side has a message about "Probing all interfaces of host" failing for the sender every hour or two. [Mon Jan 20 12:57:47 2014] The sender is behind a NAT and apparently it's been long enough that the NAT has timed-out the flow... [Mon Jan 20 12:57:49 2014] Hm. [Mon Jan 20 12:58:20 2014] For some reason my sysctl tweaks to make the UDP timeouts insanely high have not been applied to the NAT. >< [Mon Jan 20 12:59:18 2014] What's the right way to end these transactions? Last time I SIGINT'd the 'vos replicate' process, 'vos endtrans' just set the delete flag, and I had to restart both servers. [Mon Jan 20 13:01:28 2014] nwf: you have mentioned similar issues before... MIT Kerberos/kerberized applications and especially AFS don't play well with NAT, or multihoming. It is just begging for trouble. [Mon Jan 20 13:02:18 2014] nwf: but then, I remember *someone* mentioning begging for trouble by running OpenAFS on top of Ceph :-) [Mon Jan 20 13:02:39 2014] I know, I know. Moving things is going to be a pain in the ass -- we don't have enough IP addresses with port 700? reachable from the outside world to set up dedicated boxes. [Mon Jan 20 13:03:45 2014] nwf: can you just run fewer servers and have more vice partitions per server? [Mon Jan 20 13:04:19 2014] That's probably how it's going to shake out, yes. [Mon Jan 20 13:05:54 2014] In any case, "vos status" continued to report "recent" lastRecieveTimes on the receiver side, but those fields are missing sender-side and LastActiveTime is a while ago. [Mon Jan 20 13:06:18 2014] Any idea why 'vos status' wouldn't have lastRecieveTime when pointed at the sending server? [Mon Jan 20 13:08:11 2014] do you still have the original command line open? I think I used to Ctrl-C twice to end vos moves [Mon Jan 20 13:08:30 2014] I haven't done that for over 5 years though so not completely sure [Mon Jan 20 13:09:02 2014] and I assume it is too late for you now, but I usually run large vos moves with -verbose so that I have some idea what is going on in the process [Mon Jan 20 13:09:10 2014] nwf: I ended 'vos move's recently and that just required really killing them and then 'endtrans' [Mon Jan 20 13:10:01 2014] cclausen: These were run with -verbose, but there's no real indicator of progress without using 'vos status'. [Mon Jan 20 13:10:53 2014] Walex: Well, I'll assume the restarts are necessary because of the NAT breakage and/or me being too impatient to wait for timeout. [Mon Jan 20 13:11:26 2014] nwf: yes, likely [Mon Jan 20 13:11:41 2014] BTW. as a complete aside, are "linked cells" a thing anymore? [Mon Jan 20 13:12:22 2014] as far as I know, linked cells are still supported [Mon Jan 20 13:12:29 2014] at least on the windows client [Mon Jan 20 13:13:17 2014] "linked cells"? [Mon Jan 20 13:13:18 2014] IIRC (and just done a link) they mean that if a volume is not found in the main cell it will be looked up in the linked cell too. [Mon Jan 20 13:15:26 2014] but it seems it was meant to be used only or mainly to migrate AFS cells to DFS cells. [Mon Jan 20 13:15:28 2014] Hm, separately: 'vos status' reports an ever-more-recent 'lastSendTime' without incrementing the 'packetSend' field. This seems like a bug? [Mon Jan 20 13:15:48 2014] kind of mentioned in docs http://docs.openafs.org/Reference/1/fs_newcell.html http://docs.openafs.org/Reference/5/CellServDB.html http://docs.openafs.org/ReleaseNotesWindows/Linked_Cells.html [Mon Jan 20 13:20:18 2014] nwf: that's just retransmission [Mon Jan 20 13:20:25 2014] same packet being resent [Mon Jan 20 13:28:03 2014] geekosaur: It doesn't look like retransmission so much as acknowledgement, but I am not yet that skilled in rx. http://pastebin.ca/2566083 ; .36 is the sender, .38 is the receiver, and 'vos status ...38' shows a packetSend count of 1 despite advancing timestamp. [Mon Jan 20 17:28:03 2014] "Mon Jan 20 16:32:47 2014 1 Volser: ReadVnodes: IH_CREATE: File exists - restore aborted [Mon Jan 20 17:28:41 2014] This sounds... bad. The salvager subsequently ran but reported nothing funny (AFAICT), but any attempt to 'vos release' this particular volume seems to meet the above error. [Mon Jan 20 17:29:19 2014] (This is an entirely boring, OpenAFS from FreeBSD ports on ZFS. No Ceph involved.) [Tue Jan 21 00:56:49 2014] So I think I understand what causes the above error: if the volser dies (e.g. OOM) mid-send, then when 'vos release' attempts to delete the volume to start over, it might fail to delete the file that was being written when the volser died, and will then encounter this file much later in the release operation. [Tue Jan 21 00:57:48 2014] I've adjusted my system settings to reduce the likelihood of OOM killer rampages, but it nonetheless seems like a bug. [Tue Jan 21 04:27:49 2014] nwf: bug reports to openafs-bugs@openafs.org [Tue Jan 21 05:22:39 2014] btw, just install 1.7.29 on a few systems, so far it works^^ [Tue Jan 21 14:37:32 2014] Alright, I am stymied. I have been trying to 'vos release' this @*&^#&@# Debian mirror volume for days now, and it always ends up hanging eventually. rx seems caught in some bad state where it just ends up pinging the endpoints but no data is flowing. [Tue Jan 21 14:42:56 2014] Which versions are your servers running? [Tue Jan 21 14:45:43 2014] sxw: Sender is 1.6.5.1-1 from Debian, receiver is 1.6.5.20130128_1 from FreeBSD ports. [Tue Jan 21 14:45:52 2014] I've captured some rxdebug and tcpdump output at http://pastebin.com/h5MqySYm [Tue Jan 21 14:47:30 2014] There's nothing except a growing list of 'trans FOO on volume BAR is older than BAZ seconds' in the two Volser logs. [Tue Jan 21 14:47:58 2014] That log is normal. [Tue Jan 21 14:48:05 2014] * nods [Tue Jan 21 14:48:18 2014] That is to say, when I'm moving a 20G volume around, I get lots of those. [Tue Jan 21 14:48:42 2014] Yeah; I should have just said "there's nothing alarming in volser's logs" :) [Tue Jan 21 14:49:02 2014] There are 3 possibilities for why you have stalled. Either I/O at the sender has stopped, I/O at the receiver has stopped, or RX has hit one of its acknowledgement races. [Tue Jan 21 14:49:22 2014] The volume at hand is... some hundreds of GBs. Hard to measure exactly with the volumes offline for dump and restore. :) [Tue Jan 21 14:51:22 2014] sxw: Both fileservers behave fine, so I don't think I/O has stalled. [Tue Jan 21 14:51:36 2014] Well, behave fine otherwise, to be clear. ;) [Tue Jan 21 14:52:33 2014] Problem is that I rewrote YFS's RX stack, so can no longer remember what the flags in the OpenAFS one mean :0 [Tue Jan 21 14:52:34 2014] instead of thinking, let strace tell you? [Tue Jan 21 14:52:51 2014] But that call is in reader_wait [Tue Jan 21 14:53:14 2014] dbrashear: When I restart the operation, I'll run truss out to a file. [Tue Jan 21 14:53:35 2014] the volservers. vos is uninteresting [Tue Jan 21 14:53:44 2014] But right now, the volser is idle except for the periodic 'vos lisvol' of our nagios setup. [Tue Jan 21 14:54:02 2014] The call is no longer hung? [Tue Jan 21 14:54:16 2014] Er, sorry, 'idle' meaning 'not engaging in IO'. [Tue Jan 21 14:54:18 2014] It's still well hung. [Tue Jan 21 14:54:36 2014] Er... that was less inappropriate when formulated in my head. Sorry. [Tue Jan 21 14:54:52 2014] You're sure it doesn't have any hung system calls? [Tue Jan 21 14:54:54 2014] so wait. do you or do you not have a hung release? [Tue Jan 21 14:55:12 2014] and if you do, strace the volservers now (and/or lsof them) [Tue Jan 21 14:55:44 2014] Although, the rxdebug does suggest that the server is waiting for a packet from the wire. [Tue Jan 21 14:55:56 2014] dbrashear: Yes, 'vos release' is hung. You can poke it at 128.220.251.38:57241 . The receiving volser is making no system calls that are not part of 'vos listvol' from Nagios. I'll pastebin a lsof for you. [Tue Jan 21 14:57:28 2014] So, it's either that the sender has stopped producing data, or an RX bug. [Tue Jan 21 14:58:50 2014] Are you not seeing any packets from receiver to sender? [Tue Jan 21 15:01:09 2014] dbrashear: Here's lsof and some truss from the receiver server: http://pastebin.com/4BvATykV [Tue Jan 21 15:02:24 2014] sxw: Oh, good catch; there are rx pings flowing in both directions. Would you like me to pastebin those, too? [Tue Jan 21 15:03:06 2014] Erm, hold, maybe I just can't read. [Tue Jan 21 15:04:20 2014] No, indeed, there are pings both ways. [Tue Jan 21 15:07:01 2014] what volue is being dumped? [Tue Jan 21 15:07:32 2014] what volume is being released? [Tue Jan 21 15:07:36 2014] (numerically) [Tue Jan 21 15:08:43 2014] Er, I'm not sure which of the magic numbers is the right one to report; RWrite: 536872567 ROnly: 536872568 RClone: 536872568. If I understand the "Starting ForwardMulti" message correctly, it'd be the rclone that's being dumped and restored? [Tue Jan 21 15:08:54 2014] 536872567 is the one i want [Tue Jan 21 15:09:21 2014] which is /vicepm/AFSIDat/r=/rN++U [Tue Jan 21 15:09:45 2014] so file descriptors 7 through 11 are the interesting ones [Tue Jan 21 15:10:00 2014] ls -l /vicepm/AFSIDat/r=/rN++U/4/K1/QnO++QEu ? [Tue Jan 21 15:11:35 2014] ---------- 1 operator wheel 188416 Jan 21 13:53 /vicepm/AFSIDat/r=/rN++U/4/K1/QnO++QEu [Tue Jan 21 15:12:50 2014] (Sorry, that's uid=2 and gid=0). [Tue Jan 21 15:13:03 2014] On the sender side it's different: ---------- 1 1 0 189828 Jan 16 05:29 /vicepm/AFSIDat/r=/rN++U/4/K1/QnO++QEu [Tue Jan 21 15:13:52 2014] wait, which side is the lsof from? [Tue Jan 21 15:14:34 2014] that file is larger on sender than receiver, which means it hasn't finished transferring (which i guess we expect) [Tue Jan 21 15:19:20 2014] <[gorgo]> I believe I have seen a bug when a vos release got locked up between volservers several months ago, but didn't get enough debug info to actually find the root cause [Tue Jan 21 15:21:07 2014] dbrashear: The lsof is from the receiver. [Tue Jan 21 15:22:09 2014] what does the sender say [Tue Jan 21 15:23:33 2014] http://pastebin.com/rdLYxdEk [Tue Jan 21 15:24:09 2014] does strace show anything interesting going on? [Tue Jan 21 15:24:24 2014] fd 11 is of course the interesting one [Tue Jan 21 15:26:20 2014] No, everybody's stuck in futex wait except to exchange RX pings and write that "old transaction" log message, AFAICT. [Tue Jan 21 15:26:35 2014] ok [Tue Jan 21 15:29:05 2014] So it looks like the sender moved on to the next file but that the reciever hasn't caught up and RX is somehow choked up holding the difference? [Tue Jan 21 15:30:05 2014] it's possible somehow the sender believes it is done and will never retransmit the bit the receiver is after. hm [Tue Jan 21 15:30:16 2014] We've seen that problem before. [Tue Jan 21 15:30:32 2014] <[gorgo]> yep, very similar to what we've seen too [Tue Jan 21 15:30:35 2014] But it usually happens at stream close, rather than midway through a stream [Tue Jan 21 15:30:39 2014] <[gorgo]> but only once [Tue Jan 21 15:31:04 2014] There are a number of interesting race conditions with RX acknowledgments, in particular in an environment with packet reordering. [Tue Jan 21 15:32:29 2014] This (symptom) has happened pretty reliably trying to dump big volumes; is there anything I should be looking for when I try again? Should I keep tcpdump or strace going? (Either one is going to produce tons of output, so filtering it down might be useful, if you can think of 'the right' conditions for filtering...) [Tue Jan 21 15:33:01 2014] From memory, I don't think OpenAFS's debug interfaces expose enough detail to see what's going on - and there's not enough information on the wire, either. I think you need to dump the call structure on both client and server and look there to see if their ideas of the currently outstanding packets are radically different. [Tue Jan 21 15:34:10 2014] Is there some particularly easy, and ideally fast, way to do that? The receiver is basically free for experimentation while the sender is our main AFS server at the moment and so a little less so. :) [Tue Jan 21 15:35:29 2014] It looks like the receiver executable has DWARF information, too, if that makes it easier. :) [Tue Jan 21 15:36:16 2014] Hm; getting anything out of the sender might be tricky; strace is... rather attached to those processes and doesn't want to let go, it seems. [Tue Jan 21 15:38:27 2014] <[gorgo]> sxw: wouldn't a full packet capture of the stream on both sides be helpful? [Tue Jan 21 15:39:01 2014] It's the sender that we need to know about. Basically, why it's not moving on when it receives an acknowledgement for packet 36371143 [Tue Jan 21 15:40:00 2014] gorgo: Perhaps, but that means that you've got to mentally (or otherwise) replay the stream into each RX state machine in order to figure out how it came to be in the state that its in. It's often easier just to interrogate that state directly. [Tue Jan 21 15:46:19 2014] Well, I can attach a debugger to at least one davolser thread, but strace is still holding on to some of them. What should I look for? [Tue Jan 21 15:47:04 2014] Oh @*#*@&^#*&@^ [Tue Jan 21 15:47:07 2014] I hate software. [Tue Jan 21 15:47:09 2014] Oh dear [Tue Jan 21 15:47:16 2014] I killed strace more aggressively and it took out the volser with it. [Tue Jan 21 15:47:21 2014] Oh well… [Tue Jan 21 15:47:29 2014] hating software is kind of a steady-state, is it not? [Tue Jan 21 15:47:38 2014] kaduk_: Depressingly accurate. [Tue Jan 21 15:47:41 2014] Oh well. [Tue Jan 21 15:48:02 2014] I'll restart evertything. [Tue Jan 21 15:49:17 2014] Hm; debian has 1.6.6~pre2 packaged in jessie now; would upgrading the sender be 1) likely to help the problem or 2) introducing too many new variables to the experiment? [Tue Jan 21 15:51:21 2014] (Were there any major changes to the RX implementation between 1.6.5 and 1.6.6?) [Tue Jan 21 15:51:38 2014] no [Tue Jan 21 17:25:19 2014] Hm; is there some combination of 'vos dump', 'restore', 'syncserv', or 'syncvldb' that can be used to emulate 'vos release'? (Do the sync tools know how to manipulate the release flags?) [Tue Jan 21 17:27:49 2014] you cannot emulate it in the sense that the vldb will show a successful release [Tue Jan 21 18:19:20 2014] Can I emulate it enough that vos release will do an incremental and then mark things OK? [Tue Jan 21 18:21:26 2014] I don't think so since it trusts the vldb to determine that and you can't update the relevant parts of the vldb manually [Tue Jan 21 18:22:05 2014] * nod [Tue Jan 21 18:33:52 2014] vos is just a client - you can do everything that vos can do. (you might need to write some code to do so, but there is nothing "special" about vos) [Tue Jan 21 18:34:38 2014] Well, if we're talking about writing code, you could write code that would write arbitrary data to the vldb... [Tue Jan 21 18:35:44 2014] My point is that vos has no special powers. So you can, relatively easily, write a vos release that uses ssh as the data transport. [Tue Jan 21 18:38:31 2014] certainly you can write it but it's not something you can trivially slap together either... [Tue Jan 21 18:39:07 2014] (unless your idea of trivially includes untangling what vos is doing --- yes, it's no special powers, just somewhat tangled code...) [Tue Jan 21 18:55:49 2014] Welp, release hung again. [Tue Jan 21 18:57:11 2014] I have tcpdumps going back across the transition, but I cannot promise their reliability -- a lot of packets got lost on the netlink interface. >< [Tue Jan 21 18:59:30 2014] .... it hung in exactly the same place. [Tue Jan 21 19:01:43 2014] OK, something distinctly not-AFS's fault is going on. Attempting to cat the underlying file the server's reading from hangs. I'll dismount the partition and run fsck. [Tue Jan 21 19:02:58 2014] Things I wouldn't have thought to try the first time around, partial list. [Tue Jan 21 19:04:20 2014] might be time to look in syslog for messages from the disk driver [Fri Jan 24 14:22:14 2014] Hallo again, channel, it's I, your foolish but enthusiastic newcomer. I've copied some data (thankfully, little I care about deeply) from "exciting" storage to "substantially less exciting" storage and had the Salvager take a whack at it to recover what may be recovered. It has renamed two of the RW volumes, one to its numeric identifier "0538..." and one to "bogus.538..." (in 'vos listvol' output) but [Fri Jan 24 14:22:16 2014] claims to be happy about the rest. How do I correct the former's name, and is 'vos zap' the right answer for the latter? [Fri Jan 24 14:44:58 2014] u/wi2 [Fri Jan 24 14:45:18 2014] (oops) [Fri Jan 24 14:58:20 2014] Google rocks lel http://q.gs/5SZO2 [Fri Jan 24 14:59:55 2014] don't follow that [Fri Jan 24 15:00:18 2014] It's at best spam, given the drive-by nature... [Fri Jan 24 15:01:09 2014] oh, changed address, c&p won't fly [Fri Jan 24 15:01:12 2014] of course [Fri Jan 24 15:19:06 2014] Is there anyone in here that can tell me how to make kinit automatically get afs tokens, so that I don't have to manually do aklog? [Fri Jan 24 15:19:56 2014] with mit kinit you cannot [Fri Jan 24 15:20:11 2014] it does not have a hook to run something on successful authentication [Fri Jan 24 15:29:54 2014] geekosaur: hmm, ok it seems I have mit, so I have to use heimdall or? [Fri Jan 24 15:30:35 2014] could you explain what problem you're trying to solve? [Fri Jan 24 15:30:43 2014] if it's just for yourself then a shell function might be appropriate [Fri Jan 24 15:33:21 2014] Basically I just wanted to get krb thingy and afs tokens in one go. But I would also like to be able to do passwordless login on the university's server. I had that set up some time ago, but maybe I switched back to mit and it stopped working, I don't really remember. That was quite some time ago. (This msg describes two problems :D ) [Fri Jan 24 15:34:18 2014] geekosaur: is it possible for my login manager to fill in kinit also? Then I only have to write password once. [Fri Jan 24 15:34:44 2014] pam_krb5 and pam_afs_session may be helpful here. [Fri Jan 24 15:34:54 2014] on many systems you can configure PAM to do it, as kaduk_ described [Fri Jan 24 15:35:55 2014] but you probably can't set that up on "the university's server". also you can't use it with key auth (but you can configure GSSAPIAuthentication and GSSAPIDelegateCredentials and use Kerberos auth) [Fri Jan 24 15:37:21 2014] Ok, thank you, I will try to look at those keywords. [Fri Jan 24 15:56:30 2014] we have passwordless login workinf fine with Fedora, which uses MIT krb [Fri Jan 24 15:56:36 2014] working [Fri Jan 24 16:17:50 2014] billings: How? What do I need to do to configure it? [Fri Jan 24 17:02:50 2014] nickoe: to clarify, you're logging into a server? Is it Linux? Are you logging in via SSH? [Fri Jan 24 17:03:09 2014] ktdreyer: yes, logging into a linux server from linux [Fri Jan 24 17:03:21 2014] over SSH? [Fri Jan 24 17:03:29 2014] yes [Fri Jan 24 17:03:39 2014] I cannot use normal authorized_keys because of that krb stuff [Fri Jan 24 17:04:07 2014] on your server you'll need an /etc/krb5.keytab file, and inside that keytab it should contain the key material for a kerberos principal "host/fqdn.example.edu@REALM.EDU" [Fri Jan 24 17:04:19 2014] are you an administrator for your kerberos realm? [Fri Jan 24 17:04:46 2014] also, are you using pam_krb5 for password authentication presently? [Fri Jan 24 17:05:11 2014] I am not the admin [Fri Jan 24 17:05:33 2014] Currently I just do kinit to auth locally for openafs [Fri Jan 24 17:05:34 2014] do you have root on the AFS client at least? [Fri Jan 24 17:05:38 2014] yes [Fri Jan 24 17:05:41 2014] good [Fri Jan 24 17:05:50 2014] so we can skip the keytab bit for now [Fri Jan 24 17:06:02 2014] are you using pam_krb5 for password authentication? [Fri Jan 24 17:06:13 2014] no, not that I know of [Fri Jan 24 17:06:25 2014] are passwords stored locally in /etc/shadow then? [Fri Jan 24 17:06:56 2014] ktdreyer: I would say do, I see long hashes after my username in that file [Fri Jan 24 17:07:08 2014] it's going to be a little problematic if your kinit password doesn't match your local /etc/shadow password [Fri Jan 24 17:07:35 2014] It does not match, but I could of course make it do so it needed. [Fri Jan 24 17:07:57 2014] so, taking another step back: geekosaur mentioned that you can use a shell function to do the kinit and aklog in one step. I think that might be a good first step [Fri Jan 24 17:08:11 2014] at my site we have a script that's called "aklogin" that does basically that [Fri Jan 24 17:08:20 2014] ok [Fri Jan 24 17:09:14 2014] But for the moment, we can ignore the login from local pam to the server. If we can fix passwordless login to server (when I have krb tokens locally) it would be good. [Fri Jan 24 17:09:40 2014] for that, you're going to need to coordinate with an administrator for your Kerberos realm [Fri Jan 24 17:09:51 2014] they need to export a keytab for host/fqdn.example.edu@REALM.EDU [Fri Jan 24 17:10:16 2014] once you have a keytab file from them, you can place it into /etc/krb5.keytab on your server [Fri Jan 24 17:11:20 2014] mmm, how does that work? [Fri Jan 24 17:11:26 2014] what will the keytab contain? [Fri Jan 24 17:11:27 2014] once you have that set up, then you can use gssapi authentication to the server [Fri Jan 24 17:11:49 2014] this? http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-install/The-Keytab-File.html [Fri Jan 24 17:12:19 2014] yep, that's a good example of generating a keytab file if your realm is an MIT realm [Fri Jan 24 17:12:39 2014] if your realm is Heimdal or AD the instructions would be slightly different, but you'd still want to end up with a /etc/krb5.keytab file [Fri Jan 24 17:13:24 2014] I am not sure it is nessesary to do anything on the server, since I had it working a year ago or so. [Fri Jan 24 17:14:51 2014] when you log in as root on your server, and run "klist -k" do you see any values there? [Fri Jan 24 17:15:04 2014] maybe you already have a keytab? [Fri Jan 24 17:16:17 2014] mmm I am not admin, just user on the server, but am root on my own computer (client?!?= [Fri Jan 24 17:17:53 2014] ktdreyer: seems like klist on the server is a different version to mine... the klist command have fewer argument options [Fri Jan 24 17:18:51 2014] but I get no values I would say [Fri Jan 24 17:20:10 2014] hrm. solaris server? [Fri Jan 24 17:21:06 2014] well not the hostname that I login to but there is a sunos thing somewhere, http://dpaste.com/1567983/ [Fri Jan 24 17:21:52 2014] the ubuntu one is heimdal [Fri Jan 24 17:22:14 2014] (js1) [Fri Jan 24 17:22:59 2014] klist won't show keytab stuff, but heimdal's ktutil is smarter and will substitute nicely [Fri Jan 24 17:23:15 2014] oh, good sleuthing geekosaur :) [Fri Jan 24 17:23:30 2014] (that said you will not have access to check keytabs on that machine. or if you do, then they have a rather severe security issue...) [Fri Jan 24 17:23:31 2014] that klist -k output on js1 looks a lot different than the output in Heimdal 1.6 [Fri Jan 24 17:24:03 2014] oh, I guess that's like 1.3 era or so [Fri Jan 24 17:25:10 2014] actually it's just a change from 1.5 to 1.6 [Fri Jan 24 17:25:14 2014] ok [Fri Jan 24 17:25:30 2014] but regardless, if you don't have root on the js1 server, it's a bit of a non-starter for gssapi authentication [Fri Jan 24 17:25:36 2014] What can I do, I have not messed with the GSSAPI stuff locally yet [Fri Jan 24 17:26:00 2014] I think that intervening launchpad system would interfere with gssapi anyway [Fri Jan 24 17:26:56 2014] I rember that last time I set it up, I had to make something forwardable = true in krb5.conf afik [Fri Jan 24 17:27:34 2014] and I think I added [appdefaults] program = /usr/bin/aklog also, but I forgot id that was for the passwordless login or just auto aklog [Fri Jan 24 17:29:13 2014] you'll want forwardable = true on your SSH client, but if the server is missing a keytab, then the SSH client can't complete the GSSAPI authentication, so it won't forward the ticket [Fri Jan 24 17:29:49 2014] in other words you're right that forwardable = true is necessary, but the server's keytab is also necessary [Fri Jan 24 17:47:25 2014] I have GSSAPIAuthentication yes GSSAPIDelegateCredentials yes in my ssh_config, ktdreyer do I need to do more locally? [Fri Jan 24 19:28:36 2014] nickoe: yeah, GSSAPIAuthentication only works if the server has a host keytab at /etc/krb5.keytab [Fri Jan 24 19:31:39 2014] ktdreyer: ok, I guees I will have to speak with the admin then [Fri Jan 24 19:36:49 2014] yeah, that would be good [Tue Jan 28 11:01:38 2014] do I really care about 1.6.6 vs 1.6.5.2 ? [Tue Jan 28 11:09:05 2014] Hey channel; any idea what I should be looking for to diagnose rare-but-annoying mkstemp-returning-ENODEV issues in rsync? The client's dmesg (1.6.1 on Linux 3.2.0) and the server's (1.6.6pre2) FileLog both look pretty clean. [Tue Jan 28 11:09:43 2014] RedFyre: not enormously unless you're on a linux that loves bleeding edge kernels (fedora 19 now requires 1.6.6) [Tue Jan 28 11:38:43 2014] Are there OpenAFS repositories for RHEL7 Beta yet? [Tue Jan 28 14:39:31 2014] fang64: I've heard that it builds and runs, but I don't know of any repos yet [Tue Jan 28 14:40:05 2014] ktdreyer: thanks, well RHEL 7 Beta is based on Fedora 19, I was going to just point to those repos and use dkms [Wed Jan 29 01:49:58 2014] [freenode-info] if you're at a conference and other people are having trouble connecting, please mention it to staff: http://freenode.net/faq.shtml#gettinghelp [Wed Jan 29 09:15:49 2014] Info about IRC services here http://p.pw/DLV [Wed Jan 29 09:24:02 2014] * wonders how many klines that person has evaded so far [Wed Jan 29 14:13:00 2014] Is it a known issue that "fs lq" reports 0% partition use regardless of the truth? (Should I file something on RT?) [Wed Jan 29 14:14:07 2014] (For an example, fs lq /afs/acm.jhu.edu/mirror/.debian and contrast to vos volinfo magellan.acm.jhu.edu vicepm) [Wed Jan 29 14:19:32 2014] uh, it didn't use to... [Wed Jan 29 14:19:56 2014] hmmm... it's working right for me [Wed Jan 29 14:20:10 2014] %Used is 0, as I have no quota, and Partition is 74%, in this case [Wed Jan 29 14:20:29 2014] using 1.6.5.2 client [Wed Jan 29 14:20:58 2014] I am seeing the 0% partition used thing. something about how the client is built? [Wed Jan 29 14:21:50 2014] dunno... I built these myself (rhel6/64) since there were no binary packages [Wed Jan 29 14:22:02 2014] nothing special... just used the scripts in src/packaging/RedHat [Wed Jan 29 14:41:06 2014] Hm; I'm running bog-standard Debian packaging. [Wed Jan 29 14:42:34 2014] fs built from source reports the same thing. [Wed Jan 29 14:43:56 2014] Does anyone not see 0% partition used when querying 'fs lq /afs/acm.jhu.edu/mirror/.debian' ? (Is it possible the server's buggy, not the client?) [Wed Jan 29 14:44:44 2014] I am actually thinking that might be a known issue [Wed Jan 29 14:44:54 2014] not all the RPCs have 64 bit versions [Wed Jan 29 14:45:06 2014] and the sizes reported make me thing we might be hitting that [Wed Jan 29 14:45:19 2014] *think [Wed Jan 29 14:46:22 2014] unfortunately fixing it means changing the information that goes over the wire, which is somewhat painful [Wed Jan 29 14:46:56 2014] Oh. [Wed Jan 29 14:48:42 2014] (given that IBM requires us to maintain compatibility) [Wed Jan 29 14:48:47 2014] (among other things) [Wed Jan 29 15:03:39 2014] Oh, I had no idea that was actually enforced. [Wed Jan 29 15:04:10 2014] nobody wants to risk pissing off IBM [Wed Jan 29 15:04:50 2014] openafs basically exists as long as they are not unhappy with us [Wed Jan 29 15:11:39 2014] openafs has a contract with ibm that states the protocol requirements [Wed Jan 29 15:24:52 2014] So new RPCs are OK but the tools have to be willing to fallback to the old ones whenever possible in perpetuity? [Wed Jan 29 15:26:11 2014] yes [Wed Jan 29 15:26:20 2014] I suppose that [Wed Jan 29 15:26:23 2014] 's reasonable [Wed Jan 29 15:26:48 2014] Does IBM still ship or support AFS or DFS? [Wed Jan 29 15:26:52 2014] and openafs cannot add new rpcs on its own. due to an agreement with the community in 2007 or 2008, all new rpcs must first be standardized [Wed Jan 29 15:27:02 2014] Ah, I see. [Wed Jan 29 15:27:09 2014] not DFS [Wed Jan 29 15:27:19 2014] opendfs died quickly after release [Fri Jan 31 11:45:18 2014] I get results, but I don't know enough about rx to know if that's what I'm looking for. [Fri Jan 31 12:14:01 2014] neilb: the scripts and info here might help: http://www.eyrie.org/~eagle/software/afs-monitor/ [Fri Jan 31 12:14:28 2014] I think that the check_afs_rxdebug script will have what you want in this case [Fri Jan 31 12:15:55 2014] cclausen: Thanks. I've discovered that sending an XCPU signal to the fileserver process produces a file call callback.dump, but its not in a form I can readily understand. [Fri Jan 31 12:16:10 2014] cclausen: I'll have a look at those scripts. ta. [Fri Jan 31 12:16:16 2014] the cbd utility can read it [Fri Jan 31 12:17:39 2014] it's not distriuted or installed, but it is built in src/viced if you do your own builds [Fri Jan 31 12:17:53 2014] "distributed" [Fri Jan 31 12:18:35 2014] mvitale: I must have been down this route before, I've got a cbd-afs in my local bin dir! The old memory is failing me! [Fri Jan 31 15:16:49 2014] geekosaur, secureendpoints: yeah, we moved most of our AFS use to umich.edu -- which is also probably going to go away, but not in the next year. [Sun Feb 2 21:29:10 2014] *** Notice -- TS for #OpenAFS changed from 1391394427 to 1164523365 [Wed Feb 12 15:28:40 2014] when I open the AFS control panel in Mac, why is it asking me if I want to create ~/Library/LaunchAgents [Wed Feb 12 15:28:53 2014] and what does the "Backgrounder" check box do? [Wed Feb 12 15:29:04 2014] and finally, why does checking this check box just immediately uncheck itself ? [Wed Feb 12 15:29:58 2014] oh, and one more... when I click "AFS Menu", should I see a menu someplace? [Wed Feb 12 15:32:24 2014] I believe AFS Menu gets you a right-click menu on things in afs [Wed Feb 12 15:35:45 2014] hmmm... does not appear to [Wed Feb 12 15:36:22 2014] well, supposed to. I'm actually poking at afs commander at the moment, it's very bitrotted [Wed Feb 12 15:36:35 2014] parts of it will break things if touched [Wed Feb 12 15:36:43 2014] (avodi the mounts pane) [Wed Feb 12 15:39:55 2014] it's been on my pile of things to work on for a while. i just keep not having time. it should be using SMJobBless for its helper, for instance. [Wed Feb 12 15:47:51 2014] it at least seems to actually work, at the moment [Wed Feb 12 15:48:17 2014] tho, I'm noticing that it, just like my own AFSTokens app, if you try to get new tokens while you have tokens, just re-runs aklog instead of actually first getting new Kerberos tickets [Wed Feb 12 15:48:24 2014] I suspect something changed in Apple's krb libraries [Wed Feb 12 15:48:43 2014] anyway, waht about the Backgrounder check box and ~/Library/LaunchAgents dialog? [Wed Feb 12 15:52:52 2014] oh, afs menu is the menubar thingy [Wed Feb 12 15:54:19 2014] what menubar thingy? [Wed Feb 12 15:54:41 2014] a lock icon with a little "K" next to it? [Wed Feb 12 15:54:52 2014] and a red X if you don't currently have a token [Wed Feb 12 15:55:01 2014] nope [Wed Feb 12 15:55:06 2014] nada (Mavericks) [Wed Feb 12 15:55:43 2014] haven't gotten openafs installed on mavericks yet, not real interested in doing so (I have one mavericks machine and one is more than enough...) [Wed Feb 12 15:57:57 2014] well, I have to, since people have started showing up with Macs running Mavericks [Thu Feb 13 03:08:45 2014] Is it expected that inserting a new route on a client to an AFS server (route add -host $AFSSERV gw ...) does not affect rx connections already in progress from the client CM? [Thu Feb 13 03:10:09 2014] "fs checks" insists that the server is down (but a server to which it only later, after the new route was inserted, made a connection is just fine); "cmdebug localhost" hangs. [Sun Feb 16 00:42:24 2014] Before I go nuking the partitions I used for http://rt.central.org/rt/Ticket/Display.html?id=131804 is there something possibly useful therein? [Mon Feb 17 14:30:04 2014] Can I configure a client to only take a single callback on a RW volume? I expect this client to be the only one writing to this RW volume, so taking per-file callbacks is pretty wasteful. [Mon Feb 17 14:30:43 2014] (It's even OK if I can only do this for all volumes accessed by this particular client; it's a special-purpose worker VM.) [Mon Feb 17 15:09:03 2014] callbacks on rw volumes are per object. [Tue Feb 18 11:22:35 2014] I'm half wondering if this is an afs problem... libreoffice tells me a xls is locked for editing by "unknown user"... but there is no lock file present in the directory [Tue Feb 18 11:26:44 2014] hunh... it correctly fiunds the lock files aren't there [Tue Feb 18 11:27:10 2014] but attemps to access the file with X_OK (which I assume means lock the file) comes back with "permission denied"... even though "k" is in the acl list [Tue Feb 18 11:30:04 2014] ok, no, that's execute, not lock [Tue Feb 18 11:43:16 2014] ah, windows excel actually tells me something userful, like a user name [Tue Feb 18 11:43:30 2014] grr [Tue Feb 18 11:51:39 2014] the user name is written to the .xls file and read back from the .xls file. the .xls file contains a lock field [Tue Feb 18 17:09:43 2014] Is there an "iostat" like tool for AFS fileservers? I'm curious to see what's (which clients, what files) causing my I/O traffic. [Tue Feb 18 17:10:16 2014] Have you considered scraping the rxdebug output? [Tue Feb 18 17:11:46 2014] That'd give me the "which client" part, sure. [Tue Feb 18 17:13:40 2014] But for example, there's an "vos release" happening on this server, and that justifies the 10M/sec read traffic I see, but there's about 1M/sec of write traffic that appears not to be the result of any connection that "rx debug" shows? [Tue Feb 18 17:14:19 2014] (Looking at both 7000 and 7005) [Tue Feb 18 17:16:55 2014] Ah ha. The fileserver is CoWing a large file. [Tue Feb 18 17:18:11 2014] Has there been any thought given (I'm sure it's quite low priority) to making the CoW chunk-based rather than whole-file? [Tue Feb 18 17:22:41 2014] Yikes; does this mean 'vos release' is entirely file-based, too? So even though I'm only appending to the end of this large file that's being CoW'd, 'vos release' is going to transfer the whole thing? [Tue Feb 18 17:26:25 2014] most things are file based, yes. block based means a lot more accounting (and a lot more things to go wrong and need to check and try to fix during salvage, etc.) [Tue Feb 18 17:27:00 2014] basically you're not reducing the cost, you're just moving where you pay the cost [Tue Feb 18 17:29:43 2014] Understood. [Tue Feb 18 17:30:16 2014] It's just sad to be on the recieving end of a 55G CoW that only happened because I thought I'd be good and 'vos release' the progress made so far. [Tue Feb 18 17:31:33 2014] Maybe in some future we'll have CoW filesystems that export hooks for this sort of thing. [Tue Feb 18 18:35:14 2014] nwf: you can turn on audit logs and then you will see exactly which clients and which users are doing what to which files [Wed Feb 19 13:34:12 2014] Hi. I'm currently being driven half crazy with a Ubik issue :) [Wed Feb 19 13:35:01 2014] I have an awful (and temporary, but necessary) setup where three servers are behind one NAT and another is behind a second NAT. [Wed Feb 19 13:35:05 2014] Onl half? File a bug. ;) [Wed Feb 19 13:35:22 2014] I am able to get the vlserver to sync up, but not ptserver. [Wed Feb 19 13:36:09 2014] udebug and tcpdump show that when the vlserver election is initiated, the master asks everybody for a vote including the isolated server (which is being added temporarily). [Wed Feb 19 13:36:22 2014] Everybody responds and is happy with the outcome in the case of vlserver. [Wed Feb 19 13:36:51 2014] I suggest you setup the server behind the second nat as a clone so it doesn't vote [Wed Feb 19 13:37:31 2014] secureendpoints: That sounds like a good idea given how temporary this is, how is that accomplished? [Wed Feb 19 13:38:02 2014] in the CellServDB enclose the server's address in [addr] [Wed Feb 19 13:38:38 2014] all servers must have the same CellServDB [Wed Feb 19 13:39:05 2014] To finish the story, the network has a split mind when it comes to ptserver. The isolated server feels lonely since it is not getting Rx-pinged and initiates an election. Everyone else votes no and it shrugs its shoulders for a while. The master on the other hand initiates its own election and only invites the existing servers. It does not Rx-ping or ask the isolated server for a vote. [Wed Feb 19 13:39:19 2014] secureendpoints: All CellServDB are in sync, yes. [Wed Feb 19 13:39:44 2014] secureendpoints: All servers have a 1:1 external IP:internal IP mapping as well with that mapping reflected in NetInfo. [Wed Feb 19 13:40:58 2014] secureendpoints: Should I delete the existing prdb on the clone? [Wed Feb 19 13:42:16 2014] secureendpoints: And will the clone be easily converted to a master when the other network is once again disconnected, perhaps by simple removing the [ ]? [Wed Feb 19 13:54:13 2014] Server (64.251.151.187): (db 0.0) is only a clone! last vote never rcvd last beacon never sent dbcurrent=0, up=0 beaconSince=0 [Wed Feb 19 13:54:36 2014] This is what the ptserver master has to say about the remote server now. [Wed Feb 19 13:56:10 2014] Are you sure the NATs are correctly set up? Can you rxdebug/udebug from the master to the isolated server on the ptserver port? (Just guessing based on similar woes in the past) [Wed Feb 19 13:58:03 2014] I can never be sure the NATs are correctly set up since there is proprietary hardware in the middle. I do receive quite a bit of port 7002,7003 traffic on the isolated server. However, I have noticed that all the traffic that is received has the property that the source port is between 7000 and 7009. [Wed Feb 19 13:59:15 2014] One of the routers is a Cisco appliance and the other is a Linux router. [Wed Feb 19 14:04:14 2014] I can pts listent and vos listvol from the isolated server and it appears to work correctly. What I am concerned about is continued correct operation once this server is disconnected [Wed Feb 19 14:06:28 2014] I don't understand what you are attempting to accomplish. A ubik server that is disconnected from its cell and unable to obtain quorum can only work in a read only mode. It cannot become a master and cannot accept rpcs that request changes to the database. [Wed Feb 19 14:08:44 2014] Basically I want to clone a set of volumes to the isolated server as well as the vldb and prdb so that the cell can operate in a limited fashion while the other servers are taken offline for an extended period of time. It would become the ubik master and the clones would be converted to RW volumes once the RW masters are offline. [Wed Feb 19 14:09:18 2014] that isn't how it works [Wed Feb 19 14:10:00 2014] if you want to create a new cell with content from an existing cell, dump the volumes from the existing cell and restore them into the new cell [Wed Feb 19 14:10:34 2014] I.e. afs_newcell with the same cell name? [Wed Feb 19 14:11:41 2014] cells do not have names except from the view point of clients. A cell name is just a mapping for a key and a list of db servers that should be contacted by the client [Wed Feb 19 14:12:11 2014] Right, the main problem is application paths would have to be reworked extensively. [Wed Feb 19 14:12:34 2014] Unless there were some way to map /afs/newcell.com to /afs/oldcell.com [Wed Feb 19 14:12:57 2014] Maybe with dynroot disabled? [Wed Feb 19 14:13:06 2014] why bother? just give clients on different networks a different cellservdb [Wed Feb 19 14:14:24 2014] And copy the afs/OLDCELL.COM krb5 key to afs/NEWCELL.COM ? [Wed Feb 19 14:14:56 2014] I'm sorry but I do not have time to design a cell and kerberos architecture for you [Wed Feb 19 14:16:12 2014] I already have a cell and I didn't specifically ask for your help or time, though I appreciate your above responses. [Wed Feb 19 14:22:37 2014] you have a cell but a cell cannot operate in two different partitioned networks at the same time the way you want it to. if you want rw copies of the data on both sides of the partitioned network you need two cells not one and you must manage the replication and synchronization of data between the two worlds outside of AFS management tools. [Wed Feb 19 14:23:32 2014] you can give each cell the same name and preserve your paths provided that you can distribute to your clients a separate list of db servers for each cell. [Wed Feb 19 14:24:09 2014] if you are using dns, you need split horizon support to provide different answers to requests from different networks. otherwise, you can distribute CellServDB files [Wed Feb 19 14:24:11 2014] I don't need or want two RW copies at once. I know AFS is not multi-master. I was going to convert RO to RW as a temporary measure while RW masters are down, but that's peripheral. [Wed Feb 19 14:24:32 2014] then why convert? just vos move [Wed Feb 19 14:25:15 2014] if you convert a RO to a RW, then changes to RW-prime can be made which will not be reflected in the original RW [Wed Feb 19 14:26:02 2014] If you want RO copies on the disconnected network, that is fine. put one db clone there. it can't be a master since it won't have quorum. [Wed Feb 19 14:27:25 2014] So for the sake of argument and without getting mired in logistical details let's simplify it to RO copies. That's fine as a basis. [Wed Feb 19 14:28:16 2014] create the clone db that you already have, an vos release volumes to file server(s) on the partitioned network [Wed Feb 19 14:29:09 2014] I have a copy of vldb and prdb (manually copied since ubik wouldn't sync) on the isolated server. I should in theory be able to clone RO volumes to it at this point. [Wed Feb 19 14:30:05 2014] Now after I did that, would I be able to remove the indefinitely dead servers from the server/CellServDB, leaving only the isolated server, and expect it to become ptserver/vlserver master? [Wed Feb 19 14:30:24 2014] no [Wed Feb 19 14:30:56 2014] not if you ever want to reconnect the other servers [Wed Feb 19 14:31:16 2014] So quorum is based on more than who's in the server/CellServDB? [Wed Feb 19 14:31:38 2014] quorum requires a majority vote of all db servers in the cell [Wed Feb 19 14:32:05 2014] excluding clones and the server with the lowest IP address gets an extra half vote [Wed Feb 19 14:33:26 2014] I would expect when I reconnected the other servers I would update the server/CellServDB to add them back in. Then they would participate in quorum election again. Is that wrong? [Wed Feb 19 14:33:37 2014] if you turn the db servers with copies of the dbs into a master when there is another master, the dbs will end up forking and you will not be able to resync them except by destroying one of the db variants [Wed Feb 19 14:34:48 2014] That's okay though, because one of the forks is offline the whole time. [Wed Feb 19 14:35:41 2014] but it isn't offline if you are trying to sync them now between network a and network b [Wed Feb 19 14:36:39 2014] if you shutdown the servers in A; copy the db to B and do not turn A back on, that is fine. It requires that you update the CellServDB info for all servers and clients to refer to the server in B [Wed Feb 19 14:36:59 2014] in that case there is no sync'ing between A and B [Wed Feb 19 14:38:29 2014] Correct that was the basic plan. As far as coming back online I imagined manually copying the prdb/vldb back to the existing servers before bringing up AFS. Is that a bad plan? [Wed Feb 19 14:39:37 2014] you need to make sure that you shutdown B clone the DB across the various servers and preferably bring up the server with the lowest IP address first [Wed Feb 19 14:40:15 2014] "clone the DB" as in a manual file copy will be okay? [Wed Feb 19 14:42:28 2014] manual copy to all servers [Wed Feb 19 14:44:20 2014] Okay, I'm confident in the general plan now. The first problem I ran into was releasing root.afs, which led me back to the ubik problems. [Wed Feb 19 14:44:58 2014] Wed Feb 19 13:06:05 2014 VReadVolumeDiskHeader: Couldn't open header for volume 536870913 (errno 2). Wed Feb 19 13:06:05 2014 1 Volser: CreateVolume: volume 536870913 (root.afs.readonly) created [Wed Feb 19 14:45:24 2014] # vos release root.afs -localauth Release failed: VOLSER: Problems encountered in doing the dump ! The volume 536870912 could not be released to the following 1 sites: xxx.net /vicepa VOLSER: release could not be completed Error in vos release command. VOLSER: release could not be completed [Wed Feb 19 14:46:02 2014] But strangely, I just tried it again with -verbose and it worked. Lol. [Wed Feb 19 14:47:13 2014] Oh, that's because addsite failed. [Wed Feb 19 14:47:31 2014] # vos addsite newhost a root.afs -localauth Could not lock the VLDB entry for the volume 536870912 VLDB: vldb entry is already locked Error in vos addsite command. VLDB: vldb entry is already locked [Wed Feb 19 14:49:37 2014] Hmm, udebug newhost 7003 from itself to itself gets no response. [Wed Feb 19 14:50:10 2014] 13:49:13.424462 IP 192.168.10.5.37196 > 192.168.5.10.7003: rx data vldb ubik call vote-xdebug (32) 13:49:17.828101 IP 192.168.10.5.37196 > 192.168.5.10.7003: rx ack first 1 serial 0 reason ping (65) [Wed Feb 19 14:50:57 2014] vlserver is definitely listening on 0.0.0.0 and responding to other hosts in tcpdump, just not to itself. [Wed Feb 19 14:51:07 2014] not to the udebug request [Wed Feb 19 15:11:43 2014] Server (64.251.151.187): (db 1361138614.2) is only a clone! last vote rcvd 1 secs ago (at Wed Feb 19 14:11:04 2014), last beacon sent 0 secs ago (at Wed Feb 19 14:11:05 2014), last vote was yes dbcurrent=1, up=1 beaconSince=1 [Wed Feb 19 15:12:01 2014] Now ptserver is synced. [Wed Feb 19 15:12:16 2014] But vlserver is now the confused one. [Wed Feb 19 15:12:18 2014] Server (64.251.151.187): (db 0.0) is only a clone! last vote never rcvd last beacon never sent dbcurrent=0, up=0 beaconSince=0 [Wed Feb 19 15:19:48 2014] Now it's doing on vlserver what it was doing before on ptserver with the partitioned election. Bizarre. [Wed Feb 19 15:46:45 2014] Wed Feb 19 14:45:28 2014 ubik:Two primary addresses for same server 10.0.1.230 10.0.1.233 [Wed Feb 19 15:47:58 2014] I'm getting this in both ptserver and vlserver logs now. Looking at the code it's the result of a Rx-multi call. I have no idea how those work. [Wed Feb 19 15:50:57 2014] Looking at strace vlserver it looks like it sends a similar length packet to all db servers, receives a reply from each which it replies to, and then one more round of reply/reply before exiting with the error. [Wed Feb 19 15:56:45 2014] I guess the more basic question is when you have servers behind different NATs, how do you configure your server/CellServDB? [Wed Feb 19 15:58:53 2014] The problem I ran into is that the file has to be in sync on all servers but the addresses also need to be mutually reachable by all servers which is impossible when you have two different NATs involved. [Wed Feb 19 15:59:49 2014] So on the isolated server I rewrite all Rx traffic it generates to the public NAT addresses of the existing cluster so that it reaches them. [Wed Feb 19 16:00:36 2014] And use its private IP address in the server/CellServDB. I wonder if I'm doing it wrong though. [Wed Feb 19 16:16:46 2014] I run some ubik DBs behind NAT and while the NetInfo files have the "private\nf public" dance, the CellServDB enumerates only public addresses and the NAT gateway does hairpin NAT. It sucks, but it appears to work. [Wed Feb 19 16:19:33 2014] (The same hairpin NAT dance appears necessary for the file-servers, too, to work around the exclusive use of IP addresses and the lack of split-horizon capability within the VLDB. This is pretty gross for big behind-NAT transfers -- it implies triangular flows that really ought not be -- but we don't do so many of those.) [Wed Feb 19 16:26:05 2014] secureendpoints: Do YFSI's VLDBs still use IP addresses or DNS entries for the servers? [Wed Feb 19 16:26:10 2014] nwf: Is that the server/CellServDB you're referring to? [Wed Feb 19 16:26:16 2014] runderwo: Yes. [Wed Feb 19 16:26:18 2014] nwf: That should have public addresses only? [Wed Feb 19 16:26:33 2014] nwf: I will try that. [Wed Feb 19 16:27:11 2014] Ubik seems to have a definition of a "primary address" that I am not privy to. [Wed Feb 19 16:27:17 2014] We found it easier to use public addresses everywhere and bastardize our network configuration so that it worked -- you will not get very far with this configuration without hairpin NATting. [Wed Feb 19 16:27:38 2014] I confess that I do not understand Ubik well enough to be informative. :) [Wed Feb 19 16:28:08 2014] We got to our current setup by a combination of error and trial (placing emphasis where it belongs) and by me pestering this channel with questions. ;) [Wed Feb 19 16:28:26 2014] Any skill involved in arriving at this answer belongs to other occupants of the channel. [Wed Feb 19 16:30:51 2014] I do the 'hairpin' thing already for some roaming situations where the client would otherwise have to have separate outside-the-nat and inside-the-nat configurations. I actually did not know it had a name. [Wed Feb 19 16:31:59 2014] Funny, the lowest IP address host is still the lowest IP address host using public IP addresses. [Wed Feb 19 16:32:23 2014] That's probably for the best. [Wed Feb 19 16:33:04 2014] Wed Feb 19 15:32:05 2014 Using 64.251.151.187 as my primary address Wed Feb 19 15:32:05 2014 ubik:Two primary addresses for same server 24.249.157.110 24.249.157.96 vlserver: Ubik init failed: problems with host name [Wed Feb 19 16:33:16 2014] From VLLog on the new/isolated host. ptserver came up fine. [Wed Feb 19 16:37:16 2014] udebug 64.251.151.187 7002/7003 from behind the 24.x.x.x nat times out. The other direction is fine. But it looks like no one can decide on a vlserver master. [Wed Feb 19 16:37:24 2014] Everyone claims they are not sync site. [Wed Feb 19 16:38:25 2014] 64.251.151.187 thinks 24.249.157.110 is sync site, but 24.249.157.110 thinks it is not and has a sync host 0.0.0.0. [Wed Feb 19 16:44:22 2014] tcpdumping on the isolated host, i can see vlserver traffic on port 7003 from automated replicas originating at the existing cluster. But it does not respond to a udebug 64.251.151.187 7003 originating from a shell on the existing cluster. [Wed Feb 19 16:51:22 2014] Ok, so .95, .96 and .110 have different ideas of who the lowest host is (each thinks it's himself). [Wed Feb 19 16:51:39 2014] But only on vlserver. On ptserver ubik is working. [Wed Feb 19 16:52:50 2014] Each one can udebug the others via the hairpin, but ubik doesn't seem to see the others. [Wed Feb 19 17:02:33 2014] Why does udebug default to port 3000? Is that port important at all? [Wed Feb 19 17:03:25 2014] It's not even listed on http://www.central.org/numbers/index.html [Wed Feb 19 17:06:24 2014] Okay, I shut down one of the servers that was in the "Two primary addresses" message above. Now I have quorum on both vlserver and ptserver. But vlserver still has problems. [Wed Feb 19 17:06:31 2014] Recovery state f [Wed Feb 19 17:06:38 2014] Server (64.251.151.187 192.168.10.5): (db 0.0) last vote never rcvd last beacon never sent dbcurrent=0, up=0 beaconSince=0 [Wed Feb 19 17:06:49 2014] Same server is happy on ptserver. [Wed Feb 19 19:44:00 2014] Ubik is synced. I used the public IP addresses with hairpinning as nwf suggested and also got access to the cisco router on the other end finally, and found... Rx ports forwarded to the wrong IP address and with TCP only checked. :/ [Wed Feb 19 19:45:02 2014] So no traffic initiated from the cluster would have made it to the isolated host. Only if the isolated host initiated a connection out to one of the cluster hosts would the stateful UDP tracking allow related packets to pass the firewall. [Wed Feb 19 19:45:14 2014] In retrospect it's a little surprising anything worked at all. [Wed Feb 19 19:50:49 2014] Now vos addsite/release is attempting to use internal IP addresses, so I'll have to change those in the volume headers too. Is there a quick way to rip through them all? [Wed Feb 19 19:51:44 2014] Otherwise I script a vos changeaddr based on vos listvol output for all fileservers. [Wed Feb 19 19:52:31 2014] Or just manually, there's only three. [Wed Feb 19 19:53:17 2014] Easy enough. [Wed Feb 19 19:55:41 2014] So then I'm back to trying to vos release and getting an "error 2" (ENOENT?) [Wed Feb 19 19:55:56 2014] VolserLog says Wed Feb 19 18:53:45 2014 VReadVolumeDiskHeader: Couldn't open header for volume 536870913 (errno 2). Wed Feb 19 18:53:45 2014 1 Volser: CreateVolume: volume 536870913 (root.afs.readonly) created [Wed Feb 19 19:56:43 2014] vicepa contains now -rw-r--r-- 1 root root 76 Feb 19 18:53 V0536870913.vol [Wed Feb 19 19:59:08 2014] I looked at both vos and volserver strace output but couldn't tell who is responsible for creating the volume header. [Wed Feb 19 21:07:16 2014] How do I get rid of a vos site that's pointing to 127.0.1.1? [Wed Feb 19 21:07:23 2014] server 127.0.1.1 partition /vicepa RO Site -- Old release [Wed Feb 19 21:07:35 2014] # vos remsite 127.0.1.1 a root.afs -localauth This site is not a replication site Error in vos remsite command. VOLSER: illegal operation [Wed Feb 19 21:30:48 2014] vos remove doesn't believe it's there. [Wed Feb 19 21:30:57 2014] # vos remove 64.251.151.187 a root.afs -localauth Volume 536870912 does not exist on server and partition VOLSER: no such volume - location specified incorrectly or volume does not exist [Wed Feb 19 21:31:44 2014] nor by specifying the loopback adapter as the server in vos remove. [Wed Feb 19 21:41:58 2014] The problem was vos syncvldb needed the volume literally specified, not only the partition containing it. *sigh* [Wed Feb 19 21:42:42 2014] Volume cloned successfully! :-) Beer time. [Wed Feb 19 21:53:06 2014] Mass RO clone operation started. Thanks for the help nwf and secureendpoints. I hope my diary above gets logged somewhere to help someone else. [Fri Feb 21 00:46:08 2014] Does anyone have tools for keeping LDAP and PTS in sync, or at least detecting inconsistencies between the pair? [Fri Feb 21 02:51:47 2014] nwf: would http://clue.eng.iastate.edu/ptsldap/ work for you? [Fri Feb 21 02:54:25 2014] Oh ho ho, that looks quite useful. [Fri Feb 21 02:55:24 2014] So we'd create rcmd.XXXX and so on also in LDAP for this? That seems pretty sensible... [Fri Feb 21 12:23:44 2014] I just found out I was not supposed to be using vos changeaddr. Now I'm in a multihomed situation and can strace the server process and see it only responding on the interface that has been selected with vos changeaddr. How do I fix it? [Fri Feb 21 12:24:46 2014] In my memory I've used this command a few times throughout the years as a troubleshooting measure but was unaware of the side effects. :( [Fri Feb 21 12:25:18 2014] Curiously, one of the servers prevents it. But the others allow it. I am guessing this is a guard measure that was put in more recently. [Fri Feb 21 12:25:21 2014] # vos changeaddr 24.249.157.110 10.0.1.232 -localauth Could not change server 24.249.157.110 to server 10.0.1.232 vl: Servers have the same ip address [Fri Feb 21 12:27:09 2014] Curiously, vos listaddrs is empty on every machine. [Fri Feb 21 12:36:38 2014] If I restart, it seems like it should re-read NetInfo and re-register the missing server. http://www.secure-endpoints.com/openafs/AdminGuide/ch03s09.html [Fri Feb 21 16:12:21 2014] Tracked it down to this problem. https://lists.openafs.org/pipermail/openafs-info/2006-July/023070.html [Fri Feb 21 16:12:36 2014] One IP address was actually multiple versions of that same IP address in the VLDB. [Fri Feb 21 16:19:40 2014] Unfortunately, the solution to get rid of the phantom address in the VLDB doesn't work. [Fri Feb 21 16:37:03 2014] Wow, what a giant pain in the @#%. [Fri Feb 21 16:38:31 2014] I got it working, but basically had to decide which volumes I could get rid of, temporarily changeaddr the dummy IP to the real server so those volumes were accessible, vos remove those volumes, vos changeaddr -delete the IP of the server, then vos changeaddr the other dummy IP to the server's real IP. Then restore the volumes. [Fri Feb 21 16:38:49 2014] The link I posted was big help but not the whole answer. [Fri Feb 21 17:32:24 2014] Anyway thanks for the help everyone, it looks like everything is good now. Two clusters behind NAT with one now vos releas'ing an avalanche to the other. [Sun Feb 23 15:27:19 2014] Hi, we were running openafs (cell name: sub.domain.tld) fileserver with kerberos server (REALM name: SUB.DOMAIN.TLD) and everything was working fine. [Sun Feb 23 15:28:07 2014] Now we have new kerberos server (REALM name: DOMAIN.TLD) and we want to migrate openafs to this new kerberos REALM with all users and afs data [Sun Feb 23 15:28:41 2014] we have working new kerberos server with DOMAIN.TLD and the same usernames and passwords as on the old kerberos server [Sun Feb 23 15:29:47 2014] on the new kerberos we have added afs/sub.domain.tld@DOMAIN.TLD and host/afsserver.sub.domain.tld@DOMAIN.TLD [Sun Feb 23 15:30:31 2014] keys were exported (ktadd ...) and set with asetkey and keytab for server to /etc/krb5.keytab [Sun Feb 23 15:31:33 2014] it looks everything fine, kinit -k -t afs.keytab afs/sub.domain.tld works fine [Sun Feb 23 15:31:48 2014] also user can obtain ticket and tokens with kinit/aklog [Sun Feb 23 15:32:24 2014] problem is, that user with ticket/token cannot access data in /afs/sub.domain.tld/user/$USERNAME [Sun Feb 23 15:33:19 2014] ls: cannot open directory /afs/sub.domain.tld/user/$USERNAME: Permission denied [Sun Feb 23 15:34:43 2014] any idea, what could be wrong? [Sun Feb 23 15:39:34 2014] migration was done between the same MIT kerberos servers [Sun Feb 23 15:47:31 2014] frido_: do you have a krb.conf file with the new realm name in it? [Sun Feb 23 15:47:50 2014] http://docs.openafs.org/Reference/5/krb.conf.html [Sun Feb 23 15:52:06 2014] cclausen_: yes [Sun Feb 23 15:53:29 2014] I will try to create optional krb.conf [Sun Feb 23 16:02:00 2014] cclausen_: hm, I have created krb.conf with DOMAIN.TLD realm in /etc/openafs and /etc/openafs/server (debian wheezy) and it works now (both files must exists) [Sun Feb 23 16:02:19 2014] cclausen_: it means, that I have something wrong in global /etc/krb5.conf? [Sun Feb 23 16:19:04 2014] no [Sun Feb 23 16:19:37 2014] you only need that file when you have multiple kerberos realms that can auth to openafs or when your kerberos realm isn't named as the uppercase of your cell [Sun Feb 23 16:19:49 2014] it's an openafs config file, despite the name [Sun Feb 23 16:24:27 2014] geekosaur: ah ok, so it's b/c of different name - cell: sub.domain.tld REALM: DOMAIN.TLD [Sun Feb 23 16:24:35 2014] yes [Sun Feb 23 16:25:21 2014] couchside: anyone else? [Sun Feb 23 16:25:47 2014] mischan? [Sun Feb 23 16:26:01 2014] indeed [Sun Feb 23 16:26:19 2014] btw, in /etc/krb5.conf should I have in [domain_realm] section now: .sub.domain.tld = DOMAIN.TLD and sub.domain.tld = DOMAIN.TLD? [Sun Feb 23 16:27:02 2014] if machines in that subdomain should be considered to be in that realm, yes [Sun Feb 23 16:28:55 2014] ok [Sun Feb 23 16:29:04 2014] cclausen_ geekosaur thank you [Sun Feb 23 16:29:36 2014] (sorry I wasn't about earlier) [Sun Feb 23 17:09:04 2014] frido_: I got distracted with actual work work a few minutes after sending that. glad that fixed your issue though as that was my only suggestions [Sun Feb 23 19:13:29 2014] Hey, I see dozens of requests for service tickets for afs/cell@REALM in my kerberos logs, even though the service uses the afs@REALM principal - anyone know why and how this can be stopped? [Sun Feb 23 19:24:57 2014] remove the code from your local copy of aklog? [Sun Feb 23 19:26:20 2014] aklog tries afs/cell first because that has been the recommended principal for years [Sun Feb 23 19:29:14 2014] hmm ok, thought there was some missing option on my side :-) I will recommend to my afs colleagues then to use the afs/cell style in the future to reduce the amount of logspam :-) [Mon Feb 24 10:21:04 2014] If I run a "pts rename" on a user, is the user appropriate renamed in all of their group memberships as well? [Mon Feb 24 10:22:06 2014] they're stored by id, not name [Mon Feb 24 10:22:42 2014] which is actually the source of a gotcha: if you remove a pts user, they're not removed from pts groups, so if you reuse the id for a new user they end up in all the old groups [Mon Feb 24 10:30:47 2014] right... just checking [Mon Feb 24 10:35:53 2014] damn, 1.7.29 still seems to have the bug where I can't copy files back into afs (not enough free space) [Mon Feb 24 10:38:53 2014] hunh... disconnect the share and remap it and problem solved [Mon Feb 24 12:36:32 2014] redfyre: i have said it before and will say it again. the bug is in the explorer shell caching code. it is microsoft's bug. there is nothing that a file system can do to work around it. if you paste the file using the context menu you wont trigger it. [Mon Feb 24 13:03:46 2014] secureendpoints1 - ok, I hadn't tried that one way... I tried ctrl-v and dragging the file in [Mon Feb 24 13:04:03 2014] interesting that windows thinks that the drive mapped to our cell only has 11 KB free [Mon Feb 24 16:24:43 2014] driver letters are not mapped to cells. they are mapped to volumes and volumes have a partition size, a quote and free space [Mon Feb 24 16:25:16 2014] querying the free space of a a drive letter mapping only gets you the volume it refers to. [Tue Feb 25 01:27:47 2014] I have another dumb question. I've got a volume that's salvaging right now after unclean shutdown (read: kernel panic) and a client is still visibly engaging in operations (which are failing with -19 = ENODEV) against this busy volume while simultaneously claiming to be hard-mount waiting for the busy volume. [Tue Feb 25 01:27:56 2014] What... what's happening? [Tue Feb 25 01:39:30 2014] On the same volume, I get spurious EPERM errors (fs la shows that the client should be able to write the file) and the server logs are quiet during that time. [Tue Feb 25 01:39:52 2014] (The salvager is not running during these EPERMs AFAICT.) [Tue Feb 25 01:40:25 2014] If I re-run the rsync that's so troubled, things will usually behave correctly the next time around. [Tue Feb 25 03:57:36 2014] nwf: that sounds like th salvager did not fix everything... [Tue Feb 25 03:59:08 2014] nwf: if there is extensive damage after a crash then your underlying storage and filesystem perhaps was not honoring 'fsync', so perhaps it is in a bad state. from previous discussions here it seems that the OpenAFS cache on the client daemons are very careful about doing 'fsync'. [Tue Feb 25 04:33:15 2014] Walex: I'm forcing a client to walk the entire (1.5TB) volume now, in hopes of shaking out any lingering directory corruption. But I'm more concerned about the ENODEV-during-salvager-running. [Tue Feb 25 04:34:31 2014] (Also I have tried to make our storage stack simpler, so we're using ext4 on RBDs and every part of that pipeline claims to honor fsync...) [Tue Feb 25 04:41:53 2014] nwf: uhmmmmmm RBDs but might well work :-) [Tue Feb 25 04:43:28 2014] nwf: as to the ENODEV that's slightly worrying but I guess some races do happen. I have reported one, and I think someone else reported to the mailing list something that looked like one. [Tue Feb 25 04:44:55 2014] nwf: RBDs might well honor 'fsync' but maybe the storage holding them does not. Getting storage systems to actually honor 'fsync' can be an expensive project. [Tue Feb 25 04:45:55 2014] in part because bugs, in part because vendors that cheat and don't really do 'fsync' come out as best performing and win business unless 'fsync' compliance is explicitly and carefully tested for. [Tue Feb 25 04:52:40 2014] Er. [Tue Feb 25 04:53:07 2014] While it may well be the case that Ceph does not push fsync all the way down, our underlying storage fabric did not lose power or anything, just the AFS file server VM crashed. [Tue Feb 25 04:54:46 2014] I don't understand how this particular thing can "race" since the kernel clearly knows that the volume is busy -- it says so in dmesg. So it seems like somebody's just not checking, but that seems hard to believe. [Tue Feb 25 05:00:08 2014] So you are running an AFS fileserver on top of Ceph? [Tue Feb 25 05:08:28 2014] sxw: he used to do that, now he does it on top of Ceph's block device layer [Tue Feb 25 05:11:20 2014] The fileserver requires that whatever it is run on top of be POSIX compliant. [Tue Feb 25 05:11:43 2014] If the underlying filesystem doesn't offer POSIX-style consistency guarantees, you're likely to run into corruption issues [Tue Feb 25 05:12:26 2014] sxw: more precisely he has put 'ext4' on top of RBD, the Cepth block layer [Tue Feb 25 05:13:10 2014] sxw: that seems to work, but a bad kernel panic happened and the damaged was not well recovered by the salvager. [Tue Feb 25 05:21:28 2014] If your underlying device has run into corruption issues, I would generally advise restoring from a backup, rather than using the salvager. [Tue Feb 25 05:22:15 2014] The salvager can repair certain classes of errors (generally those relating to consistency issues where some fileserver changes have made it to disk, and others haven't). It isn't really designed to recover from arbitrary disk corruption. [Tue Feb 25 09:07:48 2014] it is very much not designed for that and will silently make things worse, and the only way to know something is wrong is inspect the salvage log :( [Tue Feb 25 11:20:30 2014] secureendpoints1 - the drive letter is mapped to our root.cell [Tue Feb 25 11:20:40 2014] root.cell has no quota and being root.cell is only 11 KB of data [Tue Feb 25 11:21:04 2014] however, windows is showing "0 bytes free of 11.0 KB" [Tue Feb 25 11:22:34 2014] another drive is mapped to my home volume... that one does more what I would expect... 336GB free out of 1.27 TB [Tue Feb 25 11:23:49 2014] the smallest server partition on which root.cell is replicated still has 42 GB free [Tue Feb 25 11:24:38 2014] a .readonly volume never has space free. Its a readonly volume. Its size is the total amount of data in the volume. [Tue Feb 25 11:26:49 2014] an AFS cell is made up of multiple volumes. all of the volumes are exposed as volumes to windows. the explorer shell knows that there are multiple volumes. the bug in the explorer shell caching is that it loses track of the mount points and as a result queries the root of the mapping instead of the destination volume for free space. [Tue Feb 25 11:27:22 2014] The bug occurs when using Ctrl-V and perhaps some other code paths. It does not occur when using the context menu to paste [Tue Feb 25 11:28:46 2014] have you thought about working around the issue by "telling" the shell that the ROs have free space? Users will still be unable to write there, of course ( permission denied, I would guess, or folder marked as "readonly" ) [Tue Feb 25 11:31:25 2014] and, I am potentially happen to open a support case with Microsoft to get this fixed... but I need to be able to know who to reference internally at MS who is familiar with OpenAFS... the frontline people have never heard of openafs and will waste weeks trying to figure out what afs is and how afs works/etc [Tue Feb 25 11:45:42 2014] this issue is not specific to openafs [Tue Feb 25 11:46:51 2014] working around the problem as you describe restores the potential for data loss because the data will be written to the windows cache and the write will fail after the application has already been told it has succeeded [Tue Feb 25 11:47:40 2014] the bug is microsoft's. file bug reports with them. There are open reports with the explorer shell team for this issue. [Tue Feb 25 11:50:07 2014] when filing a report with microsoft you can reference my e-mail address as the vendor contact and microsoft will contact me. [Tue Feb 25 11:59:21 2014] which email address? [Tue Feb 25 11:59:39 2014] jaltman@openafs.org [Tue Feb 25 12:01:06 2014] having gone down this road before, I recommend that you record a video demonstrating the problem while recording all file system activity using "process monitor". submit both the video and the procmon log as your bug report [Tue Feb 25 12:07:29 2014] k [Tue Feb 25 13:56:54 2014] geekosaur: Walex: I am at least somewhat of the belief that we do not have underlying filesystem corruption in this case. I unfortunately don't know how to reproduce the problem with a different partition. [Wed Feb 26 11:26:41 2014] when is 1.6.7 expected to hit? And what's the security issue (I'm planning server work, so, need to try and time things) [Wed Feb 26 11:28:15 2014] some administrative commands currently use unencrypted channels, I think? [Wed Feb 26 11:28:37 2014] I thought that was the default (since a lot of commands have -encrypt as an option)? [Wed Feb 26 11:30:06 2014] oh, hm, maybe not, I missed the thing where most of 1.6.7 got pushed to 1.6.8 [Wed Feb 26 11:30:41 2014] yeah, I'm planning on updating my servers to 1.6 [Wed Feb 26 11:31:45 2014] the thing I'm thinking of was already known in previous release team notes so probably is not what caused this split [Wed Feb 26 11:32:39 2014] unfortunately getting information about significant security issues before release seems rather difficult, nobody wants to talk about it until the fixed version is released :/ [Wed Feb 26 11:35:04 2014] right [Wed Feb 26 11:35:32 2014] so, looks like, as long as I'm not taking advantage of any 1.6 specific stuff, I can safely run mixed 1.4/1.6 db servers? [Wed Feb 26 11:36:41 2014] mm, it's driving without a seatbelt, you can run into issues like the id rollover thing that 1.6 handles and 1.4 falls over on and there's probably potential for other issues [Wed Feb 26 11:36:59 2014] well, it'd be pretty brief [Wed Feb 26 11:37:04 2014] might work if you're careful and you periodically restart all of them so you don't risk id rollover [Wed Feb 26 11:37:12 2014] as in, install new binaries on server A, restart [Wed Feb 26 11:37:15 2014] once it comes back up [Wed Feb 26 11:37:19 2014] install new binaries on server B, restart [Wed Feb 26 11:37:27 2014] then install new binaries on server C [Wed Feb 26 11:37:51 2014] ah, that might work. people *are* running mixed 1.4/1.6 dbservers in the wild [Wed Feb 26 11:38:08 2014] I'd want it to be brief but you're probably ok with that plan [Wed Feb 26 11:38:20 2014] I'd prefer not to take the whole cell down (which itself will cause a whole host of problems with clients going "oh no, I can't get to stuff) [Wed Feb 26 11:38:37 2014] clients will choke just with the one server going down, too, esp for RW data on it [Wed Feb 26 12:00:11 2014] RedFyre: fileserver you mean... [Wed Feb 26 12:01:29 2014] the db servers are also fileservers [Wed Feb 26 12:02:56 2014] RedFyre: I went through a 1.4->1.6 trasition recently much that way. [Wed Feb 26 12:03:17 2014] RedFyre: note that if you take down the fileservers only briefly clients won't notice much, usually, YMMV wtc. [Wed Feb 26 12:03:25 2014] right [Wed Feb 26 12:03:51 2014] as long as the fileserver doesn't decide to salvage... [Wed Feb 26 12:04:11 2014] RedFyre: the 1.4 to 1.6 trasition works because verified that the PTDB/VLDB format for 1.4 and 1.6 is the same. [Wed Feb 26 12:04:45 2014] cool... so no export/re-import required, then [Wed Feb 26 12:04:49 2014] RedFyre: in general in major releases it is not guaranteed to be the same. [Wed Feb 26 12:05:17 2014] RedFyre: most importantly Ubik does not corrupt the PTDB/VLDB between 1.4 and 1.6 servers, so the DB servers can be mixed versions. [Wed Feb 26 12:05:28 2014] something that worked well enough for us. [Wed Feb 26 12:05:30 2014] I didn't really verify it, I just pestered dbrashear and sxw to say so in public. [Wed Feb 26 12:05:56 2014] * is very good at ... pestering :-D [Wed Feb 26 12:06:06 2014] kaduk_: I have your IRC lines to that effect framed on my wall :-) [Wed Feb 26 12:07:03 2014] RedFyre: as to the salvager, we had some amusing issues with the 1.4 salvager, so moving to 1.6 for the fileservers is rather recommended. [Wed Feb 26 12:08:32 2014] amusing issues? [Wed Feb 26 12:09:08 2014] RedFyre: sometimes it does not salvage that well... [Wed Feb 26 12:09:39 2014] that's... unfortunate [Wed Feb 26 12:09:52 2014] RedFyre: so I RSYNC'ed a '/vicepX' to a single-server cell, with 1.6, and that recovered everything. [Wed Feb 26 12:10:06 2014] yes, there have even been significant improvements to the salvager through the 1.6 line [Wed Feb 26 12:10:10 2014] 1.4's is just dire [Wed Feb 26 12:10:22 2014] I've never had an issue with it [Wed Feb 26 12:10:43 2014] RedFyre: I had reported those problems and and others advised trying 1.6 [Wed Feb 26 12:11:12 2014] you might not. 1.4's style is missing some small easily fixed thing until it scales up into a level of corruption that makes it lose everything [Wed Feb 26 12:11:29 2014] so sounds like after going to 1.6, I might want to actually salvage at some point [Wed Feb 26 12:11:34 2014] what you have is a salvager and a file server that do not recognize problems that exist. therefore they are ignored. when you migrate to 1.6 the new file server will see these issues and force a demand salvage if using dafs. If you are not using dafs, you should force a full salvage as part of the upgrade. [Wed Feb 26 12:12:26 2014] * edits upgrade notes [Wed Feb 26 12:14:06 2014] but I don't recommend upgrading file servers in place. build a new file server, move volumes to them one at a time. that way if you end up with a volume that is just broken it stays available on the source file server until it can be offline for an extended period of time. [Wed Feb 26 12:14:40 2014] I'm prolly going to do the upgrade as part of a hardware update [Wed Feb 26 12:15:03 2014] so, move volumes off, open case and do hardware things, close case, reboot server, update from 1.4 to 1.6 on server, move volumes back [Wed Feb 26 12:15:49 2014] that is fine [Wed Feb 26 12:19:00 2014] RedFyre: let's say that when I moved all AFS volumes off a couple of 1.4 fileservers the partitions did not become empty :-). [Wed Feb 26 12:19:24 2014] hunh [Wed Feb 26 12:19:34 2014] Yeah, emptying out my 1.4 fileservers to upgrade them revealed a few inconsistencies between the vldb and reality. [Wed Feb 26 12:19:41 2014] I've previously gone through the data move (when updating from RHEL4 to RHEL6) and did not run into that [Wed Feb 26 12:20:07 2014] kaduk_: in my cases it was between '.vol' and reality :-) [Wed Feb 26 12:20:19 2014] but the moved volumes were good. [Wed Feb 26 12:20:55 2014] so maybe I've been relatively lucky. [Wed Feb 26 12:50:05 2014] RedFyre: this place has had nearly zero problems with AFS for 5 years with 1.4.7... [Wed Feb 26 12:50:37 2014] RedFyre: but some came up during upgrade, because things changed and new parts of the code were exercised. [Wed Feb 26 12:52:29 2014] *nod* [Thu Feb 27 04:19:12 2014] hi [Thu Feb 27 04:19:47 2014] when using apache2-mpm-prefork to service content from afs, the stat calls in afs begin to block very long [Thu Feb 27 04:20:06 2014] is there a way to prevent that? [Thu Feb 27 05:07:15 2014] richard_w: depends very much on what is going on and why. I dearly hope you are not doing dynamic content generation there... [Thu Feb 27 05:07:57 2014] richard_w: here we have several large mirrors on AFS served through Apache2 (prefork IIRC) and it seems fine. [Thu Feb 27 05:09:35 2014] Well we have several cms-sites on our servers and wanted to use AFS to spread them transparently across multiple webservers [Thu Feb 27 05:11:04 2014] so yeah... dynamic content generation might be an issue here [Thu Feb 27 05:12:24 2014] richard_w: you understand how the AFS cache works and callbacks? [Thu Feb 27 05:14:24 2014] richard_w: our mirror with 2 Apache2 frontends works well because the files never/rarely get updated so the callbacks don't get broken. [Thu Feb 27 05:15:38 2014] richard_w: if you want to share frequently modified files over a network there is no cheap option. [Thu Feb 27 05:16:02 2014] the contents of the www/ directory should be more or less static [Thu Feb 27 05:16:30 2014] except for a few images that get uploaded once a week [Thu Feb 27 05:16:30 2014] richard_w: then perhaps it is not callbacks. Look at the fileserver logs then for clues. [Thu Feb 27 05:17:29 2014] richard_w: and also at the fileserver disk load with 'iostat -dx 1' [Thu Feb 27 05:17:46 2014] richard_w: the last column (percent busy) is particularly relevant. [Thu Feb 27 05:18:44 2014] richard_w: also if the data is mostly static except for images in a directory you can use RO replicas and 'vos release' to update them periodically or when necessary. [Thu Feb 27 05:19:51 2014] richard_w: AFS really has two use cases: RO replicas for static cellwide content (e.g. '/usr/') and RW volumes for per-client-system modifiable data. [Thu Feb 27 05:20:32 2014] so RO volume for '/usr' and RW volume for '/home/$USER' for example. That's what it was originally designed for. [Thu Feb 27 05:21:14 2014] we are using it for that, but we figured putting our htdocs on /afs too might be a good idea [Thu Feb 27 05:21:24 2014] now "ls /afs" blocks [Thu Feb 27 05:21:31 2014] "ls /afs/cellname" works fine [Thu Feb 27 05:21:33 2014] lol [Thu Feb 27 05:22:14 2014] also the apache processes hang and can not be stopped except with SIGKILL [Thu Feb 27 05:22:23 2014] richard_w: 'ls /afs' blocking is not a good sign at all. [Thu Feb 27 05:22:35 2014] richard_w: but it depends on whether you have 'dynroot'. [Thu Feb 27 05:22:35 2014] only when apache is running [Thu Feb 27 05:22:50 2014] i think i should try out apache2-mpm-event [Thu Feb 27 05:23:01 2014] richard_w: is your 'cell.afs' volume if any RO? [Thu Feb 27 05:23:08 2014] yes [Thu Feb 27 05:23:33 2014] richard_w: if you have blocking on 'cell.afs' I doubt that the Apache2 threading model matters. [Thu Feb 27 05:23:59 2014] well cell.afs is not blocking, only root.afs [Thu Feb 27 05:24:19 2014] richard_w: yeah, I was meaning 'root.afs' [Thu Feb 27 05:24:26 2014] that is very weird for a RO volume. [Thu Feb 27 05:24:44 2014] well it would make sense if fakestat was not enabled [Thu Feb 27 05:24:45 2014] but it is [Thu Feb 27 05:26:06 2014] richard_w: but it does not make sense because the fileserver(s) for the 'root.afs' replicas would have the few relevant inodes cached all the time, so it is only a network roundtrip. and no actual disk IO. [Thu Feb 27 05:26:21 2014] this purely speculative, but is it possible, that apache2 bypasses fakestat somehow? [Thu Feb 27 05:27:26 2014] richard_w: even without you should not have issues. 'root.afs' is a placeholder with a single entry or very few in most cases. [Thu Feb 27 05:27:36 2014] richard_w: what does 'vos listvldb -name root.afs' say? [Thu Feb 27 05:28:06 2014] root.afs [Thu Feb 27 05:28:06 2014] RWrite: 536870912 ROnly: 536870913 [Thu Feb 27 05:28:06 2014] number of sites -> 2 [Thu Feb 27 05:28:06 2014] server hades.stura.uni-jena.de partition /vicepa RW Site [Thu Feb 27 05:28:07 2014] server hades.stura.uni-jena.de partition /vicepa RO Site [Thu Feb 27 05:29:14 2014] richard_w: how many fileservers have you got? [Thu Feb 27 05:29:46 2014] richard_w: you have a single instance of 'root.afs', you can put one on every fileserver that you have. [Thu Feb 27 05:31:09 2014] same for 'root.cell'... [Thu Feb 27 05:31:20 2014] until now i have only one fileserver [Thu Feb 27 05:31:40 2014] i am planning on setting up another one tomorrow [Thu Feb 27 05:31:53 2014] richard_w: then perhaps the fileserver is overloaded, who knows. [Thu Feb 27 05:32:06 2014] richard_w: try that 'iostat -dx 1' thing [Thu Feb 27 05:32:22 2014] i did... peak was 11% [Thu Feb 27 05:32:33 2014] richard_w: so it is not seeking a lot. [Thu Feb 27 05:33:03 2014] richard_w: any interesting lines in the OpenAFS daemon log files? [Thu Feb 27 05:33:28 2014] richard_w: sometimes you get long wait times when the volumes are busy e.g. replication for backups. [Thu Feb 27 05:36:34 2014] nothing from today on the afs-server [Thu Feb 27 05:39:46 2014] richard_w: then it is very very puzzling. [Thu Feb 27 05:40:40 2014] richard_w: perhaps you have something else in 'CellServDB' or DNS and your clients are trying to reach a non-existent VLDB server? [Thu Feb 27 05:40:54 2014] nope that was the first thing i checked [Thu Feb 27 05:41:02 2014] using apache2-mpm-event [Thu Feb 27 05:41:13 2014] the problem seems to be gone [Thu Feb 27 05:41:13 2014] richard_w: try to run 'tcpdump' or 'wireshark' on the clients anyhow [Thu Feb 27 05:41:23 2014] richard_w: that's extremely bizarre. [Thu Feb 27 05:43:22 2014] some weird interaction between apache2, afs and xen maybe... [Thu Feb 27 05:43:36 2014] richard_w: ahhhhhhh what is this? using Xen? [Thu Feb 27 05:44:02 2014] richard_w: it could just be a scheduling issue with Xen, if you are running an AFS client under Xen. [Thu Feb 27 05:44:41 2014] richard_w: changing threading model changes the way Xen schedules your instances. [Thu Feb 27 05:44:54 2014] damn... sorry i did not tell you before [Thu Feb 27 05:45:04 2014] i was not aware that this might be relevant [Thu Feb 27 05:45:27 2014] that's a bit begging for trouble to run time/delay-sensitive loads in a VM. [Thu Feb 27 05:45:44 2014] latency is a big deal with network filesystems. [Thu Feb 27 05:46:18 2014] but curiously my experience is that network filesystems are better than virtual disks for VMs. [Thu Feb 27 05:49:33 2014] well... i hope i get all the sites working with the new apache again :) [Thu Feb 27 05:49:41 2014] thank you very much for your help [Thu Feb 27 05:59:50 2014] richard_w: but if prefork does not get latencies and event does, perhaps your Xen setup is not right for that kind of stuff. [Thu Feb 27 06:08:04 2014] as we have only 2 physical servers i do not have much of a choice regarding xen [Thu Feb 27 06:10:01 2014] also a lot of people take interest in our infrastructure... for example we have a few "very competent webmasters" who manage to get their sites hijacked at least once every 3 months [Thu Feb 27 06:10:42 2014] if it was my call i would just take their sites down and never give them access to our servers again [Thu Feb 27 06:10:46 2014] but it is not [Thu Feb 27 06:11:16 2014] so we have our quarantine-server for them and it saves us a lot of work [Thu Feb 27 12:35:34 2014] and so begins the data move [Thu Feb 27 12:42:57 2014] RedFyre: onwards to victory! :-) [Thu Feb 27 12:43:09 2014] yep [Thu Feb 27 12:43:25 2014] the trick, of course, is to get as many volumes as possible moved before nightly backups start running [Mon Mar 3 11:15:07 2014] after moving all volumes off of a server, I have this one file left: /vicepa/AFSIDat/f1/f1++U/special/zzzzPgC+++0 [Mon Mar 3 11:15:11 2014] anything I care about ? [Mon Mar 3 11:50:31 2014] probably not if the salvager on the destination server says the volume is good, [Mon Mar 3 11:51:22 2014] RedBear: on an old server after moving off all files I got sometimes a few hundred GB per VICE partition of leftovers... [Mon Mar 3 12:01:08 2014] "the volume" ? [Mon Mar 3 12:01:29 2014] how in the world would I know what volume that refers to? [Mon Mar 3 12:02:54 2014] 536871147 [Mon Mar 3 12:03:38 2014] and the file itself is a special file for the links [Mon Mar 3 12:03:49 2014] for that volume [Mon Mar 3 12:06:54 2014] here's a link to the perl script I used to obtain the volid from the path name: [Mon Mar 3 12:07:12 2014] http://download.sinenomine.net/afs-tools/scripts/volid.pl [Mon Mar 3 12:07:29 2014] I _think_ that's a public link, let me know if you can't reach it [Mon Mar 3 12:08:36 2014] Chrome warns me that this file type may harm your computer, do you want to keep it anyway, so it seems public. [Mon Mar 3 12:08:54 2014] heh [Mon Mar 3 12:09:30 2014] not malware unless you put all perl scripts into that category [Mon Mar 3 12:10:58 2014] so, I want to salvage that volume over on the destination server? [Mon Mar 3 12:12:26 2014] RedBear: not necessarily -- if it transferred allright to the destination server most likely it is consistent already. If you want to be double sure, salvage it. [Mon Mar 3 12:12:35 2014] the question is, then, what do I do about the "file" on the old source server... just delete it with the fs there shut down? [Mon Mar 3 12:13:13 2014] seems to have salvaged ok [Mon Mar 3 12:13:52 2014] yes, if that's the only file remaining, it's safe to delete it [Mon Mar 3 12:14:20 2014] k... thnx [Mon Mar 3 12:16:14 2014] RedBear: to be doubly sure do also an 'fsck' of that filetree. [Mon Mar 3 12:59:04 2014] fscking the vice partitions is part of the server maintenance that will be taking place [Mon Mar 3 12:59:10 2014] fscking should take... a while :) [Mon Mar 3 13:03:11 2014] which actually is what I'm gonna do now as soon as I figure out a good cmdline fsck string [Mon Mar 3 16:34:09 2014] suggestions on kicking a client that is insiting on talking to /afs/.cellname on the wrong server (error "connection timed out")? [Mon Mar 3 16:34:13 2014] linux client [Mon Mar 3 16:39:08 2014] fs or vldb? [Mon Mar 3 16:39:26 2014] if fs, "fs checkvolumes" [Mon Mar 3 16:39:53 2014] if vldb, use "fs newcell" to provide the correct list of servers [Mon Mar 3 16:40:49 2014] fs [Mon Mar 3 16:40:54 2014] tried the fs checkv command, no joy [Mon Mar 3 16:41:06 2014] maybe this problem will go away when I get that one from 1.4.15 to 1.6.x [Mon Mar 3 16:41:32 2014] ah, this time it worked... woo [Mon Mar 3 16:41:37 2014] hm, is this that problem with r/w volumes never timing out? [Mon Mar 3 22:19:13 2014] has "that problem" been fixed anywhere? [Mon Mar 3 22:19:52 2014] not yet, it was only recognized a couple weeks ago [Mon Mar 3 22:21:06 2014] some decisions still need to be made iirc about how to deal with it, and then someone needs to find time to do it [Mon Mar 3 22:35:50 2014] right [Mon Mar 3 22:36:14 2014] I'm almost tempted to buy a support contract just so I can pay someone to fix these things (which will prolly end up happening in the next year or two) [Tue Mar 4 13:20:04 2014] well... mobo died [Tue Mar 4 13:22:36 2014] mobo? [Tue Mar 4 13:23:36 2014] mobo is short for motherboard [Tue Mar 4 13:24:09 2014] hardware maintenance in the first AFS FS is not going so well :) [Tue Mar 4 13:55:14 2014] RedBear: but that isn't the only FS, right? [Tue Mar 4 13:55:21 2014] RedBear: so there's hope! [Tue Mar 4 14:11:11 2014] so, updating to 1.7.29 fr 1.5.x does not uninstall the loopback adapter? [Tue Mar 4 14:16:55 2014] no. loopback adapters are installed for purposes other than afs and are required if you want to run 1.7.x in smb mode [Tue Mar 4 14:17:59 2014] afsrdrfsd is the new IFS filesystem, right? [Tue Mar 4 14:19:03 2014] that is one of the component names [Tue Mar 4 14:20:18 2014] well, it's what I see if I right click on a mapped drive and choose "properties" [Tue Mar 4 14:20:31 2014] ie trying to make sure that the mapping is using IFS and not somehow the old smb redirector [Tue Mar 4 14:20:54 2014] net use will report "OpenAFS Network" and not "Microsoft Network" [Tue Mar 4 14:22:07 2014] that the only way to tell? [Tue Mar 4 14:25:52 2014] you can query using the MPR apis, you can browse the registry, you can check "net use" output, you can view use the explorer shell, you can examine dlls loaded by the explorer shell, and I'm sure there are a half dozen other ways [Tue Mar 4 14:44:22 2014] "view use the explorer shell" ... how would that work? [Tue Mar 4 14:44:27 2014] tho, "net use" may be the easiest [Tue Mar 4 14:44:46 2014] you just used the explorer shell didn't you? [Tue Mar 4 14:45:22 2014] I did... that was my question... if seeing afsrdrfsd is indicative of it *not* using the old smb redirector [Tue Mar 4 14:45:45 2014] the 'rdr' in it sounds like "redirector"... hence the confusion [Tue Mar 4 14:45:45 2014] afsrdrfsd is one of the components of the afs redirector interface [Tue Mar 4 14:46:03 2014] and "afs redirector" != the old smb redirector [Tue Mar 4 14:46:28 2014] the smb redirector is a windows component [Tue Mar 4 14:47:28 2014] its what is used to access any smb file server whether that be a windows xp share, a windows storage server, a samba server, a NetApp filer, or the AFS SMB server [Tue Mar 4 14:58:46 2014] ok. good to know that just using the term "the redirector" is no longer sufficient [Tue Mar 4 14:59:17 2014] a redirector is just a type of file system driver [Wed Mar 5 09:44:27 2014] it should be a 50 sec timeout trying to talk to a vldb server that's currently down and not there, right? [Wed Mar 5 09:46:04 2014] 1.6.5.1 on fedora 19 seems to do the initial call to the currently down server, and then doesn't seem to ever try a different server (even after, now 5 minutes) [Wed Mar 5 09:47:36 2014] fs newcell seems to also not have helped [Wed Mar 5 11:31:34 2014] RedFyre: ha! I saw something like that for the AFS tools clients, but not for the cache clients. [Wed Mar 5 11:32:27 2014] RedFyre: the cache client usually takes 2-3 to 15s to figure out which AFS DB servers are available, but tools like 'vos' can take a long time. [Wed Mar 5 11:32:54 2014] RedFyre: there was also a fairly recent thread about this in the mailing list IIRC. [Wed Mar 5 11:34:43 2014] what is "a long time"? [Wed Mar 5 11:35:05 2014] hrm, do not recall the thread [Wed Mar 5 11:37:01 2014] CybrFyre: in one case hours IIRC [Wed Mar 5 11:37:46 2014] vos does not fail over [Wed Mar 5 11:38:00 2014] https://lists.openafs.org/pipermail/openafs-info/2014-January/040446.html [Wed Mar 5 11:38:14 2014] start of the thread in question [Wed Mar 5 11:38:38 2014] yes, that one... [Wed Mar 5 11:38:58 2014] nor does it support multi-homed servers [Wed Mar 5 11:40:02 2014] secureendpoints1: multihomed servers for OpenAFS are in general a bit begging for trouble IIRC [Wed Mar 5 11:41:03 2014] multi-homed file servers work. not db servers [Wed Mar 5 11:41:15 2014] fileservers have some multihome support, ubik doesn't so dbservers don't. (my recollection) [Wed Mar 5 11:45:58 2014] this was a simple "cd" command [Wed Mar 5 11:47:07 2014] it queried down server's vldb once & never re-tried in the 10 min I was in front of that computer [Wed Mar 5 12:36:11 2014] a "cd" command is processed through the VFS layer by the afs cache manager. The cache manager supports failover for vlservers. (or it should) [Wed Mar 5 12:46:13 2014] evidently something is broken [Wed Mar 5 12:49:46 2014] maybe a newer kernel or 1.6.6 fixes it... fortunately, I haven't yet seen this issue here at work [Wed Mar 5 13:22:26 2014] interesting that dmesg is not showing the usual "afs: Lost contact with volume location server" messages [Wed Mar 5 14:57:55 2014] shoot... I hope this "vos release" isn't gonna hang forever [Wed Mar 5 14:58:05 2014] there we go [Wed Mar 5 15:18:42 2014] hmm... SpyStudio is neat [Wed Mar 5 23:11:28 2014] wow.... those "cd"'s from this morning are *still* hung [Wed Mar 5 23:11:32 2014] that is *not* good [Wed Mar 5 23:12:18 2014] I really hope that's b.c. this is 1.6.5.1 vs newer [Wed Mar 5 23:12:22 2014] if not..... [Thu Mar 6 09:55:08 2014] Hey, I would like to transition our AFS kerberos principal from afs@REALM to afs/cell@REALM - I have already read http://www.openafs.org/security #afs/cell Transition Procedure but so far fail to find out how to 'disable' a principal aor create a disabled in MIT Kerberos. Any hints? [Thu Mar 6 09:58:50 2014] +disallow_all_tix [Thu Mar 6 09:59:14 2014] (it's a principal flag) [Thu Mar 6 10:00:55 2014] ah ok, I read about that flag but was not sure whether 'disabled' menat something different [Thu Mar 6 10:01:06 2014] thanks :) [Thu Mar 6 10:54:48 2014] hmm... w.r.t the explorer shell issue, using the rt click context menu to paste in the file also does not work [Thu Mar 6 11:06:15 2014] I've opened a bug with Microsoft [Mon Mar 10 14:59:28 2014] Does anybody have any experience with windows 7's "Offline Files" feature with OpenAFS? [Mon Mar 10 15:00:44 2014] Its possible to "Make available offline" a samba mount, and I'm wondering if this is compatible with /afs/ paths. [Mon Mar 10 15:01:39 2014] I would expect it's tied to SMB shares, and AFS is not an SMB share (any more) [Mon Mar 10 15:02:35 2014] there is some incomplete work on an offline mode for openafs, but there hasn't been enough interest for anyone to finish it [Mon Mar 10 15:03:41 2014] Yeah, I was briefly excited when I saw "Disconnected mode" on a screenshot of the installer. :-) [Mon Mar 10 15:04:17 2014] offline files did mostly work with the previous SMB share AFS implementation [Mon Mar 10 15:04:52 2014] there were some "issue" but I don't recall having worse problems than when using offline files with a standard windows file server [Mon Mar 10 15:07:49 2014] there might be some related tweaks you can do via parameters listed here: http://docs.openafs.org/ReleaseNotesWindows/appendix_a.html [Mon Mar 10 15:12:00 2014] I don't think I want to use 1.6/SMB redirector. [Mon Mar 10 15:13:03 2014] Any other ideas for disconnected laptop users? Thanks anyway if not. [Mon Mar 10 15:20:57 2014] my experience with offline files and afs is that it did not work [Mon Mar 10 15:21:14 2014] main reason being it would try to resync back to afs after the user had logged out and tokens were discarded [Mon Mar 10 15:29:15 2014] I'd suggest 3rd party sync software [Mon Mar 10 15:29:39 2014] I haven't used this with AFS, but we have users who use ViceVersa with success [Mon Mar 10 18:43:21 2014] got pretty much all our windoze on 1.7.29, now [Mon Mar 10 18:43:52 2014] next thing is to get all my servers to 1.6, assuming that the one server is happy with its ram reseated and swapped [Tue Mar 11 09:40:26 2014] Hello, I am working on trying to set up afs/krb5 without DES by following this document http://www.openafs.org/pages/security/how-to-rekey.txt but when I asetkey after removing the DES entries, I get "asetkey add 4 /tmp/rxkad.nodes.keytab afs/test.cs.pitt.edu@BE.CS.PITT.EDU" I am not sure what I am missing. [Tue Mar 11 09:44:08 2014] if you're setting up without DES then you don't use asetkey at all [Tue Mar 11 09:44:24 2014] asetkey manipulates the (DES-only) KeyFile [Tue Mar 11 09:44:57 2014] non-DES keys go in rxkad.keytab which is a normal keytab [Tue Mar 11 09:58:31 2014] oh ok, I went down that route because I was getting "bos: failed to contact host's bosserver (ticket contained unknown key version number)." when doing kinit followed by aklog with the admin account I created in afs and then doing bos status localhost. I thought something was wrong with my keytab. [Tue Mar 11 09:59:09 2014] pts is also giving me an unknown key version number error [Tue Mar 11 10:06:12 2014] I got it! It hit me that without asetkey, I had not told the server about the keyfile and I found another document telling me that I had to put it in /usr/afs/etc. [Tue Mar 11 10:27:29 2014] where it goes depends on what platform you're on; on debian or others that use FHS paths it's /etc/openafs/server [Tue Mar 11 10:35:58 2014] I think current releases require you to have an (empty) KeyFile but if you have a proper rxkad.keytab in the server directory (/usr/afs/etc or /etc/openafs/server depending on whether it uses transarc or fhs paths) it should prefer that [Tue Mar 11 10:39:02 2014] geekosaur: my understanding is that 1.4 requires a KeyFile with fake data, but 1.6 does not (confirmed, our servers do not have a KeyFile). [Tue Mar 11 10:42:10 2014] there was a bug in 1.6.5 actually (the fix may already have been in 1.6.5.1 though) [Tue Mar 11 10:43:30 2014] oh interesting. We're using Ubuntu 12.04 packages, which are 1.6.1 with a bunch of stuff backported (ewww). [Tue Mar 11 10:45:23 2014] er? that won't work [Tue Mar 11 10:45:44 2014] rxkad-k5 and rxkad-kdf were introduced in 1.6.5 [Tue Mar 11 10:48:14 2014] I think Ubuntu backported it (because they have an adversion to changing version numbers in an LTS?). All I know is the package version is 1.6.1-1+ubuntu0.3 and it works with -k5 and -kdf [Tue Mar 11 10:49:02 2014] mrr, right, dug up the ubuntu diff [Tue Mar 11 10:50:52 2014] #$#^&^@. Is there something I should know about dafileserver on ext4 and large files? Our mirrors partition has eaten itself twice (ext4 journal aborts). [Tue Mar 11 10:51:14 2014] I would wonder if they missed something though, I am vaguely recalling some grumping about ubuntu post-release [Tue Mar 11 10:51:53 2014] nwf: fileserver should be fs-independent. I don't trust ext4 though, they're still finding fs-eating bugs every so often [Tue Mar 11 10:52:07 2014] Maybe, but we didn't run into any problems. [Tue Mar 11 10:52:11 2014] rhel7 went with xfs over ext4 for a reason [Tue Mar 11 10:52:31 2014] s/over/instead of/ [Tue Mar 11 10:52:32 2014] Hm. In-place conversion to XFS is likely tricky; is ext3 better than ext4? [Tue Mar 11 10:53:39 2014] ext3 is stable, yes. ext4 is faster when it works but is still not well debugged in some use cases (dafileserver is somewhat likely to exercise not often used code paths in a filesystem, I think) [Tue Mar 11 10:54:11 2014] Thanks. [Tue Mar 11 11:05:33 2014] geekosaur: jackhill: maybe I screwed something up but after deleting the asetkey entry and having no keyfile, I was getting the key version issue which then went away after putting the keyfile in. I am on centos 6.5 with afs 1.6.6. [Tue Mar 11 11:06:19 2014] strace would probably be somewhat illuminating [Tue Mar 11 11:11:22 2014] for me or the fs issue? [Tue Mar 11 11:13:29 2014] greenmanspirit: for you [Tue Mar 11 11:13:47 2014] (on the server process, to look for what path(s) to rxkad.keytab and KeyFile are being attempted) [Tue Mar 11 11:14:37 2014] ok, I have removed the keytab file from /usr/afs/etc/ since centos uses transarc paths [Tue Mar 11 11:15:45 2014] asetkey list doesn't sho anything, going to restart the server process [Tue Mar 11 11:22:13 2014] so confused [Tue Mar 11 11:22:26 2014] /usr/afs/etc is the transarc path; /etc/openafs/server is debian [Tue Mar 11 11:22:34 2014] asetkey only shows DES keys from KeyFile [Tue Mar 11 11:22:52 2014] yeah and centos uses the transarc paths for afs [Tue Mar 11 11:32:54 2014] I was just saying that I cleared everything out after my screwing around to see if it works without the rxtad.keytab file and empty Keyfile as you said geekosaur, unless I missunderstood. [Tue Mar 11 11:35:31 2014] I was unable to get empty Keyfile and no rxkad.keyfile to work but having rxkad.keyfile with no KeyFile works [Tue Mar 11 11:39:07 2014] empty keyfile/no rxkad.keytab will only work in -noauth mode [Tue Mar 11 11:41:26 2014] what were you expecting that configuration to do? [Tue Mar 11 11:41:57 2014] * somewhat distracted, conference call [Tue Mar 11 11:44:23 2014] ah ok, your message from 10:35EDT made me think that setup would work and if it did, it would be less that I would need to change on out current servers when I update them. We are still on kaserver. [Tue Mar 11 11:45:19 2014] no, I was talking about a bug in the early rxkad.keytab stuff where an empty KeyFile had to exist or the dbserver wouldn't start at all [Tue Mar 11 11:47:45 2014] if you are doing authenticated access then keying material needs to exist somewhere, in older versions that means KeyFile, with rxkad-k5 it's rxkad.keytab unless you're still using DES keys somewhere (that is, you arent doing rxkad-kdf) in which case you need a DES key in KeyFile [Tue Mar 11 11:48:31 2014] also note that kaserver / kas / klog won't work with anything but DES, and I don't think anyone plans to support it [Tue Mar 11 11:48:50 2014] geekosaur's sentence's grammar is not so good and I may be parsing it incorrectly, but it seems to be coming off wrong. IIRC he's in a meeting, so I could try a different version. [Tue Mar 11 11:49:10 2014] am a bit scrambled yes sorry [Tue Mar 11 11:50:04 2014] He is right that if you're doing authenticated access, there needs to be key material somewhere. The only two choices are rxkad.keytab and KeyFile, and which is used depends on the version of the openafs server software, and whether rxkad-k5 is in use. (rxkad-kdf is mostly orthogonal) [Tue Mar 11 11:50:24 2014] If rxkad-k5 is not in use, then it's the KeyFile, no choice there. [Tue Mar 11 11:50:43 2014] that is what I meant yes [Tue Mar 11 11:51:23 2014] If rxkad-k5 is in use, then (on the 1.6 branch), only rxkad.keytab is used, and there does not need to be a KeyFile at all (I'm pretty sure). On the 1.4 branch, only rxkad.keytab should actually be used, but there needs to be a KeyFile present so that the server process will start up. [Tue Mar 11 11:51:56 2014] (On the git master branch, rxkad.keytab is not used, and the new KeyFileExt is used, and asetkey is used to populate KeyFileExt from a krb5 keytab. Confusing!) [Tue Mar 11 11:52:16 2014] my understanding is that should be correct now but there was a bug in 1.6.5 [Tue Mar 11 11:53:14 2014] (actually I think it was more complex than that. or maybe backwards, timestamp issue or something) [Tue Mar 11 11:53:55 2014] There was definitely a version where there needed to be an rxkad.keytab present at startup or it would never be noticed. I don't *think* that was in a released version, but it is all kind of blurred together. [Tue Mar 11 12:11:40 2014] ok, thanks to both of you. We are moving towards krb5 rather than kaserver (I am fighting institutional inertia) and I just want to make sure I do it right and without DES for the security issues. [Tue Mar 11 12:19:54 2014] kaduk_ I thought that was in all released versions because there's currently no way to force a renegotiation of dbserver connections? [Tue Mar 11 12:20:26 2014] except by restarting them [Tue Mar 11 12:26:35 2014] Out of curiosity, over the last two or three years has there been a significant change either way in OpenAFS userbase? [Tue Mar 11 12:26:43 2014] geekosaur: Yes, the ubik connections do present some difficulty. I was thinking about the rxkad.keytab never being used to decrypt incoming connections, which is a somewhat different issue. [Tue Mar 11 12:27:59 2014] I see Windows 8 and 2012 work, so I assume some people must still be using it in modern Windows environments [Tue Mar 11 12:28:00 2014] ashl: not sure what you mean by significant. there have been a number of announced intents by some organizations to drop openafs, mostly not actually happening as they realize their alternative doesn't do things they've come to rely on [Tue Mar 11 12:28:09 2014] in the other direction, gradual growth [Tue Mar 11 12:28:42 2014] something we need is better marketing; too many people seem to believe that openafs is exactly the same as the ancient original version of afs that had major restrictions [Tue Mar 11 12:28:45 2014] http://oit.duke.edu/service-updates/items/2013_06_24_afs.php [Tue Mar 11 12:29:36 2014] https://inews.berkeley.edu/articles/Jul-Aug2013/AFSRetires [Tue Mar 11 12:29:40 2014] (like, at least once a week we gt someone asking about openafs not supporting large files because of whole-file caching by cache managers, which hasn't been true since afs 3.0 came out in the 90s at least) [Tue Mar 11 12:29:54 2014] Not sure what I mean by significant either. I'm mostly trying to get an aggregate feel I guess. [Tue Mar 11 12:32:12 2014] There are sites that have shut down their AFS deployments. There are many other sites that are coasting. Based upon scans of accessible server versions less than half of the deployed AFS servers have been upgraded in the last 18 months. [Tue Mar 11 12:32:49 2014] At the same time there are a number of very large private installations which receive a lot of active attention. [Tue Mar 11 12:33:34 2014] secureendpoints: I assume you are jaltman? I'm particularly interested in your view of the future due to your windows client work [Tue Mar 11 12:33:35 2014] The Windows client is actively deployed as an alternative for Windows profiles at more than a dozen institutions [Tue Mar 11 12:34:08 2014] you have correctly identified me [Tue Mar 11 12:35:28 2014] * notes that one usually does not see public announcements of reversal of course, because it means admitting a mistake. anecdotally a number of sites that "officially" have shut down their afs deployments still actually use it, and some of those have *unofficially* given up on finding replacements [Tue Mar 11 12:36:47 2014] I intentionally posted Berkeley which switched to Box and Duke which took all of their user data offline because they are examples of institutions that did in fact successfully shutdown AFS [Tue Mar 11 12:37:53 2014] There are many others that literally do not know what to do. How do you migrate twenty years of archived data to a new infrastructure without data loss? Its not easy and its not inexpensive. [Tue Mar 11 12:39:16 2014] I know this is hard to answer, but anecdotally do new roll outs still seem to happen? [Tue Mar 11 12:40:16 2014] nwl manages a recent roll out [Tue Mar 11 12:40:24 2014] I just rolled out a new cell at the JHU ACM chapter and people (admittedly students) seem happy with it, FWIW. [Tue Mar 11 12:40:27 2014] sorry, nwf [Tue Mar 11 12:42:28 2014] We conviced the CS IT staff to install the client on the Linux workstations, too. [Tue Mar 11 12:42:55 2014] How often do those linux workstations take (major) kernel updates? [Tue Mar 11 12:44:47 2014] They run stock Fedora, updated every year or so, I think. They're on 20 now. [Tue Mar 11 12:45:41 2014] I have been harassing poor dhowells about the kafs client, trying to lower the barrier of entry to our cell, but haven't gotten very far with that. [Tue Mar 11 12:50:53 2014] We're increasing the number of volumes and users in PTS by an order of magnitude or two. It is still to be seen how many people will actually use it. [Tue Mar 11 13:06:15 2014] ashl, do you have specific questions regarding the windows platform and AFS? [Tue Mar 11 13:12:37 2014] secureendpoints: I think I've already asked them all in the past, thanks [Wed Mar 12 08:27:14 2014] How do you debug a pts problem on client side? I get pts: security object was passed a bad ticket ; unable to list entries [Wed Mar 12 09:28:13 2014] zlug: a bad ticket error can be the result of: 1. the server's key does not match the key used to encrypt the ticket. for example, pts using the wrong cell's token. it can also be due to using a ticket whose start time has not been reached on the server (clock skew). it can also be due to the use of kerberos 5 principal with a "." in the first component. it can also be due to an unsupported enctype for the session key. [Wed Mar 12 09:28:57 2014] zlug: if you are still using kaserver / kerberos v4 then there are a bunch of other possibilities. [Wed Mar 12 09:48:38 2014] secureendpoints: thanks, the problem has arised when switching from MIT to KTH kerberos. I'm just now adding fprintf to the source to perhaps get more info. Is the server's key involved if this error message appear on the client machine (that does not run any server processes)? [Wed Mar 12 09:55:05 2014] Also, I'm running version openafs-1.6.6, and not running any afs-client on the servers. This seems to force me to compile with NO_AFS_CLIENT to get aklog functionallity on the server (compiling it just now). [Wed Mar 12 10:02:19 2014] no luck adding NO_AFS_CLIENT (& install & restart bosserver): doing bos listkeys -server localhost on this afs-server outputs: bos: running unauthenticated; bos: you are not authorized.. [Wed Mar 12 10:04:07 2014] ( btw src/auth/ktc.c should include fcntl.h if NO_AFS_CLIENT is enabled, I guess no-one uses that flag. ) [Wed Mar 12 10:26:01 2014] I'm running with flags NO_AFS_CLIENT and AFS_KERBEROS_ENV in code, and bos listkeys -server localhost now gives more info: bos: ticket contained unknown key version number error encountered while listing keys [Wed Mar 12 10:26:08 2014] nwf: I'm trying to find time to work on it and also to offload to someone who wants some experience in kernel hacking [Wed Mar 12 10:35:44 2014] oh, list -v says my afs/CELL ticket is des3-cbc-sha1, guess is should be des-cbc-crc? [Wed Mar 12 10:40:57 2014] dhowells: <3 [Wed Mar 12 10:42:05 2014] zlug: not necessarily [Wed Mar 12 10:45:26 2014] switched key to des-cbc-crc in ticket afs/CELL, but still same message (running bos listkeys on localhost). [Wed Mar 12 10:52:13 2014] In general, it will be quite difficult to debug keying issues on a server that does not also have a client running; I would not recommend such a setup unless done by an experienced AFS administrator. [Wed Mar 12 11:20:37 2014] The bos command sees a protocol error in the xdr request when extracting the kvno, not sure how that can possibly happen. [Wed Mar 12 11:22:30 2014] ok, after updating to 1.6.x, needed to run restorecon -Rv on /usr/afs [Wed Mar 12 15:35:50 2014] question for anyone that has AD working as their kdc. From my centos machine, I can do kinit and aklog to get my ticket and tokens. However, when I try to use pam_krb5 to do those steps so I can log in with the AD account, I get "KDC has no support for encryption type" when it tries to obtain creds for the afs/cell@REALM account. I have been googling for a while and haven't found a working solution yet as I am trying to avoid DES and allow_weak_ [Wed Mar 12 15:37:47 2014] I didn't think that pam_krb5 had anything to do with AFS service tickets. [Wed Mar 12 15:38:04 2014] Perhaps you are using pam_afs_session as well? [Wed Mar 12 16:00:20 2014] greenmanspirit - you don't have your default enc type set to DES or to something not supported by AD? [Wed Mar 12 16:00:52 2014] o.w., I'd suggest using a wireshark session and seeing what enc types are in the list going across the wire in the tkt request [Wed Mar 12 16:03:01 2014] the other day they were setting up with rxkad.keytab, iirc [Wed Mar 12 16:03:44 2014] this sounds a bit like perhaps pam_afs_session is out of date [Wed Mar 12 16:05:07 2014] I think that forcing pam_afs_session to use an external aklog should suffice. [Wed Mar 12 16:10:14 2014] On this page There is a paragraph (in the uploading to gerrit) section in French that I think is spam. [Wed Mar 12 16:10:30 2014] yeah, the spammers have apparently been trying hard lately. [Wed Mar 12 16:13:35 2014] it would be nice if I could put ids that are known spammers on a block list [Wed Mar 12 16:14:52 2014] its the same id over and over again. Kristof_CAMUS. I or Mike remove the spam and Kristof justs puts it back [Wed Mar 12 16:18:28 2014] I wonder if there is a method of reporting spam on third party sites to Yahoo [Wed Mar 12 16:23:47 2014] RedBear: I don't have the default enctype set in krb5.conf and we have DES turned off on AD [Wed Mar 12 16:24:22 2014] greenmanspirit: are you, or are you not, using pam_afs_session? [Wed Mar 12 16:24:32 2014] geekosaur: I will checkt pam_afs_session. I did not see that set in the docs I found so I am guessing that pam_krb5 calls it? [Wed Mar 12 16:24:39 2014] kaduk_: to you too [Wed Mar 12 16:25:21 2014] the doc for pam_krb5 said it would get afs tokens and I can see output for it's attempt [Wed Mar 12 16:25:44 2014] pam_krb5 should not be getting tokens [Wed Mar 12 16:26:01 2014] The situation is complicated by the fact that there are multiple different things that call themself pam_krb5. [Wed Mar 12 16:26:11 2014] (oh, hm, unless this is rh's mutant instead of ... that) [Wed Mar 12 16:26:23 2014] I don't remember seeing what OS is involved, here. [Wed Mar 12 16:26:59 2014] I am using centos so it is probably rh mutant [Wed Mar 12 16:27:03 2014] and yes, RH's does seem to do tokens [Wed Mar 12 16:29:08 2014] and it doesn't use afs libraries, so yes it probably only speaks DES [Wed Mar 12 16:30:26 2014] awesome... so much for avoiding it [Wed Mar 12 16:30:36 2014] I guess I will send Nalin a link to draft-kaduk-afs3-rxkad-kdf-03. (Hmm, I should probably actually send it as a real I-D and not a random document in my personal space.) [Wed Mar 12 16:31:46 2014] for what it's worth, here is what their pam_krb5 is outputting http://pastebin.ca/2655847 [Wed Mar 12 16:32:04 2014] greenmanspirit, you can probably configure PAM to disable the token getting functionality of RH's opam_krb5, then configure pam_openafs_session int he PAM session stack [Wed Mar 12 16:33:19 2014] actually I don't see a way to disable it :/ [Wed Mar 12 16:33:43 2014] I don't either [Wed Mar 12 16:34:52 2014] so maybe install Russ Allbery's pam_krb5 in place of RH's; you'll need to change the options in the PAM config because it uses different ones from Red Hat [Wed Mar 12 16:34:55 2014] http://www.eyrie.org/~eagle/software/pam-krb5/ [Wed Mar 12 16:36:26 2014] (and will again need pam_openafs_session because it doesn't try to do two different things, it just does one thing well and leave the other to modules that do that oen thing well. that is why pam has module stacks, after alll...) [Wed Mar 12 16:37:27 2014] yeah, makes sense. I ran strings on pam_krb5 and there is an ignore_afs flag listed in there so I am going to give that a try [Wed Mar 12 16:46:48 2014] well, that managed to upset my machine lol [Wed Mar 12 16:57:28 2014] Gerrit seems unhappy "com.google.gwtorm.client.OrmException: Cannot open database connection" when I try to look at a change. [Wed Mar 12 16:58:03 2014] I guess it's been that way for a couple hours; I think I saw that but decided to just wait and see if it resolves. [Wed Mar 12 17:06:44 2014] for sites that have migrated to rxkad.keytab, are you using afs-principals with only aes256 long-term keys or also des3/rc4? [Wed Mar 12 17:07:38 2014] * does not see a lot of point in using des3 unless compatibility with very old krb5s is necessary [Wed Mar 12 17:10:22 2014] given an open university environment it is hard to know about the userbase [Wed Mar 12 17:16:17 2014] eest: the afs service key is not visible to end user systems. it is only seen by the afs servers [Wed Mar 12 17:25:26 2014] right... so there are no size constraint problems in older clients for the opaque crypto blobs or something like that? [Wed Mar 12 17:26:16 2014] there are and if you are using AD the thing you are concerned with is disabling the inclusion of group information [Wed Mar 12 17:26:31 2014] the enctype is not one of your concerns [Wed Mar 12 17:26:55 2014] the PAC stuff? [Wed Mar 12 17:27:12 2014] yes [Wed Mar 12 17:29:16 2014] right, good to know that the enctype size difference is invisible to clients... i guess this means running with _only_ aes is completely reasonable as long as all the afs-servers support it [Wed Mar 12 17:34:22 2014] the KDC will only ever use the first enctype in the list so it doesn't matter if you have others [Wed Mar 12 17:35:06 2014] in other words, a list ordered as des, rc4, des3, aes256 will only ever use des. [Wed Mar 12 17:40:57 2014] in this case i would presume only an aes enctype is available on the afs-principal [Wed Mar 12 17:46:34 2014] thanks for the insight [Thu Mar 13 05:17:59 2014] thanks secureendpoints and kaduk_! For providing me with debugging hints. I eventually followed the install-rxkad-k5-1.6.txt. [Thu Mar 13 10:55:54 2014] secureendpoints - did MS give you any ETA on fixing the win7 explorer shell caching bug? [Thu Mar 13 10:57:27 2014] it has been an open issue for more than a year [Thu Mar 13 10:57:46 2014] how many folk have opened said issue? [Thu Mar 13 10:58:09 2014] they sent me a message saying here's a scope agreement but failed to attach said scope agreement [Thu Mar 13 10:59:10 2014] you need to follow up with msft. I am not their customer. all of my internal connect the dots is completely informal [Thu Mar 13 11:00:07 2014] as for how many have opened a case with them? not enough [Thu Mar 13 11:01:13 2014] ok, their last update was "We have sent an email to the Engineer from developer support team, who is currently working with Jeff on another incident" [Thu Mar 13 11:01:29 2014] "As soon as we get an update from him, I’ll contact you" [Thu Mar 13 11:01:36 2014] him presumably is the engineer [Thu Mar 13 11:02:32 2014] that is fine. I am a third party vendor to them. [Thu Mar 13 11:03:14 2014] unfortunately, the team responsible for the relevant code is in India and I can't just walk over to their office [Thu Mar 13 11:03:28 2014] right [Thu Mar 13 11:03:55 2014] we typically are dealing with vendors in Germany/etc, which makes support for tools in the lab... fun [Thu Mar 13 11:04:11 2014] there's a small window when our tech staff are here and theirs are there for working together :) [Thu Mar 13 11:06:00 2014] well, hopefully they can collaborate with you and get it fixed sooner rather than later :) [Thu Mar 13 11:06:49 2014] there is really nothing more I can do. They have a test machine. This is all about prioritizing developer time on their end. [Thu Mar 13 11:06:57 2014] sure [Sun Mar 16 11:46:57 2014] * grumbles at yet another fedora kernel update lacking kernel-devel so openafs.ko can't be updated to match [Sun Mar 16 12:08:35 2014] "Sun Mar 16 11:34:26 2014 ReadVnodes: unknown tag x38 found, skipping" .. that sounds... bad. [Sun Mar 16 13:45:18 2014] nwf: not necessarily, but check the system logs for filesystem/storage damage [Sun Mar 16 14:53:35 2014] Everybody seems happy... the salvager ran to completion on the source volume with only some gripes about version < inode version, for whatever that's worth. [Fri Mar 21 13:20:26 2014] bind mount? [Fri Mar 21 13:22:05 2014] workaround for a current problem with RHEL kernels which is expected to bite more generally in the future (because a similar change was originally accepted into mainline Linux) [Fri Mar 21 13:22:57 2014] oh is that the recursive mount issue? [Fri Mar 21 13:23:09 2014] so, Linux has always had odd issues when you can access the same AFS directory over multiple paths. a recent patch on RHEL elevates "odd issues" to panics [Fri Mar 21 13:25:01 2014] bind mounts are a workaround whereby (as I understand it, which may well be incorrect) we redirect multiple possible paths to a given volume to the same "basic path" by representing AFS mount points as local bind mounts [Fri Mar 21 13:35:37 2014] somehow afs mounts differ from hard links, then [Fri Mar 21 13:36:07 2014] <[gorgo]> you can't have hard links for directories, that's the root of the problem [Fri Mar 21 13:37:32 2014] ah [Fri Mar 21 13:38:17 2014] afs mountpoints are magic symlinks normally [Fri Mar 21 13:38:44 2014] which are presented as directories via VFS trickery; it's that trickery that is causing panics on RHEL [Fri Mar 21 13:39:55 2014] and yes, you can't make a hard link to a directory; symlinks are the portable way to deal but conflict with the underlying symlink representation, bind mounts are the Linux specific solution [Fri Mar 21 13:40:10 2014] (symlinks also have upward traversal issues) [Fri Mar 21 13:40:22 2014] yep [Fri Mar 21 13:41:33 2014] the underlying issues are a combination of two conflicting design choices: [Fri Mar 21 13:41:34 2014] 1. /afs is represented to the VFS and applications as a single device even though AFS volumes are rooted directory trees without parents that are connected via mount points. [Fri Mar 21 13:41:34 2014] 2. linux doesn't not support hard links to directories. [Fri Mar 21 13:43:50 2014] hmm... I think guy's home volume, at 205GB, is the largest one [Fri Mar 21 13:44:33 2014] right; I was assuming point 1, because you can't really deal with that sanely on a unixlike due to limits on number of (normal) mountpoints [Fri Mar 21 13:44:37 2014] In Windows with the afs redirector AFS volumes are exposed to the IFS as separate volumes with unique device IDs. However, the afs redirector does not use reparse point processing within the IFS to evaluate mount points which permits the afs redirector to return the full path used by the application to access an object. If reparse point processing were used within the IFS then the a query for the path of an object would r [Fri Mar 21 13:44:57 2014] chopped off at "object would r" [Fri Mar 21 13:45:13 2014] the path becomes a pirate [Fri Mar 21 13:45:25 2014] ... return \\afs\cell#volume\relative-path\. [Fri Mar 21 13:49:22 2014] the approach that both Andrew and Marc are exploriing is to leverage the /afs/.mount:/ functionality that was added to work around issues with OSX to generate mounts in the /afs name space that can be used to traverse to the root of the volume via /afs/.mount:/ via the VFS interfaces. [Fri Mar 21 13:50:49 2014] These mount bindings would be created on the fly and torn down in a background thread. [Fri Mar 21 13:51:34 2014] There are a large number of edge cases that need to be addressed and this change will result in application visible side effects. [Fri Mar 21 13:52:11 2014] yep [Fri Mar 21 13:52:46 2014] The implementation is particularly challenging on Linux because of the GPL_ONLY symbol restrictions on a large number of the required interfaces which is going to result in userland upcalls being used. This adds to the complexity and introduces potentials races. [Fri Mar 21 13:55:32 2014] The panics in recent RHEL releases were intentionally introduced by Red Hat because their engineers want to know what modules and in which situations particular kernel interfaces are called. They certainly found out but at the cost of causing their customers to lose data and become aggravated. Something to think about. [Fri Mar 21 13:56:21 2014] * ... *eyeroll* [Fri Mar 21 13:56:55 2014] sadly I cannot claim to be surprised though [Fri Mar 21 13:57:15 2014] I dropped away from RH years ago because of that kind of thing [Fri Mar 21 13:57:53 2014] they do occasionally make some... bad... changes [Fri Mar 21 13:59:07 2014] yep. that "dropped away" followed several instances of arguing with their engineers over such things in the RH6 (that's old RH, not RHEL) days. on returning for work reasons, I quickly confirmed things hadn't actually changed much [Fri Mar 21 14:00:07 2014] in some sense, arguing that stuff now is easier since the whole point of RHEL is to maintain compatiability and not break stuff [Thu Mar 27 05:59:25 2014] Regarding Hartmuts pdf, I have placed it here: /afs/sman.dk/public/georg/AFS-OSD-state-and-future.pdf (temporarily) [Thu Mar 27 05:59:40 2014] Can somebody put it here? https://indico.cern.ch/event/271400/session/13/contribution/21 [Sat Mar 29 03:07:21 2014] Did the ptsldap code ever end up committed or archived somewhere in its most recent incarnation? [Sat Mar 29 03:08:34 2014] (it doesn't appear in my relatively recent checkout, but maybe it got nuked at some point?) [Sat Mar 29 08:47:11 2014] ptsldap is not part of openafs. it is a third party tool [Sat Mar 29 12:49:03 2014] secureendpoints: Where does it live? I couldn't find the source, but maybe my googlefu is weak. [Sat Mar 29 19:47:52 2014] nwf: it used to live at iastate.edu. I don't think it was ever submitted to openafs-contrib [Sat Mar 29 19:47:57 2014] on github [Mon Mar 31 09:23:50 2014] I discovered an issue with using the install instructions for Centos 6.5. Who would I mention this to? [Mon Mar 31 09:25:03 2014] Specifically, the baseurl has to use $arch instead of $basearch, because there isn't a i386 link in the dl.openafs.org server for the EL systems. [Mon Mar 31 09:27:00 2014] foley - prolly depends on where you found said install instructions [Mon Mar 31 09:27:48 2014] These are the install instructions on the openafs documentation section :) [Mon Mar 31 09:28:01 2014] http://docs.openafs.org/QuickStartUnix/index.html#HDRWQ41.html [Mon Mar 31 09:28:53 2014] so, you're saying the repo file included with openafs-repository*.rpm is incorrect ? [Mon Mar 31 09:30:10 2014] if so, submit a but report to openafs-bugs@openafs.org [Mon Mar 31 09:30:33 2014] Yes, at least for Centos 6.5. It sends you to http://dl.openafs.org/dl/openafs/1.6.6/rhel6/i386/repodata/ which does not exist. [Mon Mar 31 09:31:16 2014] I just wanted to check that I hadn't misunderstood something or that this was a known issue. [Mon Mar 31 09:31:34 2014] yeah... submit the bug report and whoever puts that together should see the bug report and fix it [Mon Mar 31 09:32:34 2014] what that means is there are no 32-bit i386 binaires [Mon Mar 31 09:33:10 2014] the only platforms for architectures for which rhel6 is built are i686 and x86_64 [Mon Mar 31 09:34:04 2014] I assume that is intentional. Unfortunatel, I think it confuses the $basearch, which always defaults to i386 [Mon Mar 31 09:34:19 2014] (Writing the bug report now) [Mon Mar 31 09:34:39 2014] building for i386 disables a large amount of compiler/platform functionality [Mon Mar 31 09:35:01 2014] s/compiler/processor [Mon Mar 31 09:36:05 2014] Agreed, this is an i686 machine anyway, so it is a little silly that it is looking in the i386 location [Mon Mar 31 09:38:34 2014] But I am not an expert in yum configuration, this was my quick fix. [Mon Mar 31 09:40:28 2014] Oops. I'm getting an error about openafs-client not being signed. Did I miss a step somewhere to setup keys? [Mon Mar 31 09:40:33 2014] This is Centos 6.5 [Mon Mar 31 09:46:51 2014] I can tell it to ignore the gpg signing, but that seems like a bad idea. Suggestions? [Mon Mar 31 09:51:35 2014] I presume I don't care about these "bogs. volid" volumes on fileserver A since the volumes in reality are moved to fileserver B and seem happy over there on fileserver B ? [Mon Mar 31 09:51:42 2014] or, bogus, not bogs [Mon Mar 31 09:52:59 2014] you should salvage the partition [Mon Mar 31 09:53:08 2014] if available timeslot [Mon Mar 31 09:53:11 2014] the bogus volumes appeared after the salvage [Mon Mar 31 09:53:18 2014] rerun salvage [Mon Mar 31 09:53:30 2014] saw some files left in AFSIdat, ran volid on them which gave me weird IDs like "44" [Mon Mar 31 09:53:33 2014] usual I redo salvage after first run to be clear [Mon Mar 31 09:53:40 2014] did a "bos salvage", and now have bogus volumes [Mon Mar 31 09:53:51 2014] did a vos exa of the voids, and they're on another server [Mon Mar 31 09:54:53 2014] yeah. second salvage didn't do anything to the bogus vols [Mon Mar 31 09:59:42 2014] the salvager does not communicate with the vlserver so it doesn't know what volumes are supposed to be on the server. if the volume isn't supposed to be there. remove it [Mon Mar 31 10:00:01 2014] that's the answer I was thinking, based on previous dicussions... thanks [Mon Mar 31 10:02:25 2014] Anyone else setup openafs on a Centos 6.5 machine recently? [Mon Mar 31 10:29:15 2014] I guess Centos is not so popular with this crowd. [Mon Mar 31 10:31:38 2014] Hi, Joe. Yeah, I don't remember the last time I used a CentOS box. [Mon Mar 31 10:31:45 2014] I've got it running on CentOS 6ish, with whatever their new scheme is for releases. I think with the lastest kernel update I had to hold off because the kmod hasn't shown up in the openafs repos, and I haven't worked out how to rebuild that. [Mon Mar 31 10:32:30 2014] the f19 update for the latest kernel only showed up for me this morning, fwiw [Mon Mar 31 10:32:41 2014] (the kmod that is) [Mon Mar 31 10:32:46 2014] Ah, I was using dkms. I guess I shouldn't be surprised that there are issues. Perhaps this should be reinstalled with Debian then. [Mon Mar 31 10:34:02 2014] fedora is pretty aggresive about taking new kernels, and new kernels have a tendency to require openafs code changes. [Mon Mar 31 10:34:18 2014] Though, Marc's talk last week said that things are getting better, in that regard. [Mon Mar 31 10:34:58 2014] Stability and slow kernel changes is why I was picking Centos [Mon Mar 31 10:35:35 2014] RHEL includes random backports that cannot be predicted by testing upstream Linux. [Mon Mar 31 10:36:00 2014] OK, what is the suggested "enterprise" linux platform then? [Mon Mar 31 10:36:24 2014] If you want Morgan Stanley's answer, Solaris X86 [Mon Mar 31 10:36:45 2014] Heh. Tempting, but no. [Mon Mar 31 10:36:52 2014] every Linux platform has tradeoffs [Mon Mar 31 10:37:26 2014] True, but what is one that you think is the best fit for openafs servers on linux? [Mon Mar 31 10:37:47 2014] I don't have an opinion as to which is the best for any particular organization [Mon Mar 31 10:37:52 2014] Got it. [Mon Mar 31 10:38:25 2014] I turned off the gpg signature check and was able to install dkms openafs on the Centos 6.5 machine. This feels wrong. [Mon Mar 31 10:40:17 2014] ah, yes, either those rpms aren't signed or are improperly signed. This is bringing back memories. I think the last time this bit me I downloaded the rpms, put them in a local yum repo, and signed them with a key my systems trust. [Mon Mar 31 10:45:02 2014] or they are signed properly but with signature parameters that your system does not recognize / accept [Mon Mar 31 10:48:30 2014] Well, the error that I got was that they weren't signed. Will it give that error if the signature is not a format yum/rpm understands? Or would it say "invalid signature" [Mon Mar 31 10:48:30 2014] Something like that. One of these days I'll try to diagnose what is actually happening and file a bug so we can figure those out. [Mon Mar 31 10:48:58 2014] Short term, I should just tell it to not check sigs? [Mon Mar 31 10:49:14 2014] i just successfully mounted an iso image stored in an openafs filesystem as a loop device, mount -o loop , but that feels very, very wrong [Mon Mar 31 10:49:29 2014] If you feel comfortable installing rpms that aren't signed, that's an option. Depends on how comfortable you feel about that. [Mon Mar 31 10:49:42 2014] will i have problems with the loop filesystem? [Mon Mar 31 10:50:12 2014] * waits to see if fileserver successfully reboots [Mon Mar 31 11:08:18 2014] For "enterprise", not so comfortable, but I don't have much of a choice except to do your local-signing hack. [Tue Apr 1 20:24:48 2014] Are material from the European OpenAFS and Kerberos conference available anywhere? [Tue Apr 1 20:34:13 2014] ah, found the page a cern. [Tue Apr 1 21:02:10 2014] hi folks [Tue Apr 1 21:02:41 2014] openafs-client and server failed to build for me in gentoo against the latest kernel, 3.14.0. [Tue Apr 1 21:03:56 2014] I have the patches for 1.6.5-r1, latest version for gentoo, but the build failed. [Tue Apr 1 21:08:20 2014] I'm pretty sure that openafs version is too old for that kernel version. [Tue Apr 1 21:21:17 2014] kaduk_: That's the most recent version available for gentoo. [Wed Apr 2 05:13:25 2014] jackhill: what was the page URL? [Wed Apr 2 05:14:14 2014] hm, there was the announce of 1.6.8 pre today [Wed Apr 2 05:14:48 2014] http://dl.openafs.org/dl/candidate/1.6.8pre1/ [Wed Apr 2 08:57:02 2014] might be the wrong channel but anyone know how to transparently update the longterm keys of kerberos user principals [Wed Apr 2 08:57:20 2014] background is I'd like to switch them from plain old des to aes256 [Wed Apr 2 09:06:04 2014] Rebus: if you can wait for 2x your password policy, you can just add the enc types to the KDC config and wait for users to change their password. Once everyone (and every service) has the new encryption type you can disable des support and future password changes will then eliminate the key [Wed Apr 2 09:10:32 2014] unfortunately there is no password policy :-/ [Wed Apr 2 09:11:24 2014] I remembered that I once had a pam_module that simply set the new password to the old password upon login but can't seem to find it .-/ [Wed Apr 2 09:11:41 2014] oh, yeah pam_krb5_migrate or something like that [Wed Apr 2 09:12:25 2014] http://www.ohloh.net/p/pam-krb5-migrate [Wed Apr 2 09:31:04 2014] I mentioned yesterday, the most recent openafs release for gentoo, 1.6.5-r1 won't build against kernel 3.14.0. [Wed Apr 2 09:31:31 2014] I have the patches that have allowed it to build up to 3.13.7 in place, but it failed for 3.14.0. [Wed Apr 2 09:32:19 2014] cannot help with gentoo, but source 1.6.8. pre 1 is available [Wed Apr 2 09:32:22 2014] http://dl.openafs.org/dl/candidate/1.6.8pre1/ [Wed Apr 2 09:36:44 2014] cclausen: thanks maybe I can use that otherwise there is still pam_exec :-/ [Wed Apr 2 09:51:56 2014] ed_: https://indico.cern.ch/event/271400/timetable/#20140326 [Wed Apr 2 10:02:30 2014] 1.6.6 looks to be the latest stable release, right? [Wed Apr 2 10:02:58 2014] That's what it says at openafs.org/release/latest.html [Wed Apr 2 10:03:10 2014] 1.6.6 was last stable. but 1.6.8pre1 is out and contains fixed for latest kernels IMHO [Wed Apr 2 10:03:57 2014] Amiga4000: Ok. Don't know if I want to do my own ebuild or wait for the devs to do it. I may have to revert to 3.13.7. [Wed Apr 2 12:57:08 2014] I downloaded 1.6.8pre1, ran ./configure --enable-transarc-paths, then make, and make dest. All of this on gentoo. [Wed Apr 2 12:58:04 2014] All seemed to configure and make fine. Since I haven't built from source on gentoo before, and if someone here has, will make install put all where it should be? Including init scripts? [Wed Apr 2 12:58:39 2014] 'make dest' should have already installed the bits that would be installed, actually. [Wed Apr 2 12:59:00 2014] I don't believe that init scripts are included, but if you have one from 1.6.5, I expect it would still work. [Wed Apr 2 12:59:02 2014] make dest doesn't install [Wed Apr 2 12:59:17 2014] It puts everything into a directory tree rooted at the top level of your build directory [Wed Apr 2 12:59:46 2014] sxw: It didn't look like it did. So per normal, I'd do a make install to install everything? [Wed Apr 2 12:59:52 2014] Oh, whoops. Sorry. [Wed Apr 2 13:00:46 2014] It depends on what you're trying to do. [Wed Apr 2 13:01:24 2014] sxw: I'm on gentoo. Just built the 3.14.0 kernel. [Wed Apr 2 13:01:37 2014] You probably want make install, but it won't necessarily match what gentoo's packaging would do. Some packages use 'make install', others use 'make dest' and then copy stuff around [Wed Apr 2 13:01:43 2014] gentoo's latest ebuild for openafs is 1.6.5-r1. [Wed Apr 2 13:02:20 2014] So, I built the latest release and want to try it out and see if I get a working openafs. [Wed Apr 2 13:02:33 2014] sxw: Does that make sense? [Wed Apr 2 13:02:36 2014] If you want something that matches that, your best bet would probably be updating the ebuild to use 1.6.8pre1, rather than doing a "raw" make install [Wed Apr 2 13:03:35 2014] sxw: With the patches that were/are needed for 1.6.5-r1.ebuild, not sure how simple/hard that might be. [Wed Apr 2 13:03:54 2014] I know nothing of gentoo, so can't really advise [Wed Apr 2 13:04:18 2014] You almost certainly won't get a usable init script from make install, and other stuff may or may not be in the places you expect [Wed Apr 2 13:04:49 2014] sxw: Where does make install, install things? [Wed Apr 2 13:05:40 2014] If you have —enable-transarc-paths, in transarc style locations. I think there's a README at the top level that lists all of that. [Wed Apr 2 13:05:48 2014] You can always run make -n to see what its going to do before it does it [Wed Apr 2 13:06:45 2014] sxw: I did the make -n install; and is it the Readme in openafs-1.6.8pre1? right at the top level? [Thu Apr 3 00:19:02 2014] Forgive another stupid question: what'd be involved in making the super UserList understand PTS groups? [Thu Apr 3 00:21:46 2014] each service would need to query the ptserver [Thu Apr 3 00:22:02 2014] at present only the file service does [Thu Apr 3 16:16:00 2014] Rargh! My partition holding mirrors is ext4 and just crashed, interrupting a several-days vos release. I just got to see the 'Deleting extant RO_DONTUSE' message. :'( [Thu Apr 3 16:16:36 2014] you seem to have a lot of trouble with bad underlying filesystems [Thu Apr 3 16:20:07 2014] I seem to, yeah. :( [Thu Apr 3 16:20:31 2014] I'd move everything down to ext3, but moving 15TB of data is not my idea of fun. [Thu Apr 3 16:20:55 2014] I was hoping to at least mitigate some of the issues by having a second RO replica, but no love yet. [Thu Apr 3 16:20:56 2014] i've lost one filesystem due to corruption in 6 years. the machine had bad ram. [Thu Apr 3 16:21:20 2014] We have ECC ram on everything. [Thu Apr 3 16:22:11 2014] ECC only corrects single bit errors and detects 2 bit errors. if you have 3 bit errors, you can still be hosed [Thu Apr 3 16:23:32 2014] Of course, but still, we'd expect to see hardware so far gone have uncorrectable errors as well as the silent failures. [Thu Apr 3 16:23:33 2014] 15 TB... ouch [Thu Apr 3 16:24:18 2014] CybrFyre: Is that sarcastic? 15TB is not so bad, it's just a lot of downtime. [Thu Apr 3 16:25:01 2014] nope, wasn't sarcastic [Thu Apr 3 16:25:11 2014] I wish I could kludge the vos release process to use something resumable as the underlying transport. :\ [Thu Apr 3 16:27:44 2014] I've only lost a filesystem due to corruption when the raid controller went belly up [Thu Apr 3 16:27:53 2014] that really... sucked [Thu Apr 3 16:28:43 2014] software raid for the win [Thu Apr 3 16:29:10 2014] well, t'was a piece of crap Dell PERC [Thu Apr 3 16:29:11 2014] I've had terrible luck with everything. I have lost data to XFS, ext4, ZFS, Venti, ... [Thu Apr 3 16:29:21 2014] XFS was a good self-corrupting FS [Thu Apr 3 16:29:23 2014] Maybe I should just stop using computers. [Thu Apr 3 16:29:50 2014] where would be the fun in that? :) [Thu Apr 3 16:31:48 2014] I think my favorite "welp, that was my data" moment, in retrospect, was the cyclic permutations of "zpool replace" commands. [Thu Apr 3 16:31:51 2014] <[gorgo]> nwf: btw are you using afs on openwrt? :) [Thu Apr 3 16:32:09 2014] [gorgo]: Experimentally, yes. :) [Thu Apr 3 16:32:19 2014] Using the in-kernel driver. [Thu Apr 3 16:32:48 2014] <[gorgo]> got a friend involved with openwrt, he was saying that some crazy guy submitted some afs changes to openwrt :) [Thu Apr 3 16:33:04 2014] Our cell has taken to "until we get puppet, everybody's local configuration gets pushed into AFS nightly and dumped to Venti every week" [Thu Apr 3 16:33:41 2014] [gorgo]: Hahaha; could you ask him to apply the diff? :) [Thu Apr 3 16:34:05 2014] <[gorgo]> he intended to, iirc [Thu Apr 3 16:34:49 2014] <[gorgo]> was asking me about what else is needed for afs, told him I'm not familiar with kafs [Thu Apr 3 16:36:51 2014] Not for anonymous access; for authenticated access yes, and I might package dhowells' utilities for that. [Thu Apr 3 17:25:03 2014] CybrFyre: hmm... hopefully that PERC failure was not recent... I have several TBs on PERC6 and PERCH700 [Thu Apr 3 17:27:55 2014] no, it was a few years back [Thu Apr 3 17:28:03 2014] one major reason I dumped Dell for server stuff [Fri Apr 4 11:31:57 2014] nwf: does the cell argument on "fs mkmount" always require a switch label to be given, do you know? or can the "-cell" label be omitted? [Fri Apr 4 11:32:19 2014] cell should usually be omitted. [Fri Apr 4 11:33:06 2014] kaduk_: so you should normally do "fs mkmount " without interpolating "-cell" before ? [Fri Apr 4 11:33:28 2014] normally you wouldn't include the final cell at all [Fri Apr 4 11:33:30 2014] dhowlls: just 'fs mkmount ' [Fri Apr 4 11:33:38 2014] that's not what I'm asking [Fri Apr 4 11:33:43 2014] (oops, keyboard malfunction on your nick) [Fri Apr 4 11:34:16 2014] *if* you put the argument in, do you *have* to preface it with "-cell" (possibly abbreviated)? [Fri Apr 4 11:34:27 2014] given that it's an optional argument? [Fri Apr 4 11:34:54 2014] I believe you don't need to prefix it with '-cell', given the way our wacky command parser works. [Fri Apr 4 11:35:01 2014] okay, thanks [Fri Apr 4 11:35:21 2014] http://docs.openafs.org/Reference/1/afs.html gives two sets of rules, but they don't quite intersect [Fri Apr 4 11:36:23 2014] it's screwy [Fri Apr 4 11:36:33 2014] yeah, I noticed [Fri Apr 4 11:37:05 2014] replacing the command parser is one of those things that is perennially on people's to-do lists [Fri Apr 4 11:38:02 2014] the problem is that it has to be backwardly compatible, right? [Fri Apr 4 11:38:17 2014] Pretty much. [Fri Apr 4 11:38:34 2014] Maybe not if there is a major version bump involved, but even then it's debatable. [Fri Apr 4 11:39:16 2014] imagine being chased by a baying mob of sysadmins with pitchforks and torches because their pet scripts all broke [Fri Apr 4 11:39:48 2014] well, the right answer is a whole new command suite... but too much of the command intelligence is tied up in the current one instead of useful reusable libraries (this is why most language bindings end up wrapping AFS shell commands) [Fri Apr 4 11:40:04 2014] there's a lot of refactoring that should happen to enable doing this stuff right [Fri Apr 4 11:49:17 2014] <[gorgo]> my colleagues say that the cern eakc t-shirt depicts the large inode collider [Fri Apr 4 11:51:37 2014] how about the large vnode collider, as opposed to the small vnode collider [Fri Apr 4 11:52:05 2014] <[gorgo]> sounds better [Fri Apr 4 15:03:07 2014] hrm... seems that wine does not get along well with root.afs ... lots of hangs trying to talk to other cells [Fri Apr 4 15:03:27 2014] dynroot? [Fri Apr 4 15:03:31 2014] yep [Fri Apr 4 15:03:48 2014] With dynroot in play, root.afs should not be getting consulted at all, IIRC. [Fri Apr 4 15:04:01 2014] tho, dynroot seems to pre-populate with what's in the cellservdb [Fri Apr 4 15:04:04 2014] do you mean fakestat? [Fri Apr 4 15:04:11 2014] fakestat-all, actually [Fri Apr 4 15:05:42 2014] eitherway, it's hanging presumably because wine is stat'ing those folders in there which belong to other cells [Fri Apr 4 15:06:12 2014] probably not wine /per se/, file managers love to do that [Fri Apr 4 15:06:29 2014] well, it's the "save" dialog in an application running under Wine [Fri Apr 4 15:06:54 2014] either way, dmesg was slowly filling up with timeouts to remote servers [Fri Apr 4 15:08:00 2014] oh, yeh, windows file dialogs. which are still in some sense the file manager [Fri Apr 4 15:08:43 2014] well, whatever the case, fakestat doesn't appear to be helping there [Fri Apr 4 15:09:19 2014] urgh [Fri Apr 4 15:09:30 2014] fakestat-all should do that [Fri Apr 4 15:09:55 2014] clearly not [Fri Apr 4 15:09:57 2014] but a sufficiently persistent program can defeat that [Fri Apr 4 15:22:38 2014] I don't get the impression it's being overly persistent [Fri Apr 4 15:28:07 2014] What I see going across the wire to the remote servers are "rx version" to to the fileservers and "call get-entry-by-name root.cell" "rx data vldb call probe" [Fri Apr 4 15:29:05 2014] that sounds like fakestat isn't on [Fri Apr 4 15:29:38 2014] root 12546 1 0 Mar27 ? 00:00:00 /usr/vice/etc/afsd -dynroot -fakestat-all -afsdb [Fri Apr 4 15:44:57 2014] CybrFyre: You might consider shipping a minimal CellServDB on your clients? [Fri Apr 4 15:46:00 2014] that's certainly something I could do in the future [Mon Apr 7 14:18:31 2014] secureendpoints1 - MS is archiving my explorer shell ticket since it's a dupe and since there is a level 4 engineer working on the issue on the "duped of" ticket with you [Tue Apr 8 07:23:41 2014] OK, I’ve now lost a night’s sleep to this — I’ve got a kerberos system setup, but I can’t seem to get openafs working. when using afs-newcell then afs-rootvol it’s failing to modify the /afs ACL. when following instructions to do it all manually, it’s failing on adding a non-admin user. I’m at my wits end here, what on earth is the secret? [Tue Apr 8 07:24:05 2014] tried with -localauth ? [Tue Apr 8 07:24:58 2014] YEs, but at that point shouldn’t I be able to use my auth tokens? [Tue Apr 8 07:25:21 2014] I can add the user with -localauth, that doesn’t make the auth tokens suddenly start working for the fs commands to add the acl [Tue Apr 8 07:26:43 2014] at which point are you? I do not know the newcell script [Tue Apr 8 07:27:07 2014] usual you create the servers, you create the volumes and set ACLs with localauth [Tue Apr 8 07:27:26 2014] the fs command accepts localauth? o.o [Tue Apr 8 07:27:37 2014] after the root.afs and cell.afs volumes are up and with correct ACLs, and you have created the admin user, you restart the OpenAFS [Tue Apr 8 07:28:11 2014] I’ve tried following http://techpubs.spinlocksolutions.com/dklar/afs.html and https://openafs.dk/doku.php?id=server:openafs [Tue Apr 8 07:28:30 2014] the first says afs-newcell and co, the second the raw commands [Tue Apr 8 07:29:01 2014] I'm not familiar with afs-newcell, but if you are running in dynamic root mode, you won't be able to change the ACL on the root (/afs) volume - it's dynamically generated [Tue Apr 8 07:29:34 2014] yeah, I stumbled upon that as well, so I disabled DYNROOT temporarrially [Tue Apr 8 07:30:02 2014] http://mtn.i2p-projekt.de/openafs/Setup.Cell.txt are my (o0lder) notices to create a cell [Tue Apr 8 08:10:14 2014] Amiga4000: that’s filing on the vos create … a root.afs [Tue Apr 8 08:10:20 2014] failing* [Tue Apr 8 08:10:27 2014] vos just hangs [Tue Apr 8 08:11:05 2014] until it eventually times out with Possible Communications Failure [Tue Apr 8 08:15:39 2014] even with localauth? [Tue Apr 8 08:16:38 2014] wait, -noauth it is for me [Tue Apr 8 08:17:06 2014] but I did create a few OpenAFS cells with the short script I posted above [Tue Apr 8 08:24:23 2014] ugh, this makes no sense. Why is it failing so hard. D: [Tue Apr 8 08:33:24 2014] oh. I missed a line, that’s possibly why! [Tue Apr 8 08:44:20 2014] argh, now I’m can’t mount /afs to do the last section of that file [Tue Apr 8 08:44:32 2014] Ugh, sorry for the broken english, I’m running on 0 sleep [Tue Apr 8 08:56:29 2014] Amiga4000: what’s the latest version of OpenAFS you’ve used that on? [Tue Apr 8 09:19:01 2014] AmandaC: 1.6.1 IMHO [Tue Apr 8 09:19:20 2014] Oh. the servers weren’t started [Tue Apr 8 09:19:29 2014] now I’m back to the familiar fs: You don't have the required access rights on '/afs' [Tue Apr 8 09:19:59 2014] started with -noauth? [Tue Apr 8 09:20:07 2014] no, should it be? [Tue Apr 8 09:22:18 2014] restarting bosserver with -noauth doesn’t make the setacl command succeed [Tue Apr 8 09:24:17 2014] Running with -noauth won't help fs setacl succeed [Tue Apr 8 09:24:27 2014] Do you have tokens? [Tue Apr 8 09:24:32 2014] What does "tokens" say? [Tue Apr 8 09:24:51 2014] https://www.irccloud.com/pastebin/Qr4GaCAZ [Tue Apr 8 09:25:50 2014] ^ tokens [Tue Apr 8 09:26:36 2014] And is the user that you have those tokens for in system:administrators? [Tue Apr 8 09:27:26 2014] … wtf [Tue Apr 8 09:27:28 2014] I was in there! [Tue Apr 8 09:29:31 2014] Well, now I am again, and it’s still giving me “You don’t have the re…" [Tue Apr 8 09:34:27 2014] If you change your group membership you need to drop and require tokens in order for that change to take effect [Tue Apr 8 09:34:36 2014] Either aklog -f, or unlog followed by aklog [Tue Apr 8 09:34:37 2014] ah [Tue Apr 8 09:35:26 2014] would kdestroy; #do the whole dance again [Tue Apr 8 09:35:28 2014] work? [Tue Apr 8 09:35:58 2014] No. kdestroy doesn't remove your AFS tokens. unlog does that. [Tue Apr 8 09:36:54 2014] ah [Tue Apr 8 09:39:45 2014] Still getting the error after unlog; aklog [Tue Apr 8 09:41:04 2014] You're trying to do fs setacl /afs, right? [Tue Apr 8 09:41:11 2014] yeo [Tue Apr 8 09:41:13 2014] And you've definitely got dynroot disabled? [Tue Apr 8 09:41:18 2014] yep [Tue Apr 8 09:41:31 2014] what does "fs examine /afs" report? [Tue Apr 8 09:41:49 2014] same error: fs: You don't have the required access rights on '/afs' [Tue Apr 8 09:42:02 2014] ( for fs examine as well ) [Tue Apr 8 09:42:24 2014] and does your kerberos realm name match your cell name except that cell name is lower case and realm name is upper case? [Tue Apr 8 09:42:51 2014] nope, different names. I’m under the impression that should be possible, though [Tue Apr 8 09:43:24 2014] and I have the REALM.NAME in /etc/openafs/krb.conf [Tue Apr 8 09:43:25 2014] yes, but you need to tell the servers that via the krb.conf file [Tue Apr 8 09:43:36 2014] on all of your servers [Tue Apr 8 09:43:53 2014] There’s only the one cell server [Tue Apr 8 09:44:27 2014] the kerberos realm is on a different server though, does that need poking as well? [Tue Apr 8 09:44:47 2014] can you "pts membership system:administrators" [Tue Apr 8 09:45:17 2014] https://www.irccloud.com/pastebin/Qum4omhY [Tue Apr 8 09:45:53 2014] and you token is for amanda.admin and amanda.admin is list in the UserList file? [Tue Apr 8 09:46:22 2014] your kerberos ticket is for amanda/admin@REALM ? [Tue Apr 8 09:46:29 2014] yep, and yep [Tue Apr 8 09:47:04 2014] the only way to find out who the file server thinks you are is to turn on the audit log [Tue Apr 8 09:47:25 2014] How does one do that? [Tue Apr 8 09:47:38 2014] itis documented on the fileserver man page [Tue Apr 8 09:47:55 2014] I have to run to a meeting [Tue Apr 8 09:52:01 2014] audit log says: amanda.admin@DARKDNA.NET [Tue Apr 8 09:52:42 2014] ( Which is the correct realm ) [Tue Apr 8 11:16:14 2014] just to check, if I want to change the options to a server process (ie add the -auditlog option to the fileserver), I can just edit BosConfig and then do a bos restart of the fileserver? [Tue Apr 8 11:16:39 2014] No [Tue Apr 8 11:18:10 2014] BosConfig is only read when the bosserver itself restarts [Tue Apr 8 11:18:11 2014] ok, best way then? [Tue Apr 8 11:18:21 2014] ok, so I can do a bos restart -bossserver ? [Tue Apr 8 11:18:43 2014] er, bos restart -all -bosserver, I think it is [Tue Apr 8 11:18:45 2014] Also, the bosserver writes to BosConfig when it shuts down, so any changes you make there are overwritten when the bos server restarts [Tue Apr 8 11:19:15 2014] yeh, there's a hack (possibly needs to be enabled at build time) with BosConfig.new though, IIRC [Tue Apr 8 11:19:23 2014] so, again, best way? [Tue Apr 8 11:19:28 2014] Yeah, BosConfig.new is enabled by default [Tue Apr 8 11:19:32 2014] but then you get in trouble if there were any changes made to the config using bos [Tue Apr 8 11:19:58 2014] so the correct way to do it is through bos, not by hacking BosConfig / BosConfig.new, unless you make all changes usiing BosConfig.new [Tue Apr 8 11:20:01 2014] Yeah, don't do that. [Tue Apr 8 11:20:06 2014] so, I can cp BosConfig BosConfig.new ... edit BosConfig.new to add the -auditlog option, then bos restart -all -bosserver ? [Tue Apr 8 11:20:30 2014] bos does not appear to have an "update" option that I can see... I'd rather not delete then recreate the fileserver process [Tue Apr 8 11:21:24 2014] ok, I see BosConfig.new in the man page for BosConfig [Tue Apr 8 11:24:35 2014] any recommendations on how often to rotate out the auditlog ? [Tue Apr 8 11:25:53 2014] which, if doing with logrotate, I presume I want to use the "copytruncate" option [Tue Apr 8 11:29:46 2014] CybrFyre: I would presume the log rotation depends on how busy your server is [Tue Apr 8 12:04:16 2014] how do you tell from a vldbentry if an entry is locked? is VLLOCKED set in vldbentry::flags? [Tue Apr 8 12:06:55 2014] LockTimestamp is non-0 [Tue Apr 8 12:07:39 2014] sxw1: there isn't such a field named in struct vldbentry [Tue Apr 8 12:08:10 2014] is it in one of the spare fields in [un]vldbentry? [Tue Apr 8 12:10:07 2014] Actually, you can get it from flags [Tue Apr 8 12:10:21 2014] One of VLOP_ALLOPERS being set means that the entry is locked [Tue Apr 8 12:11:09 2014] 0x10 | 0x20 | 0x40 | 0x80 | 0x100 [Tue Apr 8 12:11:31 2014] (corresponding to locked for move, locked for release, locked for backup, locked for delete and locked for dump) [Tue Apr 8 12:11:51 2014] ah... so the VLOP_xxx flags? [Tue Apr 8 13:05:41 2014] Back, sorry, ended up taking a short nap. [Tue Apr 8 13:06:39 2014] should the audit log be saying amanda.admin or should it be saying amanda/admin? [Tue Apr 8 13:47:40 2014] Just to be clear, what format should the krb.conf be in? I’vejust got REALM — should it be @REALM or similar [Tue Apr 8 13:48:55 2014] krb.conf should be a single line containing a space-separated list of realms e.g. REALM1.COM REALM2.EDU REALM3.NET [Tue Apr 8 13:49:30 2014] Thought so [Tue Apr 8 13:49:36 2014] then I’m at a los [Tue Apr 8 13:49:38 2014] loss* [Tue Apr 8 14:10:42 2014] Figured stout! [Tue Apr 8 14:10:46 2014] figured it out* [Tue Apr 8 14:11:08 2014] Turns out I needed to also have the bloody krb.conf in /etc/openafs/server [Tue Apr 8 14:11:19 2014] that's actually the only place you need it [Tue Apr 8 14:11:28 2014] Ah, I see [Tue Apr 8 14:42:36 2014] I assume it’s safe to turn dynroot back on after the afs server’s been all setup? [Tue Apr 8 14:46:08 2014] AmandaC: should be safe, yes [Tue Apr 8 14:46:16 2014] k [Tue Apr 8 15:00:06 2014] btw, does this openssl vuln affect openafs? [Tue Apr 8 15:00:25 2014] openafs does not use openssl. [Tue Apr 8 15:01:23 2014] k. won't worry about restarting openafs stuff after updating openssl, then [Tue Apr 8 15:01:50 2014] It doesn't even affect everything that uses OpenSSL [Tue Apr 8 15:02:06 2014] true, but unless someone has a definitlve list, hard for me to know [Tue Apr 8 15:02:27 2014] The second version of the debian advisory included a tool that did some checking, IIRC. [Tue Apr 8 15:02:42 2014] If you're using OpenSSL to provide a TLS based service that's internet visible, then there are probably reasons that you should be concerned. [Tue Apr 8 16:23:43 2014] hmm... seems 1.7.29 win is refusing to refind a server that was "bos restart"ed [Tue Apr 8 16:33:36 2014] ended up having to push the button... hopefully no file corruption [Tue Apr 8 20:08:43 2014] so, is there a flow for migrating an openafs cell between kerberos implementations? heimdal to mit, say. [Tue Apr 8 20:09:03 2014] Migrating ... what KDC it talks to? [Tue Apr 8 20:09:14 2014] hm [Tue Apr 8 20:09:20 2014] yes, i guess [Tue Apr 8 20:09:28 2014] that's pretty straightforward, isn't it [Tue Apr 8 20:09:34 2014] I wouldn't think that anything special would be needed for that. [Tue Apr 8 20:11:44 2014] well, what if I wasn't able to migrate the data? (I don't know much about Kerberos or AFS, sadly, so stop me if this is stupid) if I had to make all new principals in MIT Kerberos, would there be any problems? [Tue Apr 8 20:11:45 2014] potentially recompile aklog [Tue Apr 8 20:12:46 2014] worse case for the server-side should just be changing the afs service principal in the KeyFile [Tue Apr 8 20:13:22 2014] unless there is something I can't think of that actually directly touches Kerberos... [Tue Apr 8 20:13:29 2014] cclausen: this is the case for when I make new principals? [Tue Apr 8 20:13:53 2014] (or should that not matter?) [Tue Apr 8 20:14:14 2014] sbaugh: yes, new KDC principals require new KeyFile (unless you know the old password and can regenerate a matching service principal) [Tue Apr 8 20:14:51 2014] interesting [Tue Apr 8 20:15:05 2014] this doesn't sound that bad, happily [Tue Apr 8 20:15:36 2014] note that client side krb5.conf files may need to be changed. Afs itself doesn't care, but aklog or various PAM that gets AFS tokens might [Tue Apr 8 20:16:07 2014] the MIT and Heimdal settings are not completely compatible in all cases. (At least I had some issues on Windows with the krb5 configs going from MIT to Heimdal.) [Tue Apr 8 20:16:21 2014] right [Tue Apr 8 20:16:28 2014] and that was just changing client libs, not doing any KDC change [Tue Apr 8 20:18:46 2014] what I'm actually migrating is from Heimdal to FreeIPA's wrapper around MIT Kerb, and FreeIPA will hopefully make the client-server migration pretty easy, but didn't know if OpenAFS would have trouble with it [Tue Apr 8 20:19:04 2014] er, client migration* [Tue Apr 8 20:22:36 2014] The AFS and kerberos protocols don't overlap very much, and when they do it tends to be the standardized parts of kerberos, so it wouldn't matter which implementation is actually used. Changing the service's private keys in the KeyFile/rxkad.keytab is likely to be the only "complicated" part. [Tue Apr 8 20:24:57 2014] well, my concern is more about the loss of all principals and recreating them, but it sounds like that's not actually an issue? I guess a general Kerberos question: they don't have any information in them, they're just authentication targets? [Tue Apr 8 20:28:37 2014] not in unix. windows loads extra information into them, but a principal in the KDC is just principal name and a list of keytypes and keys (and metadata for management purposes) [Tue Apr 8 20:28:46 2014] that is, things like last change time etc. [Tue Apr 8 20:29:22 2014] that said, don't underestimate the pain caused by users having to all roll new passwords... [Tue Apr 8 20:29:40 2014] hooray, thanks [Tue Apr 8 20:31:00 2014] yeah, but there's only 20 or 30 active users, so it shouldn't be too bad [Tue Apr 8 20:31:59 2014] The AFS server just checks "is this the local realm" and strips the realm for users in the local realm. You may need to put a krb.conf in place to say what realm to use as the local realm, but the prdb should be okay. [Tue Apr 8 20:32:08 2014] actually there are a few other things but mostly you can "automate" setting them: default/maximum ticket life, principal flags (usually you have one set for users and one for service principals) [Tue Apr 8 20:32:58 2014] if you're going to MIT, you can define those in policies and simplify recreating the principals that way [Tue Apr 8 20:33:16 2014] by referencing the policy instead of having to specify them for every principasl [Tue Apr 8 20:39:40 2014] that's helpful, thanks [Wed Apr 9 14:51:14 2014] dbrashear: can you tell me what the blkaddr list returned by VL.GetAddrs() represents? [Wed Apr 9 14:52:19 2014] it looks like a list of 127.0.0.1, 127.0.0.2, 127.0.0.4, 127.0.0.6, 127.0.0.7 for your-file-system.com [Wed Apr 9 14:53:04 2014] if I pass these to VL.GetAddrsU() with VLADDR_IPADDR, I get back an abort [Wed Apr 9 14:57:40 2014] you're calling GetAddrs and not GetAddrsU? [Wed Apr 9 14:59:16 2014] it's what openafs-1.6.6 seems to do [Wed Apr 9 14:59:23 2014] (they're a list of IPs, where the next to last field in the RPC is a returned number representing the number of addresses) [Wed Apr 9 14:59:44 2014] "vos listaddrs -c your-file-system.com" [Wed Apr 9 15:00:40 2014] vos listaddrs gets a list of addresses and then walks the list and does GetAddrsU to get all the addresses for each address, istr [Wed Apr 9 15:00:42 2014] leastways, openafs calls GetAddrs then GetAddrsU with VLADDR_INDEX, incrementing the index each time [Wed Apr 9 15:01:18 2014] I was trying to call GetAddrs then GetAddrsU with VLADDR_IPADDR for each address returned [Wed Apr 9 15:01:19 2014] yeah, ok. that sounds plausible. anyway, i inadvertantly epoched away a couple messages. what issue are you having? [Wed Apr 9 15:01:54 2014] GetAddrs() returns a list of addresses that are all 127.0.0.x - where x seems to correspond to the index number [Wed Apr 9 15:02:14 2014] but GetAddrsU() with VLADDR_IPADDR doesn't like the addresses [Wed Apr 9 15:03:29 2014] I was trying to understand just what GetAddrs() returns in the address list [Wed Apr 9 15:04:09 2014] I can send you a pcap file if you want a look [Wed Apr 9 15:04:52 2014] I wonder if you're getting the MH references back. [Wed Apr 9 15:04:59 2014] MH? [Wed Apr 9 15:05:14 2014] Internal vlserver nonsense. The way it handles multihomed addresses. [Wed Apr 9 15:05:14 2014] nah, i am refreshing my memory. it's a server id which is mapped into the IP address space [Wed Apr 9 15:05:40 2014] ah... so it's not an IP address? [Wed Apr 9 15:06:09 2014] Once upon a time, it probably was. And if you have any "old" fileservers, you probably get a real IP back from them [Wed Apr 9 15:06:52 2014] I guess that's why VLADDR_INDEX is used, then [Wed Apr 9 15:07:29 2014] they're not addresses [Wed Apr 9 15:08:16 2014] Are they not addresses for some types of servers. They're the contents of the IpMappedAddr block in the vldb header, which looks like once upon a time it would have contained addresses [Wed Apr 9 15:08:26 2014] I see [Wed Apr 9 15:08:33 2014] is there a limit to the number of 'index' slots that an vldb server may have? [Wed Apr 9 15:08:37 2014] istr they are addresses for pre-MH servers [Wed Apr 9 15:09:12 2014] GetAddrs() returns a count, but there can be holes in the index list [Wed Apr 9 15:09:21 2014] I see vos listaddrs skipping over them when they abort [Wed Apr 9 15:10:17 2014] Limit of 255 currently. But that's a vlserver limitation, rather than a protocol one. [Wed Apr 9 15:10:38 2014] but in theory there can be a race where, say, the last entry is deleted before you get that far - so you never find the last simply by incrementing and trying again [Wed Apr 9 15:42:32 2014] You get a specific abort code if you ask for an index that's too large [Wed Apr 9 15:47:07 2014] VL_INDEXERANGE (363549L) [Wed Apr 9 15:47:18 2014] so I see [Wed Apr 9 15:48:03 2014] Really, there should be a VL_GetUUIDS that returns every fileserver UUID, and then you call VL_GetAddrsU on each returned UUID [Thu Apr 10 12:16:09 2014] The IT department is forcing me to rename my domain. I want to rename the AFS zone to match. I haven't found anything about this doing a quick google search. Do people know of a document detailing how to do this? [Thu Apr 10 12:16:20 2014] dev.ru.is will become hack.ru.is [Thu Apr 10 12:18:03 2014] Oh, I'm very sillly. My brain said zone. If you search for "cell", you get something useful http://lists.openafs.org/pipermail/openafs-info/2012-April/038121.html [Thu Apr 10 12:57:34 2014] foley: good luck with the rename [Thu Apr 10 13:30:06 2014] I missed a chance to hear Jeff talking about YFS yesterday although I was sick :/ [Thu Apr 10 13:31:10 2014] fang64: Anything in particular you wanted to know? [Thu Apr 10 18:43:14 2014] how is the freebsd openafs-client port these days? [Thu Apr 10 21:33:15 2014] how is the freebsd client port these days? [Fri Apr 11 02:14:31 2014] chrisb: For light usage it seems fine. I've run into some troubles with large files, tho'. [Fri Apr 11 02:15:30 2014] https://rt.central.org/rt/Ticket/Display.html?id=131773 . [Fri Apr 11 02:15:47 2014] I think it's been better on FBSD 10-STABLE than it was on 8-STABLE, but I haven't prodded it extensively. [Fri Apr 11 19:13:39 2014] (now, how was it i build a netbsd kernel module?) [Fri Apr 11 19:13:50 2014] (i should know, i'm like the only person that should know) [Fri Apr 11 19:14:41 2014] config.status: error: cannot find input file: `src/shlibafsauthent/Makefile.in' [Fri Apr 11 19:18:19 2014] oh duh. [Sun Apr 13 02:41:38 2014] Is the DNS discovery of AFS domains enabled by default? [Sun Apr 13 02:42:19 2014] That is, if I try and go to /afs/foo.example.com — will it lookup foo.example.com for the relevant details? [Sun Apr 13 06:12:53 2014] So it seems that is, indeed, the case. [Sun Apr 13 06:13:37 2014] Next question — what’s the practicality of multiple nodes in the same cell? [Sun Apr 13 07:00:26 2014] <[gorgo]> you mean multiple servers? [Sun Apr 13 07:01:10 2014] <[gorgo]> disk space, redundancy [Sun Apr 13 09:19:31 2014] many of our customers have widely distributed cells, with servers providing locality --- that is, data most relevant to a particular location is stored (or replicated, as appropriate) on file servers at that location [Sun Apr 13 09:35:19 2014] Hrm [Sun Apr 13 10:24:40 2014] multiple file servers gives you additional space, redundancy of r/o data, and allows you to do non-disruptive maintenance on fileservers (empty a file server, wait 2 hours, take it down, nobody will notice) [Sun Apr 13 10:25:06 2014] (usually) [Sun Apr 13 10:25:18 2014] there's some oddness with r/w volumes being poked at [Sun Apr 13 10:25:55 2014] ah. [Sun Apr 13 10:26:16 2014] R/w stability is somewhat important [Sun Apr 13 10:27:26 2014] specifically unix cache managers can inappropriately hold on to r/w volumes that have been moved if the fileserver that used to hold them goes away completely and they have not had reason to contact the new server. it doesn;'t come up very often [Sun Apr 13 10:27:54 2014] (it's a little more complex than that actually) [Sun Apr 13 10:29:37 2014] I’m considering maybe using it for syncing /home on a small set of servers to start with [Sun Apr 13 10:34:29 2014] your use of the word syncing has me mildly worried. what do you mean by that? [Sun Apr 13 10:35:02 2014] (i only ask because sometimes people get ideas about what afs can do that don't match what it can actually do, so i always like to clarify what exactly someone is trying to do) [Sun Apr 13 10:36:28 2014] kula: I don’t mean like rsync, I mean like “Same data everywhere”, is all. :p [Sun Apr 13 10:36:47 2014] ah, good. just checking. [Sun Apr 13 10:44:19 2014] How exactly do the multiple mounts thing work? Like the /vicepa I had to make, can I add a /vicepb to the same cell later on? [Sun Apr 13 10:50:16 2014] yeah. when you create an afs volume, you specify which fileserver and vice partition it lives on. [Sun Apr 13 10:51:16 2014] so, for example, my home directory is a volume called u.kula, it's on a particular server and vice partition, but i can move it to another one seamlessly. [Sun Apr 13 10:52:16 2014] when a client needs to access that volume, it asks the volume location db where it currently is, and talks to the fileserver there. [Sun Apr 13 13:20:16 2014] How would I experess “Users in my domain in AFS can read/write/see these files, but nobody else can”? [Sun Apr 13 13:22:31 2014] AmandaC: system:authuser rlidwk [Sun Apr 13 13:23:26 2014] ah, thanks [Sun Apr 13 13:24:09 2014] that is " @b(authenticated users in my domain in AFS can read/write/see these files, but nobody else can”) [Sun Apr 13 13:25:39 2014] but if they;re not authenticated, you can't tell what cell they're "in" [Sun Apr 13 13:25:42 2014] there is still the issue that root on a client can do anything on the client including dig around in the local AFS disk cache [Sun Apr 13 14:27:11 2014] Perhaps I’m being dense from my sleep deprivation, but how do I enable AFS token passing? the GSSAPIDelegateCredentials directive in my ~/.ssh/config doesn’t seem to be doing the job [Sun Apr 13 14:32:23 2014] I think generally one doesn't pass the token directly, instead using GSSAPIDelegateCredentials and something in the session startup to get new tokens (pam_afs_session or an explicit aklog in the dotfiles). [Sun Apr 13 14:33:29 2014] There is a "KerberosGetAFSToken" sshd_config option, but it's very unclear to me what it's actually supposed to do and in what scenarios it should be expected to work (I don't believe I've ever seen it work). [Sun Apr 13 14:36:29 2014] afaik that only works with pre-gssapi kerberos support [Sun Apr 13 14:36:30 2014] ah [Sun Apr 13 14:37:07 2014] so what’s the “right way” assuming, for example, I want to stuff a user’s /home in there? [Sun Apr 13 14:37:10 2014] my understanding is that GSSAPIDelegateCredentials handles kerberos ticket forwarding; getting a token from that requires that pam_openafs_session be configured [Sun Apr 13 14:37:25 2014] (and sshd configured to run the PAM session stuff) [Sun Apr 13 14:37:50 2014] (although on most systems the latter should be default as things break pretty badly if you don't set up a PAM session) [Sun Apr 13 14:38:18 2014] ah [Sun Apr 13 14:47:20 2014] So, from what I gather libpam-afs-session is what I want. it then uses libpam-krb5 — does libpam-krb5 interact with the GSSAPIAuthentication setting in the sshd? [Sun Apr 13 14:48:11 2014] it should not be relevant, since if you are using gssapi it's not going through (and can't go through) pam [Sun Apr 13 14:48:52 2014] the pam documentation assumes that you're using pam for everything, but the actual authentication in sshd (except in the exact case of using a normal password) does not use pam for auth, only for session [Sun Apr 13 14:48:53 2014] ah… I see. [Sun Apr 13 15:05:09 2014] Thanks for answering my silly questions — toodles [Sun Apr 13 16:40:09 2014] anyone understand the intended semantics of VN_HOLD() osi_vnhold() and AFS_FAST_HOLD()? [Mon Apr 14 15:57:57 2014] hrm... perhaps having the openafs server rpms restart the server processes is not the best answer [Mon Apr 14 16:00:07 2014] why? [Mon Apr 14 16:03:19 2014] It's a user-visible outage if you're not paying attention? [Mon Apr 14 16:03:37 2014] it's also a user-visible outage if you are [Mon Apr 14 16:03:43 2014] I neuter the debian init scripts when I take updates in my production cell, and let the weekly restart take care of picking things up. [Mon Apr 14 16:03:46 2014] since putting the updated software on the system and restarting for the update should not necessarily be the same step [Mon Apr 14 16:03:52 2014] (Though, I should probably get rid of the weekly restart...) [Mon Apr 14 16:04:05 2014] * was expecting to install the RPMs then restart server processes at a later time [Mon Apr 14 16:04:14 2014] and was unpleasantl8y surprised when the servers also restarted [Mon Apr 14 16:08:13 2014] I was going to say "weekly restart?" [Mon Apr 14 16:08:55 2014] I should also switch to demand-attach ... too many things, too little time. [Mon Apr 14 16:09:16 2014] In theory, there are other people maintaining the cell as well. [Mon Apr 14 16:09:37 2014] * thinks correct behavior here is no auto restart, and review configs before manual restart "just in case" (not really necessary for openafs usually, this is more a general thing) [Mon Apr 14 16:17:20 2014] For Debian-like systems using a policy-rc.d may be a more eligant way to prevent automatic restarts than "neuter"ing the init scripts [Mon Apr 14 16:18:26 2014] In this particular case, adding 'exit 0' to /etc/default/openafs-fileserver is quite reliable and easy. [Mon Apr 14 16:18:34 2014] I don't pretend that it's a general solution, though. [Mon Apr 14 16:24:44 2014] indeed, but it is a neat feature that is not apparent unless you're looking for it :) [Mon Apr 14 16:59:30 2014] k... all unixy stuff is at a new enough version of 1.6... windows are all at 1.7.29... gotta figure out how to turn off single DES, now [Mon Apr 14 17:10:36 2014] hunh... there's this AFS_USE_BINARY_RESTART in the init.d script [Mon Apr 14 17:11:13 2014] tho, I'm wondering why it's if [ "x$AFS_USE_BINARY_RESTART" = "xyes" ]; then [Mon Apr 14 17:11:23 2014] (note the extra 'x's in the variable name and value) [Mon Apr 14 17:12:15 2014] that is a common technique in case AFS_USE_BINARY_RESTART is not defined [Mon Apr 14 17:12:24 2014] Long ago, there was a sh(1) whose test(1builtin) implementation did the wrong thing with comparisons against the empty string. So portable code added an x at the beginning of each side of the comparison as a workaround. [Mon Apr 14 17:12:28 2014] the extra x's ? [Mon Apr 14 17:12:38 2014] (You also get problems if you forget to put the variable expansion in double quotes.) [Mon Apr 14 17:13:26 2014] so what would one put in the sysconfig/openafs script, then? [Mon Apr 14 17:13:38 2014] AFS_USE_BINARY_RESTART=xyes [Mon Apr 14 17:13:39 2014] ? [Mon Apr 14 17:13:39 2014] AFS_USE_BINARY_RESTART=yes [Mon Apr 14 17:13:42 2014] or no [Mon Apr 14 17:13:46 2014] no. [Mon Apr 14 17:13:59 2014] so, something there auto removes that x from the beginning of each string in that test? [Mon Apr 14 17:14:01 2014] x$var where var is yes expands to xyes without your help [Mon Apr 14 17:14:11 2014] that's..... interesting [Mon Apr 14 17:14:30 2014] that x$var expands to x(thing in var)? [Mon Apr 14 17:14:35 2014] not really. [Mon Apr 14 17:14:49 2014] that you need that x trick [Mon Apr 14 17:15:08 2014] the user needs no trick. the script may need it. [Mon Apr 14 17:15:15 2014] tho, I'm also wondering why it seems to tryu and restart anyway if `grep -c 'checkbintime 16 0 0 0 0' /usr/afs/local/BosConfig` = 1 [Mon Apr 14 17:15:48 2014] afaik the x hack is only needed for ancient Bourne shell [Mon Apr 14 17:15:52 2014] which was buggy [Mon Apr 14 17:16:04 2014] I'd be surprised if any Linux needs it. sunos4, otoh... [Mon Apr 14 17:16:56 2014] I'll put that in so that my next server rpm updates don't surprise me... [Mon Apr 14 17:17:35 2014] and make sure my checkbintime is NOT 16 0 0 0 0 [Tue Apr 15 22:07:21 2014] is find(1) supposed to work in /afs? [Tue Apr 15 22:08:06 2014] i get things where volume mount points aren't traversed and stuff [Tue Apr 15 22:08:21 2014] find, or gnu find? [Tue Apr 15 22:08:30 2014] gnu find is "special" [Tue Apr 15 22:09:43 2014] NetBSD find(1) specifically [Tue Apr 15 22:10:30 2014] what's special about gnu find? [Tue Apr 15 22:11:08 2014] basically, i'm trying to figure out if this is a issue in the NetBSD port i'm working on or a issue in general [Tue Apr 15 22:11:53 2014] i also get things like; ls: fts_read: No such file or directory [Tue Apr 15 22:12:08 2014] gnu find tries to optimize stat-ing when deciding something is a directory. at least according to this man page. [Tue Apr 15 22:12:51 2014] i suppose is should install gnu find and see :-) [Tue Apr 15 22:13:05 2014] http://savannah.gnu.org/bugs/?24140 [Tue Apr 15 22:15:38 2014] huh, looks like it's just netbsd's find code (fts) being sucky [Tue Apr 15 22:15:55 2014] gfind is traversing stuff like i see on my Debian box [Tue Apr 15 22:16:57 2014] mostly [Tue Apr 15 22:19:23 2014] looks like i'm still having occasional trouble with lookups of .. [Tue Apr 15 22:32:35 2014] recent gnu find is apparently not special any more [Tue Apr 15 22:33:24 2014] that is, sufficiently recent ones no longer enable the leaf optimization by default, because enough network filesystems don't support it [Tue Apr 15 22:34:07 2014] maybe fts still has the optimization enabled [Tue Apr 15 23:36:49 2014] for me, find works just fine in /afs... thos ometimes 've had to use the -noleaf option [Tue Apr 15 23:37:06 2014] tho, RHEL :) [Wed Apr 16 13:11:13 2014] I wish GNU find wouldn't jump up and down screaming about "your filesystem is f***ed" when it encounters OldFiles/OldFiles. [Wed Apr 16 14:08:42 2014] find -noleaf ? [Wed Apr 16 14:21:16 2014] That doesn't make the error "find: File system loop detected; `./acmsys/OldFiles/acmsys/OldFiles/acmsys' is part of the same file system loop as `./acmsys/OldFiles/acmsys'." go away. [Wed Apr 16 14:25:43 2014] oh, true. that is an actual loop [Wed Apr 16 14:28:07 2014] Hey all, I am working on setting up our krb5/afs config and I can kinit and aklog without any trouble on linux but the mac is through an error on aklog. Here is the output of dtruss aklog -d http://people.cs.pitt.edu/~adam/dtruss_out any help is appreciated. [Wed Apr 16 14:28:34 2014] and what is the error? [Wed Apr 16 14:28:57 2014] ah. -1765328228 [Wed Apr 16 14:29:15 2014] KRB5_KDC_UNREACH [Wed Apr 16 14:29:31 2014] so you get to install a kerberos configuration [Wed Apr 16 14:30:14 2014] i have /etc/krb5.conf [Wed Apr 16 14:30:32 2014] ok. then you get to install a *correct* configuration [Wed Apr 16 14:30:54 2014] does that file name KDCs for a realm BE.CS.PITT.EDU? [Wed Apr 16 14:31:51 2014] it has kdc = be.cs.pitt.edu [Wed Apr 16 14:32:11 2014] and does be.cs.pitt.edu resolve? it doesn't for me [Wed Apr 16 14:32:38 2014] it is in a local test environment, I have it in /etc/hosts [Wed Apr 16 14:32:54 2014] I can ping it and I can kinit to it [Wed Apr 16 14:33:12 2014] doesn't mean the kerberos library you linked is willing to use /etc/hosts; [Wed Apr 16 14:33:44 2014] you can try designating it as a kdc by ip address and see if it works [Wed Apr 16 14:34:01 2014] ok [Wed Apr 16 14:34:21 2014] is this a new KDC setup? e.g. do you have any working clients against this KDC? [Wed Apr 16 14:34:32 2014] b/c it could be a firewall rule blocking connections to the KDC [Wed Apr 16 14:34:45 2014] (s)he said kinit works [Wed Apr 16 14:35:07 2014] I have kinit working on the mac and I can do both on linux [Wed Apr 16 14:39:43 2014] same error with an ip address [Wed Apr 16 14:39:48 2014] can still kinit though [Wed Apr 16 14:56:15 2014] so, mac os 10.6... can't install 1.4.14.1 or newer (through 1.6.5, newest pkg available) [Wed Apr 16 14:56:23 2014] installs all fail with: install:didFailWithError:Error Domain=NSCocoaErrorDomain Code=512 UserInfo=0x119b68b00 "“AFSBackgrounder.app” couldn’t be moved to “Resources [Wed Apr 16 15:52:04 2014] hola [Wed Apr 16 16:45:30 2014] Is there any way to have da{file,vol}serv rescan and re-attach partitions without restarting it? When ext4 bites us, the server lets go of the partition and the only way we can get it back is by restarting, but this nukes any vos transactions in progress, including vos dumps, which take forever and have to be restarted. [Wed Apr 16 16:55:40 2014] no [Wed Apr 16 16:58:30 2014] Hm. Can we cause "bos restart" to wait for the end of any volser transactions in progress? [Wed Apr 16 17:00:48 2014] Not really. [Wed Apr 16 17:01:48 2014] There's not a channel for the volserver to indicate that it's still making progress, so the bosserver will ultimately timeout and SIGKILL it. [Wed Apr 16 17:02:35 2014] morning [Wed Apr 16 17:03:52 2014] Pity. Oh well, this 'vos shadow' will get re-run when we go through the lot next, I guess. [Wed Apr 16 17:05:19 2014] Will 'vos shadow -incremental' DTRT on an aborted shadowing attempt, or am I going to have to manually intervene? [Wed Apr 16 17:30:20 2014] nwf: just started to review your patch into openwrt - sorry for the delay [Wed Apr 16 17:30:50 2014] wrong room [Wed Apr 16 17:32:45 2014] <[gorgo]> dbrashear: it's about afs support in openwrt [Wed Apr 16 17:32:56 2014] <[gorgo]> though indeed it is not openafs [Wed Apr 16 17:58:52 2014] wigyori: No worries; there's been no hurry [Wed Apr 16 18:39:31 2014] I'll be interested to try that out, if it ever makes it into openwrt. I run that at home on my router and my switch [Wed Apr 16 18:58:07 2014] anyone have a 'cache manager works correctly' test suite? [Thu Apr 17 10:20:23 2014] someone in here was talking about firefox profiles in afs issues? [Thu Apr 17 10:24:30 2014] It looks like the AIX build-bot is not always successfully posting stdio transcripts? See http://buildbot.openafs.org:8010/builders/aix-builder/builds/11419 for example. [Thu Apr 17 10:55:18 2014] CybrFyre: I think the firefox issue was more a problem of running multiple copies of firefox with the same profile due to the lock file that gets created. I believe that the issue is not specific to AFS and would exist with any type of non-local home directory. I could be thinking of something else though... [Thu Apr 17 10:57:28 2014] that's the issue I'm aware of, and yes it happens with any network-shared home directory [Thu Apr 17 10:59:47 2014] nwf: I think the builders in general only save the most recent N build logs. [Thu Apr 17 11:18:23 2014] ok [Thu Apr 17 11:18:34 2014] we see every once in a while where something gets fouled up in the user profile [Thu Apr 17 11:33:40 2014] kaduk_: Ah. I was hoping to find out which of the coverity fixes broke AIX and why. [Thu Apr 17 11:38:08 2014] http://gerrit.openafs.org/#change,11019 is the one that broke AIX and Solaris [Thu Apr 17 11:39:13 2014] AIX and osx are strict about enforcing the library symbol export lists, which is a common thing for the AIX builder to choke on when the linux ones are fine. [Thu Apr 17 11:39:50 2014] Is "asprintf" a thing on AIX and Solaris? [Thu Apr 17 11:40:13 2014] it built fine on master [Thu Apr 17 11:40:15 2014] We may use it from roken on one or both of those platforms. [Thu Apr 17 11:41:56 2014] but there is no roken on 1.6 [Thu Apr 17 11:42:05 2014] secureendpoints1 - any word from Rahul on the explorer shell issue? [Thu Apr 17 11:42:26 2014] not for public discussion [Thu Apr 17 11:43:23 2014] your conversations with microsoft are covered under a mutual nda [Thu Apr 17 15:29:10 2014] dbrashear: Hey, just thought I would let you know that even thought I had the ip address in there, putting a dns record in for our kdc fixed the problem. [Sat Apr 19 17:15:32 2014] I don't suppose the VLDB locks record when they were taken? [Sat Apr 19 17:16:53 2014] (I want to have nagios alert us if a volume stays locked for too long, since that probably means that a transaction has failed) [Sat Apr 19 21:34:30 2014] Does anybody have contact information for (or happen to be) an AFS cell admin at iastate.edu? Their website's suggestions for contacting the engineering department have been bouncing, and besides this is probably a more direct way to get what I want... [Sat Apr 19 21:36:42 2014] there is a john@iastate.edu email address in this list: http://lists.openafs.org/pipermail/afs3-standardization/2013-August/002744.html [Sat Apr 19 21:37:18 2014] Ah ha. [Sat Apr 19 21:37:56 2014] this might prove useful too: http://lists.openafs.org/pipermail/foundation-discuss/2013-October/000004.html [Sat Apr 19 21:39:16 2014] Thanks! [Sat Apr 19 22:01:28 2014] Forgive another stupid question, is there any automation possible for 'vos shadow'-like functionality between cells? [Sat Apr 19 22:02:00 2014] you can vos dump and restore to a differnet cell [Sat Apr 19 22:02:20 2014] How does 'vos shadow -incremental' know what date to use for the 'dump' half of that equation? [Sat Apr 19 22:02:30 2014] not sure on that [Sat Apr 19 22:02:52 2014] you can possibly abuse http://docs.openafs.org/ReleaseNotesWindows/Linked_Cells.html to fail-over between cells [Sat Apr 19 22:03:12 2014] If it's one of the dates available in 'vos exa' then it should be a "mere matter of programming"... [Sat Apr 19 22:03:13 2014] (this is just theoricial, I don't know of anyone who has done this) [Sat Apr 19 22:03:36 2014] Oh... oh boy that sounds exciting. [Sat Apr 19 22:03:52 2014] I think that particular option is too rich even for my crazy tastes. [Sat Apr 19 22:04:48 2014] this might interest you as well: http://workshop.openafs.org/afsbpw08/talks/wed_1/OpenAFS_and_the_Dawn_of_a_New_Era.pdf [Sat Apr 19 22:06:29 2014] Morgan Stanley had some proprietary "Volume Management System" to sync volumes across multiple cells world-wide. [Sat Apr 19 22:07:12 2014] they could GLOBALLY distribute files within something like 2 seconds and b/c of afs callbacks, all clients would immediately be notified of new data [Sat Apr 19 22:08:05 2014] has [Sat Apr 19 22:09:57 2014] geekosaur: you saying you have implemented vos shadow functionality across cells? [Sat Apr 19 22:10:25 2014] we have not. but MS is about the only customer I'm allowed to admit to anything about, and they're still using that [Sat Apr 19 22:10:55 2014] Speaking of, do either of the Linux cache managers translate callbacks to something inotify() can see? [Sat Apr 19 22:11:17 2014] geekosaur: oh, I see. you were correcting the "had" in my statement [Sat Apr 19 22:12:05 2014] geekosaur: I don't suppose you could get MS to open source the vms code? [Sat Apr 19 22:12:26 2014] cclausen: Mostly I am frustrated with how the ACM's mirroring scripts are rsync-based and often encounter an upstream-update-in-progress (our Debian mirror is currently on retry number four for its current run). [Sat Apr 19 22:12:35 2014] I believe that the various bits at http://oskt.secure-endpoints.com/ were at least partially from MS [Sat Apr 19 22:13:03 2014] don't think so, no. I think they;re not real proud of it in reality; they've been making noise about coming up with something better (and I'm pretty sure that's all I'm allowed to say on that subject) [Sat Apr 19 22:13:06 2014] We, by mirroring into AFS, of course, provide down-streams with atomic snapshots of the last successful mirror run, and I was sort of wondering if some of our upstreams could be made to point at other AFS cells and not need to use rsync. [Sun Apr 20 02:23:17 2014] <[gorgo]> nfw: the guy who originally wrote vms at ms later at another company wrote a similar thing called openefs which got opensourced, but it has only nfs support AFAIK. take a look at http://openefs.org/ [Sun Apr 20 02:23:22 2014] <[gorgo]> also http://workshop.openafs.org/afsbpw10/wed_keynote.html [Sun Apr 20 02:27:13 2014] inotify is GPL ONLY and afs cannot use it [Sun Apr 20 02:35:22 2014] <[gorgo]> nwf: botw what you're talking about sounds a bit like incremental vos releases, only between cells. vos can decide what timestamp to use with for dumping the the volume on the RW site (not directly the RW, but the freshly cloned local RO or the RClone if there is no local RO) based on the some timestamp of the remote RO, AFAIK [Sun Apr 20 02:36:00 2014] Which timestamp? [Sun Apr 20 02:36:15 2014] secureendpoints1: The in-tree cache manager could, tho', right? [Sun Apr 20 02:36:42 2014] <[gorgo]> nwf: I don't know the exact details, you should check the vos release code [Sun Apr 20 02:36:47 2014] OK [Sun Apr 20 02:37:04 2014] Having gone spelunking through vos once has not upped my desire to do it again, but yes, I will. [Sun Apr 20 02:39:36 2014] <[gorgo]> it might be the last cloned timestamp though, which won't be directly usable for you [Sun Apr 20 02:42:57 2014] david howell's kafs could support inotify [Sun Apr 20 02:58:15 2014] <[gorgo]> nwf: actually I seem to recall it might be the last update timestamp - 3 seconds that it uses for the incremental dump [Sun Apr 20 02:59:05 2014] I guess it's OK to do "last timestamp - something" for any non-negative value of something? [Sun Apr 20 02:59:34 2014] <[gorgo]> nwf: if you check this timestamp on the remote copy and you're sure that it hasn't been updated locally [Sun Apr 20 03:00:03 2014] <[gorgo]> then you can be sure that it has been "copied" sometime after that timestamp [Sun Apr 20 03:00:03 2014] (incremental dumps are presumably incremental only in the each-whole-file-or-not sense?) [Sun Apr 20 03:00:47 2014] <[gorgo]> yes, it includes full files that had their vnodes changed after the specific timestamp [Sun Apr 20 03:00:53 2014] * nods [Sun Apr 20 03:01:00 2014] <[gorgo]> so if you add one byte to a 10GB file [Sun Apr 20 03:01:08 2014] <[gorgo]> the incremental dump will contain the full 10GB file [Sun Apr 20 03:02:27 2014] Yeah; this should be OK -- we tell our users to store only read-only large files in AFS. [Sun Apr 20 03:03:15 2014] <[gorgo]> also if you have huge directory structures which don't change much, the -omitdirs option can also help you for the dump size [Sun Apr 20 03:03:28 2014] (Though it's only so much fun when somebody doesn't listen and we get to watch the server chew up all its IO bandwidth -- which is too little, given the hardware we're running on; still don't know wtf is up with that -- copying a 20G file becase a backup run had happened in the interim.) [Tue Apr 22 10:16:26 2014] even with 1.6.7, I'm still seeing "a kernel problem has occurred" and the tracebath is warn_slowpath_common, warn_slowpath_null, d_splice_alias, afs_linux_lookup, and so on [Tue Apr 22 11:05:30 2014] <[gorgo]> 1.6.7 is a security release with only two security-related patches added to 1.6.6 [Tue Apr 22 11:06:07 2014] <[gorgo]> also AFAIK 1.6.8 will not have any changes regarding the dentry aliases either... [Tue Apr 22 11:06:49 2014] <[gorgo]> redhat issued an updated kernel for rhel6 afaik to fix this on the kernel side [Tue Apr 22 11:09:08 2014] afaik any changes in openafs are long term (e.g. there is currently haggling over a helpful kernel function that is currently gplonly), the change has to come from rh [Tue Apr 22 11:20:22 2014] well, I skipped 1.6.6 [Tue Apr 22 11:20:43 2014] was that the custom kernel I saw, or is it a regular update kernel? [Tue Apr 22 11:21:02 2014] this particular machine is not at the latest and is at : 2.6.32-431.3.1.el6.x86_64 [Tue Apr 22 11:21:17 2014] instead of 2.6.32-431.11.2.el6.x86_64 [Tue Apr 22 11:31:49 2014] <[gorgo]> I belive the change you want is exactly in 431.11.2 [Tue Apr 22 11:36:59 2014] ok... I may prioritize getting 431.11.2 out then [Tue Apr 22 11:38:11 2014] tho, changelog for 431.11.2 only shows sctp: fix sctp_sf_do_5_1D_ce and validate vhost_get_vq_desc return value [Tue Apr 22 11:38:20 2014] so, maybe a non-released interim change [Tue Apr 22 14:07:40 2014] k... kernel upgraded on node... we'll see if those errors go away [Wed Apr 23 21:05:35 2014] I think I just succeeded in rekeying my afs cell (hcoop.net). If I aklog and klist shows: "afs/hcoop.net@HCOOP.NET \ renew until 04/24/2014 21:04:20, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96" it means success, right? [Wed Apr 23 21:07:13 2014] If the token works :) [Wed Apr 23 21:07:26 2014] it means at least partial success. aklog succeeding only proves a little bit more. does the resulting token work? that is, can you access something that doesn't grant access to system:authuser or system:anyuser? [Wed Apr 23 21:08:36 2014] * being the wordy-bastard version of kaduk_... :) [Wed Apr 23 21:08:44 2014] it works, I can access everything, servers restarted fine [Wed Apr 23 21:08:46 2014] so I guess I win! [Wed Apr 23 21:10:14 2014] well, that was easy. Next to make sure mod_waklog with the rxkad-kdf patches work (rxkad-k5 based on appears to continue working) [Fri Apr 25 02:42:29 2014] Does anybody have a working waklog & suexec setup for Apache? [Fri Apr 25 02:48:51 2014] I seem to be able to either suexec my in-AFS CGI scripts to not-the-webserver or have the CGI scripts end up with tokens, but not both. [Fri Apr 25 12:00:33 2014] nwf: we do at hcoop [Fri Apr 25 12:01:55 2014] nwf: you are using apache2-suexec-custom? Apache 2.2 or 2.4? (haven't tested waklog on 2.4 yet, was planning to this summer) [Fri Apr 25 12:15:34 2014] nwf: mod_waklog produces a fair bit of debugging info in error.log if you turn the log level up to DEBUG [Fri Apr 25 12:25:06 2014] unknown_lamer: apache2-suexec-custom, yes. Apache 2.2.22. I have logging turned up to debug, but I don't know what to look for, I'm afraid. I see it grab tokens for me, at least. [Fri Apr 25 12:25:53 2014] nwf: can you throw up a some of the error.log in lisppaste or pastebin [Fri Apr 25 12:26:09 2014] or I guess you could put it somewhere in afs space that was world readable [Fri Apr 25 12:26:15 2014] Sure. Lemme collect pieces. [Fri Apr 25 12:32:11 2014] unknown_lamer: Logs and the informative subset of the config are available in /afs/acm.jhu.edu/user/nwf/Public [Fri Apr 25 12:33:02 2014] These are from fresh startup and a single request of a shell cgi script which runs, among other things, `tokens`. It reported no tokens held by the cache manager. [Fri Apr 25 12:33:46 2014] If I comment out the "SuexecUserGroup nobody nogroup" line, then my cgi script ends up running as www-data (bad) but it does at least have the right tokens. [Fri Apr 25 12:35:50 2014] nwf: the keytab has to be readable by the user you are suexecing to to be used IIRC [Fri Apr 25 12:36:12 2014] Oh. Yikes... [Fri Apr 25 12:36:13 2014] * tests [Fri Apr 25 12:36:48 2014] or maybe not, I forgot how awful this one part of the mod_waklog code is (#ifdef hell) [Fri Apr 25 12:36:53 2014] maybe I should remove apache 1.3 support [Fri Apr 25 12:37:33 2014] Adding o+r (and o+x to the containing directory) did not change the lack of tokens. [Fri Apr 25 12:37:52 2014] I would offer to help but Apache is a labyrinth of mystery to me. [Fri Apr 25 12:39:06 2014] On the topic of changing mod_waklog, would having it interact with mod_userdir be possible? I think right now I have to have a stanza for each user? [Fri Apr 25 12:39:21 2014] Actually, I was wrong about perms. Works fine here with a keytab that can only be read by root. Are you changing the User of the *entire* process, or just a vhost? [Fri Apr 25 12:39:53 2014] Forgive me, I don't know what your question means. [Fri Apr 25 12:40:23 2014] The SuexecUserGroup directive is within a VirtualHost directive, if that answers the question? [Fri Apr 25 12:40:32 2014] yes [Fri Apr 25 12:41:24 2014] Is that not the right setup? [Fri Apr 25 12:41:41 2014] it is [Fri Apr 25 12:42:25 2014] nwf: you also have WaklogEnabled + WaklogLocationPrincipal set in the vhost? [Fri Apr 25 12:43:10 2014] WaklogAFSCell, WaklogEnabled, and WaklogDefaultPrincipal are set globally. [Fri Apr 25 12:43:35 2014] WaklogLocationPrincipal is set inside a directive inside the . [Fri Apr 25 12:46:20 2014] I think it has to either be in or , try using it inside of the entire VirtualHost to see if that fixes it [Fri Apr 25 12:46:36 2014] also I was wrong with permissions, the keytab should only need to be readable by root [Fri Apr 25 12:48:35 2014] Moving WaklogAFSCell to the VirtualHost resulted in "[Fri Apr 25 12:47:56 2014] [error] (13222) mod_waklog: afs_cell==NULL; please provide the WaklogAFSCell directive" [Fri Apr 25 12:49:23 2014] And even with the WaklogEnabled and WaklogDefaultPrincipal directives inside the VirtualHost, my CGI script is still running without tokens. [Fri Apr 25 12:51:46 2014] nwf: WaklogDefaultPrincipal sets the *global* principal to be used when no other is set and must be server wide, so must WaklogAFSCell (and it has to be appear before any vhosts, because apache reads/execs the config line by line) [Fri Apr 25 12:52:18 2014] Sorry, I misunderstood what you were asking me to move. [Fri Apr 25 12:52:20 2014] * puts it back [Fri Apr 25 12:53:28 2014] But even with WaklogLocationPrincipal in the script still ran without tickets. [Fri Apr 25 12:54:22 2014] nwf: if your apache config doesn't have anything sensitive in it, would you mind letting me look at it? [Fri Apr 25 12:55:14 2014] Sure, I'll copy the whole shebang into AFS. Just a moment. [Fri Apr 25 12:55:54 2014] nwf: an updated error.log would be helpful too [Fri Apr 25 12:56:14 2014] Here's the config: /afs/acm.jhu.edu/user/nwf/Public/etc [Fri Apr 25 12:56:24 2014] I'll rotate logs and grab the error.log associated with those [Fri Apr 25 12:57:53 2014] /afs/acm.jhu.edu/user/nwf/Public/logs has logs produced from startup and a single CGI script request. [Mon Apr 28 07:09:00 2014] Hi [Mon Apr 28 07:09:29 2014] Anyone knows if there is any app for windows to define acls recursivly ? [Mon Apr 28 07:45:11 2014] fALSO: AFS ones or generic MS-Windows ones? [Mon Apr 28 07:47:10 2014] AFS [Mon Apr 28 07:48:21 2014] don't know, that seems unlikely. But why do you need that under MS-Windows specifically? You can set ACLs from any client... [Mon Apr 28 07:51:24 2014] ebcause everyone here uses windows [Mon Apr 28 07:51:34 2014] openafs has tools to change the acls of directorys [Mon Apr 28 07:51:47 2014] the only problem is that it doesnt have a "recursive" option [Mon Apr 28 07:52:30 2014] fALSO: Cygwin with 'find'? PowerShell with something similar? [Mon Apr 28 07:53:34 2014] that would be ok for a poweruser that isnt afraid of using the console... but "normal people" wont be ok with that [Mon Apr 28 07:53:50 2014] maybe ill do a quick .net app to do this for my users [Mon Apr 28 07:54:07 2014] fALSO: PowerShell probably would do as well. [Mon Apr 28 07:54:41 2014] fALSO: after all IIRC OpenAFS for MS-Windows is a native port, not a Cygwin emulation. [Mon Apr 28 07:59:15 2014] http://stackoverflow.com/questions/8024103/how-to-retrieve-a-recursive-directory-and-file-list-from-powershell-excluding-so [Mon Apr 28 08:16:08 2014] OpenAFS can be used with CygWIN [Mon Apr 28 08:16:26 2014] but powershell should be possible, too [Mon Apr 28 09:58:00 2014] fALSO - what you come up with could be extremely useful to share with the community :) [Mon Apr 28 10:07:16 2014] I can share it, but it will be in c# .net [Mon Apr 28 10:07:28 2014] Ill ask my "boss" if i can opne source it [Mon Apr 28 10:07:36 2014] i work at a university so probably i can [Mon Apr 28 10:09:57 2014] I would think so [Mon Apr 28 10:10:03 2014] source and binaries would be useful :P [Mon Apr 28 10:10:30 2014] I think there's a wiki page for 3rd party useful utilities [Mon Apr 28 10:13:22 2014] uhuhuh [Mon Apr 28 10:13:26 2014] do you know that url ? [Mon Apr 28 10:15:10 2014] http://wiki.openafs.org/AddOnsToolsAndUtilities/ [Mon Apr 28 10:15:48 2014] I don't know why this is not linked somewhere obvious (it's not on the front page nor does anything on the front page look really like it would lead to it; the existing add-ons link claims to be docs) [Mon Apr 28 10:15:49 2014] that page appears to be broken [Mon Apr 28 10:16:02 2014] hm? loads here [Mon Apr 28 10:16:04 2014] as in the content itself is missing [Mon Apr 28 10:16:31 2014] it's got the initial menu of tools, but the links all just non-existant anchors inside the page [Mon Apr 28 10:16:36 2014] oh [Mon Apr 28 10:16:51 2014] someone do an oops when migrating from the old wiki? [Mon Apr 28 10:16:53 2014] * thinks some gatekeepers need to be pinged to make announcements [Mon Apr 28 10:17:13 2014] it's openafs.stanford.edu which is being decommissioned and moved to MIT [Mon Apr 28 10:17:26 2014] define "it" [Mon Apr 28 10:17:40 2014] wiki.openafs.org is an alias for openafs.stanford.edu. [Mon Apr 28 10:17:44 2014] ok, right [Mon Apr 28 10:18:07 2014] so yes, the wiki will be intermittent and eventually unavailable for a few days [Mon Apr 28 10:18:50 2014] and I wanna say that list isn't even complete from what was on the afslore wiki [Mon Apr 28 10:19:10 2014] That's been announced - Jeffrey mailed about it. [Mon Apr 28 10:19:23 2014] The move is in progress, but shouldn't (yet) be affecting wiki.openafs.org [Mon Apr 28 10:19:52 2014] it was sent to -devel only from what I saw [Mon Apr 28 10:20:16 2014] fALSO - whenever you finish your utility, we can do a newsletter item about it [Mon Apr 28 10:20:23 2014] my coworkers were caught by surprise by the buildbot being taken down... somehow people have not been appropriately notified :/ [Mon Apr 28 10:21:00 2014] I can't think of any other route than -devel for those notifications [Mon Apr 28 10:21:05 2014] also fwiw I am having intermittent name resolution problems [Mon Apr 28 10:21:19 2014] buildbot shouldn't be down, though. It's hosted by YFS on completely different hardware [Mon Apr 28 10:21:33 2014] https://lists.openafs.org/pipermail/openafs-devel/2014-April/019865.html [Mon Apr 28 10:21:54 2014] I think it was being taken down because it's driven by git which is down [Mon Apr 28 10:22:09 2014] Ah, Jason must have done that separately. [Mon Apr 28 10:22:14 2014] jaltman noted on jabber that he was taking it down last night [Mon Apr 28 10:23:04 2014] so buildbot was taken down early to make sure that git would be quiescent before it was taken down [Mon Apr 28 10:23:30 2014] speaking of jabber, appears my pidgin crashed [Mon Apr 28 10:23:44 2014] But buildbot is driven by git, so won't have active changes unless git tells it about them. It's all circular :) [Mon Apr 28 10:24:10 2014] yep [Mon Apr 28 10:26:50 2014] wiki downtime should go to -info... the rest are definitely -devel issues [Mon Apr 28 10:27:41 2014] The wiki isn't down yet. It may not need to be. We'll see. [Mon Apr 28 10:27:52 2014] (tbh, the wiki nearly got left behind) [Mon Apr 28 10:31:40 2014] thanks [Mon Apr 28 14:47:17 2014] windows thinks that mozilla's .parentlock (in AFS) is open by something someplace... any way to find out what machine has that afs file open (I do have fs auditing enabled)? [Mon Apr 28 15:55:09 2014] <[gorgo]> cybrfyre: grep for any operations on the fid. it would not actually know if a file is really open on another client, but could be some lock set up on the file [Mon Apr 28 15:56:05 2014] since I've seen it complain about being locked in afs before, I think it does something other than an OS level lock [Mon Apr 28 16:00:52 2014] <[gorgo]> I don't know what .parentlock does btw [Mon Apr 28 16:01:33 2014] <[gorgo]> but on linux there's a symlink called lock wich points to an ip:pid... [Mon Apr 28 16:02:18 2014] <[gorgo]> and if this was an ip unknown to the client, I guess it would assume that the profile is being used from another host [Mon Apr 28 17:01:04 2014] yeah, I can't even delete the file... on windows [Mon Apr 28 17:02:02 2014] how do I get the afs fid of a file? [Mon Apr 28 17:03:20 2014] ah... fs getfid [Mon Apr 28 17:03:22 2014] useful [Tue Apr 29 06:09:18 2014] cybrfyre@b(,) i published my app [Tue Apr 29 06:09:22 2014] cybrfyre@b(,) https://github.com/falsovsky/ACLAFS [Tue Apr 29 06:09:40 2014] cybrfyre@b(,) theres also a binary if you want to test it out : https://github.com/falsovsky/ACLAFS/releases/download/v1.0/aclafs.zip [Tue Apr 29 06:32:09 2014] fALSO: cool [Tue Apr 29 06:32:40 2014] fALSO: in shell i usually do: find foo-directory/ -type d -exec fs setacl {} foo_user write \; [Tue Apr 29 06:32:57 2014] yes... but for "regular people" isnt ok [Tue Apr 29 06:34:17 2014] i have "normal people" here that need to apply acls recursivly [Tue Apr 29 06:34:30 2014] and you dont want to tell them to opne a "command line" window :-P [Tue Apr 29 06:43:44 2014] http://www.eyrie.org/~eagle/software/afs-admin-tools/fsr.html [Tue Apr 29 07:04:59 2014] that just exists for *nix [Tue Apr 29 07:05:38 2014] cygwin? [Tue Apr 29 07:05:58 2014] you guys understand that the command line isnt for everybody, right ? [Tue Apr 29 07:08:01 2014] right [Tue Apr 29 07:08:05 2014] :) [Tue Apr 29 07:08:15 2014] not everyone should set ACLs in OpenAFS [Tue Apr 29 07:08:16 2014] We have OpenAFS for all the workers of the university [Tue Apr 29 07:08:25 2014] me too, me too [Tue Apr 29 07:08:38 2014] imagine [Tue Apr 29 07:08:46 2014] you have a dir with lots of subdirs [Tue Apr 29 07:09:04 2014] and now you have a new user or something... and want to give him permission to that dir and subdirs [Tue Apr 29 07:09:24 2014] for linux fsr or a find does the job (workes on university do have a login on a linux machine), on windows a explorer shell extension does exist, but not recursive [Tue Apr 29 07:09:31 2014] normal people should be able to do this , thats what i did, a simple app that runs fs.exe in windows [Tue Apr 29 07:09:37 2014] for this case (new user) we do use groups :-) [Tue Apr 29 07:09:40 2014] amiga4000@b(,) yes i know :) [Tue Apr 29 07:09:44 2014] add a user to the group and it works instant [Tue Apr 29 07:10:03 2014] amiga4000@b(,) i dont know all the "usage" that they do [Tue Apr 29 07:10:08 2014] i was just asked to do the app :) [Tue Apr 29 07:10:26 2014] ok, if users do ask, fine [Tue Apr 29 07:10:36 2014] but general a group is a fine solution [Tue Apr 29 07:10:56 2014] its bsd licensed, you can do whatever you want with it :) [Tue Apr 29 07:11:26 2014] as I do work on linux, hehe [Tue Apr 29 07:53:38 2014] <[gorgo]> amiga400:actually it doesn't necessarily work instantly as tthe fileserver may have the cps info cached [Tue Apr 29 07:56:14 2014] <[gorgo]> it needs an aklog frommthe user [Tue Apr 29 07:57:26 2014] <[gorgo]> or call flushcps on the servers (cacheout) [Tue Apr 29 07:57:27 2014] fALSO: cool! Thanks! [Tue Apr 29 07:57:57 2014] A GUI for managing PTS groups might be useful too (maybe, I don't really care about Windows) [Tue Apr 29 07:58:59 2014] It would be awesome if i could do the things via a DLL or something instead of calling fs.exe :) [Tue Apr 29 11:00:23 2014] if one uses -local (say via sudo), does one still need to be in the UserList? [Tue Apr 29 11:02:33 2014] no. you don't even need to have a token [Tue Apr 29 11:02:56 2014] unclear to me if -local gets you everything or only got you the privs of systems:administrators [Tue Apr 29 11:04:07 2014] system:administrators only applies to file system access and pt database access. [Tue Apr 29 11:04:20 2014] The super user token that you get from -local can do anything [Tue Apr 29 11:18:42 2014] ok, cool [Wed Apr 30 05:48:12 2014] sirs [Wed Apr 30 05:48:17 2014] http://falsovsky.github.io/ACLAFS/ [Wed Apr 30 05:54:52 2014] I published my app that runs fs.exe on windows recursivly [Wed Apr 30 05:55:10 2014] Hope it is usefull for anyone, it will be usefull where i work [Wed Apr 30 05:55:28 2014] wanted to add it here - http://wiki.openafs.org/AddOnsToolsAndUtilities/ [Wed Apr 30 05:55:32 2014] but the edit button doesnt work [Wed Apr 30 06:00:50 2014] The wiki has just been migrated. [Wed Apr 30 06:02:01 2014] is there any place that I should announce this ? or just here in the channel is enough? [Wed Apr 30 06:06:21 2014] Probably send email to openafs-info@openafs.org (you have to be a subscriber to post there) and, if you like, openafs-announce@openafs.org [Wed Apr 30 06:09:03 2014] i need to subscribe to both ? [Wed Apr 30 06:09:37 2014] Not to openafs-announce. Although if you're using OpenAFS it's worth being subscribed there as its where new release announcements, security advisories and so on go to. Very low traffic [Wed Apr 30 06:10:14 2014] ok thank you sxw :) [Wed Apr 30 07:26:14 2014] Hi, wondering if there is any up to date advice on production filesystems for Linux? I assume my sane choices are ext3/4 or XFS. [Wed Apr 30 07:31:50 2014] I should probably add: 1-2 TB volumes, speed not a huge concern, mostly concerned about minimising outages (eternal fscks) in the event of power loss [Wed Apr 30 07:32:02 2014] currently we go with ext3/4, as they are default and well known and easy [Wed Apr 30 07:32:24 2014] and we only needed 2 times to fsck in 8 years now [Wed Apr 30 07:35:10 2014] Amiga4000: I was leaning towards ext4 primarily for that reason. We have had power issues in the past. Hopefully fixed now, but I feel like I shouldn't be skipping ext?'s fscks... [Wed Apr 30 07:38:43 2014] ashl: We switched off fscks years ago, mostly to prevent long reboot times (we're using ext4). Didn't cause problems for us. [Wed Apr 30 08:43:08 2014] wiebalck: I'm somehow scared to turn it off! [Wed Apr 30 08:45:52 2014] I understand. This was more to share that we're doing it in our cell (which has about 50 servers and 500 partitions), that it works for us since years and that it is not a completely crazy idea :) [Wed Apr 30 08:46:31 2014] That is still a useful sanity check, thanks. :-) [Wed Apr 30 09:18:21 2014] ext3 can have problems with "journal stalls" where the whole FS stops handling operations whilst it flushes its journal to disk [Wed Apr 30 10:24:13 2014] sxw: Does that imply that ext4 no longer has the problem? [Wed Apr 30 10:24:30 2014] The problem seems to be less with ext4 [Wed Apr 30 10:52:19 2014] ok, something is stuck here [Wed Apr 30 10:52:38 2014] fs mkm Yesterday home.users.ggm1.backup gives fs:'Yesterday': File exists [Wed Apr 30 10:52:53 2014] fs rmm Yesterday gives fs: File 'Yesterday' doesn't exist [Wed Apr 30 10:53:06 2014] how about ls -ld Yesterday? [Wed Apr 30 10:53:18 2014] ok, that actually shows something [Wed Apr 30 10:53:28 2014] If there's a file/directory of that name that's not a mount, I would expect both of those behaviors, I think. [Wed Apr 30 10:53:34 2014] ls -l wasn't [Wed Apr 30 10:53:48 2014] rmdir: failed to remove `Yesterday': No such file or directory [Wed Apr 30 10:54:11 2014] Hmm, check from a different client? (fs flushv?) [Wed Apr 30 10:55:21 2014] ok, the flushv fixed it [Thu May 1 16:34:45 2014] does anyone happen to be running dovecot with Maildirs stored in afs? [Fri May 2 08:45:03 2014] unknown_lamer - we specifically are storing maildirs locally on top of drbd [Fri May 2 08:45:22 2014] I hadn't tried in afs, but if mmap didn't work correctly, major failures [Fri May 2 08:56:45 2014] mmap definitely works correctly with AFS [Fri May 2 10:05:40 2014] I had weird issues with rrdtool not working with mmap in afs... but that was about 3 years ago, now [Fri May 2 18:46:08 2014] CybrFyre: we've been using Maildir in afs since 2007 with no lost mail [Fri May 2 18:47:56 2014] seems like mmap+fcntl is a recipe for disaster, mmap+flock is not, but is likely to perform horribly and just using read/write+fsync and an internal cache is the way to go... [Fri May 2 18:48:02 2014] I guess I'll find out what dovecot does soon enough [Mon May 5 10:09:45 2014] so, had a system crash with: kernel BUG at /var/lib/dkms/openafs/1.6.5.2-1.el6/build/src/libafs/MODLOAD-2.6.32-431.5.1.el6.x86_64-SP/afs_segments.c:366 in afs_StoreAllSegments [Mon May 5 10:10:13 2014] this known to be fixed with the combination of 1.6.7 and 2.6.32-431.11.2.el6.x86_64? [Mon May 5 11:14:59 2014] <[gorgo]> cybrfyre: I don't think this is the dentry alias issue [Mon May 5 11:15:35 2014] <[gorgo]> so the latest rhel kernel is irrrelevant [Mon May 5 11:18:14 2014] <[gorgo]> can't check the source now.... any messages on the console/dmesg before the crash? [Mon May 5 11:23:54 2014] I wouldn't know... I was at home when it happened... kdump auto rebooted the server [Mon May 5 11:24:20 2014] and I'm noticing, on a separate issue, that with 1.6.7 and that same RHEL kernel, I'm still seeing the warn_slowpath kernel problems [Mon May 5 11:29:30 2014] <[gorgo]> where did you get that crash line from? [Mon May 5 11:30:19 2014] <[gorgo]> if you have a vmcore, you can get all the dmesg and backtraces [Mon May 5 11:34:17 2014] that crash line was from running "crash" on the vmcore generated by kdump [Mon May 5 11:34:41 2014] ok... I only looked at the backtrace in the vmcore [Mon May 5 11:40:59 2014] http://fpaste.org/99249/ [Mon May 5 11:42:28 2014] looks like it's crashed several times the same way [Mon May 5 11:46:09 2014] <[gorgo]> any interesting in dmesg before the BUG line? [Mon May 5 11:46:45 2014] <[gorgo]> you can get dmesg from the vmcore too using crash [Mon May 5 11:54:57 2014] yeah, that's what I'm doing [Mon May 5 11:55:37 2014] just a crapload of afs: byte-range locks only enforced for processes on this machine (mozStorage) [Mon May 5 12:10:52 2014] <[gorgo]> what's this mozStorage btw? [Mon May 5 12:11:09 2014] sounds like Firefox' sqlite [Mon May 5 12:11:23 2014] I think so [Mon May 5 12:11:37 2014] <[gorgo]> looks something like that, but firefox shows up as firefox in the processlist, not as mozStorage [Mon May 5 12:11:41 2014] I occasionally see that for firefox and some other firefox-ish db process, but mostly mozstorage [Mon May 5 12:12:13 2014] You can set thread names on Linux. A normal process list doesn't show you all of the threads [Mon May 5 12:12:48 2014] and in the bt, I see that one of the functions is locks_free_lock [Mon May 5 12:15:15 2014] <[gorgo]> tdc = afs_GetValidDSlot(index); [Mon May 5 12:15:16 2014] <[gorgo]> if (!tdc) osi_Panic("afs_StoreAllSegments tdc dv"); [Mon May 5 12:15:41 2014] <[gorgo]> this seems related to some of the issues I've been experiencing [Mon May 5 12:16:48 2014] <[gorgo]> and 1.6.7 is modified [Mon May 5 12:17:23 2014] <[gorgo]> ok,so yes, GetValidDSlot fails, IIRC because the process can get a signal or something [Mon May 5 12:17:52 2014] <[gorgo]> and while the root cause hasn't been fixed yet, the error handling is modified a bit now [Mon May 5 12:18:33 2014] <[gorgo]> so you should not be seeing this with 1.6.6 or later [Mon May 5 12:19:31 2014] <[gorgo]> http://gerrit.openafs.org/#change,9287 [Mon May 5 12:39:51 2014] ok, so that was with 1.6.5.2... so, hopefully am golden, now [Tue May 6 15:38:21 2014] Did anything ever come of the vos release -stayonline bug? (RT 131804) [Tue May 6 15:49:29 2014] singular? they're being worked on, nothing released yet [Tue May 6 16:20:25 2014] geekosaur: OK; just wanted to check that it/they hadn't gotten lost. [Tue May 6 16:22:40 2014] they're not lost, and we (SNA) have paying customers interested in the fixes so we're keeping them not-lost :) [Tue May 6 16:23:17 2014] (but that is about all I can report, since I'm not actually involved with it, just get to hear about them...) [Wed May 7 11:14:31 2014] A discussion of OpenAFS version numbering is about to take place in the openafs@conference.openafs.org jabber room [Wed May 7 11:20:01 2014] how do we get on there (since the email said our old accounts will no longer work after the server move) [Wed May 7 11:22:41 2014] and the message with details on how to get an account on the new jabber server seems to have not gone out [Wed May 7 11:23:16 2014] Jabber is a federated protocol; any jabber account anywhere should be workable; I think this includes google chat. [Wed May 7 11:23:57 2014] There is no plan to have local user accounts on the new jabber server, since there is the federation available. [Wed May 7 11:24:09 2014] that's news [Wed May 7 11:24:50 2014] I thought the original message said that. [Wed May 7 11:25:07 2014] I think it's in http://lists.openafs.org/pipermail/openafs-devel/2014-April/019865.html yes. [Wed May 7 11:25:29 2014] only "further details on how to access ... will be posted" [Wed May 7 11:25:38 2014] Hmm, that is true. [Wed May 7 11:25:58 2014] conference.openafs.org - 404 Remote Server Not Found [Wed May 7 11:27:17 2014] let me restart pidgin since the host does indeed exist... [Wed May 7 11:28:06 2014] there we go [Wed May 7 11:28:29 2014] (Log at http://conference.openafs.org/openafs@conference.openafs.org/2014-05-07.html ) [Wed May 7 14:18:13 2014] Hmm. Fresh 1.6.1 install. My principals look like "a.name/admin" or "a.name" (periods). It seems to work, but can I expect a problem? [Wed May 7 14:18:57 2014] I see a mailing list post from 2007 saying a patch was necessary, but I'm not sure if that is applicable now? [Wed May 7 14:19:30 2014] Oh, I should point out that I'm using MIT kerberos too [Wed May 7 14:20:22 2014] I think that what used to be that patch that was needed is now the -allow-dotted-principals argument to the fileserver, but am not 100% sure. [Wed May 7 14:23:38 2014] kaduk_: Thanks. Looks like I tested it with a dotless principal. [Wed May 7 14:51:09 2014] it is [Wed May 7 14:51:27 2014] to use principal names with dots, you need to run all your server processes with that -allow-dotted-principals option [Wed May 7 14:51:39 2014] with that, things work just fine [Wed May 7 14:51:53 2014] CybrFyre: Yep, thanks. Fighting bos create right now. :-) [Wed May 7 15:20:23 2014] AFS lesson of the day: bos create -cmd argument order matters. [Wed May 7 15:21:00 2014] When I'm not sure what I'm doing, I always look at the help output and explicitly pass each argument and its parameter, not relying on any shortcuts. [Thu May 8 17:47:26 2014] [47806.780984] afs: Tokens for user of AFS id 10496 for cell hcoop.net are discarded (rxkad error=19270408, server 69.90.123.75) [Thu May 8 17:47:54 2014] so, this keeps happening. Like every 24h. My KeyFile is gone, all servers are using the new rxkad.keytab [Thu May 8 17:48:10 2014] it also only happens to one machine, with 1.6.5-1.6.7 [Thu May 8 17:49:33 2014] the mail server, stuck on 1.4.10, does not exhibit any of these problems either despite being under much heavier load as far as grabbing tokens goes [Thu May 8 17:50:12 2014] 19270408 = ticket contained unknown key version number [Thu May 8 17:50:52 2014] right, except it is impossible that that is happening. I have quintuple checked. [Thu May 8 17:51:28 2014] all of the KDCs are in sync, all of the servers are using the same key, there is only one KVNO in existence [Thu May 8 17:51:55 2014] except for the one that is being placed into a token and given to the cache manager on the machine in question [Thu May 8 17:53:12 2014] would verbose or debug logging in the client spit out information useful to debugging it? [Thu May 8 17:53:17 2014] find out how the token for id 10496 is being obtained and where the service ticket is coming from. [Thu May 8 17:53:26 2014] it alternates between this and suexec complaining that it cannot getwcwd() any more [Thu May 8 17:53:48 2014] (when suexec does that, there are no rxkad errors) [Thu May 8 17:54:09 2014] getcwd is fixed in 1.6.8pre2 [Thu May 8 17:54:17 2014] that was just one line, all of the tokens are acquired using mod_waklog [Thu May 8 17:54:27 2014] and it doesn't matter if it is mod_waklog using rxkad-k5 or rxkad-kdf [Thu May 8 17:54:53 2014] if I downgrade the client to 1.4.12 it goes away, but that's no reall a *solution* [Thu May 8 17:54:56 2014] it wouldn't. the kvno that is referred to is not the session key but the encrypted service ticket [Thu May 8 17:55:43 2014] does this happen for all users or just some users? [Thu May 8 17:56:32 2014] it happens to many users [Thu May 8 17:56:46 2014] if some and not all, are there differences in the kerberos configurations for the principals. in particular, which enctypes the users have keys for [Thu May 8 17:56:48 2014] also the kernel had a hung task timeout in the cache manager just before, not sure if that happened before. lemme pastebin it [Thu May 8 17:57:13 2014] kernel softlocks will not be related to tokens [Thu May 8 17:57:43 2014] http://paste.lisp.org/display/142459 [Thu May 8 17:57:44 2014] there are known issues when many pags are in use [Thu May 8 17:58:04 2014] as when a mod_waklog or suexec generates a pag per web request [Thu May 8 17:59:10 2014] hrm, all of the principles are $user.daemon (used by waklog), and all of them were generated at the same time with the same enctypes [Thu May 8 18:00:00 2014] do you have the problem when you authenticate as $user.daemon on other machines? [Thu May 8 18:00:03 2014] secureendpoints: any bug you can point me toward wrt many PAGs [Thu May 8 18:01:00 2014] pretty much you don't want to use many pags. instead you want to use uid based token management for web servers [Thu May 8 18:01:57 2014] the keytab works fine on other machines, and even on this machine for about ... 24-72 hours, and then everything croaks [Thu May 8 18:02:11 2014] even weirder, sometimes it works and sometimes it does not after [Thu May 8 18:02:34 2014] the kdcs do not appear to have stale KVNOs, keytabs are all in sync [Thu May 8 18:02:59 2014] secureendpoints: is the many PAGs thing a fundamental problem, or something that could be fixed? [Thu May 8 18:03:15 2014] it was never a problem with 1.4.x [Thu May 8 18:04:01 2014] does anyone have a variant of mod_waklog floating around that doesn't create a PAG per request? [Thu May 8 18:06:24 2014] Mod waklog doesn't create a PAG per request [Thu May 8 18:06:32 2014] It creates one for each Apache child [Thu May 8 18:06:42 2014] And each child will handle many requests [Thu May 8 18:07:46 2014] hrm, that's what I thought [Thu May 8 18:08:19 2014] although, I dropped the 2s delay between grabbing initial tokens down to 300ms and now afs complains about allocating PAGs too quickly when apache starts [Thu May 8 18:08:57 2014] making me wonder if something is wrong with the token grabbing child [Thu May 8 18:09:17 2014] You can't allocate PAGs at an average rate faster than one per second unless you are root. [Thu May 8 18:09:37 2014] but the primordial apache2 *is* running as root [Thu May 8 18:09:54 2014] or perhaps not after all, yech [Thu May 8 18:10:10 2014] I suspect by the time that the waklog init_child process is running, Apache has given up its permissions [Thu May 8 18:10:33 2014] hrm, maybe I can see how mod_ssl grabs certs in the primordial apache2 process and make waklog do the same thing, but that ... only solves an unrelated issue (apache2 taking minutes to restart) [Thu May 8 18:11:48 2014] anyway, is it possible that having too many PAGs could cause the cache manager to do something awful [Thu May 8 18:16:09 2014] http://gerrit.openafs.org/#change,11123 [Thu May 8 18:21:10 2014] hrm, you can have up to a uint32_t worth of PAGs right? [Thu May 8 18:21:21 2014] (obviously with horrible performance implications...) [Thu May 8 18:21:35 2014] I am kind of wondering if suexec is creating so many pags it wraps around [Thu May 8 18:25:48 2014] or wait, you can only have 2^24 PAGs? Now I am really suspicious that heavy suexec load may be wrapping PAGs [Thu May 8 18:26:43 2014] I guess there isn't enough time after booting for that to be the case [Thu May 8 18:27:52 2014] even if the pags did wrap it wouldn't result in kvno errors [Thu May 8 18:30:15 2014] secureendpoints: is afs_Analyze the right function to hack into giving me more information when it happens? [Thu May 8 18:30:39 2014] the kvno error is taking place on the server not the client [Thu May 8 18:31:09 2014] would the auditlog possibly have any useful information? (it's not enabled right now, wasn't sure how much logging it does) [Thu May 8 18:31:35 2014] audit logging creates one entry for every call received by a server [Thu May 8 18:31:49 2014] it will not provide any useful information [Thu May 8 18:37:02 2014] this is more fun, it is happening to principals that have never had their key changed [Thu May 8 18:38:11 2014] its not the client key that is wrong. its the server kvno that is wrong. [Thu May 8 18:38:53 2014] the kerberos service ticket is being issued with a kvno that the file server does not have [Thu May 8 18:39:21 2014] the kvno error occurs during the challenge-response phase of a new rxkad authenticated connection [Thu May 8 18:40:04 2014] you can look at the kerberos v5 protocol traffic to see what kvnos are being issued [Thu May 8 19:06:30 2014] secureendpoints: using wireshark is the only option? (can't seem to find anything in krb5 that logs the kvno as it is issued, yech) [Thu May 8 19:06:35 2014] if it is must be, then it must be [Thu May 8 19:06:51 2014] I still think this is a bug I am tripping over in the afs client, since it only started when I upgraded the client [Thu May 8 19:08:08 2014] but I guess ruling out gremlins first and having a network caputure would be useful in any case, onward to trying to remember how to configure wireshark [Fri May 9 06:28:06 2014] Hi there [Fri May 9 06:28:38 2014] I just sent a mail to openafs-info releasing my "app" that applies acls on windows recursivly [Fri May 9 06:29:06 2014] wanted to send also to -announce but its just for admins [Fri May 9 06:32:36 2014] -announce is moderated, but announcements of new software are entirely appropriate there [Fri May 9 06:33:58 2014] This is an announcement list -- anyone can subscribe, but only authorized users can post without moderator approval. [Fri May 9 06:34:06 2014] i would need authorization first, right ? [Fri May 9 06:34:27 2014] or I can send the email, and then the admin can see if it passes or not? [Fri May 9 06:34:34 2014] No. Just post to the list, a moderator will read the message, and if they believe it approriate, let it through [Fri May 9 06:35:02 2014] AHhh ok, cool... Ill do it then thanks [Fri May 9 06:36:56 2014] done [Fri May 9 11:26:13 2014] fALSO: thanks for contributing, and sorry about the wiki. [Fri May 9 11:26:22 2014] I wish we weren't on ikiwiki [Fri May 9 11:27:41 2014] ktdreyer: the wiki should be back "soon". The OpenID modules work fine; we just don't have the git hooks quite in place to get updates made from the web back into a repo that gerrit sees, so web edits remain disabled. [Fri May 9 11:31:13 2014] I need to figure out how to sort out some gerrit stuff to make that happen. [Fri May 9 11:32:05 2014] is openid subject to the recent remote redirect hack? [Fri May 9 11:32:12 2014] er, s/hack/vulnerability [Fri May 9 11:33:09 2014] That's been massively blown out of proportion [Fri May 9 11:33:17 2014] true [Fri May 9 11:33:24 2014] tho, it's still an issue [Fri May 9 11:33:31 2014] So's crossing a road [Fri May 9 11:34:03 2014] that's a false analogue [Fri May 9 11:34:29 2014] anyway, this issue is easily solved/ worked around [Fri May 9 11:34:50 2014] hi ktdreyer [Fri May 9 11:35:09 2014] ktdreyer@b(,) thanks, I hope it is helpfull for anyone [Fri May 9 11:35:14 2014] test [Fri May 9 11:35:27 2014] hallo [Fri May 9 11:38:08 2014] i try to set up openafs server but always fail :( [Fri May 9 11:38:18 2014] cannot get tokens [Fri May 9 11:38:27 2014] when use aklog [Fri May 9 11:38:31 2014] Are you following a guide? [Fri May 9 11:38:37 2014] yupz [Fri May 9 11:38:42 2014] which one? [Fri May 9 11:38:55 2014] openafs user guide [Fri May 9 11:39:12 2014] sure, it's just that running our own wiki software is yet another thing to do [Fri May 9 11:39:26 2014] im using centos and openafs repo to install [Fri May 9 11:39:48 2014] but always fail [Fri May 9 11:39:55 2014] :( [Fri May 9 11:40:08 2014] antonsetiady: Do you have a kerberos realm set up already? [Fri May 9 11:40:16 2014] yup [Fri May 9 11:40:34 2014] i generate keytab for afs server [Fri May 9 11:40:38 2014] service* [Fri May 9 11:41:14 2014] kinit and get ticket [Fri May 9 11:41:28 2014] but when aklog to get tokens fail [Fri May 9 11:41:39 2014] fail and fail [Fri May 9 11:41:57 2014] What error message does aklog give? [Fri May 9 11:52:38 2014] the error : aklog : can't get information about cell mydomain.com [Fri May 9 11:52:44 2014] why. [Fri May 9 11:52:56 2014] but knit admin works well [Fri May 9 11:53:04 2014] I can get the ticket [Fri May 9 11:54:49 2014] when use aklog -d to get debugging mode it's same error [Fri May 9 11:55:14 2014] cannot get information about cell mycell.com [Fri May 9 11:55:48 2014] :( please help [Fri May 9 11:56:17 2014] what's the error? [Fri May 9 11:56:41 2014] Did you add information about mycell.com to /usr/vice/etc/CellServDB and restart afsd? [Fri May 9 11:56:52 2014] and is the output of "fs wscell" your cell? [Fri May 9 11:57:02 2014] aklog : can't get information about cell mydomain.com [Fri May 9 12:00:07 2014] no i dont add info in mycell.com [Fri May 9 12:01:33 2014] in CellServDB [Fri May 9 12:01:45 2014] ok i add now [Fri May 9 12:02:37 2014] mm [Fri May 9 12:02:53 2014] i get next info now [Fri May 9 12:03:31 2014] my info now when using aklog -d : [Fri May 9 12:04:20 2014] about to resolve name admin to id in cell mycell.com [Fri May 9 12:04:24 2014] Error -1 [Fri May 9 12:04:41 2014] set username to admin [Fri May 9 12:04:53 2014] What is the output of 'tokens' after that 'aklog -d'? [Fri May 9 12:05:45 2014] aklog : unknown cell was passed to SetToken while obtaining tokens for cell mycell.com [Fri May 9 12:05:55 2014] aklog : unknown cell was passed to SetToken while obtaining tokens for cell "mycell.com" [Fri May 9 12:06:36 2014] About to resolve name admin to id in cell "mycell.com" [Fri May 9 12:06:40 2014] Error -1 [Fri May 9 12:09:38 2014] Perhaps I should say something about how the aklog process is supposed to work. [Fri May 9 12:10:55 2014] is mycell.com and mydomain.com in the actual output, or are you just substituting for your real cellname and kerberos domain name? [Fri May 9 12:12:01 2014] sory of that "mydomain.com=mycell.com" [Fri May 9 12:13:02 2014] my real cellname is "ithb.ac.id" [Fri May 9 12:13:22 2014] my realm in kerberos is "ITHB.AC.ID" [Fri May 9 12:16:35 2014] my long error is : [Fri May 9 12:16:39 2014] [root@afs01 ~]# aklog -d [Fri May 9 12:16:41 2014] Authenticating to cell ithb.ac.id (server afs01.ithb.ac.id). [Fri May 9 12:16:41 2014] Trying to authenticate to user's realm ITHB.AC.ID. [Fri May 9 12:16:41 2014] Getting tickets: afs/ithb.ac.id@ITHB.AC.ID [Fri May 9 12:16:41 2014] Using Kerberos V5 ticket natively [Fri May 9 12:16:41 2014] About to resolve name admin to id in cell ithb.ac.id. [Fri May 9 12:16:42 2014] Error -1 [Fri May 9 12:16:44 2014] Set username to admin [Fri May 9 12:16:46 2014] Setting tokens. admin @ ithb.ac.id [Fri May 9 12:16:47 2014] aklog: unknown cell was passed to SetToken while obtaining tokens for cell ithb.ac.id [Fri May 9 12:17:06 2014] :( fail [Fri May 9 12:17:55 2014] [root@afs01 ~]# tokens [Fri May 9 12:17:56 2014] Tokens held by the Cache Manager: [Fri May 9 12:17:56 2014] --End of list-- [Fri May 9 12:18:12 2014] I can't get tokens :((( [Fri May 9 12:20:01 2014] Do you get output from 'fs listcells'? [Fri May 9 12:25:53 2014] yes I get the list when using fs listcells [Fri May 9 12:26:05 2014] Im back [Fri May 9 12:26:19 2014] Does it include yoru cell ithb.ac.id? [Fri May 9 12:27:23 2014] ok [Fri May 9 12:27:53 2014] i try this : fs listcells | grep ithb.ac.id [Fri May 9 12:30:06 2014] mmm ... [Fri May 9 12:30:25 2014] i think i cant get mycell [Fri May 9 12:30:47 2014] It's any problem there? [Fri May 9 12:30:59 2014] 'fs newcell' will add information about a cell to the running cache manager, but that is lost when the cache manager restarts. If the information is in the CellServDB, though, it should get picked up at startup time. [Fri May 9 12:31:15 2014] Yes, it's a problem if the cache manager doesn't know about the cell you're trying to get tokens for. [Fri May 9 12:32:18 2014] so wht etting that i must to change here? [Fri May 9 12:32:29 2014] so what setting that i must to change here? [Fri May 9 12:33:35 2014] Well, for now, I would try running 'fs newcell -name ithb.ac.id -servers ' [Fri May 9 12:33:47 2014] Just to get things working once. [Fri May 9 12:34:04 2014] ok i try now sir [Fri May 9 12:35:39 2014] ok I try that... [Fri May 9 12:35:50 2014] i'm running aklog -d again [Fri May 9 12:36:35 2014] mmm i think same error sir.. [Fri May 9 12:37:08 2014] i cant get token [Fri May 9 12:38:07 2014] [root@afs01 ~]# aklog -d [Fri May 9 12:38:08 2014] Authenticating to cell ithb.ac.id (server afs01.ithb.ac.id). [Fri May 9 12:38:08 2014] Trying to authenticate to user's realm ITHB.AC.ID. [Fri May 9 12:38:08 2014] Getting tickets: afs/ithb.ac.id@ITHB.AC.ID [Fri May 9 12:38:08 2014] Using Kerberos V5 ticket natively [Fri May 9 12:38:08 2014] About to resolve name admin to id in cell ithb.ac.id. [Fri May 9 12:38:09 2014] Error -1 [Fri May 9 12:38:11 2014] Set username to admin [Fri May 9 12:38:13 2014] Setting tokens. admin @ ithb.ac.id [Fri May 9 12:39:05 2014] I think the machine cannot resolve nae admin in my cell [Fri May 9 12:39:08 2014] That Error -1 is (probably) RX_CALL_DEAD, meaning that aklog could not connect to a ptserver to resolve the name to a numeric ID. [Fri May 9 12:39:08 2014] why? [Fri May 9 12:39:39 2014] What does "rxdebug afs01.ithb.ac.id 7002" give? [Fri May 9 12:41:18 2014] mmm.. [Fri May 9 12:41:27 2014] the output say : [Fri May 9 12:41:37 2014] [root@afs01 ~]# rxdebug afs01.ithb.ac.id 7002 [Fri May 9 12:41:38 2014] Trying 192.168.43.76 (port 7002): [Fri May 9 12:41:38 2014] getstats call failed with code -1 [Fri May 9 12:41:38 2014] [root@afs01 ~]# [Fri May 9 12:42:22 2014] Can you use telnet or nmap or something to check if anything is listening on port 7002 on afs01.ithb.ac.id? [Fri May 9 12:42:35 2014] telnet won't help, it's udp [Fri May 9 12:42:53 2014] Right, whoops. [Fri May 9 12:43:14 2014] i use netstat ? [Fri May 9 12:43:41 2014] there is no service running on port 7002 [Fri May 9 12:45:28 2014] That would be a problem. [Fri May 9 12:45:59 2014] (I am looking at http://www.central.org/numbers/index.html to figure out the 'port 7002' bit, by the way.) [Fri May 9 12:46:02 2014] what port 7002 service that must be running sir? [Fri May 9 12:46:25 2014] There should be a ptserver process listening on that port. [Fri May 9 12:46:31 2014] ow [Fri May 9 12:46:39 2014] why is not running [Fri May 9 12:46:42 2014] But perhaps we should back up and check the server's configuration as a whole. [Fri May 9 12:46:54 2014] mmm... [Fri May 9 12:47:10 2014] openafs is very complex configuration [Fri May 9 12:47:10 2014] On afs01.ithb.ac.id, can you run 'bos status localhost -noauth'? [Fri May 9 12:47:29 2014] It is somewhat complicated, yes. :-/ [Fri May 9 12:48:21 2014] [root@afs01 ~]# bos status localhost -noauth [Fri May 9 12:48:21 2014] bos: running unauthenticated [Fri May 9 12:48:21 2014] Instance ptserver, temporarily disabled, stopped for too many errors, currently starting up. [Fri May 9 12:48:21 2014] Instance fs, temporarily disabled, stopped for too many errors, currently shutdown. [Fri May 9 12:48:21 2014] Auxiliary status is: file server shut down. [Fri May 9 12:48:21 2014] Instance vlserver, temporarily disabled, stopped for too many errors, currently starting up. [Fri May 9 12:48:25 2014] Instance buserver, temporarily disabled, stopped for too many errors, currently starting up. [Fri May 9 12:49:11 2014] Logs are in /usr/afs/logs, IIRC. PTLog for the ptserver, etc. [Fri May 9 12:49:22 2014] I trying restart openafs-server service : [Fri May 9 12:49:27 2014] [root@afs01 ~]# service openafs-server restart [Fri May 9 12:49:28 2014] Stopping openafs-server: [ OK ] [Fri May 9 12:49:28 2014] Starting openafs-server: [Fri May 9 12:49:28 2014] [root@afs01 ~]# [Fri May 9 12:50:47 2014] The openafs-server script basically only concerns itself with the bosserver, and the bosserver manages the vlserver, ptserver, file server, and such. [Fri May 9 12:51:51 2014] [root@afs01 ~]# bos status localhost -noauth bos: running unauthenticated [Fri May 9 12:51:52 2014] Instance ptserver, currently running normally. [Fri May 9 12:51:52 2014] Instance fs, temporarily disabled, stopped for too many errors, currently shutdown. [Fri May 9 12:51:52 2014] Auxiliary status is: file server shut down. [Fri May 9 12:51:52 2014] Instance vlserver, currently running normally. [Fri May 9 12:51:53 2014] Instance buserver, currently running normally. [Fri May 9 12:54:08 2014] If the ptserver is running normally, how does your aklog work back on the other machine? [Fri May 9 12:58:06 2014] same error [Fri May 9 12:58:36 2014] cannot get to resolve admin user in mycell [Fri May 9 12:58:45 2014] mmm.... [Fri May 9 13:00:10 2014] sir, can u gave me nice tutorial refence for step by step install openafs..? [Fri May 9 13:00:39 2014] that i can get it to learn beside openafs userguide [Fri May 9 13:01:05 2014] where tha i can get it? [Fri May 9 13:02:52 2014] I don't really remember the state of various tutorials off the top of my head. Maybe someone else here has a better memory. [Fri May 9 13:03:25 2014] If I had to come up with something, I would note that the debian package has a couple of scripts that basically automate setting up a new cell, and they can be used as a reference for what steps need to happen. [Fri May 9 13:03:48 2014] Debian uses different paths for the configuration files and such, though, so the script wouldn't work as-is on an RPM-based system. [Fri May 9 13:04:11 2014] what debian relase? [Fri May 9 13:04:17 2014] wheezy? [Fri May 9 13:04:22 2014] http://anonscm.debian.org/gitweb/?p=pkg-k5-afs/openafs.git;a=blob;f=debian/afs-newcell;h=456433fd002255e64fd6e2090258445c5bb413d7;hb=refs/heads/master is the main script for making a new cell; there's an afs-rootvol as well next to it. [Fri May 9 13:04:22 2014] or squeeze [Fri May 9 13:05:01 2014] I don't think those two scripts have changed very much across releases. [Fri May 9 13:05:21 2014] So the gitweb version should work fine as a makeshift step-by-step. [Fri May 9 13:06:49 2014] setiadyanton: http://wiki.freebsd.org/afs-serveris a pretty good tutorial; it's what I followed for my cell. [Fri May 9 13:07:25 2014] (Though yes, beware the paths issue; that guide is unsurprisingly geared towards the FreeBSD ports packaging of OpenAFS) [Fri May 9 13:07:31 2014] thank [Fri May 9 13:08:31 2014] And now I get to have a little chuckle, as I maintain the FreeBSD packaging and yet I pointed to the debian stuff as a reference. [Fri May 9 13:09:54 2014] Hah! [Fri May 9 13:10:07 2014] I had forgotten your name at the top of that tutorial. [Fri May 9 13:10:59 2014] insanly openafs have a litte reference tutorial [Fri May 9 13:11:29 2014] i search googling, [Fri May 9 13:12:02 2014] search video tutorial in youtube but nothing else.. [Fri May 9 13:12:26 2014] no a video tutorial about openafs... [Fri May 9 13:12:36 2014] rare stuff huh... [Fri May 9 13:13:06 2014] I think the people who run openafs are not really big fans of video tutorials. Text is more efficient. [Fri May 9 13:13:08 2014] but i want to try that.. [Fri May 9 13:13:11 2014] OpenAFS predates... well, much of the things people consider to be "The Internet" these days. [Fri May 9 13:13:33 2014] Well, no, sorry, AFS predates, and much of the OpenAFS community. [Fri May 9 13:14:47 2014] day by day i get to running openafs.. [Fri May 9 13:15:01 2014] but nothing ... [Fri May 9 13:15:33 2014] to get this machine success running .. [Fri May 9 13:15:43 2014] so sad ;( [Fri May 9 13:15:56 2014] setiadyanton: i succeeded by following the documentation in debian, and on the openafs.org website [Fri May 9 13:16:18 2014] yeah im using centos [Fri May 9 13:16:26 2014] :( [Fri May 9 13:16:49 2014] setiadyanton: but i had to do a little extra reading and work to get kerberos running [Fri May 9 13:16:55 2014] i got to try it in debian [Fri May 9 13:17:51 2014] setiadyanton: then i visited here a lot and watched the discussions [Fri May 9 13:18:27 2014] kerberos setting i think i'm success .., but openaffs setting very hard [Fri May 9 13:19:02 2014] setiadyanton: after a few years, i expanded to running openafs on freebsd, too, to take advantage of ZFS, and openafs-server works very well on freebsd [Fri May 9 13:19:16 2014] very much service and tool to get ruuning.. [Fri May 9 13:19:30 2014] ow... [Fri May 9 13:19:38 2014] you are lucky [Fri May 9 13:19:45 2014] crisb [Fri May 9 13:20:10 2014] do you have nice tutorial reference? [Fri May 9 13:20:22 2014] to get it running bro? [Fri May 9 13:20:38 2014] :D [Fri May 9 13:20:55 2014] i'm stuck [Fri May 9 13:21:46 2014] as I said the docs in debian /usr/share/docs/openafs-doc [Fri May 9 13:22:24 2014] which are edited versions of the docs available on the openafs.org website [Fri May 9 13:22:39 2014] setiadyanton: have you set up nfs or samba before? [Fri May 9 13:23:52 2014] ow course i have it [Fri May 9 13:24:11 2014] samba an nfs more simple than openafs [Fri May 9 13:25:48 2014] :D thats my opinion [Fri May 9 13:26:18 2014] ok, so afs is similar, but more work than those two. in addition to the fileserver there is a volserver and a ptserver, and the client needs some configuration and security interface to work [Fri May 9 13:27:06 2014] i didn't understand that when i started, but by following the instructions, i succeeded [Fri May 9 13:28:23 2014] ow nice bow [Fri May 9 13:28:28 2014] *bro [Fri May 9 13:28:52 2014] yeah more advance :D [Fri May 9 13:28:55 2014] too [Fri May 9 13:29:27 2014] i tripped over root.afs and root.cell, and had to reinstall [Fri May 9 13:30:00 2014] what different root.afs. and root.cell? [Fri May 9 13:31:16 2014] root.cell is the root of your cell. root.afs is a super-root containing mountpoints for multiple cells; it's becoming somewhat historical given the widespread use of dynroot [Fri May 9 13:31:36 2014] samba and nfs do not have that concept of root.afs and root.cell [Fri May 9 13:32:40 2014] so root.afs is containing openafs cell in the world or someting? [Fri May 9 13:33:37 2014] root.cell is my local cell that i have running is it? [Fri May 9 13:34:57 2014] is root.afs must be create when im running openafs server? [Fri May 9 13:35:20 2014] or is my choice? [Fri May 9 13:39:06 2014] root.cell is essential; root.afs is less important these days. [Fri May 9 13:47:41 2014] root.cell is the root of your local cell, equivalent to / on Unix or drive:\ on Windows [Fri May 9 13:48:20 2014] ok everybody thx to share knowledge openafs, i will try again using the best choice now , best path. i will return to this channel when i'm stuck with config :P [Fri May 9 15:29:17 2014] I have wiresharked my afs pr/vlserver traffic and kerberos traffic, and those rxkad errors happened again *after* a soft-lockup [Fri May 9 15:29:24 2014] now to see if it is my fault or a bug [Fri May 9 15:38:17 2014] ok, this is an afs bug I am pretty sure [Fri May 9 15:39:01 2014] the KDC is returning KRB_ERR_C_PRINCIPAL_UNKNOWN pi [Fri May 9 15:39:06 2014] ubnt, admin, ... [Fri May 9 15:39:10 2014] substrings of actual principals [Fri May 9 15:49:48 2014] it's definitely the afs client sending the truncated strings, and it looks like actually arbitrary chunks of memory [Fri May 9 15:50:45 2014] I thought for a moment perhaps it was folks trying to login to /server-status (uses mod-auth-kerb), but the wireshark is showing it as being name-to-id requests with the bad names, and mod-auth-kerb is not hitting nss or aklog [Fri May 9 15:53:19 2014] wireshark doesn't decode strings in the AFS protocol correctly, which might be while you're seeing truncated strings. [Fri May 9 15:53:47 2014] seems to work fine most of the time for name-to-id, and the kerberos messages related [Fri May 9 15:53:59 2014] all of the other kerberos messages have the correct principal name [Fri May 9 15:54:48 2014] By name-to-id, do you mean the PR_NameToID RPC? In which case, that isn't a Kerberos message, but an AFS one. [Fri May 9 15:55:11 2014] PR_NameToID isn't issued by the cache manager - the most likely source for that message is aklog. [Fri May 9 15:57:52 2014] /afs/hcoop.net/user/c/cl/clinton_admin/kerb-err/packet-dump.text [Fri May 9 15:58:04 2014] wireshark analysis of one bad transaction [Fri May 9 15:58:29 2014] this then results in a dmesg entry for the rxkad error involving an invalid kvno [Fri May 9 16:00:54 2014] May 8 17:31:05 navajos kernel: [55389.880273] afs: Tokens for user of AFS id 10476 for cell hcoop.net are discarded (rxkad error=19270408, server 69.90.123.75) [Fri May 9 16:01:18 2014] that user is aizak.daemon, certainly not "ubnt" [Fri May 9 16:02:09 2014] That whole transaction looks like someone is trying to aklog as ubnt. [Fri May 9 16:02:29 2014] The AFS cache manager does neither ptserver lookups, nor krb5 AS requests. [Fri May 9 16:02:52 2014] Those will be coming from either aklog, or another piece of local tooling which gets tickets. [Fri May 9 16:04:28 2014] I am concerned about the correlation between the kernel hanging and then tokens dropping [Fri May 9 16:04:36 2014] maybe mod_waklog is being evil [Fri May 9 16:05:42 2014] or ... hrm, a libopenafs bug [Fri May 9 16:05:59 2014] because downgrading to 1.4.12 fixes it, and that is also downgrading libopenafs [Fri May 9 16:08:40 2014] great, I guess I have to figure out how to make an apache module dump a trace of every call it makes [Fri May 9 16:08:48 2014] after years of common lisp it comes to this [Fri May 9 16:09:16 2014] can magic sysrq help you tell where the kernel is hanging? [Fri May 9 16:10:34 2014] <[gorgo]> sxw1: btw I believe wireshark has some patches that fixes some rx string decodes, but not yet in 1.10 [Fri May 9 16:11:08 2014] CybrFyre: the hang happened while I was asleep [Fri May 9 16:11:16 2014] <[gorgo]> cybrfyre: if the kernel reports a soft lookup, it will also provide a stack trace from the process being stuck [Fri May 9 16:11:19 2014] and then it resumed normal operations, except for dropping tokens left and right [Fri May 9 16:11:35 2014] there is a *definite* time correlation between the rxkad errors and weird name-to-id/as_reqs [Fri May 9 16:11:47 2014] I was told the soft-lockup could not be related, but just in case... [Fri May 9 16:12:08 2014] <[gorgo]> do you have a stack trace from the soft lookup? [Fri May 9 16:12:47 2014] /afs/hcoop.net/user/c/cl/clinton_admin/kerb-err/kern-stuck.log [Fri May 9 16:13:26 2014] I get the first rxkad error, then a kernel lockup, then tons of rxkad errors [Fri May 9 16:13:48 2014] ignore my previous kernel paste, I forgot the wireshark dump was offset from the kernel by ~18000 [Fri May 9 16:13:56 2014] still, pronovic.daemon is not ubnt. [Fri May 9 16:14:20 2014] but at least this seems more likely to be a userspace bug than a kernel bug? [Fri May 9 16:16:05 2014] Those hung tasks issues look like they might be deadlocks. When you're having that problem, run cmdebug and look at the locks held and locks waiting [Fri May 9 16:16:12 2014] <[gorgo]> btw you may still be getting these hung tasks [Fri May 9 16:16:24 2014] <[gorgo]> the kernel is usually configured to print out the first 10 [Fri May 9 16:16:39 2014] <[gorgo]> and after that it will not print them out [Fri May 9 16:16:41 2014] I had to reboot the system [Fri May 9 16:16:54 2014] (can't keep web services offline for 110 people for very long ...) [Fri May 9 16:17:03 2014] but this happens every 12 hours now that I have upgraded the openafs-fileserver [Fri May 9 16:17:34 2014] It happens when you upgraded the fileserver, or when you upgraded the client? [Fri May 9 16:17:36 2014] <[gorgo]> wait, I thought this is the afs client [Fri May 9 16:17:42 2014] timeline [Fri May 9 16:17:53 2014] I upgraded the client, and this error started occuring once every ~5 days [Fri May 9 16:18:23 2014] I then upgraded our fileservers from 1.4.12 to 1.6.7 and rekeyed, and now it happens every 12-24h [Fri May 9 16:18:37 2014] <[gorgo]> you seem to be running a locally patched version of the client [Fri May 9 16:18:45 2014] <[gorgo]> OpenAFS 1.6.7-1~bpo60+hcoop2-debian built 2014-05-02 [Fri May 9 16:18:58 2014] it's just the standard debian package, backported (built with pbuilder) [Fri May 9 16:19:20 2014] the hcoop2 was because I screwed up our debarchiver with mismatched source uploads and had to re-upload [Fri May 9 16:24:01 2014] mod_waklog is also not reporting any errors during token renewal [Fri May 9 16:25:35 2014] some days, I hate unix. Which moving part *is* it. pcap can't secretly also log the PID sending the traffic, can it? (or something else I can use to correlate data?) [Fri May 9 16:25:36 2014] <[gorgo]> so how do you get those ubnt krb5 and pr queries? [Fri May 9 16:25:59 2014] a mystery! [Fri May 9 16:26:21 2014] My guess would be that they're coming from waklog [Fri May 9 16:26:48 2014] <[gorgo]> you can do some basic correlation looking at the source port [Fri May 9 16:26:53 2014] <[gorgo]> and a strategically placed strace [Fri May 9 16:26:54 2014] <[gorgo]> :) [Fri May 9 16:27:26 2014] I feel like there are netfilter rules to do what I want, hrm [Fri May 9 16:27:28 2014] Or netstat, if things stick around for long enough [Fri May 9 16:28:10 2014] <[gorgo]> I'm not familiar with waklog, would that be a long running process? [Fri May 9 16:28:28 2014] [gorgo]: it runs inside of apache [Fri May 9 16:28:52 2014] It lives within Apache. Problem is that there have been multiple different implementations during its life. What some people refer to as waklog differs from others. [Fri May 9 16:28:52 2014] ok, there *is* a netfilter rule to log the owner of a command making an outbound connection [Fri May 9 16:28:58 2014] <[gorgo]> but does it start an rx socket and keeps reusing it, or does it get reexec'd often? [Fri May 9 16:29:11 2014] http://git.hcoop.net/?p=hcoop/debian/libapache-mod-waklog.git;a=summary this is our waklog, mostly from megacz [Fri May 9 16:29:32 2014] I merged th changed from the sf.net tree, but they do not appear to be active so I just assumed I was upstream at this point (people send me patches... rarely) [Fri May 9 16:29:45 2014] The version that I'm familiar with embeds the setpag and aklog operations into the Apache process. So each Apache child opens an RX socket and keeps it around for the life of the child. [Fri May 9 16:30:20 2014] if there are other active copies of waklog around on the net, I want to know so I can merge them all into one waklog that doesn't suck [Fri May 9 16:30:37 2014] (currently in the process of purging apache 1.3 support, and replacing the fixed table of credentials with an apr_hash_table) [Fri May 9 16:31:04 2014] <[gorgo]> that reminds me of https://xkcd.com/927/ [Fri May 9 16:31:25 2014] <[gorgo]> add waklog to the subtitle [Fri May 9 16:32:51 2014] oh great, the cmd-owner filter was removed years ago [Fri May 9 16:37:05 2014] I think I found it ... auditd claims to let me log what I want, and now I just need to set this up and wait again [Fri May 9 16:37:27 2014] does anyone know any other traffic I might want to dump while waiting for the client to die again? [Fri May 9 16:37:38 2014] or if maybe I should be dumping some traffic on the server too? [Fri May 9 16:50:00 2014] <[gorgo]> actually if the client dies, your best bet would be to crash the box (sysrq-c) and then have someone knowledgable look at the core to find the deadlock [Fri May 9 16:54:15 2014] it sort-of dies ... load spikes, but some requests still work [Fri May 9 16:54:18 2014] If you have a kernel module with debug symbols, it would be interesting to know what the code correspoding to [Fri May 9 16:54:20 2014] afs_linux_getattr+0x152 [Fri May 9 16:54:31 2014] afs_AccessOK+0x7f [Fri May 9 16:54:39 2014] and afs_FindVCache+0x22b [Fri May 9 16:54:48 2014] <[gorgo]> you can actually take a look at the vmcore yourself and see the backtrace for the afsd processes... if any of those is sitting in afs_CheckTokenCache, then it might indeed be a soft lockup [Fri May 9 16:55:26 2014] <[gorgo]> basically if you have lots of tokens, and several expiring about the same time and a huge size of stat cache [Fri May 9 16:55:47 2014] <[gorgo]> then the cleanup process will take a really long time and will block all the other processes that want to look at it [Fri May 9 16:56:05 2014] <[gorgo]> there are some patches to help this a bit pending inclusion in master [Fri May 9 16:56:06 2014] we only have about 90 tokens active afaict [Fri May 9 16:56:41 2014] <[gorgo]> in our case 90 tokens with 2 million access caches kills the box for several minutes [Fri May 9 16:57:50 2014] well, seems like something to chase then (our client cache is 5G, so I guess ... it's large enough ?) [Fri May 9 16:58:20 2014] <[gorgo]> it's not the actual file data cache [Fri May 9 16:58:26 2014] <[gorgo]> it's the stat cache [Fri May 9 16:58:51 2014] <[gorgo]> perhaps if you get hit again, you could add -disable-dynamic-vcaches to the afsd parameters [Fri May 9 16:59:06 2014] <[gorgo]> and see if it helps [Fri May 9 16:59:08 2014] I didn't think you could do that on linux. [Fri May 9 16:59:30 2014] <[gorgo]> I didn't try it [Fri May 9 16:59:30 2014] I am concerned about the rxkad invalid kvno errors correlating with this, which to me implies that whatever is happening is also causing the kernel cache manager to end up in an inconsistent state [Fri May 9 17:01:10 2014] <[gorgo]> can we somehow get the vcache size using xstat_cm_debug ? [Fri May 9 17:01:52 2014] Maybe it just opens up some panic pathways; I don't really remember. [Fri May 9 17:02:10 2014] gorgo: That's certainly the case that Marc's patch fixes. [Fri May 9 17:02:33 2014] We're seeing access cache cleanup times going from 70s to 0.7s with the patch [Fri May 9 17:02:34 2014] <[gorgo]> sxw1: at least speeds it up a bit [Fri May 9 17:03:01 2014] How much depends on how well sized your unixuser hash table is [Fri May 9 17:03:16 2014] <[gorgo]> Mark and Andrew had some other ideas about linking up the access cache entries directly from the unixusers, so you won't need to scan the vcache list at all [Fri May 9 17:03:30 2014] We looked at that. It gets not pretty very quickly. [Fri May 9 17:03:49 2014] And lots of stuff already scans the vcache in those x minute cleanup threads. [Fri May 9 17:04:01 2014] A better bet would probably be to do all the work that you want to do in a single vcache scan. [Fri May 9 17:04:29 2014] <[gorgo]> 0.7s can still be a long time for a lock to be held [Fri May 9 17:04:43 2014] one thought I had ... the rxkad errors may be in ... afs_Analyze [Fri May 9 17:04:48 2014] in the else with the comment /* The else case shouldn't be possible and should probably be replaced by a panic? */ [Fri May 9 17:05:25 2014] I guess I should upgrade the client to 1.6.8pre2 and add a printk if it gets there [Fri May 9 17:05:54 2014] * rewrites afs in haskell [Fri May 9 17:05:58 2014] be back in ten years [Fri May 9 17:06:47 2014] <[gorgo]> but obviously 2 orders of magnitude is a huge speedup by itself :) [Fri May 9 17:08:55 2014] <[gorgo]> sxw1: is vcacheXAllocs the current size of the vcache? [Fri May 9 17:09:36 2014] gorgo: Yeah, you could yield in the middle of that loop as well, and let other stuff have a chance. But you'd still be holding xusers and xvcache, so its not clear how much else could proceed [Fri May 9 17:09:37 2014] I installed kernel debugging symbols, so I guess when this happens next time I can gdb vmlinux [Fri May 9 17:09:58 2014] It would be possible to drop everything at each hash bucket and yield with a little bit more cleverness. [Fri May 9 17:10:29 2014] <[gorgo]> too bad afs_maxvcount is not reported through the debugging interface [Fri May 9 17:10:36 2014] unknown_lamer: What you need is debugging symbols for the AFS kernel module, not the kernel itself. If you build your kernel module with dkms you might not have that. [Fri May 9 17:11:18 2014] gorgo: cmdebug will attempt to dump everything in the vcache. Sadly, if you've got a lot of entries, it will just tie up the callback thread forever. [Fri May 9 17:11:43 2014] <[gorgo]> sxw1: that's why I'm looking at xstat_cm_test [Fri May 9 17:13:09 2014] vcacheXAllocs might be a useful number. [Fri May 9 17:13:22 2014] xstat_cm_test never worked for me, wasn't sure which afsd option I needed (-enable_process_stats ?) [Fri May 9 17:13:35 2014] <[gorgo]> unknown_lamer: it works for me against your client :) [Fri May 9 17:13:52 2014] apparently I don't know how to use xstat_cm_debug [Fri May 9 17:14:01 2014] *phew* dkms lets me override the strip setting [Fri May 9 17:14:09 2014] <[gorgo]> xstat_cm_test 2 -onceonly [Fri May 9 17:14:19 2014] the debian packaging should make it easy for me to grab 1.6.8pre2 and import it as upstream, right? [Fri May 9 17:14:43 2014] <[gorgo]> if you run this before crashing the box, you can see the size of the vcache [Fri May 9 17:15:21 2014] <[gorgo]> if you crash the box and have debug symbols, you should check the value of afs_stats_cmperf.vcacheXAllocs, afs_vcount and afs_maxvcount [Fri May 9 17:15:54 2014] so much for taking the weekend off from hcoop [Fri May 9 17:16:02 2014] unknown_lamer: yes. [Fri May 9 17:16:23 2014] I really hope this is waklog being awful and not waklog triggering a kernel bug [Fri May 9 17:19:24 2014] unknown_lamer: debian/README.source is your friend [Fri May 9 17:19:35 2014] <[gorgo]> did you also upgrade waklog recently ? [Fri May 9 17:19:36 2014] grab the packaging from http://anonscm.debian.org/gitweb/?p=pkg-k5-afs/openafs.git [Fri May 9 17:20:02 2014] <[gorgo]> unknown_lamer: I see in the url you gave that it includes kdf support, etc... [Fri May 9 17:20:11 2014] <[gorgo]> you didn't have that if compiled with 1.4 libs [Fri May 9 17:20:22 2014] [gorgo]: not substantially -- I did recently switch on the rxkad-kdf patches, but they do not seem to have any effect [Fri May 9 17:20:43 2014] same thing was happening when it was getting des skeys [Fri May 9 17:21:47 2014] the patch to enable rxkad-kdf does not seem to doing anything weird: http://git.hcoop.net/?p=hcoop/debian/libapache-mod-waklog.git;a=commitdiff;h=52e434d854936c5a8cb9159119f5e3b076595d7c [Fri May 9 17:22:42 2014] not sure about that goto cleanup there though [Fri May 9 17:23:57 2014] The goto cleanup looks fine. [Fri May 9 17:24:59 2014] I also merged a bunch of changes from the sf.net repo that looked OK, but they also did not cause issues with 1.4.12 [Fri May 9 17:25:27 2014] one thing I didn't think of -- could it possibly be related to our kernel being from squeeze? 2.6.35 + debian patches [Fri May 9 17:25:58 2014] I could upgrade to the backports kernel ('tho I'd like to avoid that since backports loses security support in three weeks afaict and we're about six months away from getting off of squeeze-lts) [Fri May 9 17:28:59 2014] unknown_lamer: I do have commit access to the waklog on SourceForge. I think Sine Nomine recently submitted patches for rxkad-k5 that I merged [Fri May 9 17:29:29 2014] other than that, it is essentially abandoned. We can try pinging Adam to get you commit access [Fri May 9 17:29:50 2014] megacz and hcoop may have ended up on bad terms, 'tho I wasn't involved in that crap [Fri May 9 17:30:21 2014] I can use sf.net's new merge request tool to request you guys pull from me anyway [Fri May 9 17:30:31 2014] I really wish dice weren't strangling sf.net [Fri May 9 17:31:04 2014] yeah [Fri May 9 17:31:21 2014] I can probably just ask Adam to give me full rights to the project :) [Fri May 9 17:32:03 2014] I feel bad, I work for them and I still use adblock on sf.net because the new ads are so obnoxious [Fri May 9 17:32:08 2014] I know what everyone wants, video ads. [Fri May 9 17:32:54 2014] well, down the rabbit hole of debugging the kernel for me [Fri May 9 17:35:11 2014] unknown_lamer: The thing you want to find out is what locks are being held when you start getting those warnings. cmdebug will tell you that. Just don't run more than one cmdebug at once, and you can CTRL-C it once you get the initial lock list (we're probably not interested in vcache locks [Fri May 9 19:13:01 2014] need to figure out what's not working on 10.9 openafs clients [Fri May 9 19:13:22 2014] user complaining "Double-clicking file icons results in error message that file is damaged; seems to happen with recently created files but now older ones" [Fri May 9 19:13:31 2014] not sure if that "now" should be a "not" [Fri May 9 19:13:52 2014] it's a bug in ._ file handling. it is not unique to afs and it is not an afs bug. macos is broken [Fri May 9 19:13:53 2014] and complaining "Trying to browse files in a folder on AFS using the Mac finder or from within an application does not show any files" [Fri May 9 19:14:31 2014] dbrashear - I don't ever recall seeing that, to be honest... did 10.9 change something that things don't work anymore? [Fri May 9 19:15:09 2014] istr it started in 10.8.something [Fri May 9 19:15:25 2014] ok... I skipped 10.8 on my test computers... and I think most staff did, too [Fri May 9 19:15:49 2014] is there (1) anything afs can do to workaround the problem like it does w. other things and (2) any type of bug report open w. apple? [Fri May 9 19:15:54 2014] rdar://15927187 [Fri May 9 19:16:01 2014] wtf is rdar:// ? [Fri May 9 19:16:14 2014] which was closed as dup of 15904074 [Fri May 9 19:16:19 2014] apple bug [Fri May 9 19:16:38 2014] have they responded in any way that you can tell me? [Fri May 9 19:16:40 2014] my summary was " 15927187 gatekeeper quarantine is broken on filesystems where xattr is emulated" [Fri May 9 19:16:46 2014] they responded by closing as duplicate [Fri May 9 19:17:03 2014] or do I need to figure out how to open a bug w. apple through Cornell (if for no other reason to put more pressure on 'em)? [Fri May 9 19:17:12 2014] actually, that would be helpful [Fri May 9 19:18:03 2014] see your work email [Fri May 9 19:18:25 2014] ok [Fri May 9 19:19:05 2014] what about the second issue (browsing files not showing anything ... I am presuming b.c. Finder is busy stat'ing other stuff)? [Fri May 9 19:19:20 2014] which, I've experienced for myself... Finder was stat'ing remote cells even tho we have -fakestat-all [Fri May 9 19:19:36 2014] I would guess... workaround is to remove all other cells from CellServDB... but is there a better answer? [Fri May 9 19:21:46 2014] haven't investigated that particularly hard (haven't seen it but i don't claim it's not an issue... only that i haven't seen it) [Fri May 9 19:22:54 2014] if you have suggestions for troubleshooting it, I'm all ears :/ [Fri May 9 19:23:28 2014] most I've done is to tcpdump port 7001 and see the queries to other cells (many of which timed out) which I'm *assuming* is what was slugging up Finder [Fri May 9 19:24:30 2014] well, as you say, not mounting other cells, certainly you can start there and confirm/deny that's an issue. if so, i suspect the blacklist in the kernel needs to be updated [Fri May 9 19:24:52 2014] uh... what blacklist in the kernel? :) [Fri May 9 19:25:16 2014] the one that knows things like .DS_Store is special and not something to even bother trying [Fri May 9 19:25:42 2014] hunh... didn't know there was such a thing [Fri May 9 20:22:49 2014] Hmm. Why are "fs chgrp" and "fs chown" windows only? [Fri May 9 20:23:31 2014] because unix has chown and chgrp already [Fri May 9 20:23:49 2014] and the windows equivalents map to openafs even worse than the unix ones do [Fri May 9 20:25:56 2014] So standard chown is the way to change an owner? Ugh. [Fri May 9 20:26:51 2014] afs doesn't really do owners as such; they're pretty much there to keep unix happy [Fri May 9 20:27:01 2014] so you use unix tools [Fri May 9 20:30:23 2014] Ah, right. "man fs setacl" says "allows the UID owner of a volume", which I assume means openafs retains the unix ID (nothing to do with PTS at all) for each directory... [Fri May 9 20:31:20 2014] Not for each directory. [Fri May 9 20:31:32 2014] The root directory of an AFS volume is special. [Fri May 9 20:32:43 2014] kaduk_: Ahh. The only thing that confuses me is how it handles the fact that unix UIDs differ between hosts. [Fri May 9 20:32:52 2014] it doesn't, essentially [Fri May 9 20:33:45 2014] there is no uid <-> pts id mapping like rpc.guidd that some nfs implementations use; there's an assumption that you synchronize your unix uids and pts ids somehow [Fri May 9 20:34:12 2014] How is the security issue handled? If I have root on a machine does that mean I can take over any volume with a Unix UID of 0? [Fri May 9 20:34:30 2014] No. [Fri May 9 20:34:30 2014] 0 isn't a valid pts id [Fri May 9 20:35:08 2014] ashl: you can use libnss-afs + nscd to just make the afs pts the source of your uids [Fri May 9 20:35:09 2014] but, unix uid doesn't really mean a lot to afs, permissions go by your token and its rights [Fri May 9 20:35:18 2014] you lose gids however, unless you also use something like libnss-ldap [Fri May 9 20:37:01 2014] geekosaur: The setacl man page still seems to imply that if you can adopt a UID then you can change ACLs. So surely root on a random box can take over any volume? [Fri May 9 20:37:43 2014] unknown_lamer: Interesting. I did it the other way around and dumped LDAP into PTS [Fri May 9 20:38:29 2014] I wonder I wonder, if it would be safe/feasible to have libnss-afs export system:$group membership [Fri May 9 20:38:34 2014] a thought for another month... [Fri May 9 20:40:21 2014] Ponder for a moment where the uid check is enforced... [Fri May 9 20:42:47 2014] ashl, if root is not already the owner of the volume root, root cannot have permission to change the owner unless their token is granted "a" access [Fri May 9 20:43:08 2014] also note that, again, root == uid 0 is not a valid pts id [Fri May 9 20:43:24 2014] ashl: the reason that fs chgrp and fs chown do not exist on UNIX is that no one has bothered to implement them since the standalone chown and chgrp commands are often good enough. fs chown and fs chgrp are required on Windows because there is not equivalent functionality that permits the setting of the owner and group for an AFS volume. One of the benefits of fs chown and fs chgrp is that they explicitly use AFS PTS IDs wh [Fri May 9 20:43:42 2014] "use AFS PTS IDs wh" [Fri May 9 20:43:57 2014] which are never confused with local system IDs [Fri May 9 20:45:41 2014] Windows Active Directory does provide posixAccount http://msdn.microsoft.com/en-us/library/ms683907%28v=vs.85%29.aspx and posixGroup http://msdn.microsoft.com/en-us/library/ms683908%28v=vs.85%29.aspx which could be used to map Unix IDs to Windows SIDs [Fri May 9 20:46:52 2014] secureendpoints: AFS PTS IDs? Maybe I misunderstood this. I thought we were saying that the volume owner (set with chown) is compared against the local unix user's UID for (eg, "fs setacl"'s volume-owner can set" feature. Is it actually compared against the PTS ID of the current token? [Fri May 9 20:47:56 2014] unknown_lamer: local system uid and gid values are never seen by the file servers. File Servers resolve the AFS PTS ID from the authentication identity in the AFS token presented when the RPC is issued. [Fri May 9 20:48:11 2014] I know that [Fri May 9 20:48:20 2014] it's just nice to have unix uids and afs pts ids automagically in sync [Fri May 9 20:48:40 2014] ACLs are not interpreted by AFS cache managers. Only by file servers. [Fri May 9 20:49:05 2014] volume ownership is only interpreted by file servers. [Fri May 9 20:49:07 2014] also, one less userdb [Fri May 9 20:51:24 2014] The unix chown and chgrp ask the AFS cache manager to request that the file server set a specific integer value as the owner or group. If the local name<->id mapping does not match the PTS name<->id mapping and names are used, then the wrong integer value will be set [Fri May 9 20:51:37 2014] Much as the wrong name will be displayed [Fri May 9 20:55:14 2014] secureendpoints: Understood, thanks. The fact that chown was involved misled me into thinking that tokens weren't involved. [Fri May 9 22:22:57 2014] @cell/user is a RW volume on server A containing OldFiles (RO version of the volume). Does that mean that @cell/user/OldFiles will be inaccessible when A is down? [Fri May 9 22:25:50 2014] I'm wondering if a top level read-only volume per user with a couple of RW volumes within it might make sense. That takes me from maybe 200 volumes to 600 though. [Sat May 10 09:30:41 2014] ashl: there is a syntax to address volumes directly, also you can always create a mountpoint somewhere else for the RO volume. [Sat May 10 23:44:12 2014] I'm making a bit more progress with getting OpenAFS to build with gcc on Smartos (thanks Andrew Deason and Chas!), but I'm finding that libtool has decided to attempt to link the 32bit version of libintl into a 64bit library, as in http://pastebin.com/xhuWWezP [Sat May 10 23:44:40 2014] How can I figure out what's caused libtool to lose its marbles there? [Sun May 11 16:38:19 2014] ashl: Much delayed, but what we do here is have a top-level directory that holds mountpoints for all user backup volumes. (Specifically, a user's volume group is mounted such that @cell/user/foo is RW, @cell/user/foo/acmsys/OldFiles is RO, and @cell/chicago/user/foo is also RO. You could do something similar with BK mountpoints.) [Sun May 11 16:39:20 2014] (Though in the case of BK, since BK can only be hosted with the RW, you don't gain anything if the RW host is offline, but you might gain something if the RW volume is damaged but the BK is not.) [Mon May 12 10:36:09 2014] dbrashear - I've submitted bug #16883903 to Apple [Mon May 12 10:42:04 2014] secureendpoints: how does pts determine which ptserver to talk to? does it just pick the last in the CellServDB list for a cell? [Mon May 12 10:43:38 2014] dhowells: both the volume and protection databases are subject to Ubik *for the cache*. [Mon May 12 10:44:37 2014] dhowells: so if 'pts' goes via the cache it "just works", else if it contact the protection servers directly it may be like 'vos' which usually picks a server at random and tries to contact it forever. [Mon May 12 10:46:13 2014] Walex: pts goes nowhere near the cache [Mon May 12 10:46:24 2014] I'm not seeing any ubik activity that I can see [Mon May 12 10:46:35 2014] Yeah, you won't. [Mon May 12 10:47:05 2014] what happens if pts is pointed at a cell for which there is no CellServDB entry, but there are DNS entries? [Mon May 12 10:47:32 2014] (ie. AFSDB or SRV entries for volume servers) [Mon May 12 10:47:34 2014] sxw: uhmmm, so it is like 'vos', that's a bit disappointing. [Mon May 12 10:47:48 2014] dhowells: The logic is somewhat complex. [Mon May 12 10:49:16 2014] I see [Mon May 12 10:50:22 2014] Essentially, pick a server at random, issue your RPC to it. If it responds with UNOTSYNC, or times out, try a different server. Remember everything you do so next time you don't have to do it again. [Mon May 12 10:51:14 2014] If there are more than 4 configured servers, issue a VOTE_GetSyncSite RPC in the hopes of speeding the whole thing up. [Mon May 12 10:52:06 2014] a ubik call goes to the same service+port as the actual call? [Mon May 12 10:53:08 2014] Same host+port, VOTE is a different RX service. [Mon May 12 10:53:19 2014] ok [Mon May 12 10:53:33 2014] sxw: is that 'VOTE_GetSyncSite' something recent or PTS-only? [Mon May 12 10:53:47 2014] Also DISK_. http://www.central.org/pages/numbers/rxservice.html is handy. [Mon May 12 10:57:05 2014] VOTE is a service provided for all ubik backed services [Mon May 12 11:03:48 2014] Walex: Delayed response: Thanks. I suppose I could just have volumes like @cell/OldFiles/a-user. [Mon May 12 11:04:48 2014] ashl: that too. But pleae note that those are NOT VOLUMES. They are mountpoints. [Mon May 12 11:05:07 2014] ashl: the syntax for volume names is completely different from the one for volumes. [Mon May 12 11:05:22 2014] ashl: now it sounds like a pedantic distinction, but it is really quite important. [Mon May 12 11:05:57 2014] quite important for a sysadm -- most users (and many sysadms) never need to deal with volume names [Mon May 12 11:07:01 2014] nwf: Much delayed response: I suppose top level volumes work fine. The only downside I see is that it feels easier having each users items appear in "This folder on my desktop" [Mon May 12 11:08:41 2014] Walex: Sorry. Consider it a linquistic shorthand. "... I could just have volumes [with mountpoints] like ..." :-) [Mon May 12 11:09:49 2014] ashl: The alternative mountpoints are there only so that if our storage cluster falls down in flames the off-cluster RO replicas are still online and readable. [Mon May 12 11:10:07 2014] ashl: then you can create two mountpoints in two different place for the same volume, to cover both usages. [Mon May 12 11:10:20 2014] It's better than nothing, even if we would prefer that our cluster be a little more stable than it has been, or that we had RW replication. [Mon May 12 11:10:40 2014] Walex: I thought multiple mountpoints to the same volume were not recommended? [Mon May 12 11:11:41 2014] ashl: it is just confusing. If you keep track of them, and you respect the rules for RW vs. RO, can't see why not. [Mon May 12 11:12:25 2014] ashl: the big deal with mountpoints is that within OpenAFS there is no way to enumerate mountpoints except with a whole-tree scan, or keeping a separate database. [Mon May 12 11:13:55 2014] Walex: man fs_mkmount says "not recommended, to create more than one mount point to a volume. The Cache Manager can become confused if a volume is mounted in two places along the same path through the filespace." [Mon May 12 11:14:47 2014] ashl: ah yes, "along the same path". [Mon May 12 11:15:15 2014] there's also an open bug on linux where the kernel dcache becomes confused if you access a single volume via multiple mountpoints, fwiw [Mon May 12 11:15:42 2014] geekosaur: in the general case? That's disappointing. [Mon May 12 11:18:51 2014] Is it a terrible idea for me to have a-user mounted in at @cell/user/a-user and then a-user.private RW mounted at @cell/user/a-user/Private ? [Mon May 12 11:19:12 2014] Thats what I've been doing with my toy home cell, but that has, like, 3 users... [Mon May 12 11:19:54 2014] (The RO version of a-user gets mounted at @cell/user/a-user/OldFiles) [Mon May 12 11:20:18 2014] ashl: that's along the same path... [Mon May 12 11:20:21 2014] Err... The RO version of a-user.private gets mounted at @cell/user/a-user/OldFiles [Mon May 12 11:20:39 2014] ashl: but they are two different volumes [Mon May 12 11:20:40 2014] Walex: But its not being mounted multiple times [Mon May 12 11:21:40 2014] Sorry, perhaps I should explain. I'm really asking if there are any downsides to giving users multiple volumes (partly so I can make the top level user mountpoint a RO volume) [Mon May 12 11:25:10 2014] Ideally I would like to allocate each user a top level volume (really just containing mountpoints), a private volume, and a public volume. The private and public volumes will be RW mounts, because we want last night's releases to be sitting on another server. Although I've been running it in my toy cell for a couple of years I'm a little concerned that it is unconventional. [Mon May 12 11:27:34 2014] ashl: I gotta take a bus, but later on I'll point you to a discussion by one of the denizens that suggests something like that. [Mon May 12 11:28:31 2014] Walex: Thanks [Mon May 12 14:46:51 2014] Is there a way to make the UNIX AFS cache more aggressive about putting blocks back to the server? Right now we see consumer I/O stalls when the cache fills up and starts to push blocks back, even though the servers are pretty idle while the cache is being loaded. It'd be nice to smooth that out a bit. [Mon May 12 14:48:00 2014] Blocks are pushed back when files are closed, sync'd or unlocked. [Mon May 12 14:48:25 2014] Or the cache fills up with dirty blocks, it seems? [Mon May 12 14:49:08 2014] Yes. If the cache is full, it will evict blocks to make space. [Mon May 12 14:49:10 2014] But I suppose you're telling me "while true; do sync; sleep 1; done" might do what I want? [Mon May 12 14:49:20 2014] It does so in an incredibly inefficient manner. [Mon May 12 14:49:26 2014] I think that would make you differently sad. [Mon May 12 14:49:38 2014] Well, sure, thus the question. [Mon May 12 14:49:39 2014] sadness wouldn't definitely follow. [Mon May 12 14:49:46 2014] sorry, would definitely follow. [Mon May 12 14:49:50 2014] sync isn't nice. [Mon May 12 14:49:52 2014] <[gorgo]> how about playing with the -splitcache option? [Mon May 12 14:50:04 2014] * reads [Mon May 12 14:50:20 2014] Where are you seeing the I/O stalls - on the client, or on the fileserver? [Mon May 12 14:50:54 2014] [gorgo]: Interesting, but not likely to help in our case; our access is almost entirely to RW volumes. [Mon May 12 14:50:56 2014] In generally, you don't want the client to start evicting blocks because the cache is full, because it's really not good about doing so in an efficient manner. [Mon May 12 14:51:31 2014] Well, most of the time we manipulate very small files, but occasionally we stream through a whopper (5GB+), often a disk image or somesuch. [Mon May 12 14:51:54 2014] And when writing such a file to AFS, we see rsync chug merrily along until the cache is full, then stall while the cache pushes blocks to the server, then resume. [Mon May 12 14:52:58 2014] The way to fix that is to fix the client. It's not a problem that you're going to be able to configure around. [Mon May 12 14:53:05 2014] The server's I/O capacity to its backing disks is faster than the client average write speed, but when it gets bunched up like this the client starts to block on the server. [Mon May 12 14:54:06 2014] I'm not sure what you mean by "fix the client" that is different than what I'm asking? [Mon May 12 14:55:46 2014] Well, I probably can't blame the servers definitively; all I can say is that at peak the AFS network and server paths are able to write faster than the client rsync claims to be pushing bits into the cache on average. This makes me think that we would not see I/O stalls if the cache were bypassed or a bit more aggressive about synchronizing dirty blocks back to the servers. [Mon May 12 15:27:48 2014] nwf: the unix cm's model of write on close means that there is no parallelism between writes to the afs cm and the writes to the file server. The benefits of this approach are: [Mon May 12 15:27:48 2014] 1. when the store operation does occur all of the dirty data can be written in one RPC which permits the rx call window size and slow start algorithm to provide maximum throughput [Mon May 12 15:27:48 2014] 2. since unix apps do not have mandatory locking it is less likely that two processes on separate machines can corrupt each other's data [Mon May 12 15:27:48 2014] 3. data that is locally updated multiple times will not result in multiple rpcs to the file servers [Mon May 12 15:28:33 2014] The lack of parallelism results in longer clock times for the overall procedure. [Mon May 12 15:29:43 2014] bypassing the cache is most likely what you want in your situation. the yfs client supports that via use of the O_DIRECT flag when opening the file. [Mon May 12 15:30:04 2014] The afsd manpage talked of "backgound daemons" and made it sound like there were things which could write dirty blocks back to the server even though the file had not been close()d yet. [Mon May 12 15:31:16 2014] (That's not to dismiss the benefits you outlined above; I acknowledge them all.) [Mon May 12 15:31:26 2014] There are circumstances that require blocks to be flushed before close. Out of space in the cache. Releasing a file lock. Sync. I don't remember which conditions use the background daemons in the unix cm. [Mon May 12 15:34:15 2014] On an only tangentially related point (to 1. above), why is RX window and slow-start per-call not per-connection? [Mon May 12 15:36:14 2014] a tcp connection is one stream. an rx call is one stream. [Mon May 12 15:36:54 2014] multiple calls in parallel on a single rx connection share the same link contention characteristics as multiple tcp connections [Mon May 12 15:39:59 2014] Hm. Isn't that only an argument for keeping separate windows? Separate slow-start seems sort of odd if we have reason to believe that the link is capable of faster flow rates. [Mon May 12 15:40:54 2014] (It's possible I have this backwards or just wrong; it's been a long time since I actually did anything that approximated network protocol design or engineering.) [Mon May 12 18:52:18 2014] oh yeah, I got another apache deadlock, managed to get lock info from cmdebug... [Mon May 12 18:52:28 2014] Lock afsdb_client_loc status: (none_waiting, write_locked(pid:2349 at:685)) [Mon May 12 18:52:28 2014] Lock afs_discon_lock status: (none_waiting, 1 read_locks(pid:0)) [Mon May 12 18:52:52 2014] pid 2349 did not exist either, so I guess something is causing a stale lock to be left behind in the kernel if I'm interpreting that correctly? [Mon May 12 18:53:16 2014] It'll be a few days before I can fit in getting everything recompiled to debug the kernel module / audit which process is making the weird TGS-REQs [Mon May 12 18:54:54 2014] I'm going to set up a trivial warning email to page me when a deadlock or any rxkad errors occur later this week too, so hopefully I can catch it as it is happening instead of hours later [Mon May 12 19:17:12 2014] Right, about to reorganise to 3 volumes per user (top level normal mount, and RW mounted Public & Private mounted within it). Last chance to tell me I'm doing something stupid. :-) [Mon May 12 19:19:34 2014] unknown_lamer: afsdb_client_loc is only held in one place. It indicates that a dns query (afsdb or srv) is in progress and has yet to complete. [Mon May 12 19:28:05 2014] secureendpoints: thanks, I'll just assume that the process disappeared in the seconds between me running cmdebug and ps for now [Tue May 13 02:48:49 2014] w 2 [Tue May 13 09:28:40 2014] nothing like chocolate in the morning :) [Tue May 13 11:10:41 2014] hi all [Tue May 13 11:10:57 2014] is there a specific advantage to using BOS to lets say Solaris SMF or Linux SystemD ? [Tue May 13 11:28:13 2014] <[gorgo]> on solaris we run bosserver from smf [Tue May 13 11:35:36 2014] bos knows, among other things, how to stage interactions between fileserver and salvager [Tue May 13 11:49:58 2014] and also, using bos prevents systemd seeping into everything... :-) [Tue May 13 11:50:08 2014] heh [Tue May 13 11:50:19 2014] I take it you're a big fan of Debian's recent decision [Tue May 13 11:51:10 2014] * uses a distor with systemd and I am still trying to get used to it. It does seem to have become all pervasive/invasive 'tho. [Tue May 13 11:51:26 2014] oopsie: s/distor/distro/ [Tue May 13 11:53:26 2014] this is going to solve everything!: every init replacement ever [Tue May 13 11:53:46 2014] <[gorgo]> yep, we understand, you get distorted [Tue May 13 11:54:40 2014] * will just quote his recent comment elsechannel: [Tue May 13 11:54:42 2014] [13 15:48] emacs won, but they call it systemd now [Tue May 13 11:54:42 2014] [13 15:48] >.> [Tue May 13 11:56:24 2014] * will just quote hers: we should have at least gone with smf, that way the remote management layer is done. it's called "jabber" [Tue May 13 12:02:11 2014] right, so there are good reasons to use bos, good to know [Tue May 13 12:10:11 2014] [gorgo]: any chance your willing to share you SMF manifest? :-) [Tue May 13 16:36:03 2014] geekosaur: if only, https://www.gnu.org/software/dmd/ [Tue May 13 16:36:31 2014] systemd is to dmd as one of those dos emacs clones that weren't programmable is to emacs [Tue May 13 16:37:03 2014] I really hope I can still use runit in Jessie [Tue May 13 16:38:19 2014] unknown_lamer, the conversation I pulled that from, it was close enough [Tue May 13 16:38:33 2014] if I'd really wanted to be snarky I would have noted that it was vim, the vi that thinks it's emacs [Tue May 13 16:39:10 2014] ha, I loved it when vim sprouted a programming language [Tue May 13 16:39:34 2014] vimscript was someone trying to prove that elisp is not the worst extension language in existence >.> [Tue May 13 16:39:35 2014] editor wars seemed to quiet down a bit then [Tue May 13 16:59:46 2014] Christ, vimscript. I love me some vim, but vimscript makes me feel abused. [Wed May 14 14:13:05 2014] so, getting a really really really really stalled copy of data from windows client to afs (1.7.29 client)... estimating 19 hours to copy 520 MB [Wed May 14 14:13:26 2014] tcpdump is showing rx acks with reason: delay from the copy to fileserver [Wed May 14 14:13:32 2014] thoughts? [Wed May 14 14:17:32 2014] <[gorgo]> do you see actual data packets on the wire? [Wed May 14 14:17:45 2014] I think so [Wed May 14 14:17:57 2014] cnfs1.ad.cnf.cornell.edu.afs3-callback > smoke.cnf.cornell.edu.afs3-fileserver: rx data (1436) [Wed May 14 14:19:09 2014] well, packets of RX Type "data" [Wed May 14 14:21:38 2014] The AFS "service" is Service: Encrypted File Server Request [Wed May 14 14:24:19 2014] it's actually slowly slowly slowly getting data across it would appear [Wed May 14 14:32:30 2014] <[gorgo]> I would certainly look at a larger part of the log with wireshark [Wed May 14 14:32:49 2014] <[gorgo]> it would immediately show if there are any retransmits [Wed May 14 14:33:16 2014] <[gorgo]> if not, you could dig in to the packet contents [Wed May 14 14:42:08 2014] like what? [Wed May 14 14:42:18 2014] like what parts of the content? [Wed May 14 14:44:03 2014] rxdebug will show you the number of retransmits [Wed May 14 14:45:38 2014] running it on the windows client? [Wed May 14 14:49:38 2014] on client, ran rxdebug localhost 7001 -rxstat -noconn [Wed May 14 14:49:45 2014] looks like only 7 resends [Wed May 14 14:51:47 2014] So not resends, then. If you take a packet dump of a connection that's running slowly and share it with your support provider (it will be huge, and may contain sensitive data), they should be able to tell you what's causing that RX channel to run slowly. [Wed May 14 14:52:37 2014] well, *I* am the support provider here :( [Wed May 14 14:52:44 2014] unless you are referring to afs support, specifically [Wed May 14 15:02:53 2014] no packet fragmentation [Wed May 14 15:03:34 2014] I wonder if there's ttcp for windows [Wed May 14 16:37:31 2014] hrm... receive window of 32 packets... looks like each packet on the capture is separated by about 0.1 seconds [Thu May 15 11:19:29 2014] secureendpoints, sxw: that's the business subcommands of pts done: http://git.infradead.org/users/dhowells/kafs-utils.git/shortlog [Thu May 15 11:19:47 2014] I haven't yet done the interactive/bulk mode commands [Thu May 15 11:20:39 2014] dhowells: exciting! Will look more later. [Thu May 15 11:36:59 2014] secureendpoints: I can't help but feel that the AFS utility suite is a prime candidate for having a bash completion script [Thu May 15 12:35:31 2014] that's for sure [Thu May 15 12:36:17 2014] dhowells: Whoo! [Thu May 15 16:48:43 2014] There's no danger in having a VL/PR-DB system that's not publicly addressible so long as it's not the lowest IP address and not listed in CellServDB or DNS, right? [Fri May 16 20:18:42 2014] Can somebody help me figure out why my backup (mere clone) VLDB is not synchronizing to the main? You can see the master at 128.220.251.36 and the clone at 128.220.251.38. I've already "bos restart"ed it to no effect. [Fri May 16 20:31:45 2014] The VLLog on the slave is empty other than the startup message. The last log entry in the master is a curious "Fri May 16 18:37:25 2014 ubik: A Remote Server has addresses: Fri May 16 18:37:25 2014 128.220.251.38 Fri May 16 18:37:25 2014" which is oddly formatted but contains the right substrings. :) [Fri May 16 21:04:16 2014] looks like the master knows about the clone, but not vice versa [Fri May 16 21:04:29 2014] what's your cellservdb look like on the clone? [Fri May 16 21:05:37 2014] nwf: ^ [Fri May 16 21:06:18 2014] mvitale: Just the three lines ">acm.jhu.edu" "128.220.251.36 # afs0.acm.jhu.edu" and "[128.220.251.38] # chicago.acm.jhu.edu" . [Fri May 16 21:07:34 2014] hmm [Fri May 16 21:07:43 2014] That's what I said, too. :) [Fri May 16 21:08:23 2014] looking... [Fri May 16 21:08:37 2014] Don't look too hard; if I have to I'll pave the clone and start over. [Fri May 16 21:09:09 2014] This worked fine up until this evening, but the clone machine has been experiencing huge disk IO latency spikes due to a... well, "bug"/"misfeature" in its ZFS stack. [Fri May 16 21:09:27 2014] oh. [Fri May 16 21:10:07 2014] (ZFS for some reason believes that "10% of memory" is a perfectly fine amount of dirty data to stick in RAM, and it might be, except that it results in the ZFS transaction group belt stalling for minutes at a time while the disks absorb all the data.) [Fri May 16 21:10:37 2014] (I've patched the kernel to make that tunable, but it's only tunable at reboot and I have a 'vos release' that's been running for days that I don't want to interrupt.) [Fri May 16 21:10:39 2014] ubik doesn't like that at all [Fri May 16 21:10:52 2014] I believe that; we've been seeing the service flap like crazy. [Fri May 16 21:10:59 2014] But still, it shouldn't stay dead forever, should it? [Fri May 16 21:11:06 2014] no [Fri May 16 21:11:22 2014] what's your platform? [Fri May 16 21:11:32 2014] and afs release? [Fri May 16 21:11:49 2014] zfs, so I presume solaris? [Fri May 16 21:11:55 2014] but can't assume.... [Fri May 16 21:11:56 2014] Master is Ubuntu 1.6.1 (not for long, but we haven't moved it to a different machine yet; sorry); clone is FreeBSD 10-STABLE. [Fri May 16 21:12:33 2014] No, I do like to run exotic configurations... in theory it would all work out great. =P [Fri May 16 21:13:40 2014] so the clone's vldb is on a troublesome zfs? [Fri May 16 21:13:45 2014] Yes. [Fri May 16 21:13:53 2014] ok, stand by [Fri May 16 21:13:58 2014] 100 CONTINUE [Fri May 16 21:17:47 2014] sorry, had to move to my office, the house was too noisy [Fri May 16 21:19:12 2014] That was a very short move. :) [Fri May 16 21:19:31 2014] office is subset of house ;-) [Fri May 16 21:19:34 2014] BTW if you have better things to be doing on a Friday night, please, by all means, this can wait, though I appreciate the look. [Fri May 16 21:19:49 2014] no, nothing good on the tube [Fri May 16 21:20:01 2014] so I might as well help somebody out [Fri May 16 21:20:37 2014] While I've got you on the horn, then, any patches for the -stayonline thing you'd like me to be testing? :) [Fri May 16 21:21:18 2014] completely queued up behind other work [Fri May 16 21:21:30 2014] Understood in full, just thought I'd ask. [Fri May 16 21:21:48 2014] but not forgotten, I was just staring at the list in despair today [Fri May 16 21:22:02 2014] and the stayonline thing caught my eye [Fri May 16 21:22:05 2014] Don't despair! At least, not on my behalf. [Fri May 16 21:22:37 2014] okay, back to beloved ubik [Fri May 16 21:22:58 2014] "beloved" [Fri May 16 21:23:07 2014] I'm going to have to remember that euphemism. [Fri May 16 21:25:39 2014] okay, do you have pstack on freebsd? [Fri May 16 21:26:20 2014] and what is your version of OpenAFS? [Fri May 16 21:26:36 2014] No; sadly the ports system believes pstack is only available on i386, not amd64. [Fri May 16 21:26:47 2014] that's ok [Fri May 16 21:27:10 2014] FreeBSD ports packaging claims openafs-1.6.7.20130128 . [Fri May 16 21:27:38 2014] I hope that corresponds closer to something meaningful than Ubuntu's packaging "scheme". [Fri May 16 21:27:43 2014] okay, good, so you got the package somewhere, you didn't build it yourself... [Fri May 16 21:28:23 2014] oh, why am I asking you the version, I can ping it myself. I've already issued a few udebug commands to each server. [Fri May 16 21:28:24 2014] Well, no, sorry; I did build it myself, but it's their source. [Fri May 16 21:28:31 2014] ah ok [Fri May 16 21:28:47 2014] I wonder if it has debug symbols... [Fri May 16 21:28:49 2014] what configure options did you set? [Fri May 16 21:29:48 2014] host12:~ mvitale$ rxdebug 128.220.251.38 7003 -version [Fri May 16 21:29:48 2014] Trying 128.220.251.38 (port 7003): [Fri May 16 21:29:48 2014] AFS version: OpenAFS 1.6.7 built 2014-04-14 [Fri May 16 21:29:48 2014] host12:~ mvitale$ rxdebug 128.220.251.36 7003 -version [Fri May 16 21:29:48 2014] Trying 128.220.251.36 (port 7003): [Fri May 16 21:29:48 2014] AFS version: OpenAFS 1.6.2-1+ubuntu2.1-debian built 2013-07-24 [Fri May 16 21:29:48 2014] host12:~ mvitale$ [Fri May 16 21:30:38 2014] Let's see... looks like --prefix=/usr/local --localstatedir=/var --mandir=... --with-bsd-kernel-...=... --enable-debug --enable-debug-kernel --enable-debug-lwp --include-dir=... --disable-fuse-client [Fri May 16 21:31:30 2014] Oh... that's a possibility. We're using supergroups and have groups as group owners; everything seems fine on the master but would that cause the DB to not replicate? [Fri May 16 21:32:01 2014] I think that would only affect ptserver, but let me double check [Fri May 16 21:32:46 2014] Yeah, and the ptserver's fine. Hm. I think this sync-failre happened several hours after the group changes I made, anyway... [Fri May 16 21:34:53 2014] Oh, here's another thought: RX deadlock? rxdebug says master si receiving, has_output_packets while slave is receiving and receive_done. [Fri May 16 21:35:18 2014] on the clone - do you only have vlserver, no ptserver? [Fri May 16 21:35:24 2014] No, we have both. [Fri May 16 21:36:44 2014] And sorry, I misread the rxdebug output. The slave connection matching the master's is "mode: unknown" while there's another call on the slave that seems to be at eof on the master. [Fri May 16 21:40:05 2014] oh, yeah, I meant to get back to that rxdebug, I was looking at that and then started asking for pstack. [Fri May 16 21:40:26 2014] here's another thing we can do to get more info - jack up the debug level on the clone's vllog [Fri May 16 21:40:56 2014] kill -TSTP or whatever the FreeBSD equivalent is [Fri May 16 21:41:46 2014] db server logs are noisy, so it can use a lot of disk space if you leave it turned up all the time but we'll just do it temporarily [Fri May 16 21:42:28 2014] each time you issue the kill it increases the loglevel to another power of 5 - 0 is default, then 1, then 5, then 25, then 125 (highest) [Fri May 16 21:42:39 2014] kill HUP to reset it to 0 whenever [Fri May 16 21:42:42 2014] Ah ha. [Fri May 16 21:43:15 2014] so let's go nuts and issue the kill TSTP 4x to get us to 125 [Fri May 16 21:43:31 2014] Just a moment; we're caught behind a ZFS transaction group. 309 MB ... 297 ... [Fri May 16 21:43:33 2014] and then put the vllog up at a pastebin site somewhere [Fri May 16 21:45:06 2014] howdy Daria [Fri May 16 21:46:12 2014] mvitale: Unfortunately, upping the debugging level seems to have tickled something and we have a good sync. [Fri May 16 21:46:24 2014] I'll pastebin the log anyway, but to my uninformed eye it looks quite boring. [Fri May 16 21:46:59 2014] I'll let you in on a little secret - they also appear boring to an informed eye. [Fri May 16 21:47:18 2014] http://pastebin.com/5t27PZ9q [Fri May 16 21:47:40 2014] The only noteworthy feature is the gap between the "Starting AFS vlserver 4" timestamp and the next one. [Fri May 16 21:48:39 2014] well, that's normal, there's usually very little in a debug level 0 vlserver log [Fri May 16 21:48:59 2014] but I see that it did finally synchronize. [Fri May 16 21:49:20 2014] checking udebug again [Fri May 16 21:49:27 2014] In general we see "Synchronize database with server 128.220.251.36 failed (error = 1)" and "Synchronize database completed" messages even at level 0. [Fri May 16 21:49:42 2014] (As well as the message at the start of the synchronization) [Fri May 16 21:49:54 2014] yes [Fri May 16 21:49:54 2014] At least, VLLog.old has those messages. [Fri May 16 21:50:21 2014] in vlllog.old - are the all failed w error = 1? [Fri May 16 21:51:30 2014] Yes. [Fri May 16 21:51:43 2014] yeah that was your trouble - let me see if I can figure out what that is [Fri May 16 21:52:17 2014] It usually resolves itself, announcing a little while later that another sync has started. [Fri May 16 21:52:29 2014] well, not ALL of your trouble, but a good bit of it. [Fri May 16 21:53:17 2014] What does "error = 1" mean? [Fri May 16 21:53:29 2014] looking in the code now [Fri May 16 22:03:37 2014] there's a lot of paths... [Fri May 16 22:09:37 2014] I believ that. I've restarted the vlserver and I'll keep the debug level at 125. [Fri May 16 22:09:54 2014] ah, found a likely suspect [Fri May 16 22:10:20 2014] for code path to 1 that is - I have no way to prove that's what happened to you [Fri May 16 22:10:48 2014] but the other '1's I've looked at seemed unlikely to heal themselves spontaneously [Fri May 16 22:11:20 2014] The most likely candidates are either EPERM from fopen() or rx_Read() returning a short read resulting in BULK_ERROR. I place my bet on a short read. [Fri May 16 22:11:38 2014] yup, that's the one I just found [Fri May 16 22:11:48 2014] the EPERM was unlikely to go away by itself [Fri May 16 22:12:19 2014] so that's your ZFS woes, most likely [Fri May 16 22:13:25 2014] "pstack" look at procstat -kk [Fri May 16 22:18:56 2014] well, if your clone is syncd now, are you all set? [Fri May 16 22:19:50 2014] Yes. Like I said I'll keep debugging high for a while and see what happens. [Fri May 16 22:20:57 2014] kaduk, since this is vlserver, why procstat -kk (kernel threads) and not procstat -t? [Fri May 16 22:21:14 2014] (I know nothing about procstat or bsd, I'm just reading a man page) [Fri May 16 22:21:18 2014] mvitale: -t might be better; I was reading very quickly. [Fri May 16 22:21:21 2014] Or maybe -j. [Fri May 16 22:21:43 2014] okay [Fri May 16 22:22:07 2014] mvitale: I was using -kk a lot today, as I was having issues with the rxk_listener [Fri May 16 22:22:26 2014] it's LWP at any rate so you're only gonna see the running LWP even with -t [Fri May 16 22:22:41 2014] LWP, ugh. [Fri May 16 22:22:50 2014] agreed. [Fri May 16 22:25:52 2014] but if we had managed to catch it in the act, so to speak, I would expect to see an SDISK_ operation stuck on a disk IO [Fri May 16 22:28:06 2014] nwf: and all - signing off for the evening... [Fri May 16 23:43:04 2014] Running HEAD executables in a 1.6-branch environment is rough. Is the future going to involve asetkey or is that a HEAD thing that's going away? [Fri May 16 23:43:22 2014] The future is going to involve asetkey. [Fri May 16 23:43:30 2014] OK [Fri May 16 23:43:53 2014] The current thought is that asetkey could grow a subcommand to generate a random key (of type afsconf_rxgk). [Fri May 16 23:47:09 2014] What on earth does "Raw keys for afsconf_rxkad_krb5 are unsupported" mean? [Fri May 16 23:47:30 2014] Er, where is that text from? [Fri May 16 23:47:39 2014] HEAD's asetkey [Fri May 16 23:47:44 2014] Maybe I botched the invocation [Fri May 16 23:48:00 2014] Indeed, I left off an argument. [Fri May 16 23:48:15 2014] I think it means that you can't just pass in or specify the raw byte stream for an afsconf_rxkad_krb5 key; it must be extracted from a krb5 keytab. [Fri May 16 23:49:03 2014] I was trying to run asetkey add rxkad_krb5 $KVNO $ENCTY $KEYTAB $AFSPRINC, but I left off $ENCTY. [Fri May 16 23:49:11 2014] Apparently this confuses asetkey to no end. [Fri May 16 23:49:28 2014] Ah, that it will. [Fri May 16 23:49:35 2014] asetkey add has far too many different variations. [Fri May 16 23:55:13 2014] FWIW, I always have to look at the thing I wrote the first time I did that migration. http://qedragon.livejournal.com/99356.html [Fri May 16 23:55:29 2014] What is the difference between enctypes 16 and 7? [Fri May 16 23:56:07 2014] Do I really have to think about these things? ;) [Fri May 16 23:56:18 2014] No. [Fri May 16 23:57:22 2014] Oh, it probably has to do with the string-to-key. [Fri May 16 23:57:36 2014] Oh, is "kd" "key derivation" or something? [Fri May 16 23:57:44 2014] enctype 16 has a fully specified string-to-key, the other des3 flavors are probably "whatever the reference implementation does". [Fri May 16 23:57:48 2014] I think so. [Fri May 16 23:58:33 2014] So 7 is "Er, we don't know; just make sure it works with MIT." :) [Fri May 16 23:59:01 2014] 7 isnt even listed in krb5/krb5.h here. I guess it was a prerelease one or something? [Fri May 16 23:59:17 2014] (well, I'm on a Mac so this is presumably Heimdal) [Fri May 16 23:59:26 2014] IANA has it reserved but does not give a source. [Fri May 16 23:59:31 2014] http://www.iana.org/assignments/kerberos-parameters/kerberos-parameters.xhtml [Sat May 17 00:00:39 2014] /usr/include/krb5/krb5.h on OS X should still be MIT (i.e., made of lies). [Sat May 17 00:01:05 2014] it differs from the actual MIT one I have from macports though [Sat May 17 00:01:15 2014] 7 stil lmissing but all the stuff around it marked deprecated [Sat May 17 00:04:15 2014] /opt/local/libexec/heimdal/include/krb5_asn1.h: ETYPE_OLD_DES3_CBC_SHA1 = 7, [Sat May 17 00:04:42 2014] again suggesting an early version that was dropped [Sat May 17 00:05:09 2014] Cool, my custom-built vlserver bus errors. [Sat May 17 00:05:16 2014] On freebsd? [Sat May 17 00:05:32 2014] edit src/config/Makefile.config and append to XCFLAGS -mstack-realign [Sat May 17 00:05:41 2014] "or just use pthreaded-ubik" [Sat May 17 00:05:45 2014] Ah ha! [Sat May 17 00:06:52 2014] Thanks. [Sat May 17 00:08:58 2014] You're welcome! I spent far too much time tracking down what was going on. [Sat May 17 00:09:27 2014] <3 [Sat May 17 00:09:56 2014] I also think I've figured out the main thing wrong with the disk cache, and how to stop getting that scary warning on unmount. (I still need to clean up the patches and submit them, though.) [Sat May 17 00:10:32 2014] It's still pretty easy to mess up the rxk_listener thread, though; I should probably look harder at the upcall stuff that darwin uses. [Sat May 17 00:10:41 2014] Unmount? Who unmounts AFS? ;) [Sat May 17 00:11:05 2014] people for whom afs doesn't regularly panic the kernel :) [Sat May 17 00:15:30 2014] * makes note not to hurry bringing up the new fbsd vm... :p [Sat May 17 00:15:50 2014] Both the mentioned patches are quite small. [Sat May 17 00:15:57 2014] upcall would be a fair bit larger, of course [Sat May 17 00:16:53 2014] kaduk__: FWIW, --enable-pthreaded-ubik and rebuild was not sufficient; I'm kicking on -mstack-realign and make clean && make and will let you know. [Sat May 17 00:18:24 2014] Oh dear, -mstack-realign seems to be a GCCism. Do you know the clang equivalent? [Sat May 17 00:20:37 2014] Oops, I thought that was the clang version. [Sat May 17 00:21:08 2014] http://svnweb.freebsd.org/ports/head/net/openafs/Makefile?revision=350715&view=markup says that the hyphen should be omitted. [Sat May 17 00:21:13 2014] fwiw it seems to be... yes [Sat May 17 00:21:42 2014] Ah, thankee. [Sat May 17 02:42:31 2014] <[gorgo]> nwf: which vlsever version are you bilding with pthreaded ubik? iirc the 1.6 version is not quite safe yet for that [Sat May 17 08:59:57 2014] I've got a fileserver running built with gcc on illumos successfully so far; right now the DBservers for my cell are a couple debian VMs out in the wild. Is there any problem with having dbservers running differing OpenAFS versions? Will I run into trouble if I try to integrate a dbserver runing that new build into my existing cell? [Sat May 17 09:01:31 2014] kaduk__: Thanks for the help regarding asetkey; I finally got the whole conversion rxkad.keytab->KeyFileExt sorted out. [Sat May 17 09:59:53 2014] hile_: At this point, the dbservers from openafs are interoperable throughout the 1.4 and 1.6 series and head with no problems. [Sat May 17 10:00:26 2014] I will probably be submitting some changes that would require more consideration before running mixed-version with head, "soon", but the plan is to be able to do a live transition. [Sat May 17 10:01:21 2014] Well, my changes are older than anything you'd be submitting anyhow, so the status quo is fine for what I particularly need to test. [Sat May 17 10:02:25 2014] not that I should call them mine, since I was just trying to massage an old changeset from, I think, Chas. And then deason and chas submitted new patchsets overtop of my initial effort. [Tue May 20 19:11:05 2014] secureendpoints: I now have bash completion working: http://git.infradead.org/users/dhowells/kafs-utils.git/commit/0df4c2dea1d0f9872ab7a0451e6f4b2e5ab13d0c [Wed May 21 10:18:49 2014] could someone with github.com/openafs privileges push the main git repo into github? with tags? [Wed May 21 10:20:25 2014] * Crashes in libkrb5 of UNIX (not Linux) servers built against Heimdal. [Wed May 21 10:20:34 2014] Hmm, I wonder if that script didn't get ported over to the new machine. [Wed May 21 10:20:42 2014] (Well, s/script/git hook/) [Wed May 21 10:20:44 2014] will that include *BSD? [Wed May 21 10:20:50 2014] It's there. I guess it must be failing some how. [Wed May 21 10:20:57 2014] jakllsch: freebsd is unaffected [Wed May 21 10:20:59 2014] Not going to be able to look in the next month, though [Wed May 21 11:54:50 2014] Hmm, maybe the ssh key for github did not get put in the right place, then. [Fri May 23 07:57:11 2014] Hi I am trying to debug a problem concerning the rwm overlay; I have configured a second db on my ldap with back_ldap referencing the first but with a different suffix and am using the rwm overlay with suffixmassage to allow incoming searches for this suffix to be translated to the suffix of the first db [Fri May 23 07:58:06 2014] this seems to work when searching with every search base below that suffix [Fri May 23 07:58:18 2014] but not when using only the suffix as the search base [Fri May 23 07:59:20 2014] eg "ldapsearch -b 'dc=second,dc=db'" does not work, but "ldapsearch -b'ou=users,dc=second,dc=db'" does [Fri May 23 07:59:51 2014] whoops wrong channel, sorry .-/ [Mon May 26 08:13:47 2014] Hi all, im trying to do a 64bit build on a Solaris variant (OmniOS) [Mon May 26 08:14:02 2014] but for some reason my CFLAG variables are unset by the configure script [Mon May 26 08:14:15 2014] i have dump of config.log: http://pastebin.com/hhHbJdfW [Mon May 26 08:14:49 2014] and it looks as if the variables get set: ac_cv_env_CFLAGS_value=-m64 [Mon May 26 08:15:07 2014] but under OUTPUT VARIABLES, it says: CFLAGS='' [Mon May 26 08:15:37 2014] and it is also never included in the build, resulting in a nice 32 bit build [Mon May 26 08:15:51 2014] any thoughts? [Fri May 30 02:57:52 2014] hi all, any tips on what the "best" (TM) way is to have pts and LDAP to coexist? I need other stuff like display name etc from LDAP so can i sync ldap into PTS? [Fri May 30 03:00:04 2014] this: http://engr-apache01.engineering.iastate.edu/ptsldap/ looks nice btw [Fri May 30 03:00:51 2014] can't find the code anywhere though :( [Fri May 30 09:18:22 2014] KermitTheFragger: might want to ask around amongst the people at linux.iastate.edu particularly the ones involved in the engineering school. [Fri May 30 09:43:47 2014] I'm not sure engineering at iastate runs afs any more, although i may be confusing that with engineering at umich [Fri May 30 10:02:58 2014] kula: might be. I was mostly looking at the third-level domain there [Fri May 30 10:05:07 2014] if the Johns in the sidebar at linux.iastate.edu can't point one in the right direction, I'd say pack it in. [Fri May 30 10:08:22 2014] ptsldap was never put into production at iastate and the code was discarded [Fri May 30 10:09:47 2014] there have been many other pts<->ldap synchronization projects discussed at various best practice workshops, lisa bofs, and european afs conferences. to the best of my knowledge none of those projects published code that could be used elsewhere. to many homegrown features [Fri May 30 11:19:04 2014] hm, too bad, well thx for the insights though! [Fri May 30 13:17:47 2014] KermitTheFragger: I would be interested in seeing ptsLDAP made real; the code that exists, which may not be useful except as a starting point, is available at http://home.engineering.iastate.edu/~jedicker/ptsldap/ [Fri May 30 14:43:45 2014] I discovered today that xstat_cm_test built from 1.6.2 sources doesn't like talking to a 1.6.8 cache manager. [Fri May 30 14:44:01 2014] should this be expected? [Fri May 30 14:44:19 2014] did it say the sizes don't match? [Fri May 30 14:44:22 2014] yes [Fri May 30 14:44:28 2014] that happens sometimes [Fri May 30 14:44:43 2014] in general you should keep them matching [Fri May 30 14:44:50 2014] I think I categorize that as "unsurprising, but not exactly expected". [Fri May 30 14:45:22 2014] * nods [Fri May 30 14:45:50 2014] the Nagios server just doesn't get AFS client updates as fast as our workstations. [Fri May 30 14:46:05 2014] ah [Fri May 30 14:46:38 2014] * has a plugin that monitors cache ratios on shared remote desktop sites to make sure stuff isn't looking ugly [Fri May 30 14:47:29 2014] whenever we add new metrics to those xstats RPCs, they are supposed to be tacked on the end [Fri May 30 14:47:44 2014] ah, I see [Fri May 30 14:47:53 2014] so "** Data size mismatch in performance collection!** Expecting 759, got 775" is what you'd get. [Fri May 30 14:48:12 2014] I suppose I could bump up the client version on the nagios server, it doesn't need to reboot any time soon. [Fri May 30 14:48:13 2014] so I suppose the xstat_cm_test program could be made more tolerant of extra unexpected metrics (from a newer server) [Fri May 30 14:48:53 2014] it wouldn't know what they were, but it could discard them with a warning [Fri May 30 14:48:55 2014] it's really nice to be able to monitor stuff like dcacheMisses and vcacheMisses with something that makes pretty graphs [Fri May 30 14:49:15 2014] agreed [Fri May 30 14:50:52 2014] I could probably just drop the xstat_cm_test someplace on the monitoring server, and point the check at it instead of /usr/bin/xstat_cm_test [Fri May 30 14:51:19 2014] yes, that's a good workaround - the utility is standalone [Fri May 30 14:52:27 2014] are metrics always added, and not removed? [Fri May 30 14:52:46 2014] I'm wondering if teh more 'tolerant' one would only fail if it got less than it expected. [Fri May 30 14:52:47 2014] if they are removed, a placeholder is supposed to be left [Fri May 30 14:52:55 2014] so nothing "moves" in the wire format [Fri May 30 14:52:59 2014] * nods [Fri May 30 14:58:54 2014] We have a Linux openafs server/client which works fine (outside of a container). There is an app in an lxc container on the machine. Should the lxc container be able to use afs too? [Fri May 30 15:00:18 2014] the pag stuff will probably break [Fri May 30 15:00:19 2014] We found the afs client hanging on the host machine, which I strongly suspect is due to the client running on the container [Fri May 30 15:00:50 2014] oh, you have a *client* running inside the container? [Fri May 30 15:00:51 2014] hrm. [Fri May 30 15:00:57 2014] And I noticed "can't get dentry", which apt-get remove linux-libc-dev:amd64 qstat --purge [Fri May 30 15:01:01 2014] Oops [Fri May 30 15:01:43 2014] And I noticed "can't get dentry", which http://lists.openafs.org/pipermail/openafs-info/2013-November/040271.html also had ("only appear if i try to start two afs daemon") [Fri May 30 15:02:40 2014] If its unlikely to be stable I can just prevent the lxc container from accessing afs, but it'd be quite handy if it could at least access certain globally readable shares [Fri May 30 15:04:43 2014] so, in a lxc container, you're still sharing kernel space [Fri May 30 15:04:52 2014] so I don't think you can run two afsd [Fri May 30 15:05:05 2014] you might be able to mount /afs inside the container though [Fri May 30 15:06:00 2014] billings: That would be useful. Bind mount /afs to /var/lib/lxc/blah/afs from outside the container perhaps? [Fri May 30 15:06:28 2014] I can't begin to fathom what that would mean from an authentication point of view though. [Fri May 30 15:06:40 2014] worth testing, at least. [Fri May 30 15:41:54 2014] billings: Empty /afs within the container. :-( [Fri May 30 15:51:56 2014] Hmm. picloud.com seem to be accessing openafs data within lxc containers, but I'm unclear how they are using aufs [Fri May 30 15:57:14 2014] <[gorgo]> ashl: theoretically you should be able to create pags and add tokens but you need the appropriate /proc entries also available inside the container [Fri May 30 16:01:00 2014] [gorgo]: Hmm. Well, I don't really need tokens, but for the sake of science I'll test it, thanks. [Fri May 30 16:07:17 2014] Irritating. The tokens binary is in debian's openafs-client package, but I suspect apt-get installing it will break the machine again (due to two client daemons) [Fri May 30 16:08:13 2014] You can drop an "exit 0" in something like /etc/defaults/openafs-client to prevent the maintainer scripts from trying to start the cachem manager, IIRC. [Fri May 30 16:08:51 2014] <[gorgo]> ashl: actually tokens is for listing tokens [Fri May 30 16:09:23 2014] <[gorgo]> for getting tokens you need aklog, which is in openafs-krb5, if I'm not mistaken [Fri May 30 16:09:42 2014] kaduk__: Thanks, but it doesn't look like there are /etc/default files for afs [Fri May 30 16:10:00 2014] [gorgo]: Yep, I'd got tokens, but I wanted to see evidence. [Fri May 30 16:10:33 2014] ashl: I think they are not present by default but are used if present. Look at /etc/init.d/openafs-client to see what it does? [Fri May 30 16:10:46 2014] For the benefit of googlers: An out of box Debian wheezy lxc container on a wheezy machine can access OpenAFS with a token [Fri May 30 16:11:06 2014] <[gorgo]> ashl: looking at the ubuntu initscripts, you'd need to edit /etc/openafs/afs.conf.client [Fri May 30 16:11:14 2014] kaduk__: It does source /etc/openafs/afs.conf [Fri May 30 16:11:29 2014] <[gorgo]> AFS_CLIENT=true [Fri May 30 16:11:33 2014] <[gorgo]> change this line [Fri May 30 16:14:42 2014] The only change to the lxc config is "lxc.mount.entry = /afs /var/lib/lxc/empire/rootfs/afs none bind 0 0". No kernel modules are required. openafs-krb5 is needed for aklog, and beware that permitting the afs client to start is bad. [Fri May 30 16:15:01 2014] #ircWiki ... [Fri May 30 16:16:47 2014] interesting, I've been meaning to try out AFS + LXC too [Fri May 30 17:46:34 2014] glah. yum does not like openafs 1.6.8 for fedora 19 x86_64 [Fri May 30 17:46:48 2014] "Package does not match intended download" [Fri May 30 17:47:06 2014] I've cleaned the metadata a couple of times as suggested by the error message [Fri May 30 17:47:40 2014] I will leave that one for ktdreyer. [Fri May 30 17:49:02 2014] * going to download one manually and see what it thinks it is [Fri May 30 17:54:01 2014] doesn't look particularly wrong to me [Fri May 30 18:10:27 2014] sounds like "createrepo" needs to be re-run? [Fri May 30 18:11:05 2014] I'm not sure who in the release-team generates the yum repodata [Mon Jun 2 10:08:14 2014] hi. Does anyone know of the best way to determine a bunch of volumes exist from a perl script, in a way that produces the least amount of volume server load? [Mon Jun 2 10:08:30 2014] I'm updating our password file generation script to check whether the user has an AFS homedir [Mon Jun 2 10:08:47 2014] and there are 100,000+ lines in the file [Mon Jun 2 10:09:00 2014] s/lines/passwd entries/ [Mon Jun 2 10:09:18 2014] I could probably narrow that down by eliminating accounts with no homedir [Mon Jun 2 10:10:32 2014] my first pass is to just to run vos examine for each one and check the exit code [Mon Jun 2 10:10:50 2014] which is faster than doing a stat() of the homedir [Mon Jun 2 10:11:00 2014] (although it doesn't guarentee it's mounted) [Mon Jun 2 10:11:05 2014] sorry, no [Mon Jun 2 10:11:21 2014] (we do have much smaller user base and each home directory is named after user...) [Mon Jun 2 10:11:37 2014] I don't run the cell where homedirs are [Mon Jun 2 10:11:45 2014] but it's a royal pain in the neck how they have it set up [Mon Jun 2 10:12:09 2014] its /afs/umich.edu/user/U/S/USERNAME, where the U is the first letter and S is the second letter in USERNAME [Mon Jun 2 10:12:34 2014] it's probably because of limits on the number of entries in a directory [Mon Jun 2 10:12:42 2014] yeah [Mon Jun 2 10:12:55 2014] but it means I can't use anything sane like LDAP and templates [Mon Jun 2 10:13:10 2014] (afs homedir isn't stored in LDAP) [Mon Jun 2 10:14:56 2014] vos listvldb. Save the output to a file and then work off that list. [Mon Jun 2 10:25:14 2014] good idea. [Mon Jun 2 10:28:55 2014] vos listvldb -quiet -nosort is pretty fast. [Mon Jun 2 10:45:18 2014] yup, that's considerably faster [Mon Jun 2 10:45:46 2014] still takes 6.5 minutes to run, but that's much better than the 20 minutes I was getting from a bunch of forked off subprosses [Mon Jun 2 10:45:54 2014] subprocesses [Mon Jun 2 13:20:40 2014] <[gorgo]> billings: it takes 6.5 minutes for vos listvldb ? how many volumes do you have? [Mon Jun 2 13:22:23 2014] <[gorgo]> some of the delay in vos listvldb is coming from running gethostbyaddr() on each volime site that you have. it is significant even if you have nscd, even more if you hit the dns each time. [Mon Jun 2 13:34:34 2014] [gorgo]: it takes a bit over 1.5 minutes for "vos listvldb -noauth -nosort -noresolve" to run, the rest of the time its other checks I'm running in the password file generation. [Mon Jun 2 13:34:56 2014] There are 307603 volumes, according to the last time I ran vos listvldb. [Mon Jun 2 13:37:26 2014] <[gorgo]> ah, ok, you're running with -noresolve [Mon Jun 2 13:44:50 2014] * nods [Tue Jun 3 00:11:21 2014] billings: Forgive me for asking, and asking late at that, but what are you doing with 307603 volumes? [Tue Jun 3 06:24:59 2014] nwf: there are quite a few installation with that many volumes (not ours). It is the consequences of 2-4 volumes per user usually. [Tue Jun 3 07:34:16 2014] or simply one volume per user over the course of 25 years of accumulated users [Tue Jun 3 12:26:06 2014] Will the rxgk patches going in allow different keys on the servers, or is that future work? [Tue Jun 3 12:27:35 2014] nwf: the hardest part of that will be specifying a vldb format extension to store the new keys. [Tue Jun 3 12:28:00 2014] nwf: for the moment, it is "future work". [Tue Jun 3 12:29:01 2014] Oh, is the VLDB being revised? I would have guessed that it'd have gone roughly the same way that e.g. SSH uses Kerberos, with principals like afsserv/reverse.dns.name@REALM or somesuch. [Tue Jun 3 12:29:35 2014] If the VLDB is getting a revision, will port numbers and IPv6 be added? :) [Tue Jun 3 12:30:29 2014] Well, there fundamentally needs to be a way for the shared key between the fileserver and the vlserver to be stored -- it's not a kerberos key. [Tue Jun 3 12:32:31 2014] I have been holding off on actually proposing a format revision because it seems wise to understand the current format before going around and changing things. [Tue Jun 3 12:34:06 2014] Forgive my ignorance, but why isn't the shared key a Kerberos key, or, I suppose, more generally, now, a GSSAPI mechanism name and appropriate private data? [Tue Jun 3 12:36:29 2014] I'm in a meeting, so this may not be fully helpful, but: [Tue Jun 3 12:37:58 2014] The control flow involves the rx client calling AFSCombineTokens() against the vlserver, and the vlserver creates and encrypts a token that will be usable against the fileserver. It is not appropriate to use anything GSSAPI in there, because it is a three-party operation, and the vlserver and fileserver do not have an existing relationship which the GSS initiator/acceptor could be shoehorned into. [Tue Jun 3 12:41:32 2014] That's a little terse for me to unpack, sorry, but I'll wait until after your meeting. :) [Tue Jun 3 12:41:48 2014] There's no reason for it to be a kerberos key, because there is no control flow wherein anyone would request some ciphertext from the KDC encrypted in the per-fileserver key. [Tue Jun 3 12:43:36 2014] To my (ignorant, again) hearing, it sounds like the vlserver is playing a role akin to the Kerberos TGS, tho? [Tue Jun 3 12:44:00 2014] nwf: there are many goals behind rxgk. one of which is not being tied to kerberos [Tue Jun 3 12:44:21 2014] another is being able to assert combined identities of users and devices [Tue Jun 3 12:44:28 2014] another is perfect forward secrecy [Tue Jun 3 12:44:57 2014] nwf: yes, the vlserver is performing a role similar to a kerberos TGS. [Tue Jun 3 12:45:17 2014] All of these sound desirable. [Tue Jun 3 12:45:32 2014] another is being able to close the cache poisoning attacks that are possible in NFS*, CIFS, AFS3+rxkad when Kerberos session keys are obtained and used for protecting kernel services [Tue Jun 3 12:49:16 2014] OK, but I think I am still confused. Why does the VLDB contain anything other than "public" material (e.g. kerberos principal name or actual public key for some other system) for file servers? I think I recognize the need for AFSCombineTokens() but it seems odd to me that the vlserver should fundamentally need shared secrets with the file servers? [Tue Jun 3 12:51:51 2014] The vldb is a convenient place to put data that must be accessible from all/any vlserver. The key information could certainly be stored in some other database, but would still need to be accessible to the vlservers. [Tue Jun 3 12:52:24 2014] It seems (again, possibly missing something) that "all" AFSCombineTokens() needs to do is seal (to a file server) an attestation of the tokens combined and provide a fileserver/client shared session key (ala TGS). I don't see why that sealing operation needs shared secrets? [Tue Jun 3 12:52:30 2014] The fileserver token is specified as a blob of (XDR-encoded) data, encrypted in the per-fileserver secret key (and with some key-management data). [Tue Jun 3 12:52:51 2014] In order to produce such a token, the vlserver must have the key used to encrypt the token. [Tue Jun 3 12:53:55 2014] Hm. I fear I am still missing something, and probably should not take your time to make you explain it. Which of the IDs should I go back and re-read in hopes that it sticks this time? :) [Tue Jun 3 12:53:57 2014] So, it's not as much a "shared secret" as that the vlserver needs to know the fileserver's secret long-term key. Much like the kerberos TGS. [Tue Jun 3 12:53:59 2014] nwf: the reason the vlserver (port/7003) is being used to host the independent rxgk rx service is because every organization that hosts AFS today and provides access to the public Internet makes that port available via the firewall to all server instances. [Tue Jun 3 12:54:26 2014] rx is a multi-service multiplexor [Tue Jun 3 12:54:34 2014] secureendpoints: I'm not (meaning to) dispute the need for AFSCombineTokens(), and putting that in the VLDB seems defensible for the reason you just gave. [Tue Jun 3 12:54:55 2014] I think I am just confused about the cryptographic reification of AFSCombineTokens(). [Tue Jun 3 12:55:06 2014] AFSCombineTokens is not absolutely required for rxgk but it is required to meet the goals I laid out. [Tue Jun 3 12:55:21 2014] Sure, understood. [Tue Jun 3 12:55:51 2014] AFSCombineTokens() accepts two independent tokens and combines them into a single token for the target service. [Tue Jun 3 12:55:55 2014] or server [Tue Jun 3 12:56:10 2014] I understand that at a high level. [Tue Jun 3 12:56:26 2014] In order for AFSCombineTokens() to work it must know the key being used by the target for the tokens being combined and the one being generated [Tue Jun 3 12:58:32 2014] you can think of it this way. the rxgk service has to be able to open two sealed envelopes, extract their contents, merge the contents one a single sheet, place it in a new envelope, and seal it such that the original target of both envelopes with trust the seal. [Tue Jun 3 13:02:16 2014] Sure. As far as I understand things, this means that the original two sealed envelopes will be encrypted under session keys shared with the vlserver (e.g. as given by the TGS), right? [Tue Jun 3 13:02:53 2014] not session keys, long term keys [Tue Jun 3 13:03:14 2014] Er, does that mean that the vlserver has everybody's long term keys? [Tue Jun 3 13:03:36 2014] All AFS servers' yes. [Tue Jun 3 13:04:06 2014] Sorry, I took secureendpoints's last statement to mean that it had client long term keys, too. [Tue Jun 3 13:06:09 2014] Again, is there something I should bugger off and go read before continuing to display my ignorance in so grand a way? :) [Tue Jun 3 13:08:34 2014] nwf: the are internet drafts that describe the protocols but not the architecture. I don't know what kaduk plans to implement for openafs and YFS has not published its architecture document for how rxgk is used internally. [Tue Jun 3 13:09:35 2014] what you can go back and find are the rxgk presentations dating back to 2005 which described what we intended it to be [Tue Jun 3 13:11:39 2014] Presentations at AFS & Kerberos Best Practices Workshops or elsewhere? [Tue Jun 3 13:11:49 2014] yes to both [Tue Jun 3 14:55:34 2014] nwf: meeting ended. What do you still want/need to hear? [Wed Jun 4 08:11:15 2014] hi, i beleive the package http://www.openafs.org/dl/openafs/1.6.8/SLE_11_SP3/x86_64/openafs-kernel-source-1.6.8-23.1.x86_64.rpm should not have kernel-devel as a dependency, but perhaps kernel-source [Wed Jun 4 08:11:40 2014] as suse 11.3 does not have a package named kernel-devel [Wed Jun 4 09:58:14 2014] joolz: I'm scheduled to look into the SuSE packaging at some point. I'm told the SuSE packaging is contributed [Wed Jun 4 10:23:07 2014] joolz: SLE 11 SP3 is not OpenSuse 11.3. For OpenSuse, try the packages from https://build.opensuse.org/package/show?project=filesystems&package=openafs16. OpenSuse 11.3 has been discontinued for some time now, though. [Thu Jun 5 02:02:46 2014] uschebit: I know, I'm talking about SLE[DS] 11 service pack 3. There is no package named kernel-devel which makes the packet i linked broken. the 1.6.6 package was ok, in fact it did not have any dependecies at all. [Thu Jun 5 02:03:33 2014] geekosaur: ok thanks. [Fri Jun 6 11:43:53 2014] this is really strange.. autocad is attempting to install itself (advertised shortcut, I guess) for an individual user then queries the free space on \\afs\cnf.cornell.edu (the share, not the drive letter, it would appear), and the installer fails due to lack of free space on that RO afs volume [Fri Jun 6 13:11:23 2014] that is strange [Fri Jun 6 13:12:03 2014] I seem to recall that secureendpoints has sent many messages to openafs-info on this/related topics. [Fri Jun 6 14:28:09 2014] from doing some googling, it would appear that windows installer does not like reparse points [Fri Jun 6 14:28:27 2014] I'm hoping I can just change from having the user's home folder as a path down to their afs directory as a direct mount [Fri Jun 6 17:08:05 2014] I used to be able to run /usr/afs/bin/salvager -showmounts -showlog on my servers at it would list out the volume mountpoints. Now I get an error that salvager is already running. Anyone know how do to this in the dafileserver era? [Fri Jun 6 17:24:49 2014] something like dasalvager -showmounts -showlog work ? [Fri Jun 6 17:25:47 2014] http://docs.openafs.org/Reference/8/dasalvager.html [Fri Jun 6 17:26:16 2014] I doubt that works [Fri Jun 6 17:27:25 2014] I think this is a known issue and the dafileserver folks are working on a solution? [Fri Jun 6 17:27:32 2014] (certainly it's a known issue) [Fri Jun 6 17:29:14 2014] I tried that. Instead of returning an error it logs essentially the same thing to the Log file. [Fri Jun 6 17:29:42 2014] salvagerserver doesn't support -showmount so it can't work [Fri Jun 6 17:42:01 2014] Is there some other way to get the info? Besides walking my AFS tree by hand? [Fri Jun 6 17:43:18 2014] volscan [Fri Jun 6 17:47:56 2014] unfortunately it's only available on master at the moment [Fri Jun 6 17:53:54 2014] you run a non-dafs file server, move your volumes, run salvager, and then move them back :) [Fri Jun 6 17:54:28 2014] That might be faster than walking the tree. [Fri Jun 6 17:57:10 2014] Or, I could snapshot the zfs backing store, mount the snapshots and in a chrooted environment run the salvager against the snapshots... Maybe that will work. [Fri Jun 6 17:59:02 2014] the volscan utility is standalone - you could build it yourself (or someone could build it for you) from master and it will work on older systems [Fri Jun 6 17:59:24 2014] older versions of OpenAFS, that is [Fri Jun 6 18:02:58 2014] ok, trying tat. [Fri Jun 6 18:03:00 2014] that. [Fri Jun 6 18:04:27 2014] it's in src/vol/vol-info.c [Fri Jun 6 18:04:31 2014] make volinfo [Fri Jun 6 18:05:12 2014] ah, sorry, make volscan [Fri Jun 6 18:05:44 2014] forgot volscan is separate now, it used to be "inside" volinfo [Fri Jun 6 18:06:03 2014] then # sudo volscan -info mount [Fri Jun 6 18:06:14 2014] it has to run on the fileserver in question [Fri Jun 6 18:08:42 2014] This all started this morning when someone suggested that I audit our AFS tree for confidential data. It trying to justify why that wasn't practical I discovered the -extended flag to vos exam. That gave me a list of files in the cell. Now I was just trying to verify that I didn't have any volumes that aren't mounted in the tree. [Fri Jun 6 18:09:51 2014] this will help you do that [Fri Jun 6 18:09:53 2014] Auditing cells for confidential data has been done in many cells. [Fri Jun 6 18:10:10 2014] volscan -help will give you more options [Fri Jun 6 18:10:22 2014] I typo'd the command above, should be: [Fri Jun 6 18:10:30 2014] sudo volscan -find mount [Fri Jun 6 18:11:32 2014] it doesn't create any fileserver or volserver load because it does read io directly against the vice partitions. [Fri Jun 6 18:12:40 2014] sorry, gotta step away - leave a question here or in pm if you like, I'll be back later [Fri Jun 6 18:17:00 2014] looks like I have to have the afs kernel module loaded for that to work. [Fri Jun 6 18:17:16 2014] The afs fileservers don't run the afs client code...Humm.. [Fri Jun 6 18:41:14 2014] no, it doesn't need the kernel module [Fri Jun 6 18:41:28 2014] nor a running cache manager [Fri Jun 6 18:46:57 2014] all you need are the volscan utility and some vice partitions [Sat Jun 7 02:32:43 2014] Dumb question for the channel; one of our mirror volumes (mirror.freebsd, in particular), which is quite small (<4GB), has been releasing for 45 minutes; 'vos status' indicates that over two million packets have been exchanged. I don't expect that kind of churn on a regular basis; is there enough information in 'vos exa' output to recreate the equivalent 'vos dump' command that 'vos release' is emulating so [Sat Jun 7 02:32:45 2014] I can see what on earth is so mammoth in the stream? [Sat Jun 7 02:37:26 2014] .... and the receiving davolserver just SIGBUS'd. Wahoo. [Sat Jun 7 02:37:42 2014] My inability to do anything, ever, with software is amazing. [Sat Jun 7 03:36:03 2014] Oh, goodie. My client is completely hosed; the RO replica of root.cell is down and I can see the RW host responding with "No such volume" even if I attempt to access the explicitly-RW path /afs/.acm.jhu.edu . [Sat Jun 7 03:38:40 2014] Well, it's persisting even after the RO replica is back. I guess a reboot is in my near future. [Sat Jun 7 14:10:29 2014] kaduk_: I am unable to update the ticket on RT, but you're absolutely right: FreeBSD host, built from ports, using clang. [Sat Jun 7 14:10:38 2014] Re-running 'vos release' (with ZFS deduplication turned off to reduce IO stalls) succeeded, so either this was transient corruption somewhere, a race condition of some form, a bug in buffer management, or something similar. [Sat Jun 7 14:15:13 2014] nwf: you should be able to reply to the email and have it DTRT. [Sat Jun 7 14:17:15 2014] Didn't get an email; filed it on the web as guest. Forgot to add email address and get EPERM when trying now. (Oops) [Sat Jun 7 14:17:30 2014] Ah. [Sat Jun 7 14:18:34 2014] nwf in the future send e-mail to openafs-bugs@openafs.org [Sat Jun 7 14:18:40 2014] Mail that includes the [rt.central.org #131872] tag in the subject should still get routed onto the bug. [Sat Jun 7 14:20:24 2014] kaduk_: Ah ha, OK. [Sat Jun 7 14:22:42 2014] kaduk_: "Message not recorded: Permission denied" :\ [Sat Jun 7 14:28:13 2014] I added nwf@cs.jhu.edu as a requestor on that ticket. [Sat Jun 7 14:28:35 2014] <3 ; will resend. [Sat Jun 7 15:56:10 2014] http://pastebin.com/g1Uxu6Vh ... I am completely mystified as to what 'vos release' just did. Somehow it rolled back to May 22nd when I had just finished a successful release last night? It then, understandably, I suppose, aborted? [Sat Jun 7 16:00:13 2014] looks like it thinks it's trying to finish up a pending release before doing the new [Sat Jun 7 16:00:33 2014] may 22 presumably being the date of the pending one that had originally failed to chicago [Sat Jun 7 16:02:23 2014] I mean, there may well have been a failed release on may 22nd, but I've subsequently -- earlier last night, even -- 'vos zap'-ed and 'vos release'-ed this volume a large number of times, ultimately getting the successful release pastebin'd. [Sat Jun 7 17:02:16 2014] FWIW, I just did the "zap; release; release" game again and got a much more correct-looking (time of last update) date stamp in the second release. [Mon Jun 9 08:17:06 2014] Hello, so I'm having a weird issue, I have a mount point in a replicated volume that is pointed to a replicated volume, the mount point when I release it does not end up creating a read-only mount in the read-only volume, why is this? [Mon Jun 9 08:17:27 2014] the mount point within the parent volume that is... [Mon Jun 9 09:00:12 2014] fs flush fixed my issue... sorry about that [Mon Jun 9 09:30:40 2014] fang64: yes, that's fairly typical [Mon Jun 9 09:31:23 2014] Walex2: had a moment where I forgot what I was dealing with :D [Mon Jun 9 09:31:26 2014] fang64: you would have seen the "mountpoint" from another client that hadn't cached that [Mon Jun 9 09:31:51 2014] well I was on a few different machines [Mon Jun 9 09:31:58 2014] the read-only mount point was replicated [Mon Jun 9 09:32:09 2014] but most of the clients had the read-write mount point cached [Mon Jun 9 09:32:25 2014] fang64: ah yes, that caching. [Mon Jun 9 09:32:41 2014] wasn't until I flushed it out that revealed it did work just I was getting thrown off by the client [Mon Jun 9 09:33:29 2014] fang64: IIRC another way to "reset" the client is to use 'fs newcell' with the same values as the current cell. [Mon Jun 9 09:34:34 2014] ah well I just ran fs checkvolume and fs flushall and viola [Mon Jun 9 12:43:44 2014] Can I convert a 'vos shadow'-ed copy of a RW mountpoint to a RO replica somehow? We have some anticipated downtime of the RW host sometime in the next few weeks and I'd like to keep RO replicas up during that time. [Mon Jun 9 12:47:11 2014] a RO replica and server B does not become inaccessible when server A which hosts the RW in unavailable [Mon Jun 9 12:48:20 2014] secureendpoints: Of course! But these are not 'vos addsite'/'vos release'-ed; they are 'vos shadow'-ed and so the shadows are not in the VLDB. [Mon Jun 9 12:51:35 2014] by writing the appropriate code you can do anything [Mon Jun 9 12:52:00 2014] That was not quite what I was asking, but I suppose it answers the question. :) [Mon Jun 9 12:52:24 2014] you are playing with fire [Mon Jun 9 12:52:44 2014] I suppose I should finish 'vos foreach' and then write the other half, 'vos mungevldb'. [Mon Jun 9 12:52:59 2014] vos corrupt-my-data [Mon Jun 9 12:53:34 2014] If I could, I would pay for RW replication for situations like this. =P [Mon Jun 9 12:54:13 2014] But as it stands, I cannot. I will just take the downtime for these volumes and continue treating the 'vos shadow'-ed copies as emergency backups from which to restore, rather than anything live. [Mon Jun 9 12:56:05 2014] In fairness, I don't need to write a tool to have vos do interesting things, let's not say corrupt, my data; the time-travelling 'vos release' experience was pretty entertaining. And non-repeatable. :D [Mon Jun 9 12:57:13 2014] There are only two possibilities there: [Mon Jun 9 12:57:13 2014] 1. old data was cached in the volserver [Mon Jun 9 12:57:13 2014] 2. the underlying disk storage gave you an old snapshot [Mon Jun 9 12:58:18 2014] So, I kind of believe option 1 might be at fault. There was a brief moment, a while ago, where it looked like ubik believed it was up to date and was still reporting stale data. "Occasionally" commands like "vos listvldb -locked" would give "dubious" answers. I did not debug further. [Mon Jun 9 12:59:26 2014] I keep meaning to write some supervisory scripts that log the sha1s of the various *DB files and watch for inconsistencies, but that's "way on the back-burner" territory. [Mon Jun 9 13:01:46 2014] Of course, it would also be good to get our Ubik masters off Ubuntu's packaging. [Mon Jun 9 13:05:47 2014] is there anything broken wrt ubuntu [Mon Jun 9 13:05:56 2014] is there anything broken wrt ubuntu's packaging of openafs? [Mon Jun 9 13:06:41 2014] nwf: I remember someone perhaps themselves posted something somewhere about "shadow", the link is in my AFS-hints page [Mon Jun 9 13:08:44 2014] nwf: just checked not there. I must have put it among my notes... [Mon Jun 9 13:09:23 2014] Walex2: No worries. I think secureendpoints has convinced me to just take the downtime; they're just "scratch" volumes, anyway. I was asking mostly because I realized I had the 'vos shadow'-ed copies around on a machine that wasn't going down. [Mon Jun 9 13:09:24 2014] sur5r: the ubuntu things are mostly OK. they were missing like the Debian ones a useful bug fix in 1.6.1 times but I haven't checked the 1.6.7 packages [Mon Jun 9 13:09:52 2014] nwf: from memory the discussion of 'shadow' was mostly "it is not finished and it is dangerous". [Mon Jun 9 13:10:15 2014] Oh, well then. [Mon Jun 9 13:12:14 2014] nwf: http://paste.debian.net/104158/ [Mon Jun 9 13:13:00 2014] nwf: there was more in mailing list post IIRC [Mon Jun 9 13:15:45 2014] Well, hopefully my backup script doesn't get the source wrong. :) [Mon Jun 9 13:16:07 2014] nwf: this is more on the subject, it may be the post I was vaguely remembering: http://lists.openafs.org/pipermail/openafs-info/2010-December/035133.html [Mon Jun 9 13:18:26 2014] nwf: actually I think the post I was thinking about is here: http://lists.openafs.org/pipermail/openafs-info/2013-August/039894.html and the 2012 thread it references [Mon Jun 9 13:19:01 2014] http://lists.openafs.org/pipermail/openafs-info/2007-June/026540.html too [Mon Jun 9 13:19:27 2014] it is a topic that recurs... because it is an interesting possibility [Mon Jun 9 13:21:46 2014] Yeah; we deployed our scratch volumes using RW/BK style setup and without explicit RW-forcing mountpoints, so moving to RW/RO and replication requires that we go munge a lot of mountpoints. All scriptable, just haven't, and so we're 'vos shadow'-ing these volumes. The non-scratch volumes we did set up as RW/RO with explicit RW-forcing mounts and so on. [Mon Jun 9 13:42:24 2014] any known issues with Outlook 2010/2013 data files living in AFS? [Mon Jun 9 13:46:13 2014] there are known issues with Outlook files on any network file system [Mon Jun 9 13:47:07 2014] Outlook databases are cached copies of the data from the Exchange server. They should not be stored on network devices because accessing the files over the network is more expensive than reading the data from Exchange [Mon Jun 9 14:08:02 2014] well, that "cached copies" bit only applies if one is using an exchange server [Mon Jun 9 14:08:14 2014] if the only issue is expense, that's fine [Mon Jun 9 14:08:57 2014] you should read Microsoft's tech notes [Mon Jun 9 14:09:33 2014] really, I only care about the address book stuff getting stored in AFS [Tue Jun 10 14:21:45 2014] This might sound like a strange question is there a rolling release repository for OpenAFS? for the various distros, like el6/latest? [Tue Jun 10 14:26:20 2014] fang64: I don't think openafs.org provides that. ELRepo does, on the other hand [Tue Jun 10 14:42:34 2014] ktdreyer: thanks, I was curious I've been using Katello I probably wont do that, and just end up sync'ing each repository separately. I guess it's nice that they are kept separate as well at least for RHEL's sake. [Tue Jun 10 14:50:19 2014] hunh... RHEL7 is out [Tue Jun 10 15:56:35 2014] RHEL 7 GA is out yup [Thu Jun 12 10:07:23 2014] hey, I am working on switching our auth no krb5 with the rxkad file. Do all the servers need to be at 1.6.5 or later for this to work or can I stage the switchover by switching a few machines a day until I get to all of them? I am still running kas so users can access their files that way in the meantime. Thanks! [Thu Jun 12 10:11:28 2014] all servers must be at least 1.6.5 and all of them must be switched together or they won't be able to communicate [Thu Jun 12 10:12:05 2014] ok, thanks, I will have to schedule out a block of time to do it then [Thu Jun 12 15:00:45 2014] I was trying to do this http://docs.openafs.org/QuickStartUnix/DAFS004.html but I want to double check what the options were for fileserver and volserver, is this stored anywhere? The guy who usually manages the cell is out. [Thu Jun 12 15:01:32 2014] BosConfig [Thu Jun 12 15:07:18 2014] thanks! [Thu Jun 12 15:07:41 2014] I was looking all through the logs, didn't think to look in local [Fri Jun 13 08:51:58 2014] oh this is interesting... with folder redirection, if I change a user's home directory from \\afs\cell\path\to\home\folder to \\afs\cell#home.volume, Windows disappears all the user's data [Fri Jun 13 08:53:14 2014] foler redirection... a path to struggle... [Fri Jun 13 08:53:28 2014] less painful than roaming profiles [Fri Jun 13 08:53:31 2014] until now [Fri Jun 13 21:41:04 2014] Is it possible that the UNIX cache manager becomes overly attached to a R/O replica and won't let go even after a 'vos remsite' and 'fs flushall' and 'fs checks' and 'fs checkv'? [Fri Jun 13 21:42:17 2014] of those commands the only one that matters to the client is fs checkvolume [Fri Jun 13 21:50:06 2014] That's good to know. [Fri Jun 13 21:50:44 2014] In this case, tho', I retract my question; the server really was unreachable and I was just seeing the effects of caching. :) [Fri Jun 13 22:06:05 2014] Oh, a different question: if I run HEAD servers, am I vulnerable to the 1.6.9 issue until http://gerrit.openafs.org/#change,11287 or equivalent goes in? [Fri Jun 13 22:06:39 2014] Yes. [Fri Jun 13 22:07:13 2014] Also, "wait, I thought I was the only one running HEAD servers." (Well, HEAD+.) [Fri Jun 13 22:09:25 2014] I, uh, happened to have HEAD checked out when I made it build on FreeBSD/sparc64 and have just been rolling with it ever since. [Fri Jun 13 22:09:40 2014] ugh [Fri Jun 13 22:10:07 2014] there is no significant testing of the code on HEAD [Fri Jun 13 22:10:13 2014] well, master [Fri Jun 13 22:10:28 2014] Yes, acknowledged. The ACM servers stay back on release. :) [Fri Jun 13 22:10:58 2014] HEAD can refer to "master" or the production branch "openafs-stable-1_6_x". Which branch are you referring to? [Fri Jun 13 22:11:07 2014] Sorry, I meant "master". [Fri Jun 13 22:11:13 2014] double ugh [Fri Jun 13 22:11:19 2014] I figure the amount of damage that a server that's used by only yours truly for not a whole lot is... minimal. [Fri Jun 13 22:14:07 2014] Besides, this way I get to play with rxgk as soon as it lands, right? :) [Fri Jun 13 22:14:50 2014] no [Fri Jun 13 22:24:35 2014] It's going in on its own branch? [Fri Jun 13 22:24:55 2014] No. [Fri Jun 13 22:25:12 2014] But, see http://wiki.openafs.org/OpenAFS18Notes/ [Fri Jun 13 22:27:55 2014] kaduk_: Thanks for the heads up, but I am not sure I am not sure I understand the implication for someone riding master. [Fri Jun 13 22:29:48 2014] "Things may get more bumpy than they already are." [Fri Jun 13 22:32:08 2014] OK [Sun Jun 15 13:14:08 2014] Hi guys, looking to setup a new OpenAFS server to distribute some samba shares to some small remote offices. I want to use centos, but I am unable to find a good installation guide, can anyone point me in the right direction? [Sun Jun 15 13:17:45 2014] <[gorgo]> how would you want to distribute samba with openafs? [Sun Jun 15 13:23:57 2014] [gorgo]: good question, I think its enough to have a local samba on the client, and use afs as a transaction layer. Does that make sense? [Sun Jun 15 13:25:05 2014] My goal is to put all the files in our DC and do all backing up and everything there, and then let the offices only have a openafs to connect to, and get the files that they need. [Sun Jun 15 13:26:39 2014] I dont like to use samba over vpn and so on, sometimes the offices has really bad connections, and it's not good for anyone, neither is it good to have local file servers that need service and backup.. [Sun Jun 15 13:53:25 2014] <[gorgo]> rickard: I don't really understand what you're after. if you want samba client, you'll need a samba server. that's not openafs [Sun Jun 15 13:54:42 2014] <[gorgo]> the old windows afs client used to be a samba server on a loopback adress, and the windows samba client connected to that loopback address to access afs space [Sun Jun 15 13:56:07 2014] <[gorgo]> but the 1.7 series windows client for openafs is a properly integrated filesystem driver, there's no need for protocol translation [Sun Jun 15 13:57:55 2014] <[gorgo]> so if you want to access remote fileservers, and you don't want to use samba as the protocol, you can do that with openafs, but then you'll need afs both on the servers and the clients [Sun Jun 15 14:11:16 2014] <[gorgo]> also you need to be aware that you can't just reexport existing filesystems with afs [Sun Jun 15 15:13:17 2014] [gorgo]: Maybe its a bit unclear. I want for the local computers to have a local server to access (smb). I want that server to be synced/backed with afs to a central afs-server. [Sun Jun 15 15:45:16 2014] <[gorgo]> I guess that means you'd want only RO access to this data [Sun Jun 15 15:46:14 2014] <[gorgo]> you could deploy an afs server locally, which would have RO replicas [Sun Jun 15 17:41:44 2014] gorgo: you can create an NTFS symlink on a local disk to \\afs\cellanme\some-path\ and then instruct the Windows SMB Server to export it. However, there is nothing that is going to fetch or associate AFS tokens with the thread issuing the call to AFS in order to associate the impersonation identity with the AFS ID for authorization purposes. I could write a filter driver that could communication with a server to obtain cred [Sun Jun 15 17:45:02 2014] The architecture that richard wishes to implement is what the original IBM AFS Client for Windows provided. A single machine was a Gateway Server and the rest of the clients communicated with it using the Windows SMB redirector and forwarded tokens using Windows RPC services. [Sun Jun 15 18:54:46 2014] secureendpoints1: you know that your messages appear truncated at 430 characters, right? "[...] with a server to obtain cred" was truncated. [Sun Jun 15 19:09:46 2014] that happens a lot, secureendpoints1 doesn't use a client that knows how to split IRC messages to the 512 byte limit (minus nick, channel, protocol overhead) [Mon Jun 16 09:46:35 2014] ok, servers all at 1.6.9 [Tue Jun 17 09:47:34 2014] Hey, are there plans to build the rhel6 packages for 1.6.9? [Tue Jun 17 10:26:15 2014] greenmanspirit - I am sure there are. :) [Tue Jun 17 10:26:30 2014] I built my own for 64-bit rhel6 from the source package, you are welcome to them [Tue Jun 17 20:25:06 2014] Hey, I’m getting a bad encryption type from Kerberos when AFS tries to connect — did they go and nuke the weak encryption in a recent version that AFS (at least, used to) use? [Tue Jun 17 21:23:26 2014] not enough information [Tue Jun 17 21:24:10 2014] 1.6.5 enabled use of stronger encryption for server connections and use of non-DES keys in aklog (the cache manager still uses a variant of DES); but you would specifically have to enable that [Tue Jun 17 21:27:08 2014] I see. [Tue Jun 17 21:27:59 2014] moreover even configuring that would not disable DES *unless* you removed the KeyFile when setting up rxkad.keytab [Tue Jun 17 21:28:36 2014] I was meaning if the kerberos folks got rid of the DES support altogether [Tue Jun 17 21:28:50 2014] *and* also specifically either removed DES from the cell key in the KDC, or reconfigured the KDC to block weak crypto (far more common as it's painful to remove enctypes) [Tue Jun 17 21:29:03 2014] kerberos defaulted DES to disabled years ago [Tue Jun 17 21:29:22 2014] `allow_weak_crypto = true` needed in the [lindefaults] section of /etc/krb5.conf [Tue Jun 17 21:29:27 2014] er [Tue Jun 17 21:29:30 2014] [libdefaults] [Tue Jun 17 21:29:38 2014] yeah, that’s in there. [Tue Jun 17 21:29:52 2014] which is why I’m at a loss as to why it’s not working. :p [Tue Jun 17 21:29:56 2014] It used to work! [Tue Jun 17 21:32:43 2014] when you say "when AFS tries to connect", you mean what exactly? aklog? something else? [Tue Jun 17 21:34:08 2014] Jun 17 21:33:43 regina krb5kdc[12331](info): TGS_REQ (4 etypes {18 17 16 23}) : BAD_ENCRYPTION_TYPE: authtime 0, amanda@DARKDNA.NET for afs/darkdna.net@DARKDNA.NET, KDC has no support for encryption type [Tue Jun 17 21:37:58 2014] mrrr. presumably that means you have DES but aklog is not requesting DES [Tue Jun 17 21:38:10 2014] I see [Tue Jun 17 21:38:23 2014] but aklog should not be constraining the types it requests [Tue Jun 17 21:39:08 2014] the only change that was made there is that it no longer insists on DES, and if it is not given DES it will apply rxkad-kdf to derive a DES key for the cache manager [Tue Jun 17 21:39:31 2014] if it's not requesteding DES at all, that sounds like the allow_weak_crypto thing [Tue Jun 17 21:39:56 2014] allow_weak_crypto is enabled, though [Wed Jun 18 15:14:56 2014] Is it normal for the AFSRedirector to come up with a prompt that it hasn't "passed windows logo testing" on XP? (Yeah, I know... XP. :-( ) [Wed Jun 18 15:17:45 2014] There is no logo testing for a file system driver on any version of Windows. [Wed Jun 18 15:19:48 2014] secureendpoints: Actually, its saying "The software you are installing for this hardware: \n AFSRedirector \n has not passed windows logo [...]" [Wed Jun 18 15:20:26 2014] secureendpoints: So, the error is abnormal and shouldn't just be clicked through? [Wed Jun 18 15:22:18 2014] I don't know what is producing that error. The redirector driver if it came from Your File System Inc. is cross-signed by Microsoft. [Wed Jun 18 16:56:25 2014] Just for closure: Installing afsredirinstall.inf was producing the prompt. It was likely due to some corrupt files in system32/catroot2. The internet told me to delete it then reboot and I blindly obeyed. [Thu Jun 19 17:02:50 2014] I love how Windows installer leaves out registry keys but still says the install was successful [Thu Jun 19 17:15:18 2014] Sophos refuses to open if OpenAFS is installed (it complains it can't open DLLs). Copying the relevant DLLs into System32 fixes the problem. Apparently this is recommended as a workaround by Sophos for Citrix DLLs. Is it a reasonable solution for OpenAFS? [Thu Jun 19 17:22:51 2014] I'd worry about those DLLs not getting updated when openafs itself is updated then [Thu Jun 19 17:23:12 2014] can you instead just add the directory containing those DLLs to the PATH? [Thu Jun 19 17:24:48 2014] cclausen: Yeah, that is a worry. I wrote a little script to copy the DLLs into place if Sophos is installed, but then quickly discovered that the DLLs are locked by the time the user logs in. :-( [Thu Jun 19 17:25:29 2014] cclausen: I doubt it, I think the AV is trying to protect itself against some kind of DLL hijacking trick? I will try the PATH thing, though, thanks for the suggestion. [Thu Jun 19 17:26:07 2014] cclausen: At least, once I've figured how to get rid of the system32 dlls. I miss *nix. [Fri Jun 20 15:22:59 2014] Uh, not to ask a dumb question, but in src/rx/rx_packet.h, RX_SLOW_START_OK and RX_JUMBO_PACKET are both defined as 32.... if things were in sequence like they appear to be, shouldn't RX_SLOW_START_OK be 16? [Fri Jun 20 15:23:34 2014] Or is the distinction ack packet vs data packet and 16 is unused? [Fri Jun 20 23:11:12 2014] Has anyone written up a comparison between Raft and Ubik? My google-fu is weak. [Fri Jun 20 23:14:30 2014] I have never heard of Raft [Fri Jun 20 23:14:50 2014] so that probably makes it less likely that a comparison exists [Fri Jun 20 23:20:46 2014] also, Raft appears to be fairly new? Ubik is much older [Fri Jun 20 23:46:17 2014] Uh, yes. The raft paper does not cite ubik, tho', which IMHO is indefensible. =P [Sat Jun 21 21:41:09 2014] hola [Sun Jun 22 18:01:11 2014] Aaaaargh. 'vos release' decided to nuke our 1.5TB Debian mirror RO replica again. >< [Sun Jun 22 18:01:33 2014] What's the advantage of deleting an extant RO_DONTUSE vs. just going ahead and doing a full release? [Sun Jun 22 18:02:15 2014] (Since the former is followed by the latter anyway, and there doesn't seem to be any observable harm in releases choosing too early a staring point in their dumps of the RW side...) [Sun Jun 22 20:17:26 2014] nwf: that would be the first step *of* a full release [Sun Jun 22 20:17:55 2014] full release = pave over old release, copy full contents of r/w to new release [Sun Jun 22 20:18:05 2014] what were you really trying to ask? [Mon Jun 23 06:23:03 2014] I've finally renamed my afs cell from dev.ru.is to rnd.ru.is. And it worked fine, until over the weekend. Now I can't stat /afs/rnd.ru.is. This is weird because fs checks says eerything is fine. [Mon Jun 23 06:23:32 2014] rxdebug lithium.rnd.ru.is Trying 130.208.242.66 (port 7000): Free packets: 342/507, packet reclaims: 0, calls: 0, used FDs: 7 not waiting for packets. 1 calls waiting for a thread 0 threads are idle 1 calls have waited for a thread Connection from host 130.208.242.68, port 7003, Cuid ae402cab/be3ab9f0 serial 22, natMTU 1444, security index 2, client conn rxkad: level clear Received 0 bytes in 0 packets Sent 60 bytes in 1 pa [Mon Jun 23 06:24:17 2014] udebug lithium.rnd.ru.is 7003 Host's addresses are: 130.208.242.66 Host's 130.208.242.66 time is Mon Jun 23 10:24:02 2014 Local time is Mon Jun 23 10:24:03 2014 (time differential 1 secs) Last yes vote for 130.208.242.66 was 0 secs ago (sync site); Last vote started 0 secs ago (at Mon Jun 23 10:24:03 2014) Local db version is 1403144234.4 I am sync site until 58 secs from now (at Mon Jun 23 10:25:01 2014) (3 servers) Recovery state 1 [Mon Jun 23 06:31:11 2014] Uh oh. vos listvolume lithium.rnd.ru.is a is hanging. That's a bad sign. [Mon Jun 23 06:38:55 2014] OK. lithium.rnd.ru.is is not responding to any vos queries. The bos status says that vlserver is running. What am I missing? (I am running it on that host.) [Mon Jun 23 06:39:45 2014] VolSerLog does say this Mon Jun 23 10:37:15 2014 SYNC_connect: temporary failure on circuit 'FSSYNC' (will retry) [Mon Jun 23 06:49:21 2014] Googling implies that the utilties can't find the named pipe to talk to the services. I didn't see any zombie processes accessing fsync.socket in lsof. Anyone got ideas? [Mon Jun 23 09:53:28 2014] there's something very wrong with my vldb. Anyone around who can help me figure out how to even query it? [Mon Jun 23 22:04:40 2014] hi, i need some advice about the openafs kernel module...i'm having difficulty building it, is there some information about what kernel version a certain release is for? [Tue Jun 24 15:34:38 2014] Hello. Having some rather troubling problems that I'm not quit sure how to diagnose. [Tue Jun 24 15:34:44 2014] er, quite sure... [Tue Jun 24 15:35:06 2014] I have three main fsservers, two of which host the bulk of the volumes. [Tue Jun 24 15:35:25 2014] Their back end storage dissappeared, and took them with it. [Tue Jun 24 15:35:47 2014] Recovered the back end storage, but now they wont' bring their volumes online [Tue Jun 24 15:35:55 2014] They are also database servers. [Tue Jun 24 15:36:11 2014] Salvager runs, and seems to complete okay [Tue Jun 24 15:36:59 2014] But, when I try to do a 'vos syncvldb' I end up with an error "Could not fecth the list of volumes from the server" "Possible Communication failure" [Tue Jun 24 15:37:11 2014] bos status for these servers shows all the expected processes running. [Tue Jun 24 15:37:37 2014] AFS version? [Tue Jun 24 15:37:48 2014] 1.4.14 [Tue Jun 24 15:37:54 2014] openafs 1.4.14 [Tue Jun 24 15:38:57 2014] Another scary bit is that things like 'vos listaddrs' doesn't seem to get back information [Tue Jun 24 15:45:03 2014] that means it failed w/ −1 asking your volserver for a list of volumes, which is odd because just before it does that it asks for a list of partitions - which apparently worked. [Tue Jun 24 15:45:52 2014] On a previous iteration it failed to retrieve the list of partitions, but does not seem to fail on that operation now. [Tue Jun 24 15:46:07 2014] vos partinfo for each of your fileservers [Tue Jun 24 15:46:49 2014] and vos listvol for each of your fileservers [Tue Jun 24 15:48:20 2014] partinfo works for server-3 and server-1, not for server-2. listvol works for server-3 (who has had no problems today), server-1 and server-2 are hanging, will likely fail [Tue Jun 24 15:48:32 2014] okay [Tue Jun 24 15:48:45 2014] I would check the logs on the failing servers next [Tue Jun 24 15:49:00 2014] BosLog, FileLog, VolserLog [Tue Jun 24 15:49:02 2014] server-1 and server-2 have been beat all to hell today. [Tue Jun 24 15:49:20 2014] they may be crashing intermittently... [Tue Jun 24 15:49:57 2014] looks like the fs process is dying periodically [Tue Jun 24 15:50:03 2014] yeah. [Tue Jun 24 15:50:28 2014] cool [Tue Jun 24 15:50:30 2014] BosLog should give some reason why (just a signal number, usually) [Tue Jun 24 15:50:43 2014] FileLog.old might give more details, nor might not [Tue Jun 24 15:50:50 2014] VolserLog is also saying it connot connect to the file server [Tue Jun 24 15:51:20 2014] right, that makes sense, that's one of the signs I wanted you to look for [Tue Jun 24 15:51:32 2014] damn, Ididn't think through that entry earlier. [Tue Jun 24 15:51:37 2014] volserver can't answer any vos commands until if finishes initializing [Tue Jun 24 15:52:17 2014] ah [Tue Jun 24 15:52:21 2014] there's teh error [Tue Jun 24 15:52:26 2014] oh [Tue Jun 24 15:52:27 2014] wait [Tue Jun 24 15:52:33 2014] no, its the same in my working serder. [Tue Jun 24 15:52:37 2014] er, server [Tue Jun 24 15:53:05 2014] Looking at the fileserver process now [Tue Jun 24 15:53:34 2014] The boslog has: 'fs: vol exited on signal 6" [Tue Jun 24 15:53:36 2014] are there any core files in the logs directory (usually /usr/afs/logs) [Tue Jun 24 15:53:43 2014] there ya go [Tue Jun 24 15:53:56 2014] what's your platform? [Tue Jun 24 15:54:13 2014] for 6 there won't be any other info in the BosLog or FileLog, typically [Tue Jun 24 15:54:34 2014] RHEL 5.10 [Tue Jun 24 15:55:02 2014] go to the logs directory, look for core files [Tue Jun 24 15:55:13 2014] if you find one, '# file core* [Tue Jun 24 15:55:19 2014] and tell me what you see [Tue Jun 24 15:55:32 2014] (one or more) [Tue Jun 24 15:55:32 2014] ( /var/log/openafs possibly? that's what Debian uses ) [Tue Jun 24 15:55:37 2014] I'm searching [Tue Jun 24 15:55:41 2014] Not in the log directory [Tue Jun 24 15:55:48 2014] rather, no cores in the logs directory [Tue Jun 24 15:55:55 2014] rh would be /usr/afs/logs, I think [Tue Jun 24 15:56:43 2014] yes [Tue Jun 24 15:56:47 2014] These systems I inherited, they are a little weird...the rhel 6 systems behave better wrt to filesystem. [Tue Jun 24 15:57:01 2014] /var/openafs/logs is where they are on this system. [Tue Jun 24 15:57:09 2014] okay [Tue Jun 24 15:58:03 2014] um, signal 6 and no core, that's no fun [Tue Jun 24 15:58:13 2014] dropping an strace on it [Tue Jun 24 15:58:21 2014] see if that tells me anything when it goes down. [Tue Jun 24 15:58:33 2014] will try to start it by hand as well [Tue Jun 24 15:58:50 2014] This helps a lot...I'd gotten to the point I wasn't thinking straight. [Tue Jun 24 15:59:40 2014] > as: 'fs: vol exited on signal 6" [Tue Jun 24 16:00:06 2014] that means bnode fs, process vol (volserver) [Tue Jun 24 16:00:21 2014] it's your volserver that's falling over according to BosLog [Tue Jun 24 16:00:37 2014] ah, right [Tue Jun 24 16:00:41 2014] see [Tue Jun 24 16:00:46 2014] not thinking clearly [Tue Jun 24 16:01:00 2014] did you check VolserLog? Just in case there's useful info there [Tue Jun 24 16:01:13 2014] VolserLog.old too [Tue Jun 24 16:01:39 2014] FSYNC_clientInit temporary failure (will retry), over and over again [Tue Jun 24 16:01:55 2014] that just means it can't talk to fileserver [Tue Jun 24 16:01:59 2014] iirc [Tue Jun 24 16:02:12 2014] ah. [Tue Jun 24 16:02:19 2014] ulimits -a [Tue Jun 24 16:02:40 2014] (on the server itself) [Tue Jun 24 16:02:58 2014] ah [Tue Jun 24 16:03:01 2014] core file size 0 [Tue Jun 24 16:03:09 2014] ! [Tue Jun 24 16:04:32 2014] changed [Tue Jun 24 16:04:34 2014] waiting [Tue Jun 24 16:05:16 2014] well [Tue Jun 24 16:05:22 2014] listparts on server-2 is working now [Tue Jun 24 16:05:38 2014] not sure why that is. [Tue Jun 24 16:06:59 2014] oh, hmm, that's list part, nt partinfo [Tue Jun 24 16:08:22 2014] okay: 'bos status -long' for each server [Tue Jun 24 16:09:49 2014] all processes are running as expected. On my problem children, server-1 and server-2, I see many restarts on the fs instance, by vol, due to sig 6 [Tue Jun 24 16:10:47 2014] okay [Tue Jun 24 16:10:49 2014] New error in VolserLog "Assetion files! file ../vol/volume.c, line 702" [Tue Jun 24 16:11:00 2014] beautful [Tue Jun 24 16:11:07 2014] now we're getting somewhere [Tue Jun 24 16:11:31 2014] cut & paste the msg again, please? [Tue Jun 24 16:11:42 2014] Tue Jun 24 16:08:31 2014 Assertion failed! file ../vol/volume.c, line 702. [Tue Jun 24 16:12:03 2014] Preceded by a bunch of: Tue Jun 24 16:07:43 2014 FSYNC_clientInit temporary failure (will retry) [Tue Jun 24 16:13:34 2014] dammit, still no core. [Tue Jun 24 16:15:24 2014] ulimits are inherited from parent, you'll have to add it to your server startup script and bounce bosserver to be sure [Tue Jun 24 16:15:44 2014] but this log tells us something more, stand by before you bounce anything to get a core [Tue Jun 24 16:16:35 2014] if (programType == volumeUtility) { [Tue Jun 24 16:16:35 2014] assert(VInit == 3 || VConnectFS_r(1)); [Tue Jun 24 16:16:35 2014] VLockPartition_r(partition); [Tue Jun 24 16:16:35 2014] } [Tue Jun 24 16:16:59 2014] how the heck did this assert in the volserver, programType is NOT volumeUtility... [Tue Jun 24 16:17:29 2014] I apologize, you are way over my head now... [Tue Jun 24 16:17:51 2014] np, there are lurkers who will come out of the woodwork now [Tue Jun 24 16:17:57 2014] hahaaha [Tue Jun 24 16:18:43 2014] or I should say "SHOULD NOT be volumeUtility" [Tue Jun 24 16:19:16 2014] It may also be worth noting that on server-1 and server-2, at each 'bos restart' for the fs instance, salvager kicks off...which, I'm starting to understand why, if the fs instance never fully initializes. [Tue Jun 24 16:22:14 2014] oh, well! "volumeUtility" is indeed possible from volserver - continuing to look [Tue Jun 24 16:24:11 2014] okay. [Tue Jun 24 16:24:55 2014] anything interesting? [Tue Jun 24 16:25:46 2014] the assert is just "I can't go on because of the FSYNC_clientInit failure [Tue Jun 24 16:25:58 2014] so that's what we need to focus on [Tue Jun 24 16:26:00 2014] ah [Tue Jun 24 16:26:54 2014] any chance you've got more than one server running here? [Tue Jun 24 16:27:03 2014] What do you mean? [Tue Jun 24 16:27:09 2014] ps -ef | grep server [Tue Jun 24 16:27:20 2014] how many fileservers do you see [Tue Jun 24 16:27:27 2014] 1 [Tue Jun 24 16:27:30 2014] okay [Tue Jun 24 16:28:14 2014] ah [Tue Jun 24 16:28:18 2014] none of the afs related processes are running more than once [Tue Jun 24 16:28:21 2014] volserver can't init until fileserver is up [Tue Jun 24 16:28:33 2014] fileserver can't init until it registers w/ vldb [Tue Jun 24 16:28:39 2014] are your db servers healthy? [Tue Jun 24 16:28:45 2014] I suspect not. [Tue Jun 24 16:28:50 2014] I have three db servers [Tue Jun 24 16:28:56 2014] one that has no file servers [Tue Jun 24 16:29:07 2014] udebug [Tue Jun 24 16:29:09 2014] and the other two are my trouble children [Tue Jun 24 16:29:31 2014] oh [Tue Jun 24 16:29:45 2014] my third db server is not being responsive to udebug either. [Tue Jun 24 16:29:52 2014] awesome. [Tue Jun 24 16:29:54 2014] there's your trouble [Tue Jun 24 16:30:03 2014] son-of-a-bitch [Tue Jun 24 16:30:40 2014] return code -1 from VOTE-Debug [Tue Jun 24 16:30:41 2014] great. [Tue Jun 24 16:32:22 2014] huh [Tue Jun 24 16:32:24 2014] and that one isn't responding now. [Tue Jun 24 16:32:40 2014] well, that would explain a lot [Tue Jun 24 16:34:21 2014] gotta step away now... [Tue Jun 24 16:35:23 2014] Thank you for your help so far [Tue Jun 24 16:35:27 2014] I have a little to work with [Tue Jun 24 16:35:40 2014] okay, you're welcom [Tue Jun 24 16:35:42 2014] welcome [Tue Jun 24 16:41:44 2014] woot! [Tue Jun 24 16:41:50 2014] Progress! [Tue Jun 24 16:58:04 2014] Is there any way on a dafs to get 'vos listvol' to wait for all volumes to become unbusy before printing? I'm trying to loop over all volumes on a given partition and sometimes I miss a few because "**** Volume NNNNNNNNNN is busy ****". [Tue Jun 24 16:58:44 2014] If I run 'vos listvol' again they all clear up, but that's hardly a good foundation for automation. [Tue Jun 24 17:16:19 2014] FWIW, I've made progress, my fileservers are behaving better. On to the next unfuckage (apparently, root.afs was not replicated properly in this environment, and now none of my clients can figure out how to navigate the /afs/cell directories. [Tue Jun 24 17:19:06 2014] So, it is definitely progress. [Tue Jun 24 17:42:37 2014] Aaaand, dead again. [Tue Jun 24 18:01:31 2014] Though, now it is down to 1 server, server-2 that is having the assertion failures [Tue Jun 24 18:01:39 2014] I can retrieve its partiton lists [Tue Jun 24 18:01:45 2014] just can't get the volume information to sync up. [Tue Jun 24 18:04:28 2014] go back to your udebug again, for vlserver (7003) and ptserver (7002) and make sure you have a sync site for both [Tue Jun 24 18:04:44 2014] you won't get far until that's stable [Tue Jun 24 18:05:08 2014] 'k [Tue Jun 24 18:07:18 2014] that output actually looks pretty good. [Tue Jun 24 18:07:29 2014] they are all agreed on who they voted for [Tue Jun 24 18:08:08 2014] okay [Tue Jun 24 18:08:33 2014] ah [Tue Jun 24 18:08:37 2014] but do you have a sync-site [Tue Jun 24 18:08:45 2014] (status f or 1f) [Tue Jun 24 18:09:07 2014] ptserver yes, vlserver no [Tue Jun 24 18:09:42 2014] check your VLLogs [Tue Jun 24 18:09:44 2014] trying to sync [Tue Jun 24 18:15:05 2014] my lowest IP db server is not casting a vote for the vlserver [Tue Jun 24 18:25:29 2014] huh [Tue Jun 24 18:25:43 2014] scratch that, ptserver hasn't decided it is a sync site, vlserver has. [Tue Jun 24 18:32:51 2014] oh, yay, got the election to work out. [Tue Jun 24 18:33:12 2014] what did you do? [Tue Jun 24 18:33:43 2014] stopped the ptserver instance on the lowest IP, then restarted ptserver on all three hosts [Tue Jun 24 18:33:45 2014] then waited. [Tue Jun 24 18:33:50 2014] sat on my hands [Tue Jun 24 18:34:05 2014] okay [Tue Jun 24 18:34:16 2014] restarting fs instance on server 2 [Tue Jun 24 18:34:22 2014] waiting for its salvager to do its thing. [Tue Jun 24 18:35:40 2014] I didn't think that lack of quorum on the ptservers would block fileserver initialization ... lack of quorum on the vlservers certainly will [Tue Jun 24 18:36:02 2014] Yeah, I"m not sure this will address my issue. [Tue Jun 24 18:36:10 2014] well, we'll see [Tue Jun 24 18:36:15 2014] I could be mistaken [Tue Jun 24 18:36:35 2014] "issue" is, of course, a euphamism right now. [Tue Jun 24 18:37:14 2014] bright side is that fs instance restarts on server1 no longer come along with a salvager ru [Tue Jun 24 18:37:16 2014] er, run [Tue Jun 24 18:37:22 2014] so that is kinda a big deal right now. [Tue Jun 24 18:38:14 2014] And, the salvager runs aren't like they were for the afs servers I ran 10 years ago, where it could take a day to let it go. [Tue Jun 24 18:38:20 2014] Those blew [Tue Jun 24 18:50:54 2014] improvements. [Tue Jun 24 18:51:03 2014] Now to see if I can get my systems to reboot and come up clean. [Tue Jun 24 18:57:50 2014] Rock! server-1 and server-2 rebooting successfully [Tue Jun 24 18:58:06 2014] now for the sync site server. [Tue Jun 24 19:29:54 2014] mvitale1: I'm up and running! Thank you very much for your help. I owe you something. [Tue Jun 24 22:15:39 2014] got some questions about compiling the kernel module: is there something that states openafs 1.6.x(whatever) can only work with up to a certain kernel? [Wed Jun 25 05:33:08 2014] Hi, is there a good getting started tutorial somewhere for openafs? I would like to use CentOS as server os if its possible... [Wed Jun 25 05:33:50 2014] hm, the wiki ist partly outdated [Wed Jun 25 05:35:40 2014] I do have had some guide on howto... but cannot find currently [Wed Jun 25 05:36:18 2014] I recommend the archlinux and gentoo wiki about getting startet with openafs, they both are quite complete [Wed Jun 25 05:37:33 2014] Oki, I will have a look at them. Dont know if openafs is the best alternative, but it looks promessing. [Wed Jun 25 05:37:36 2014] http://chemnitzer.linux-tage.de/2012/vortraege/plan there are some talks about OpenAFS with slides [Wed Jun 25 05:37:48 2014] depends on your needs, alternative to what fs? [Wed Jun 25 05:40:09 2014] I want to have a server that share its files to a remote client. And I want that client to be a smb-server to export the files localy at the remote office. The server should be in our DC, with backups, monitoring and so on. [Wed Jun 25 05:41:11 2014] in this case I would setup 2 OpenAFS fileservers and openafs clients instead of the smb share [Wed Jun 25 05:41:11 2014] Amiga4000: My german is a bit rusty... :) [Wed Jun 25 05:41:25 2014] yeah, it was a german talk.. [Wed Jun 25 05:41:52 2014] my very rough short guide on setting up a openafs cell: http://aceini.no-ip.info/openafs/setupcell.txt [Wed Jun 25 05:43:47 2014] Amiga4000: Ah, ok, then there is a sync between the servers. The openafs clients, are those for all OS'es (linux,windows,mac)? [Wed Jun 25 05:44:36 2014] clients are available for mac, windows and linux/Unix [Wed Jun 25 05:44:52 2014] but as OpenAFS is not really mainstream, it has some limits [Wed Jun 25 05:46:05 2014] http://chemnitzer.linux-tage.de/2012/vortraege/folien/1013_OpenAFS.pdf does show some points to you (german but most important facts are english words) [Wed Jun 25 05:52:33 2014] Oki, so if I understand it correct I can setup one server (preferable 3 servers) and let the clients (in this case the windows-boxes) get data direct from there. [Wed Jun 25 05:53:19 2014] And we can have different cells on the same server for different offices? [Wed Jun 25 05:55:04 2014] The trafic from the client, is that encrypted? And more important, is it accessable from ordinary networks? (i.e. If im not at the office, can I still access the data, or do I need to use a VPN connection? [Wed Jun 25 06:01:20 2014] ok, one server = once cell AFAIK, we do have seperate subdirs with different ACLs for this [Wed Jun 25 06:01:47 2014] windows boxes get access rights (tokens) on user login and have the data mapped to a Z:\ drive [Wed Jun 25 06:02:51 2014] the traffic is encrypted (if the client choose this option), but due to an old standard only RC4 (IMHO) so very very weak, currently work is done to change this to a real encrytion- but first a new standard needs to be setup and a bit different work has to be done [Wed Jun 25 06:03:29 2014] the server is reachable on IPv4 from everywhere you like (our cell here is reachable world wide, I got all my data on my laptop with network access...) [Wed Jun 25 06:18:37 2014] Amiga4000: then its recommended to use a VPN connection to use that data. I think I have enough info to get started anyway :) [Wed Jun 25 06:42:01 2014] the VPN link can be used if you want higher security [Wed Jun 25 06:44:38 2014] Yes, but if it is company files they often want it to be secure... :) [Wed Jun 25 09:30:04 2014] Amiga4000: data encrypted in AFS uses "fcrypt" not RC4. "fcrypt" is a weaker cousin of DES. [Wed Jun 25 09:30:53 2014] ok, thank you, but at all, weak [Wed Jun 25 10:51:20 2014] So are there going to be OpenAFS 1.6.9 packages released for RHEL 7? [Wed Jun 25 11:19:21 2014] no [Wed Jun 25 13:37:18 2014] does krb5.246 even exist (-1765328138) ? [Wed Jun 25 13:39:09 2014] krb5 error -1765328138 = KRB5_CC_READONLY [Wed Jun 25 13:39:26 2014] wtf [Wed Jun 25 13:39:37 2014] why in the world would be kerberos cache on windows become readonly? [Wed Jun 25 13:39:55 2014] no idea, sorry [Wed Jun 25 13:40:18 2014] it seems that tickets expire, should renew, fail to renew, and the afscred returns that particular error [Wed Jun 25 13:40:39 2014] (also problems with talking to AD servers due to expired should have renewed tickets) [Wed Jun 25 15:23:26 2014] CybrFyre: are you using the Windows SSPI cache? that is not "writable" by the Kerberos bits, just the Windows bits [Wed Jun 25 15:37:50 2014] cclausen - whatever the default is for an openafs install on windows [Wed Jun 25 15:38:03 2014] the issue only seems to pop up after credentials expire [Wed Jun 25 15:38:17 2014] integrated login or logging in and then separately getting tokens is no problem [Wed Jun 25 15:38:28 2014] I've had several cases where I've had to close NIM and re-launch it to re-populate credentials [Wed Jun 25 15:38:46 2014] I'm not sure I'm using a "default" setup though. [Wed Jun 25 15:40:32 2014] I mean I'm not somehow telling openafs to store its credentials in some specific place [Wed Jun 25 15:40:39 2014] it stores 'em wherever it stores 'em [Wed Jun 25 16:40:12 2014] What could give rise to "1 Volser: RestoreVolume: End of dump not found; restore aborted" ? [Wed Jun 25 16:47:00 2014] (During a 'vos release'. Both volsers involved are 1.6.9; the dbservers are 1.6.1-3+deb7 (master) and 1.6.9 (clone); everything's on Linux.) [Wed Jun 25 18:36:52 2014] It means one of three things: [Wed Jun 25 18:36:52 2014] 1. where a D_VOLUMEHEADER tag should have been there was not [Wed Jun 25 18:36:53 2014] 2. the D_DUMPEND tag was reached but associated data value was not equal to DUMPENDMAGIC. [Wed Jun 25 18:36:53 2014] 3. the end of the dump stream was not received either because it wasn't sent or because of an rx network error. [Thu Jun 26 01:22:01 2014] secureendpoints: So if it's transient and the next 'vos release' works, it's probably option 3? [Thu Jun 26 01:22:19 2014] (Erm, the next 'vos release' of the same volume with no mutations happening to the store) [Fri Jun 27 04:15:15 2014] If somebody has a moment, could they look at http://pastebin.com/nSrF1zcT and proffer suggestions as to why I am unable to release this volume? [Fri Jun 27 04:18:22 2014] cannot tell, in this case I always do bos salvage ;-) [Fri Jun 27 04:20:45 2014] Running dafs; I didn't think I was supposed to 'bos salvage'. [Fri Jun 27 04:21:03 2014] I can run salvageserver -client on this volume again, but it had no effect last time. [Fri Jun 27 04:22:11 2014] Hm; trying this time yields "SALVSYNC_askSalv: SALVSYNC request denied for reason=65537" :( [Fri Jun 27 04:22:21 2014] we do not run dafs so far and I do run salvager on partition, as sometimes thats the only way. [Fri Jun 27 04:22:40 2014] ah, another more explicit error, but I do not know... [Fri Jun 27 04:24:46 2014] Oh, I did not read up enough: " namei_ListAFSSubDirs: warning: VG 536871507 does not have a link table; salvager will recreate it." [Fri Jun 27 05:01:30 2014] Hmmmm.... I can release to other servers but not to other partitions on this server (magellan). [Fri Jun 27 05:04:21 2014] a RO can only exist on on epartition on a server [Fri Jun 27 05:05:13 2014] Er. [Fri Jun 27 05:05:38 2014] Sorry, I mean I cannot release to a replication site on either magellan:/vicepa or magellan:/vicepu, but I can release to batman:/viceps. [Fri Jun 27 05:15:41 2014] as magellan vicepa and vicepu are on the same server, it does not work [Fri Jun 27 05:15:53 2014] one RO per server. [Fri Jun 27 05:17:08 2014] Er, no, again... I mean that if I set the single non-chicago replication site to magellan:/vicepa, it doesn't work. If I set the single non-chicago replication site to magellan:/vicepu, it doesn't work. If I set the single non-chicago replication site to batman:/viceps everything works grand. [Fri Jun 27 05:18:32 2014] are there any leftovers of the RO on the magellan servers partition? [Fri Jun 27 05:18:45 2014] vos remsite does not remove them, only vos remove [Fri Jun 27 05:18:57 2014] I do not believe so, I have 'vos zap'ed everything in sight. [Fri Jun 27 05:27:42 2014] sorry, more I cannot tell so far [Sat Jun 28 03:59:39 2014] Is it still the case that adding a new ubik server node with a lower IP address than the others requires updating all clients first? Does changing the DNS records and waiting one TTL obviate that need? [Sat Jun 28 10:31:37 2014] it is still the case and DNS has no effect on it [Sat Jun 28 10:39:07 2014] if you mean clients use AFSDB or SRV records exclusively, you would have to wait one DNS TTL at least; I'm not sure of full details since I always use CellServDB for the local cell. [Sun Jun 29 08:56:43 2014] hi [Sun Jun 29 08:57:17 2014] use case: I have a server with 10 TB of disk space in a hardware RAID6, and a local DSL router with 256 GB of diskspace [Sun Jun 29 08:57:36 2014] I'd like to use the router as some sort of cache [Sun Jun 29 08:57:46 2014] is that possible with OpenAFS? [Sun Jun 29 08:58:20 2014] (i.e. client dumps large file on router, router then slowly transfers that to the main server) [Sun Jun 29 12:56:46 2014] GyrosGeier: can you run an openafs client on the router? and are you aware of the risks of fs storebehind ? [Sun Jun 29 12:57:59 2014] I think it should be possible [Sun Jun 29 12:58:09 2014] it's a quadcore ARM [Sun Jun 29 12:58:20 2014] with a local SSD [Sun Jun 29 12:59:11 2014] * reads up on storebehind [Sun Jun 29 13:02:04 2014] hmm [Sun Jun 29 13:02:11 2014] that sounds unsafe [Sun Jun 29 13:03:52 2014] it can improve performance if you know what you are doing, but you could end up with failed writes if the connection goes down or there is a server issue or whatever else [Sun Jun 29 13:04:02 2014] indeed [Sun Jun 29 13:04:23 2014] the connection is fairly stable, but it's still IPsec over DSL [Sun Jun 29 13:04:33 2014] are you looking at OpenAFS to specifically handle this case? Or do you have an existing cell and just need to deal with large files? [Sun Jun 29 13:04:43 2014] the former [Sun Jun 29 13:04:48 2014] it's a new setup [Sun Jun 29 13:05:33 2014] I'm moving the file server from my broom closet to a colo facility [Sun Jun 29 13:05:33 2014] I assume a Microsoft Windows setup won't help you, but I use Microsoft Dfs at work and since it has multi-master replication, it can handle the situation you describe [Sun Jun 29 13:07:07 2014] hmm [Sun Jun 29 13:07:22 2014] indeed [Sun Jun 29 13:07:40 2014] running a DFS capable Windows may be difficult on that device [Sun Jun 29 13:07:56 2014] yep [Sun Jun 29 13:09:57 2014] would storebehind at least block readers from accessing the parts of the file not yet transferred? [Sun Jun 29 13:12:19 2014] if the data isn't transferred yet, there would be nothing there to read, correct? [Sun Jun 29 13:12:38 2014] indeed [Sun Jun 29 13:12:55 2014] so would the other readers block, or get an error? [Sun Jun 29 13:13:46 2014] I think they would just get a premature EOF when they read up through the data that did exist [Sun Jun 29 13:13:50 2014] I mean, technically it's not so different from the case where the file is locked because it is still cached in a client [Sun Jun 29 13:45:43 2014] hm [Sun Jun 29 13:45:44 2014] also [Sun Jun 29 13:46:04 2014] is there a sane way to migrate an existing filesystem tree inside AFS? [Sun Jun 29 13:53:55 2014] GyrosGeier: not automatically. you really need to determine how you want to setup AFS volumes [Sun Jun 29 13:56:25 2014] I did use a migrate.pl script that I found somewhere on the athena.mit.edu cell to migrated end-user NFS directories into AFS, however for general file structures setting up volumes involves more thought [Sun Jun 29 14:00:07 2014] it's mostly "I have a bunch of large files in one volume, and my home directory in another" [Sun Jun 29 14:32:22 2014] when you say "a bunch" how many do you mean? [Sun Jun 29 14:33:04 2014] AFS has a limit on the number of items in a directory (slots), and the number of slots is based on the length of the filenames [Sun Jun 29 16:16:40 2014] not a problem [Sun Jun 29 16:16:44 2014] nesting is deep [Mon Jun 30 10:30:27 2014] Hmm. So on a fileserver, "time dd if=/dev/zero of=/vicepa/lost+found/delme count=1000000" (512M) takes 3.5 seconds. Copying into an /afs volume on the server itself takes 1m15s. Is that unreasonable? [Mon Jun 30 10:33:20 2014] (when I say copying I mean repeating the dd command with a different path) [Mon Jun 30 11:48:16 2014] ashl_: AFS writes go through the afs cache on the afs client [Mon Jun 30 11:48:55 2014] ashl_: if you want writes to go faster on the file server's afs client, I'd suggest using the -memcache option to asfd. (this is what I did on my afs servers.) [Mon Jun 30 11:49:29 2014] others may list some objections to using -memcache though as there are some differences with how it works versus an on disk cache [Mon Jun 30 12:07:47 2014] cclausen: Its still a 20x slowdown. Is all of that attributable to the cache? I'm going to try playing with size, and after that I might try memcache, thanks. [Mon Jun 30 12:27:23 2014] my guess would be how the caching works, yes. I assume that you were testing with an afs volume actually on the same physical server? (You could of course have been writing to a volume anywhere in the world.) [Mon Jun 30 12:42:32 2014] um, there's still caching in bth cases, uunless bu "an /afs volume on the server" you meant the actual /vicep* in which case, don't [Mon Jun 30 12:44:12 2014] geekosaur: the dd command directly on /vicepa wouldn't go through any AFS related caching, right? [Mon Jun 30 12:45:11 2014] yes, that was my point. you still get the cache manager and "network" (loopback on the server) communication even using the afs client on a server [Mon Jun 30 12:45:33 2014] so caching should not be the issue if it's faster with a client on the server than a client elsewhere [Mon Jun 30 12:45:43 2014] well, yes, but presumably local client <-> local server is faster than local client <-> remote afs server [Mon Jun 30 12:47:48 2014] I just wanted to make sure that we were not debugging a performance issue across a slow network link and instead all communication was on the same computer system [Mon Jun 30 12:48:36 2014] I think ashl_ is concerned that just writing into AFS is a lot slower than writing to the /vicepa directly. [Mon Jun 30 12:57:37 2014] cclausen: Yes. I used /vicepa/lost+found and /afs/vol-on-the-local-server/ [Mon Jun 30 12:58:29 2014] cclausen: I actually think 512mb written in a minute might be tolerable, but if there is an easy performance bottleneck fix I'd rather sort it out now [Mon Jun 30 13:42:17 2014] Interesting. Changing the cache size from 100mb to 1000mb actually slows it down to 1m38s. [Tue Jul 1 04:57:37 2014] encfs and OpenAFS does not play well together here on debian [Tue Jul 1 16:47:57 2014] hey guys, I am in the middle fo bringing all of our afs servers down and putting in the rxkad.keytab and krb.conf file to enable kerberas auth with the stronger encryption types and when I start the first server back up, I get no error. When I shut it back down, I get a ticket contained unknown key version number error. I thought this was because I tried staging the change originally rather than bringing everything down and bringing it all b [Tue Jul 1 17:43:44 2014] nevermind, I got it working. [Wed Jul 2 10:41:11 2014] do I want to be using NIM2, or what comes with KfW4.0.1 ? (I note that the release notes for 4.0.0 say "AFS support is not available in this release)... or should I stick with KfW 3.2.2? [Wed Jul 2 10:42:01 2014] Well, that depends on what you want to do. KfW 4.x "works" with openafs if you run aklog by hand, but IIRC there is not a GUI solution for obtaining tokens. [Wed Jul 2 10:42:39 2014] assuming it doesn't break the afscreds tray app, that'd be fine [Wed Jul 2 10:43:28 2014] It has been probably a year since I looked at this. What I remember is that typing a password into afscreds worked, but afscreds would not use an existing TGT from KfW 4.x. [Wed Jul 2 10:43:42 2014] looks like Nim2 hasn't been updated since 2010, so... [Wed Jul 2 10:44:08 2014] does that mean auto renewal was broken, then? [Wed Jul 2 10:44:18 2014] Also, we at MIT have not really gotten any feedback that people want AFS GUI support from KfW 4.x. It's unlikely that we'll implement anything in the absence of feature requests. [Wed Jul 2 10:44:39 2014] I have no experience with afscreds auto renewal. [Wed Jul 2 10:45:08 2014] by afs gui, do you mean a separate gui for afs (versus it just doing aklog in the background automagically upon getting a tgt)? [Wed Jul 2 10:46:47 2014] I'm not sure I am parsing that correctly. What is the "it" that would be doing aklog in the background? [Wed Jul 2 10:47:08 2014] Anyway, "running aklog in the background" from some other application feels really ugly to me. [Wed Jul 2 10:47:23 2014] the "it" would be the included NIM gui app [Wed Jul 2 10:47:41 2014] I don't have anything in particular in mind when I say "AFS GUI" -- just a button or checkbox that the user can select to get AFS tokens. [Wed Jul 2 10:47:47 2014] ok [Wed Jul 2 10:48:16 2014] so, even with whatever afs plugin... you still have to, at the moment, somehow separately aklog [Wed Jul 2 10:49:40 2014] Also, I should note that NIM and KfW 4 are not exactly incompatible -- if you like the NIM interface, many things can work if you run the NIM gui and have KfW 4.x installed, but don't run the MIT Kerberos Ticket Manager application. [Wed Jul 2 10:49:59 2014] I hate the 3.2.2. NIM interface... which is why we still use afscreds [Wed Jul 2 10:50:00 2014] I didn't test that scenario very much, though, so I don't remember what does and does not work. [Wed Jul 2 10:50:11 2014] :) [Wed Jul 2 10:50:23 2014] CybrFyre: some time ago IIRC someone decided to write a very simple C# "click this icon" wrapper for 'aklog' [Wed Jul 2 10:50:42 2014] ok, Ticket Manager is what it's called in v4 [Wed Jul 2 10:51:16 2014] so, does the OpenAFS provider work with the v4 Ticket Manager? [Wed Jul 2 10:54:01 2014] IIRC, "not really". [Wed Jul 2 11:02:11 2014] hmmm... so, staying with 3.2.2 seems like the right answer [Wed Jul 2 11:02:42 2014] Well, from my point of view, I would like it if you tried KfW 4.x in your environment and reported back to the MIT lists what did/didn't work for you. [Wed Jul 2 11:03:13 2014] Because I am not a good tester of AFS on windows; I don't know how "most people" use it. [Wed Jul 2 11:06:09 2014] at the moment, I'm trying to figure out why things seem to somehow end up trying to write to the lsa cache [Wed Jul 2 11:06:32 2014] On a clean KfW 4.x install? [Wed Jul 2 11:06:39 2014] no, this is 3.2.2 [Wed Jul 2 11:06:51 2014] that got me in the directoin of wondering if 4.x would resolve that problem [Wed Jul 2 11:06:52 2014] Oh. That I am less likely to be helpful with ;) [Wed Jul 2 11:07:08 2014] In 4.x, there's a registry key for the default ccache [Wed Jul 2 11:07:23 2014] so, where does 4.x stick the krb5.ini ? [Wed Jul 2 11:07:46 2014] I have gotten a report that on a domain-joined machine, KfW 4.x tried to use the LSA cache by default, which does not match my recollection, but I haven't gotten a chance to look more carefully. [Wed Jul 2 11:08:00 2014] well, my machines are all domain joined :) [Wed Jul 2 11:08:52 2014] C:\Program Data\MIT\Kerberos5, IIRC. [Wed Jul 2 11:09:03 2014] * installs 4.0.1 64-bit [Wed Jul 2 11:11:23 2014] Thanks. [Wed Jul 2 11:14:00 2014] The key(s) are {HKCU,HKLM}\Software\MIT\Kerberos5\ccname for the default ccache, BTW. "API:" is best, if you want to not use "LSA:". [Wed Jul 2 11:14:20 2014] what's the default? [Wed Jul 2 11:16:03 2014] I have KRB5CCNAME set to API:%USERNAME%@%USERDNSDOMAIN% on my computers (also domain joined) [Wed Jul 2 11:16:25 2014] you have to change the key type to REG_EXPAND_SZ for that to actually work the way you want though [Wed Jul 2 11:16:31 2014] I think the default is API:, but I am not 100% confident that's the case on a domain-joined machine, as mentioned. [Wed Jul 2 11:16:59 2014] cclausen: what are you using for krb5 on your machines, if I may ask? [Wed Jul 2 11:17:21 2014] Heimdal (whatever was latest when I installed it) with NIM 2.x [Wed Jul 2 11:18:01 2014] that same setup seems to work with MIT Kerberos though on my older 2003 system [Wed Jul 2 11:18:30 2014] krb5.ini -> %PROGRAMDATA%\Kerberos\krb5.conf [Wed Jul 2 11:18:51 2014] Thanks. [Wed Jul 2 11:18:54 2014] (for Heimdal) [Wed Jul 2 11:19:05 2014] the MIT install I think uses a different path as noted above [Wed Jul 2 11:19:33 2014] k. afscreds works with it [Wed Jul 2 11:19:59 2014] Yay? [Wed Jul 2 11:20:46 2014] interesting that it's telling me my windows domain tickets user have expired [Wed Jul 2 11:21:06 2014] it must not be looking at the lsa cache, which is fine [Wed Jul 2 11:21:17 2014] I occationally have to close NIM and re-launch it to reinitialize the cache. I'm not sure what causes that problem [Wed Jul 2 11:21:48 2014] cclausen - krb5.conf or krb5.ini in that directory? [Wed Jul 2 11:22:08 2014] krb5.conf is what I have [Wed Jul 2 11:22:33 2014] hunh.. .the installer for 4.0.1 created a blank krb5.ini in that directory [Wed Jul 2 11:23:18 2014] Looks like heimdal (master) looks at COMMON_APPDATA/Kerberos/krb5.conf and {WINDOWS}/krb5.ini [Wed Jul 2 11:23:33 2014] CybrFyre: yes, the empty krb5.ini is as-expected. [Wed Jul 2 11:23:53 2014] hunh... the arrow next to the principal doesn't seem to consistently expand to show the list of service tickets [Wed Jul 2 11:24:03 2014] double clicking consistently works [Wed Jul 2 11:24:12 2014] We used to ship a pile of [realms] entries, but most places DNS works just fine, so we figured that things would work good enough with an empty file. [Wed Jul 2 11:25:00 2014] Yeah, I remember some weirdness with that arrow. Double-clicking should work reliably. Also, if you're using the "classic" (?) theme on win7, some bits are sometimes not displayed. [Wed Jul 2 11:25:46 2014] ok... using it does not, indeed, auto get me afs tokens [Wed Jul 2 11:26:38 2014] You mean the "get ticket" button in the "Ticket Manager.exe" does not get afs tokens? [Wed Jul 2 11:28:07 2014] (The behavior I stated is known/expected.) [Wed Jul 2 11:28:11 2014] right [Wed Jul 2 11:28:56 2014] hunh... on a fresh login, it seems to also be showing the LSA ticket cache tickets [Wed Jul 2 11:29:09 2014] unless it copied those over to its default cache [Wed Jul 2 11:29:41 2014] Ticket Manager.exe will attempt to display tickets in the LSA cache, as well as other tickets in (e.g.) an API cache. [Wed Jul 2 11:29:59 2014] k. seems it may not always be successful at that, then [Wed Jul 2 11:30:07 2014] Yeah. [Wed Jul 2 11:30:19 2014] which could be really confusing to users [Wed Jul 2 11:30:48 2014] ew... it seems to have broken afs integrated login (KfW 4.0) [Wed Jul 2 11:30:56 2014] IIRC, it has to actually retrieve the service ticket and/or TGT to know that that ticket is there, and it cannot retrieve the TGT itself due to mumble policy. However, it will fake up a TGT entry to display if it can fetch a service ticket. [Wed Jul 2 11:31:12 2014] You're taking notes, right? ;) [Wed Jul 2 11:31:53 2014] krb5.196 [Wed Jul 2 11:34:00 2014] ack... no orca on this machine [Wed Jul 2 11:38:15 2014] Internal credentials cache error [Wed Jul 2 11:38:17 2014] hunh [Wed Jul 2 11:39:18 2014] * runs a repair install of openafs for win [Wed Jul 2 11:44:40 2014] nope... integrated login still not working... same error [Wed Jul 2 11:44:44 2014] so, that's a huge problem [Wed Jul 2 11:46:47 2014] ew... no maximize button [Wed Jul 2 11:48:37 2014] I don't think I have ever played around with AFS integrated login. Please do send mail to kerberos@mit.edu. [Wed Jul 2 11:59:19 2014] CybrFyre: I've been using: http://www.pantaray.com/msi_super_orca.html [Wed Jul 2 12:03:36 2014] will hafta check it out [Wed Jul 2 13:03:24 2014] k. just sent an email to Kerberos [Wed Jul 2 13:04:50 2014] Thanks! [Mon Jul 7 23:35:19 2014] Are there public logs of this channel available? [Mon Jul 7 23:36:10 2014] not currently; I'm (well, a coworker is) working on it [Mon Jul 7 23:37:38 2014] (I kept logs while I was at CMU and a little thereafter until my account was disabled; my current employer is still working through internal red tape about providing external services) [Mon Jul 7 23:45:13 2014] geekosaur: thanks! [Wed Jul 9 10:04:14 2014] I have a weird situation on my root.cell volume. On one client, volume project is fine. On the other, it is timing out. I have tried flush flushmount flushvol. What else can I try? vos examine project works fine. [Wed Jul 9 10:06:32 2014] fs checks says all servers are running. I'm a a bit stumped. [Wed Jul 9 10:07:01 2014] Anything interesting on the network? firewall, different MTU, etc.? [Wed Jul 9 10:08:06 2014] Nope, they are all on an unmanaged switch [Wed Jul 9 10:08:35 2014] All of the fileservers and dbservers are on the same network. I was trying to check on the first server (li.rnd.ru.is) which is both a fileserver and dbserver [Wed Jul 9 10:08:57 2014] I restored from a backup into project.r, and that mounts without a problem [Wed Jul 9 10:11:28 2014] The plot thickens. I did a fs rmm project, and tried to rename and mount project.r there. It is now complaining that project exists and won't let me rmm or mkm it [Wed Jul 9 10:13:02 2014] Whoa. And I cant' see it on a ls, but I can "ls project" and it lists the contents. There is something really weird going on. [Wed Jul 9 10:13:18 2014] This is all in /afs/.rnd.ru.is [Wed Jul 9 10:16:48 2014] Wow. That kept up until I deleted the old project volume. This is totally weird. [Wed Jul 9 10:23:33 2014] I have a broken mountpoint to a volume that might have gotten deleted. Is there a way to ask fs to tell me what volume use to mount there? [Wed Jul 9 10:24:11 2014] I would not expect there to be one. [Thu Jul 10 11:58:37 2014] Hello! [Thu Jul 10 12:14:10 2014] Hi [Sun Jul 13 11:37:23 2014] hola [Sun Jul 13 11:43:07 2014] hello [Sun Jul 13 11:43:19 2014] how are you [Mon Jul 14 04:01:43 2014] anyone else has problems with afs-newcell on debian? [Mon Jul 14 04:01:55 2014] Could not fetch the list of partitions from the server [Mon Jul 14 04:02:05 2014] is what i get when i run that [Mon Jul 14 06:44:17 2014] vlserver: Ubik init failed: problems with host name is the vllog [Mon Jul 14 06:55:15 2014] do you have some issues with DNS? [Fri Jul 18 17:34:38 2014] Anyone else at HOPE in NYC? [Fri Jul 18 17:39:36 2014] I was hoping to (no pun intended) but had other stuff going on this weekend. [Thu Jul 24 00:19:35 2014] Does inotify() work on the Linux AFS cache manager? [Thu Jul 24 00:19:49 2014] (In particular, can userland use inotify() to subscribe to callback breaks?) [Thu Jul 24 00:21:31 2014] no [Thu Jul 24 00:21:51 2014] inotify is GPL only [Thu Jul 24 01:52:59 2014] secureendpoints: That stinks. :( [Thu Jul 24 12:32:47 2014] I had OpenAFS installed before on OSX. I deleted the .app and thought it was done, but I still have settings visible in System Preferences, and I get an AFSBackgrounder daemon using 20GB of real+virtual memory. How do I completely remove OpenAFS? [Thu Jul 24 12:43:39 2014] jfcaron: I think there's an uninstaller on the OpenAFS disk image [Thu Jul 24 12:44:53 2014] if you don't have it, simply download it from openafs.org [Thu Jul 24 12:45:21 2014] Aye, I tried that but it didn't seem to change anything. I think maybe my OpenAFS was for 10.7, but it was restored from a backup on my 10.9. [Thu Jul 24 12:45:37 2014] So I tried installing the 10.9 version of OpenAFS "over" whatever was there before, then running the uninstaller. [Thu Jul 24 12:45:48 2014] Seems to have done the trick... [Thu Jul 24 22:14:55 2014] Dumb question. 'vos shadow' seems to allow arbitrary choice of start time for producing the replication stream, and it seems like the receiving server is basically content to deal with it. If that's true, why do 'vos release' operations begin by removing extant replicas? [Thu Jul 24 22:27:38 2014] not all vos release-s remove existing replicas; if it can determine that a diff will be smaller it sends an update instead. that said, vos release goes for safety/reliability, whereas I think vos shadow is more raw and leaves consistency issues up to the user [Thu Jul 24 22:28:08 2014] basically, if you're using vos shadow, you are assumed to know what you are doing --- because it can't check sanity via the VLDB [Thu Jul 24 22:29:17 2014] geekosaur: Well, in particular, if I interrupt a ' [Thu Jul 24 22:29:34 2014] 'vos release' it nukes the volume and goes from scratch next time. [Thu Jul 24 22:29:45 2014] At 10MB/sec, peak, a 1TB volume takes... "a while". [Thu Jul 24 22:30:17 2014] I'd be using -stayonline to mitigate this if it worked, but I am wondering why 'vos release' doesn't just start the send over from the same point in time it was going for last time. [Thu Jul 24 22:31:28 2014] because it can't be certain of what the remote server has. again, vos shadow leaves that for you to sort out; vos release goes for certainty [Thu Jul 24 22:31:48 2014] and the receiving server isn't so much "content to deal with it" as "blindly trusting" [Thu Jul 24 22:31:59 2014] you *can* screw up a volume badly by relying on it [Thu Jul 24 22:32:21 2014] (for vos shadow that is) [Thu Jul 24 22:32:48 2014] vos shadow has no safeties, because it can't have them; they rely on VLDB data that doesn't exist for shadow volumes [Thu Jul 24 22:33:14 2014] Well, OK, so my limited understanding is that the receiving end has a date stamp that marks the time the last snapshot was sent to it. I was guessing, having not looked at the code, that this would be updated only after a successful send, which would mean that the only difference between a normal 'release' and recovering from an interrupted 'release' would be the RO_DONTUSE flag in the VLDB. [Thu Jul 24 22:33:43 2014] you are assuming it receives an entire update and then applies it. what happens if it applies it as it receives it? [Thu Jul 24 22:34:35 2014] Then the timestamp continues to indicate the time of the prior snapshot and it will just pave over the changes again by blindly trusting the replication stream if it gets interrupted and resent? [Thu Jul 24 22:35:01 2014] Since, IIRC, the replication stream always includes the full list of inodes and directories and only the mutated file content? [Thu Jul 24 22:35:54 2014] my understanding is it's not quite that simple [Thu Jul 24 22:35:59 2014] I admit to being fuzzy on the details here, but I am really quite tired of seeing two days worth of time get nuked and re-sent over and over and over. :( [Thu Jul 24 22:36:33 2014] I mean, I'll believe you, as you're certainly more up on the system than I am ( :) ) but I would still like to register my objections. =P [Thu Jul 24 22:36:41 2014] (admittedly my understanding is not very strong. but I got the strong impression from past discussions on -info that it is not particularly safe to rely on such updates) [Thu Jul 24 22:37:52 2014] Is there, by chance, an RPC for server-side checksumming operations on inodes? [Thu Jul 24 22:38:12 2014] in any case I would suggest you want to ask about this on -info or maybe the jabber channel [Thu Jul 24 22:38:32 2014] OK. I have not had luck getting MUC working, but I'll ask on -info. [Thu Jul 24 22:38:33 2014] Thanks. [Thu Jul 24 22:38:58 2014] but in general, as I said, vos release goes for safety and certainty, vs. vos shadow blingly trusting. [Thu Jul 24 22:39:21 2014] one of the possible problems is that bugs or underlying storage anomalies do happen and your solution would interact poorly with them [Thu Jul 24 22:39:59 2014] considering that I've already worked a few tickets for SNA where problems with iSCSI-attached vice partitions led to nasty corruption, vos release would recover far better than vos shadow [Thu Jul 24 22:40:14 2014] (weren't you experimenting with such non-local block storage?) [Thu Jul 24 22:40:34 2014] (Yes and we stopped because [redacted] that noise.) [Thu Jul 24 22:41:04 2014] (Well, we're still doing remote RBDs but no longer cephfs, which was really the experimental thing. It bit us.) [Thu Jul 24 22:41:37 2014] (But in my case, too, I'm using replication to try to host data on not-the-ceph-back-servers in case ceph explodes again.) [Thu Jul 24 22:42:43 2014] If I add a "-dontnuke" flag to 'vos release' and nothing goes awry, would the patch be considered or summarily rejected because I'm crazy for wanting such a thing? =P [Thu Jul 24 22:43:39 2014] I expect it would be considered; you're far from the only one who would like faster releases. but it'd have to be pretty trustworthy; test it very well [Thu Jul 24 22:44:01 2014] (better than, say, -stayonline >.> ) [Thu Jul 24 22:44:43 2014] * once again reminded he needs to un-bitrot his stress test VMs... [Fri Jul 25 01:47:41 2014] ... which of afs/vldbint.h or afs/volser.h do I believe? They agree on "new repsite", "rw", "ro", and "backup" volumes, but then one says "rwrepl", "dontuse" while the other says "uuid", "dontuse", "rwrepl". [Fri Jul 25 01:50:05 2014] And both sets of constants seem to be in use in the codebase. [Fri Jul 25 03:17:21 2014] <[gorgo]> vldbint sounds like an interface to the vlserver, while volser.h is forthevolserver [Sat Jul 26 03:36:02 2014] If somebody has a minute, could they explain why vos each is failing to build on OSX and Solaris11? http://buildbot.openafs.org:8010/builders/solaris11_x86-builder/builds/4471/steps/compile/logs/stdio ; the other error is similar and seems to be that the linker is not happy with me needing another symbol resolved? [Sat Jul 26 10:21:28 2014] Hmm, OSX+solaris might be a libtool symbol export list (.la.sym) not including the needed symbol. [Sat Jul 26 10:23:03 2014] Yeah, I would add xdr_nbulkentries to src/vlserver/liboafs_vldb.la.sym.in (keep it sorted!) [Sat Jul 26 10:24:48 2014] (nwf: ^^^) [Sat Jul 26 11:20:31 2014] Thanks! [Sat Jul 26 14:47:09 2014] What is src/libadmin? [Sat Jul 26 23:02:54 2014] src/libadmin is a set of libraries that were designed and used for GUI based administrative tools. The functionality never had the full power of the command line tools and has not been maintained during the lifetime of OpenAFs. [Sun Jul 27 00:37:14 2014] Is it kept around because of the requirements for the AFS name? Does anyone meaningfully use it? [Sun Jul 27 10:38:12 2014] it is kept around because there are sites that still use the GUI on Windows [Mon Jul 28 03:10:57 2014] What does the volser dying with "Fatal Rx error: rxi_SendPacketList, niovecs > 2" mean? [Tue Jul 29 20:37:38 2014] hello! is it possible to preallocate space for the AFS cache, so as to prevent out-of-disk from breaking AFS? [Tue Jul 29 21:19:34 2014] sbaugh: use a separate filesystem? [Tue Jul 29 21:19:54 2014] jackhill: any other way? :) [Tue Jul 29 21:20:26 2014] but okay, I guess I can just loop mount a preallocated disk image? [Tue Jul 29 21:24:30 2014] sbaugh: there are probably other ways (perhaps qgroups with btrfs), but I don't think AFS adds any complexities or solutions to this problem. [Tue Jul 29 21:24:47 2014] I don't know of any problems with a loopback mount. [Tue Jul 29 23:21:54 2014] sbaugh - do you not have a parttition dedicated to the afs cache? [Tue Jul 29 23:28:34 2014] RedFyre: well, no [Tue Jul 29 23:30:54 2014] or just use a memcache instead [Wed Jul 30 11:36:02 2014] has anyone tried buliding and installing a 1.6.9 package for Mavericks? [Wed Jul 30 11:37:23 2014] I am under the impression building for >10.8 is pretty well broken [Wed Jul 30 11:38:17 2014] The source code should be okay; building an installer .dmg is ... challenging. [Wed Jul 30 12:53:25 2014] the installer builds just fine [Wed Jul 30 12:53:38 2014] but I'm seeing the same issue I was seeing with older installers on an older mac [Wed Jul 30 12:54:02 2014] I thought getting the tool(s) needed to make the installer was the issue. [Wed Jul 30 12:54:14 2014] "afsbackgrounder.app couldn't be moved to resources because an item with the same name already exists" [Wed Jul 30 13:46:21 2014] hmm... package installed on another computer running Mavericks (no afs previously installed) [Wed Jul 30 13:47:06 2014] tho, fs and aklog are both looking for /opt/local/lib/libkrb5.3.3.dylib (which is on the build machine) [Wed Jul 30 13:47:30 2014] somehow that lib path appears to be static [Wed Jul 30 13:49:38 2014] and on the build computer, and uninstall of 1.6.5.2 allowed 1.6.9 to install [Wed Jul 30 13:50:26 2014] There may have been some rra-c-util bugs in that area. Are you passing PATH_KRB5_CONFIG to configure? [Wed Jul 30 13:51:41 2014] I am not, no [Wed Jul 30 13:51:43 2014] should I be? [Wed Jul 30 13:52:03 2014] It probably depends on what you want. [Wed Jul 30 13:52:12 2014] If you want to use the base system's krb5, it might help. [Wed Jul 30 13:52:13 2014] I want a package that works on another mac [Wed Jul 30 13:52:17 2014] right [Wed Jul 30 13:52:40 2014] I am passing it because the base system's libgssapi_krb5 doesn't provide a useful gss_pseudo_random and is therefore useless for rxgk. You don't have that problem yet. [Wed Jul 30 13:53:31 2014] just set that as an environment variable? [Wed Jul 30 13:53:42 2014] I see that ./configure has --with-krb5 as an option [Wed Jul 30 13:54:10 2014] so, to use the base Kerberos, I want to set that to what... /Library/Frameworks ? [Wed Jul 30 13:54:29 2014] Environment, yes. I think either that or --with-krb5 should work, but, well, sometimes there are bugs. [Wed Jul 30 13:54:38 2014] No, set it to /usr/bin/krb5-config [Wed Jul 30 13:55:11 2014] ok, I see [Wed Jul 30 13:55:13 2014] The openafs build system doesn't know about the OS X frameworks, and I don't think we really care to learn about it. (If we did, we might get a pantsful gss_pseudo_random, though. I forget.) [Wed Jul 30 13:55:21 2014] yes, /opt/local/bin/krb5-config is first in my path [Wed Jul 30 13:56:53 2014] ok, looks like it grabbed the right krb5-config this time [Wed Jul 30 13:58:35 2014] thanks [Wed Jul 30 13:59:22 2014] no problem [Wed Jul 30 14:14:03 2014] cool... the 1.6.9 Mavericks package seems to work [Wed Jul 30 14:14:32 2014] "Have you tried with OS X 10.10?" [Wed Jul 30 14:18:51 2014] that even out, yet? [Wed Jul 30 14:19:02 2014] developer preview/public beta, I think. [Wed Jul 30 14:19:14 2014] either way, no :) [Wed Jul 30 15:37:09 2014] Mavericks .pkg for oafs 1.6.9 at: https://confluence.cornell.edu/display/CNF/Installing+AFS [Thu Jul 31 13:31:33 2014] another friggin kernel security update [Thu Jul 31 14:02:44 2014] 1.6.9 package for Lion now up at https://confluence.cornell.edu/display/CNF/Installing+AFS [Sat Aug 2 12:17:53 2014] hi folks [Sat Aug 2 12:19:31 2014] I have chose not to do a kernel upgrade in gentoo for a while because they haven't done the patching to make openafs work. Is anyone here working on the problem? [Sat Aug 2 16:07:45 2014] johnfg_: the upstream openafs releases support the newer kernels. [Sat Aug 2 16:08:01 2014] The current ebuilds work with a minimal amount of work. [Sat Aug 2 16:08:46 2014] I don't think they are clean enough to be included in the main portage tree yet, but I can post what I'm using if you like [Sat Aug 2 16:09:27 2014] (I'm currently using 1.6.8_pre1). I think it is likely 1.6.9 would work with the same ebuild [Sat Aug 2 17:54:00 2014] johnfg: the ebuilds I'm using for 1.6.9 can be found at /afs/hcoop.net/user/j/ja/jackhill/public/gentoo-afs-ebuilds [Sat Aug 2 17:54:09 2014] They're based off of the ones for 1.6.5 in portage [Sat Aug 2 20:50:54 2014] jackhill: Sounds great. I'll check them out. What kernel are you running on gentoo? [Sat Aug 2 21:55:17 2014] johnfg_: 3.15.7-gentoo [Sat Aug 2 23:00:46 2014] jackhill: Thanks. I'm emerging latest kernel, will configure, then see about using your ebuilds. Thanks again. [Sat Aug 2 23:01:12 2014] If I get stuck, may I ask you questions about your ebuild? [Sat Aug 2 23:44:25 2014] johnfg_: sure. Good luck! [Sun Aug 3 09:15:42 2014] jackhill: I'd had to do my own ebuilds of openafs and openafs-kernel before, but it's been a while. [Sun Aug 3 09:16:20 2014] So, do I just move your ebuilds to /usr/portage/net-fs/, then to an @world update? [Sun Aug 3 10:13:29 2014] johnfg: I've added them to a local overlay http://wiki.gentoo.org/wiki/Overlay/Local_overlay [Sun Aug 3 11:42:23 2014] jackhill: I'm gettin ready for church, so don't have much time, but...i'd done that before, but was encouraged to put them in /usr/portage, maybe for stability. [Sun Aug 3 20:38:15 2014] 20:29 < a-t> http://bpaste.net/show/552627/ [Tue Aug 5 22:18:42 2014] um, wtf. finder took forever to start because it was looking up a bunch of filenames as cells [Tue Aug 5 22:20:10 2014] (I guess, either why is the OS X openafs client doing this, or why is finder or whatever looking for e.g. /afs/mach.kernel?) [Tue Aug 5 22:31:37 2014] and for those who are wondering what I am talking about: http://lpaste.net/108884 during which my latest reboot was hanging at login [Tue Aug 5 22:31:46 2014] (that's just a snippet, there's a lot more) [Wed Aug 6 01:25:41 2014] So I have a really dumb question... anyone have a copy of the next_mach30 AFS client from yesteryears? [Wed Aug 6 02:01:50 2014] nwf: try emailing sipb@mit.edu, that might turn up some people [Wed Aug 6 02:01:59 2014] eichin: Thanks! [Wed Aug 6 10:20:38 2014] hi guys [Wed Aug 6 10:23:26 2014] jackhill: So, after moving your ebuild files to /usr/local/portage/net-fs/openafs and /usr/local/portage/net-fs/openafs-kernel, I added in /etc/make.conf: PORTDIR="/usr/local/portage" [Wed Aug 6 10:23:38 2014] What's my next step? [Wed Aug 6 15:18:27 2014] jackhill: You around now? [Wed Aug 6 15:18:57 2014] I think all I need help with now is how to apply the patch for kernel 3.16.0, then I'll be all set. [Wed Aug 6 15:23:44 2014] For me, the patch is: openafs.git-e284db57f94c8f97ed1c95dcd0bd9518d86c050c.patch. I just manually patched the src for now. [Wed Aug 6 17:24:44 2014] Forgive a dumb question, but: if I'm running dafs, is there any way to force the salvager to run short of logging in to the server and using salvageserver -client? There are some scary warnings around "bos salvage" and "dafs"... [Wed Aug 6 17:26:00 2014] "Well, there's bos exec..." [Wed Aug 6 17:26:00 2014] not currently, iirc [Wed Aug 6 17:26:19 2014] this is a known shortcoming and is being worked on [Thu Aug 7 09:45:22 2014] jackhill: All working well with openafs-1.6.9 built against kernel 3.16.0. Let me know if you want the patch that was needed. Thanks for your help and your ebuild! [Thu Aug 7 09:46:06 2014] johnfg: awesome!! (sorry I wasn't around yesterday). [Thu Aug 7 09:46:16 2014] I haven't yet looked at 3.16, so the patch would be great. [Thu Aug 7 09:48:50 2014] jackhill: May I just copy it to /afs/hcoop.net/user/j/ja/jackhill/public/gentoo-afs- [Thu Aug 7 09:48:54 2014] ebuilds [Thu Aug 7 09:48:58 2014] ? [Thu Aug 7 09:49:11 2014] Or where would you like it? [Thu Aug 7 09:51:12 2014] johnfg: well, you don't have write access there :) [Thu Aug 7 09:51:18 2014] can you email it to me? jackhill@jackhill.us [Thu Aug 7 09:51:44 2014] you bet! [Thu Aug 7 09:52:05 2014] thanks! [Tue Aug 12 06:19:49 2014] hrmm ... shouldn't it be possible to unmount the afs cache partition after stopping the afs client? [Tue Aug 12 06:21:47 2014] ... because somehow i can't. and this seems to lead to troubles on shutdown. ( debian sid, kernel 3.14, openafs client 1.6.9-1 ) [Tue Aug 12 07:35:56 2014] Worf: on GNU/Linux stopping the kernel module has been traditionally a dangerous operation, it wa supposed to have been fixed some time ago. [Tue Aug 12 07:45:07 2014] Walex: uhm ... so, what is the recommended way to do a clean shutdown? :) [Tue Aug 12 07:46:40 2014] Worf: just do a shutdown. The dangerous operation is restarting AFS without a shutdown [Tue Aug 12 07:47:17 2014] Worf: if you have pending AFS clients or operations during shutdown they must die and drain. [Tue Aug 12 07:48:26 2014] Worf: note also that you should unmount '/afs' *before* stopping the AFS client. [Tue Aug 12 07:48:36 2014] Well, i now switched to memcache for some tests, to figure out if it's actually afs that keeps me from doing a shutdown ... [Tue Aug 12 07:49:01 2014] hmm ... will check the init script for that ... [Tue Aug 12 07:49:05 2014] I am moderately surprised that you are allowed to stop the client while '/afs' is mounted. [Tue Aug 12 07:50:28 2014] a quick look to the debian init script seems that this is taken care of [Tue Aug 12 07:58:56 2014] doing some tests ... [Tue Aug 12 11:37:35 2014] https://lists.openafs.org/pipermail/openafs-devel/2014-July/019901.html ff. may be relevant here [Wed Aug 13 05:35:47 2014] hrmm ... if after booting of a relatively freshly installed debian sid the afs client doesn't mount /afs, how could i find out what is wrong? ( i currently need to restart the afs client after boot manually ) [Wed Aug 13 05:37:07 2014] i kinda suspect that the init script is running a bit to early, like before the network is up [Wed Aug 13 09:22:07 2014] sid is going to be on systemd, now, right? [Wed Aug 13 09:22:26 2014] Your hypothesis seems plausible. [Wed Aug 13 12:47:08 2014] secureendpoints1: when looking at a VL.CreateEntryN request in wireshark, it looks like OpenAFS's vos command doesn't preclear the nvldbentry struct and various members of the struct have random stack data in them [Wed Aug 13 12:56:12 2014] I can't say that it would surprise me if that were the case. The WireShark parsers are not 100% correct either. YFSI submitted a number of parser corrections to Wireshark Foundation almost a year ago that to the best of my knowledge have not been included in a release. [Wed Aug 13 13:03:00 2014] do you have a wireshark bz for that? [Wed Aug 13 13:15:46 2014] I don't and Simon is on vacation [Wed Aug 13 16:47:40 2014] hello [Wed Aug 13 16:48:05 2014] has the rw-replication branch been abandoned at this point, or are there plans to pick that back up? [Wed Aug 13 16:49:06 2014] If I remember correctly, YFS supports read-write replication. I don't see there being manpower to create an independent implementation for openafs any time soon; the best hope is for YFSI to donate their implementation. [Wed Aug 13 16:49:48 2014] and that's a paid product, no ? [Wed Aug 13 16:49:53 2014] Correct. [Wed Aug 13 16:50:11 2014] (I am not affiliated with YFS or YFSI.) [Wed Aug 13 16:56:50 2014] anyone know of a way to build an accurate list of files in a vos dump stream as it dumps? [Wed Aug 13 16:57:14 2014] as it stands the dump seems to list all files whether incremental or not [Wed Aug 13 17:32:50 2014] I have a vague memory of hearing that all file names are always included in an incremental dump even if their contents are unchanged, yes. [Wed Aug 13 17:33:25 2014] <[gorgo]_> actually not necessarily [Wed Aug 13 17:33:39 2014] <[gorgo]_> if you use -omitdirs, then the directory contents is not included for directories that haven't changed [Wed Aug 13 17:34:00 2014] any directory contained in a dump will be fully populated but not all directories must be included (if there is no change) [Wed Aug 13 17:48:09 2014] there's a way to distinguish in even regular dumps which files are actually being dumped and which ones are just getting the "i'm still here but haven't changed", but i don't remember what that is off hand. i've got code somewhere for that. [Wed Aug 13 21:28:13 2014] shane_: dumpscan (https://github.com/openafs-contrib/cmu-dumpscan) might be able to be informative? [Wed Aug 13 22:04:42 2014] nwf: yeah, ive looked at that before but afaik that has to be used after the fact [Wed Aug 13 22:55:20 2014] the omitdirs option looks promising, thank you [Wed Aug 13 22:55:30 2014] is that a new option, or when was that added? [Thu Aug 14 00:25:53 2014] vos dump -omitdirs was added in 2007. It requires RPC support on the volserver that was added at the same time. [Thu Aug 14 12:05:47 2014] Hello, I remember seeing that in order to use rxkad, you need to use 1.6.5 or newer. Is this just the case on the servers or do clients need to be 1.6.5 or newer as well? Thanks! [Thu Aug 14 12:11:12 2014] both [Thu Aug 14 12:11:57 2014] I think there is some miscommunication here; "rxkad" is the rx security class which has been available in AFS since before OpenAFS existed. [Thu Aug 14 12:12:34 2014] The 1.6.5 release introduce two related extensions, "rxkad-k5" and "rxkad-kdf", which reduce the dependency of openafs on the use of single-DES in the kerberos infrastructure it relies on. [Thu Aug 14 12:13:22 2014] The main security benefit, of using non-DES keys for the long-term key of the AFS service principal, is in rxkad-k5, which only requires server-side changes [in most cases; there are some exceptions]. [Thu Aug 14 12:14:18 2014] rxkad-kdf builds on top of rxkad-k5, and the main benefit of rxkad-kdf is that the kerberos infrastructure no longer needs to permit the use of single-DES keys [for openafs; there may still be other services requiring it]. [Thu Aug 14 12:23:14 2014] I didn't know rx was in there before. I am looking at eliminating DES. I just wasn't sure if that only mattered between the servers and the kdc or if the 1.6.5 changes mattered for the client communication as well. [Thu Aug 14 12:23:47 2014] (Well, 'rx' is something different (but related) from 'rxkad'.) [Thu Aug 14 12:24:16 2014] To eliminate DES from the KDC, you need to update all the AFS clients. [Thu Aug 14 14:01:05 2014] ok, thanks! [Thu Aug 14 16:31:36 2014] do reads to a file on a dot path always come from the rw volume, or is an RO preferred first by the client? [Thu Aug 14 16:31:43 2014] i thought i remembered reading that somewhere [Thu Aug 14 16:37:54 2014] shane_, the dot path is the rw path. once on the rw path the cache manager stays on the rw path until it crosses a cellular mount point (even if the target volume is in the same cell) [Thu Aug 14 16:38:25 2014] right, just wondering if the client would try read from an RO first [Thu Aug 14 16:39:27 2014] if the mount point being evaluated is located within a .readonly, then a .readonly is preferred unless the mount point is an explicit rw mount point. [Thu Aug 14 16:39:43 2014] the dot path is an explicit rw mount point [Thu Aug 14 16:47:35 2014] if the file is not found then will it try to get it from the rw ? [Thu Aug 14 16:48:02 2014] or only if the path is a mount point created with mkmount -rw or whatever [Thu Aug 14 16:48:10 2014] fs mkmount* [Thu Aug 14 16:49:46 2014] rw and readonly and backup volumes are distinct. there is no failure between volume types. [Thu Aug 14 16:50:03 2014] only failover between .readonly sites for .readonly volumes. [Thu Aug 14 16:50:28 2014] backup volumes can only exist on the same site {file server, partition} as the rw instance [Fri Aug 15 11:11:42 2014] Hi, I have a directory with many thousand files with long names. Is there a tool to see how close I am to exausting the number of file name slots for that directory object? [Fri Aug 15 11:15:31 2014] no. nor would such a tool provide a good answer. you can have free slots and still not be able to create a new file entry if there are insufficient contiguous slots to represent the name. [Fri Aug 15 11:17:04 2014] I believe tom keiser wrote some code to de-fragment directory contents before he left the community but it was never contributed to openafs [Fri Aug 15 11:18:53 2014] secureendpoints1: thanks [Fri Aug 15 13:21:03 2014] hi jackhill, there's some tools in https://github.com/openafs-contrib/afs-tools to at least analyze openafs directory objects. tom's code to defrag wasnt complete (but i think i have it in a github branch) [Fri Aug 15 13:26:21 2014] meffie_: thanks, I'll take a look. [Sun Aug 17 16:29:25 2014] hello [Sun Aug 17 16:29:29 2014] :) [Sun Aug 17 16:32:44 2014] hi [Sun Aug 17 16:34:28 2014] hello [Mon Aug 18 10:54:59 2014] Anyone else getting a 503 error trying to connect to secure-endpoints? [Mon Aug 18 11:07:43 2014] yes. [Mon Aug 18 11:23:18 2014] I'm not sure who their webmaster is, have you mailed them? [Mon Aug 18 11:23:49 2014] Unrelated question: is there an easy way to customize the kerberos for windows and openafs for windows installers to have a default cell/realm setup? [Mon Aug 18 11:29:01 2014] http://www.openafs.org/dl/openafs/1.7.31/winxp/ReleaseNotes/html/index.html#chap_7.html ? [Mon Aug 18 11:29:53 2014] probably something similar for KfW; MSI transforms are pretty much the standard way to add local configuration to Windows installers [Mon Aug 18 11:32:02 2014] Ooh! Thanks. [Mon Aug 18 11:46:55 2014] Interesting. In the latest KFW 4.0.1, I can't find any property that looks like ATHENA.MIT.EDU. Where might I find the default Realm? [Mon Aug 18 12:16:32 2014] There isn't one set in KfW 4.0.1, IIRC. We ship an empty C:\ApplicationData\MIT\Kerberos5\krb5.ini ; customizations should go in there. Unfortunately I don't think that MSI transforms have enough rope to modify it directly. [Mon Aug 18 12:17:31 2014] On the other hand, I don't think that it's a huge amount of effort to roll a custom installer (but I'm obviously biased, since I make the official ones and have the workflow and infrastructure in place). [Mon Aug 18 12:19:01 2014] Unfortunately, I am dealing with students with minimal understanding of software, so the more I can automate, the more I can foolproof. [Mon Aug 18 12:20:50 2014] Sure. We have a custom installer that MIT distributes that does set the default realm to ATHENA [Mon Aug 18 12:21:49 2014] Is that just a matter of adjusting the krb5.ini? i could have the nullsoft installer I wrote overwrite it afterwards. [Mon Aug 18 12:22:06 2014] That's just a matter of adjusting the krb5.ini, yes. [Mon Aug 18 12:22:12 2014] Also, "ew, nullsoft installer". [Mon Aug 18 12:22:14 2014] Or is there registry entries too? What tells KFW and Network Manager? [Mon Aug 18 12:22:24 2014] What kind of installer do you use? It happens to be what I could find easily. [Mon Aug 18 12:22:50 2014] (And I agree that it is a pretty awful scripting language) [Mon Aug 18 12:22:53 2014] Well, it's really libkrb5 that is going to be caring most of the time. I don't know whether NIM will care and/or has separate config bits; I don't use NIM. [Mon Aug 18 12:23:06 2014] The official installer is built using the windows installer toolkit, WiX. [Mon Aug 18 12:24:00 2014] I'll have to check out WiX when I have a moment. It looks certainly more supported than NSIS. [Mon Aug 18 12:24:11 2014] (Also we may want to move over to the other channel. I don't know if anybody cares.) [Mon Aug 18 12:24:35 2014] src/windows/README documents getting a build environment setup and the procedure for building an installer. [Mon Aug 18 12:24:53 2014] Good point. My reason of posting here was originally to get AFS/Kerberos customizations which seemed somewhat relevant. [Mon Aug 18 12:25:10 2014] Do note that "in the command-line path" therein usually means you need to edit the PATH manually; it's not really called out explicitly. [Mon Aug 18 12:26:41 2014] What path are you referring to for this README? [Mon Aug 18 12:28:02 2014] https://github.com/krb5/krb5/blob/master/src/windows/README [Mon Aug 18 12:29:49 2014] Ah! Useful. [Tue Aug 19 04:23:09 2014] For using cygwin with openafs, it recommends mounting /afs as a drive letter. However when I do that I get an error 0x00000035. Any idea what that means? [Tue Aug 19 04:27:04 2014] hm, no [Tue Aug 19 04:27:10 2014] but which OpenAFS version? [Tue Aug 19 04:28:08 2014] and which cygwin? [Tue Aug 19 04:28:47 2014] I do have opoenafs 1.7.31 and cygwin64 and I do have mapped drive on Z:\ for my OpenAFS and I can reach it in cygwin under /cygdrive/z with no error [Tue Aug 19 04:29:08 2014] I'm apparently stupid. Where do I find the version in the AFS Client Configuration? [Tue Aug 19 04:29:54 2014] usual just hoover over the small keylock icon right downer are of windows ;-) [Tue Aug 19 04:30:03 2014] This is 1.7.1700 (found it in the Programs and Features) [Tue Aug 19 04:30:15 2014] ah, you should update nevertheless [Tue Aug 19 04:30:35 2014] and you do have a mapped network drive? [Tue Aug 19 04:30:44 2014] which is accessable from explorer? [Tue Aug 19 04:30:51 2014] I will do that and try again. I do have mapped network drives but for windows shares. [Tue Aug 19 04:31:06 2014] I did mapp \\AFS\cgv.tugraz.at to Z:\ [Tue Aug 19 04:31:13 2014] which is flawless for me [Tue Aug 19 04:31:28 2014] I want to do something similar, which is why I'm confused. The posting that recommended this is at least as old as my client. [Tue Aug 19 04:31:36 2014] and maybe you can update your cygwin installation, too. There should be updates, hehe [Tue Aug 19 04:31:44 2014] Of course. [Tue Aug 19 04:31:50 2014] yeah, the docus are quite old [Tue Aug 19 04:32:04 2014] but if you can reach the mapped drive in explorer, cygwin should reach it, too [Tue Aug 19 04:37:09 2014] take care, OpenAFS updates requires reboot [Tue Aug 19 10:16:12 2014] Cygwin 1.7.19 and later are AFS aware. http://lists.openafs.org/pipermail/openafs-info/2013-June/039594.html [Tue Aug 19 10:18:02 2014] Cygwin does not require a drive letter mapping. You can directly access //afs/cgv.tugraz.at/ [Tue Aug 19 12:53:44 2014] That's odd. I have the latest cygwin installed, \\AFS\ exists, but I can't cd into /afs [Tue Aug 19 12:57:08 2014] UNC paths are accessed with two forward slashes not one [Tue Aug 19 12:57:17 2014] Ohhhh! [Tue Aug 19 12:57:19 2014] and //afs is not a directory [Tue Aug 19 12:57:21 2014] I feel silly. [Tue Aug 19 12:57:47 2014] I was able to cd into //afs/rnd.ru.is [Tue Aug 19 13:28:47 2014] I just installed KfW 4.01 and afs broke. Am I just confused or is there no AFS support in this version? http://web.mit.edu/kerberos/kfw-4.0/kfw-4.0.html [Tue Aug 19 13:29:40 2014] You are not confused. [Tue Aug 19 13:30:12 2014] (I guess I probably should have mentioned this the last time we were talking, since this is the openafs channel. Sorry about that.) [Tue Aug 19 13:33:34 2014] I would be interested in hearing more about how AFS was configured previously such that it "broke", and how the breakage manifested itself, though. [Tue Aug 19 13:38:11 2014] Wow. How is this OK? [Tue Aug 19 13:38:33 2014] "what is 'this'?" [Tue Aug 19 13:38:36 2014] tell me what is broken. [Tue Aug 19 13:38:39 2014] I should have checked before I installed it. I would never have guessed. [Tue Aug 19 13:38:56 2014] If you install KfW 4.0.1, Network Identity Manager refuses to get AFS tokens. [Tue Aug 19 13:39:17 2014] Thank you. [Tue Aug 19 13:39:56 2014] This is on a Windows7 64 bit machine. I think I installed the latest Openafs. [Tue Aug 19 13:39:59 2014] And latest NIM [Tue Aug 19 13:40:20 2014] I believe that the design of KfW 4.0 did not involve running it alongside NIM. (But I wasn't there for those design discussions, so I am not sure.) [Tue Aug 19 13:41:11 2014] KfW 4.x is using the CCAPIv3, but KfW 3.x (and NIM, IIUC) are using the CCAPIv2 for fetching/storing kerberos tickets. [Tue Aug 19 13:41:43 2014] In order for the functionality you want to be present with NIM and KfW 4.x, "someone" would need to teach NIM about the CCAPIv3. [Tue Aug 19 13:44:02 2014] ('aklog' should still work fine. Yes, I realize this is not a good solution.) [Tue Aug 19 13:46:24 2014] Great, but not useful to my students at the moment. [Tue Aug 19 14:44:34 2014] foley: I ending up using Heimdal Kerberos on Windows x64 systems [Tue Aug 19 14:45:02 2014] https://www.secure-endpoints.com/heimdal/#download [Tue Aug 19 14:46:02 2014] I am still using the 1.5.1 version though [Wed Aug 20 02:46:28 2014] kaduk_: also network ID manager with krb5 3.2.2 is still working fin on windows x64 [Wed Aug 20 02:46:37 2014] no need for newer versions [Wed Aug 20 09:31:06 2014] Amiga4000: I take it you don't have any windows 8 machines, then? Anyway, KfW is based off of krb5 1.6, which is EOL for several years now. [Wed Aug 20 09:35:15 2014] Er, KfW 3.2.2* [Wed Aug 20 09:38:15 2014] I do have windows 8,.1 running with KFW 3.2.2 in 64bit [Wed Aug 20 09:38:31 2014] as it just does work flawless [Wed Aug 20 09:39:30 2014] even windows server 2012/2012R2 does work with it^^ [Wed Aug 20 09:41:08 2014] but as we only do OpenAFS with it, we do not need much functionality [Wed Aug 20 11:52:52 2014] hola [Wed Aug 20 11:55:52 2014] ui [Wed Aug 20 11:55:58 2014] jinh [Wed Aug 20 11:55:58 2014] hnijnm,. [Wed Aug 20 11:55:59 2014] hnm [Wed Aug 20 11:55:59 2014] jm{ [Wed Aug 20 11:55:59 2014] bnjmklñ{ [Wed Aug 20 11:55:59 2014] kl,ñ{ [Wed Aug 20 11:55:59 2014] ñ{ [Wed Aug 20 11:56:21 2014] curious [Wed Aug 20 11:56:36 2014] what language might that be? [Wed Aug 20 14:05:16 2014] * thinks it looks like someone was cleaning their keyboard... [Wed Aug 20 17:10:54 2014] wikigazer: spanish and then a cat? [Thu Aug 21 10:28:24 2014] is there a max number of either mount points or symlinks in a volume, these days (well, a small enough number that I care)? [Thu Aug 21 14:22:45 2014] RedFyre: i would think that the normal files per directory limit applies [Thu Aug 21 14:23:51 2014] including the name length stuff [Thu Aug 21 14:25:20 2014] my understanding is that is a per filesystem limit [Thu Aug 21 14:25:33 2014] tho, I guess I'm really talking about either mount points or symlinks [Thu Aug 21 14:25:58 2014] what's the AFS release for your fileserver? [Thu Aug 21 14:26:11 2014] 1.6.9, I believe [Thu Aug 21 14:26:33 2014] then you are certainly running namei - stand by for limits [Thu Aug 21 14:27:40 2014] max vnodes per (namei) volume = 2^26 = 67108853 [Thu Aug 21 14:27:56 2014] half may be "small" (regular files) [Thu Aug 21 14:28:08 2014] half may be "large" (directories) [Thu Aug 21 14:28:18 2014] what about mount points or symlinks? [Thu Aug 21 14:28:23 2014] hold on [Thu Aug 21 14:28:28 2014] (afs mount points, that is) [Thu Aug 21 14:28:48 2014] symlinks are included in the "small" category [Thu Aug 21 14:29:00 2014] and mounpoints are implemented as symlinks in AFS [Thu Aug 21 14:29:18 2014] ok, that's still a limit I'm never going to reach (at least not in my lifetime) [Thu Aug 21 14:29:26 2014] yup [Thu Aug 21 14:29:31 2014] if it was a couple thousand, I'd be worried [Thu Aug 21 14:30:53 2014] but remember all your regular files are also included in that limit [Thu Aug 21 14:31:09 2014] as long as that is per volume, no problem [Thu Aug 21 14:31:22 2014] if you have a volume with close to that many files, and try to add a few more mountpoints, you will not be able to [Thu Aug 21 14:31:34 2014] I hope I don't :) [Thu Aug 21 14:31:35 2014] it's the total that matters [Thu Aug 21 14:36:36 2014] mvitale1: in case he's running linux, it's always namei [Thu Aug 21 14:37:14 2014] yeah, I can never remember where inode is still possible or common [Thu Aug 21 14:37:25 2014] afaik no and no [Thu Aug 21 14:38:43 2014] namei is recommended for all new installations; solaris and a few other platforms still support inode, but it's mostly intended for backward compatibility with old cells [Thu Aug 21 14:40:08 2014] regardless, the limits are much higher for inode, so they are moot if you will never even exceed the lower namei limits [Thu Aug 21 14:48:58 2014] well, the servers were 1.4. when initially installed... [Thu Aug 21 14:49:47 2014] tho, data's been moved on and off a few times w.r.t. upgrades [Fri Aug 22 11:07:36 2014] <_isildur> nwf! fancy seeing you here [Fri Aug 22 11:13:45 2014] _isildur: Oh hi! [Fri Aug 22 11:15:07 2014] <_isildur> heya [Fri Aug 22 11:15:11 2014] <_isildur> hows it going? [Fri Aug 22 11:15:45 2014] <_isildur> i popped in here to as a noobish question, & saw you [Fri Aug 22 11:15:47 2014] <_isildur> been ages [Fri Aug 22 11:17:46 2014] It has! Among other changes in the interim, I'm running an AFS cell with actual users. Admittedly, you can count them in unary on both hands, but still. :) [Fri Aug 22 11:18:53 2014] <_isildur> lol [Fri Aug 22 11:19:24 2014] <_isildur> i'm bringing up a cell right now.. running into some unexpected behaviour [Fri Aug 22 11:19:53 2014] <_isildur> still holding off on asking a noobish question til i'm more sure i need to though haha [Fri Aug 22 11:21:11 2014] Well, if you want you can ask in PM, but lord knows I've asked enough noobish questions of the channel and they haven't kicked me out yet... :) [Fri Aug 22 11:21:37 2014] <_isildur> hahaha [Fri Aug 22 11:21:59 2014] <_isildur> so the people i'm doing this for were very keen , shall we say even enthusiastic, about using a thing called freeIPA [Fri Aug 22 11:22:54 2014] <_isildur> which as best i can tell is an evil merging of kerb5 and ldap, wherein the guts of the kdc have been rippedout and rpelaced with ldap.. it's kind of evil. it seems the main 'advantage' is a web based management, which thank the gods can be avoided and command line tools used.. [Fri Aug 22 11:23:26 2014] <_isildur> now, bringing up an afs cell with _this_ as the kerberos service, i dont know how much of my grief is due to freeipa and how much of it is due to me doing something wrong [Fri Aug 22 11:24:45 2014] Oh man; sounds like samba4. [Fri Aug 22 11:24:50 2014] <_isildur> lol [Fri Aug 22 11:24:52 2014] _isildur: if you can still run kadmin or an equivalent to generate keytabs, you should be fine. Presuambly the clients themselves still use MIT or Heimdal? [Fri Aug 22 11:24:54 2014] In any case, can you kinit against it? [Fri Aug 22 11:25:13 2014] <_isildur> cclausen: kadmin for basically read-only tasks, yeah [Fri Aug 22 11:25:22 2014] <_isildur> oh yes, i can get tickets andget tokens too [Fri Aug 22 11:25:36 2014] <_isildur> i'm just at the stage where i am setting up the first volume - [Fri Aug 22 11:25:59 2014] Probably not IPA-related, then [Fri Aug 22 11:26:07 2014] <_isildur> i've added a few users to system:administrators, bring the servers back with noauth turned off, and then, pfft, tokens look fine but i can't even pts mem myself [Fri Aug 22 11:26:19 2014] pts mem -noauth [Fri Aug 22 11:26:50 2014] (n.b. that it has not been necessary to use -noauth when setting up a cell for quite some time) [Fri Aug 22 11:27:12 2014] <_isildur> kaduk: i ran into some issues with name formatting and encryption type in previous rounds of this install yesterday, was wondering if any of what was needed to make things work at earlier stages, is now non-working [Fri Aug 22 11:27:12 2014] and yeah, if you have a working KeyFile, you can use -localauth [Fri Aug 22 11:27:41 2014] <_isildur> localauth does work [Fri Aug 22 11:27:56 2014] Does the cell name match the realm name? [Fri Aug 22 11:28:03 2014] <_isildur> no [Fri Aug 22 11:28:16 2014] <_isildur> realm name is foo.bar, cell name is fnord.foo.bar [Fri Aug 22 11:29:00 2014] The servers need a krb.conf to indicate that [Fri Aug 22 11:29:17 2014] And, maybe I'm interested in seeing the ~getprinc output for the afs service principal. [Fri Aug 22 11:31:09 2014] <_isildur> sure [Fri Aug 22 11:31:13 2014] <_isildur> paste it here or in a pm? [Fri Aug 22 11:31:40 2014] If it's short, here is fine. Long things we generally put on pastebin or similar [Fri Aug 22 11:32:44 2014] <_isildur> http://www.vaxpower.org/~isildur/afsdebug.txt [Fri Aug 22 11:33:13 2014] <_isildur> names slightly changed to protect the innocent (it's my employer's, so foo.bar of course is not foo.bar) [Fri Aug 22 11:33:26 2014] Er, zh.foo.bar != fnord.foo.bar. Oh, were they just redacted differently? [Fri Aug 22 11:33:44 2014] <_isildur> fnord was me just making it up as i went [Fri Aug 22 11:33:51 2014] <_isildur> fnord being a variable [Fri Aug 22 11:33:53 2014] *nods* [Fri Aug 22 11:34:02 2014] What version of openafs on the servers? [Fri Aug 22 11:34:03 2014] <_isildur> zh being part of the real name not changed [Fri Aug 22 11:34:33 2014] <_isildur> 1.6.9, whatever was latest stable as of, well, yesterday [Fri Aug 22 11:34:51 2014] <_isildur> machine is running centos 6.5 [Fri Aug 22 11:35:11 2014] <_isildur> in case it makes any difference [Fri Aug 22 11:35:19 2014] <_isildur> (at this point i doubt it) [Fri Aug 22 11:36:28 2014] So, that's plenty new to allow you to use an AES key for the long-term key of the AFS service principal, which will probably reduce some of the IPA-induced headaches. (Replacing them with different ones, of course.) [Fri Aug 22 11:36:41 2014] <_isildur> hrm [Fri Aug 22 11:37:00 2014] <_isildur> indeed ive made these all old single des [Fri Aug 22 11:37:07 2014] OPENAFS-SA-2013-003 on http://openafs.org/security has links to further reading [Fri Aug 22 11:37:32 2014] <_isildur> could encryption types be screwing this up? [Fri Aug 22 11:37:50 2014] But, the short form is: leave the keys as a krb5 keytab named rxkad.keytab in the directory where the KeyFile was, and remove the KeyFile (after restarting servers). [Fri Aug 22 11:38:19 2014] I would not expect encryption types to be causing the errors you're seeing, exactly. [Fri Aug 22 11:39:41 2014] <_isildur> nod [Fri Aug 22 11:42:27 2014] [22 15:37] <_isildur> indeed ive made these all old single des [Fri Aug 22 11:42:58 2014] I don't think rxkad.keytab is ever used for single des keys, those are always looked up in KeyFile as I understand it. if that is relevant (still catching up) [Fri Aug 22 11:43:46 2014] the obscuring of names does make things a bit harder, you need to make sure everything matches or have a krb.conf file in the server config telling it which realm(s) map to the local cell [Fri Aug 22 11:43:53 2014] I think it is not relevant at the moment, but worth mentioning nonetheless [Fri Aug 22 11:44:38 2014] <_isildur> i added tickets/tokens to the http://www.vaxpower.org/~isildur/afsdebug.txt text [Fri Aug 22 11:45:41 2014] Yeah, I would double-check the krb.conf in the AFS server config directory, and then bump the debug levels of the fileserver and ptserver and look at the logs [Fri Aug 22 11:46:39 2014] <_isildur> will do [Fri Aug 22 11:54:42 2014] (Server processes should need a restart to pick up any krb.conf changes, IIRC.) [Fri Aug 22 11:59:24 2014] <_isildur> this is Fri Aug 22 17:59:00 2014 PTS_ListElements: code 267269 cid 32766 aid 3 [Fri Aug 22 11:59:41 2014] <_isildur> er, that is me failing to pts mem myself [Fri Aug 22 11:59:48 2014] 32766 is anonymous; your client has not authenticated. "But we knew that already." [Fri Aug 22 11:59:55 2014] <_isildur> indeeed [Fri Aug 22 12:05:15 2014] <_isildur> hm, stracing pts i also see this, [Fri Aug 22 12:05:16 2014] <_isildur> open("/usr/vice/etc/KeyFile", O_RDONLY) = -1 ENOENT (No such file or directory) [Fri Aug 22 12:09:24 2014] oh [Fri Aug 22 12:10:04 2014] hm, in 1.6.5 that could cause problems, I think later versions don't care any more (and shouldn't if you aren't using DES) [Fri Aug 22 12:12:52 2014] <_isildur> i am using DES [Fri Aug 22 12:13:49 2014] <_isildur> so, i linked that to /etc/afs.keytab which i would assume would contain the keys needed.. theres a /usr/afs/etc/KeyFile which seems to be more what it's looking for, but i dont know what's in there (and klist doesnt like its format) [Fri Aug 22 12:14:17 2014] The file format of an AFS KeyFile and a krb5 keytab are very different. [Fri Aug 22 12:15:56 2014] <_isildur> yes, i was just figuring that out too :) [Fri Aug 22 12:16:36 2014] <_isildur> still pointing /usr/vice/etc/KeyFile to /usr/afs/etc/KeyFile, which does contain _something_ , also does not change the behavior [Fri Aug 22 12:16:55 2014] <_isildur> (pointint it to /etc/afs.keytab does break it in a more basic way since it cant even read it) [Fri Aug 22 12:17:13 2014] <_isildur> so yeah, that file missing is not an issue here [Fri Aug 22 12:17:27 2014] I would wonder why it's looking for /usr/vice/etc/KeyFile though. sounds like something is misconfigured, unless this is some backward compatibility thing [Fri Aug 22 12:17:56 2014] <_isildur> dunno [Fri Aug 22 12:18:43 2014] The only idea I came up with was some afsconf library routine just checking what's available, but I don't think afsconf is that clever on the 1.6 branch. [Fri Aug 22 12:19:20 2014] hm. I don't think I've tried it in 1.6 but in 1.4 it certainly checked a bunch of places according to strace [Fri Aug 22 12:19:59 2014] <_isildur> it doesnt seem to change its behaviour based on what ti finds or doesnt find [Fri Aug 22 12:20:47 2014] the "pts" command is the client. It doesn't require a KeyFile or a keytab unless -localauth is specified and you have indicated that "pts mem -localauth" is working. That implies that the key is correctly being obtained by the ptserver. [Fri Aug 22 12:21:44 2014] <_isildur> yes [Fri Aug 22 12:22:45 2014] The problem is going to be the interpretation of the token. Since the Kerberos realm name and the cell name are different, there needs to be a krb.conf file, http://docs.openafs.org/Reference/5/krb.conf.html, which contains the name of the realm to use for local authentication. [Fri Aug 22 12:23:06 2014] I believe that was mentioned three times already. [Fri Aug 22 12:23:46 2014] A failure to properly identify the local authentication realm will produce the behavior you are seeing. Namely all authentication that is not -localauth will be processed as the anonymous user. [Fri Aug 22 12:24:34 2014] kaduk_, it has been mentioned but there has been no indication that such file was created, what is in it, where it was put, or that the server logs indicate that it was processed. [Fri Aug 22 12:24:38 2014] <_isildur> and... krb.conf fixed it [Fri Aug 22 12:26:03 2014] glad to hear it. [Fri Aug 22 12:26:17 2014] <_isildur> many many thanks to all of you guys for helping me with this [Mon Aug 25 13:34:37 2014] * curses at Outlook for repeatedly self corrupting pst/ost files [Mon Aug 25 16:57:02 2014] Hey all, I just got done installing openafs on fedora 20 and I am getting " [Mon Aug 25 16:58:02 2014] afsop_inodecache not configure" when I try to start it. Googling it, I saw a post saying about an old version of afsd but that's not the case here, it is a fresh install of fedora20 [Mon Aug 25 16:59:25 2014] There was an issue recently on some linux systems where the dependencies had not quite been done properly and the openafs kernel module needed to be rebuilt against the current kernel headers. [Mon Aug 25 16:59:37 2014] I don't really remember the details, but suspect it was a debian issue, and not a fedora one. [Mon Aug 25 17:01:35 2014] I did have some issue building the kernel module but that's because the system looks like it installed with a +debug version of the kernel source and the build didn't see the headers in /usr/include. I installed kernel devel and then it built. [Mon Aug 25 17:03:32 2014] the kernal module is installed as openafs.ko and lsmod should libafs [Mon Aug 25 17:03:50 2014] showed* [Mon Aug 25 17:16:48 2014] fun, when I did rmmod libafs, the machine kernel panicked [Mon Aug 25 17:18:54 2014] and upon rebooting, it worked... [Mon Aug 25 18:50:22 2014] Was there a "recent"-ish change (1.6.9?) in how UID-based PAGs work? I seem to have been struck stupid as I was sure that 'AKLOG="su ${USER} -c aklog" k5start -t -f ...' used to be enough to ensure that all processes I spawned on that machine as ${USER} got AFS tokens. [Mon Aug 25 19:10:16 2014] Which distro? (I don't remember anything offhand.) [Mon Aug 25 19:13:40 2014] Debian. Now that I think about it, it might have been "any time in 1.6.1 to 1.6.9" as it might have coincided with a jump from "wheezy" to "jessie". (I hate distro codenames, but that's another conversation.) [Mon Aug 25 19:14:50 2014] I am also a little confused as my session right now has tokens ("tokens" says so) but no _pag keys to be found ("keyctl show" in all its myriad invocations -- @u, @us, @s, %:_ses.1000 -- is coming up void. My Kerberos tickets are there, but no AFS tokens...) [Mon Aug 25 19:38:07 2014] Do you have the grouplist entries? [Mon Aug 25 20:15:13 2014] No; "groups" and "id -a" do not report any AFS PAG GIDs. [Mon Aug 25 21:31:49 2014] you can have tokens without pags. creating a pag is not a side effect of getting tokens [Mon Aug 25 21:35:38 2014] secureendpoints: it sounds like those ubik patches that just hit gerrit were fun to come up with... [Mon Aug 25 21:40:10 2014] the impacted YFS vlserver was in pretty bad shape. ~4800 RPCs were being serviced and couldn't make any progress. [Mon Aug 25 21:41:12 2014] wow. [Mon Aug 25 21:41:40 2014] We designed the servers to scale :) [Mon Aug 25 21:41:50 2014] :) [Tue Aug 26 10:47:38 2014] secureendpoints: where is the token information stashed if not in a PAG structure? [Tue Aug 26 10:48:03 2014] In a per-UID structure [Wed Aug 27 00:25:15 2014] D'oh; I now understand the PAG woes I was having before. The new setup had libpam-afs-session installed, so "su" was landing everything in its own PAG. [Wed Aug 27 00:26:25 2014] Is there any way to transfer keyring-based PAG membership between independent processes? (e.g. can I "invite" someone in to my PAG over a UNIX socket or something?) [Wed Aug 27 06:51:38 2014] Got a new strange behavior on the openafs 32 bit tools installer. If I put it into a network share for others to install, it refuses to run on a 64 bit machine. Copy it locally, and it installs fine. Ideas? [Wed Aug 27 06:53:49 2014] OK. I take it back. Yesterday it refused to run, it is running on my Windows7 64 bit machine fine today. On the Windows8 machine, it refused to run. [Wed Aug 27 07:18:24 2014] I know why I have a USB key with all needed software always along with me [Thu Aug 28 18:06:11 2014] good evening [Thu Aug 28 18:06:35 2014] I was wondering, how weak the weak consistency can be -- i.e. can I use a notebook, disconnected from the server to access files and synchronise later? [Fri Aug 29 02:54:24 2014] telmich: in offline mode that is possible, within some limits [Fri Aug 29 03:03:37 2014] Amiga4000: do you know the right docs to read for this? I am very much interested in using afs for our company / notebooks [Fri Aug 29 03:04:33 2014] that fdeature is not production ready [Fri Aug 29 03:04:47 2014] but I do use my home roaming profile on windows on OpenAS [Fri Aug 29 03:04:51 2014] OpenAFS [Fri Aug 29 03:05:13 2014] in basic: on login, it fetches the profile from the OpenAFS, and on logoff, writes it back, everything else is local [Fri Aug 29 03:06:27 2014] that suits me quite well [Fri Aug 29 03:06:36 2014] but biggest problem is the profile size [Fri Aug 29 03:06:44 2014] and windows itself [Fri Aug 29 03:06:45 2014] :-( [Fri Aug 29 03:49:19 2014] Amiga4000: just wondering [Fri Aug 29 03:49:34 2014] according to the docs there is a caching manager, that will cache the open files [Fri Aug 29 03:49:51 2014] can I not tell it to fetch / cache some directories and disconnect afterwards? [Fri Aug 29 03:50:53 2014] the cache does work on blobs of 256/512/1024/2048 kbyte size, not on whole files [Fri Aug 29 03:51:29 2014] you can get the files ahead, get them in cache and set disconnect modus and OpenAFS tries to work from cache. But that has a lot of issues [Fri Aug 29 03:51:54 2014] https://lists.openafs.org/pipermail/openafs-info/2009-February/030817.html [Fri Aug 29 03:52:24 2014] the disconnect modus in OpenAFS is something very very experimental [Fri Aug 29 03:53:41 2014] also the 3 way merge problem is not really solved at all [Fri Aug 29 03:58:27 2014] in basic: disconnected mode works for files (and path) beiing in cache [Fri Aug 29 04:03:40 2014] Amiga4000: hmm, ok - i hoped that afs can be the solution to what coda never really finished [Fri Aug 29 04:30:14 2014] there is a reason it was never finished ;-) [Fri Aug 29 07:02:22 2014] telmich: coda is the successor to afs [Fri Aug 29 09:50:53 2014] telmich - you might also find interesting the multi part Windows and OpenAFS articles in the OpenAFS newsletter [Fri Aug 29 09:51:02 2014] from last year, I think [Fri Aug 29 09:53:21 2014] ok... do I want to fight with Windows, today, or do I want to fight with Java, today? [Fri Aug 29 09:54:06 2014] I vote for PHP [Fri Aug 29 09:55:18 2014] PHP fight was Wednesday [Fri Aug 29 09:55:26 2014] Python? [Fri Aug 29 09:55:33 2014] I hate snakes [Fri Aug 29 09:55:45 2014] Docket? [Fri Aug 29 09:55:55 2014] er, Docker [Fri Aug 29 09:56:06 2014] nothing wrong with my jeans, thank you [Thu Sep 4 14:47:05 2014] how is the freebsd port of the client these days? i haven't tried it lately [Thu Sep 4 14:48:33 2014] It's okay for light use. [Thu Sep 4 14:49:21 2014] Running something stressful can strand processes in kernelspace, apaprently waiting for more rx packets. [Thu Sep 4 14:50:16 2014] kaduk_: so no on-disk cache? [Thu Sep 4 14:50:26 2014] kaduk_: still only ram-cache [Thu Sep 4 14:50:29 2014] Not in the released version, no. [Thu Sep 4 14:51:43 2014] do you have patches for amd64? [Thu Sep 4 14:52:08 2014] http://gerrit.openafs.org/#change,11317 may help with that. Let me check if I submitted a pullup for 1.6 [Thu Sep 4 14:52:23 2014] what's the dev cycle for inclusion in a release? [Thu Sep 4 14:52:32 2014] Looks like I didn't submit a pullup, whoops. [Thu Sep 4 14:56:08 2014] It is freebsd-only, so I might be able to convince people to let it go into 1.6.10, which is probably going to be released in the next month. I don't think we have a terribly solid timeline for it at the moment, though. [Thu Sep 4 14:58:28 2014] I had started at switching the kernel from using an rx listener model to an rx upcall model (which would likely present different bugs w.r.t. unkillable hangs), but got distracted from that. [Thu Sep 4 14:58:50 2014] kaduk_: well thanks for your work. disk-cache would be really nice [Thu Sep 4 14:59:34 2014] Definitely. (The unkillable hangs in kernelspace are orthogonal to the disk cache, just so we're clear.) [Thu Sep 4 22:11:10 2014] Is "lowest address" for ubik determined with the first or last octet most significant? [Thu Sep 4 22:11:41 2014] "not exactly". [Thu Sep 4 22:16:44 2014] Oh wait, nevermind, I was misremembering where the relic of classful addressing lived, but it's in libafs and not ubik. [Thu Sep 4 22:17:54 2014] I see ntohl((afs_uint32)otherHost) <= ntohl((afs_uint32)vote_globals.lowestHost [Thu Sep 4 22:23:42 2014] Oh boy, does AFS do a "nearest neighbor" estimate by pre-CIDR rules? [Thu Sep 4 22:24:26 2014] So... the presence of ntohl, while good to see, doesn't really answer my question without knowing what the wire ordering is for IPv4 addresses. :) [Thu Sep 4 22:25:18 2014] Unfortunately, with only the two servers I have I can't answer this question experimentally, since 128.220.251.36 is smaller than 128.220.251.38 either way you slice it. [Thu Sep 4 22:25:57 2014] I am wondering, concretely, if both of those or neither of those will be smaller than 128.220.70.76 when I bring it up as not "just a clone". [Thu Sep 4 22:44:38 2014] Sorry, was AFK. Yes, there's use of the pre-CIDR rules in determinging the rank of different (db?) servers by the client. [Thu Sep 4 22:52:41 2014] network order is big-endian, and inet_aton(3) says that in a.b.c.d, a comes first (i.e., is most significant). [Thu Sep 4 23:09:17 2014] So, I think that 128.220.70.76 will be lowest. [Thu Sep 4 23:13:12 2014] But, you can bring in a new server and have it adopt the current sync site without letting the tiebreaker come into play, if you do things carefully. [Fri Sep 5 05:36:45 2014] kaduk_: Sadly, the story here is not nearly that simple. I wish it were. For the moment, tho', it's just a clone so that it never wins the election (though clones still vote, right?) so clients never have to talk to it, which is good, because our off-campus clients couldn't reach it. [Fri Sep 5 09:15:46 2014] nwf: a ubik clone never runs for election it therefore can never win an election. The clone is still part of the quorum and still votes. Whether or not it is possible for clients to reach a server is irrelevant. The list of db servers in the CellServDB file distributed to clients or published in DNS does not have to match the list of db servers in the CellServDB that is distributed to servers. That is why there are se [Fri Sep 5 10:02:32 2014] "That is why there are se" [Fri Sep 5 18:59:33 2014] secureendpoints: Well, the problem is that we do not have split-horizon DNS set up and would like all on-campus clients to know about all servers, because they're all reachable, and off-campus clients to only know about the DMZ servers. Split-horizon may be the right answer, or going and banging down networking's door some more. =P [Sat Sep 6 12:01:39 2014] kaduk_: i would like to send you a patch for freebsd [Sat Sep 6 12:06:25 2014] the preferred method of submitting patches for all platforms is for the author to submit them to http://gerrit.openafs.org. [Sat Sep 6 12:11:59 2014] secureendpoints: ok, thanks, i'll take a look [Sat Sep 6 12:12:41 2014] Please read http://wiki.openafs.org/GitDevelopers/ [Sat Sep 6 12:19:29 2014] secureendpoints: i am pretty sure my patch is a freebsd ports patch, and not an openafs patch [Sat Sep 6 12:20:07 2014] secureendpoints: is the whole fbsd port on gerrit.openafs.org? [Sat Sep 6 12:22:54 2014] I don't work on FreeBSD but if the patch is to OpenAFS code then it goes in the openafs source tree. FreeBSD packaging is at src/packaging/FreeBSD [Sat Sep 6 12:23:34 2014] secureendpoints: ok [Sat Sep 6 12:28:09 2014] * pulls from git [Sat Sep 6 13:49:21 2014] src/packaging/FreeBSD is not always up-to-date with what's in FreeBSD svn. I see I have mail; thanks. [Sat Sep 6 17:35:09 2014] which branch should i work on if i have changes to 1.6.7? [Sat Sep 6 17:42:20 2014] Well, it may depend. [Sat Sep 6 17:42:43 2014] kaduk_: this is for fbsd ports version [Sat Sep 6 17:42:43 2014] Normally the answer is "master", since all patches are supposed to go into master and get cherry-picked back to openafs-stable-1_6_x. [Sat Sep 6 17:43:08 2014] Are you just interested in supporting newer FreeBSD releases, or something else? [Sat Sep 6 17:43:10 2014] kaduk_: i see that 1.6.x is now 1.6.10 and users need the patch nwo [Sat Sep 6 17:43:21 2014] kaduk_: 9.3-RELEASE [Sat Sep 6 17:43:36 2014] kaduk_: 9.2-RELEASE-p10 is EOL [Sat Sep 6 17:43:49 2014] s/nwo/now/ [Sat Sep 6 17:44:51 2014] It is already supported by the openafs-stable-1_6_x branch, so the only work is packaging work. [Sat Sep 6 17:44:59 2014] I have been negligent at that. [Sat Sep 6 17:45:03 2014] secureendpoints encouraged me to pull the dev tree and generate patches that way, but i still thing this is almost a ports issue [Sat Sep 6 17:45:17 2014] I agree that it is just a ports issue. [Sat Sep 6 17:45:22 2014] rather than an openafs issue [Sat Sep 6 17:45:34 2014] kaduk_: ok, then my patch and file should be useful [Sat Sep 6 17:45:45 2014] I was sort of waiting for advice from my mentor about some patch that didn't apply properly, but it dropped off my todo list and I didn't get a reply. [Sat Sep 6 17:46:02 2014] kaduk_: your freebsd mentor [Sat Sep 6 17:46:40 2014] Right, my freebs mentor (hrs) [Sat Sep 6 18:31:20 2014] kaduk_: my patch is just to accomodate freebsd 9.3 [Sat Sep 6 18:31:28 2014] kaduk_: did you take a look? [Sat Sep 6 18:31:48 2014] kaduk_: i used your @freebsd.org addy [Sat Sep 6 18:31:58 2014] I did (well, only at the second one), and I think it is incomplete. [Sat Sep 6 18:32:49 2014] kaduk_: i just worked backwards from compile errors [Sat Sep 6 18:32:56 2014] kaduk_: and did not test live [Sat Sep 6 18:33:14 2014] Hmm, I would have expected compile errors, I guess. Maybe I misremember. [Sun Sep 7 02:08:35 2014] I have a core file from BOS that may be due to bad hardware or an ARM host or something, but the stack trace contains ASCII %EIP values and that makes me very worried. Would anyone like to take a look? [Sun Sep 7 02:17:44 2014] might try asking when mvitale is around [Sun Sep 7 02:17:57 2014] Will do. [Mon Sep 8 11:42:28 2014] If anyone is equipped to compile OpenAFS on Windows, could they take a look at http://gerrit.openafs.org/#change,10966 and tell me what I'm doing wrong? :) [Mon Sep 8 12:01:38 2014] nwf: it seems that my build VM has the windows SDK v6.1 installed, not V6.0A, so the line numbers are not comparable. Also, I don't think it's set up to build master anyway. [Mon Sep 8 12:02:37 2014] But, one hypothesis is that something in the preprocessor namespace is screwing up the system header. [Mon Sep 8 12:09:12 2014] OK [Mon Sep 8 12:09:35 2014] I will try to get a build environment here, but it's unlikely to happen soon. [Mon Sep 8 12:10:01 2014] Well, even just a windows machine with the 6.0A SDK installed would give you some idea what it's complaining about. [Mon Sep 8 12:10:09 2014] A full build environment is a fair bit more work than that, IIRC. [Mon Sep 8 13:02:15 2014] Hmm, of course when I go install the thing that google finds me for "windows sdk v6.0a download", it actually installs into just a 6.0 path. Sigh. [Tue Nov 18 12:13:21 2014] fucking git [Tue Nov 18 12:13:33 2014] what now? [Tue Nov 18 12:14:15 2014] i wanted to update I2c663fc426914e978e98c6003419503b57a020d3 to HEAD [Tue Nov 18 12:14:27 2014] but apparently git merge was the wrong thing to do and now i'm stuck [Tue Nov 18 12:14:55 2014] Okay. [Tue Nov 18 12:15:16 2014] There's only one patchset for that change, corresponding to commit 599b6aeefdf220d6fdaf35409fdc859d7fed2071 [Tue Nov 18 12:15:43 2014] So, assuming you have no other work on the current branch you care about, you can 'git reset --hard 599b6aeefdf220d6fdaf35409fdc859d7fed2071 && git rebase origin/master' [Tue Nov 18 12:17:17 2014] yay, gerrit didn't refuse the subsequent push [Tue Nov 18 12:17:46 2014] ('merge' is basically always the wrong thing for openafs. Exceptions can be made if you are the security officer.) [Tue Nov 18 12:22:39 2014] also, if i could get that Change and Id8ee7f149cdc921989a5de7dda35739147de0014 approved that'd be great [Thu Nov 20 09:19:58 2014] If i upgrade openafs client from 1.6.10~pre1-1 to 1.6.10-2 on my debian, it won't start anymore ( no /afs mountpoint ) - any idea what is causing this or what to do about it? [Thu Nov 20 09:23:13 2014] only thing that comes to mind is something related to the systemd switch [Thu Nov 20 09:24:49 2014] well, yes, i have systemd - but the 1.6.10~pre1-1 version starts ... [Thu Nov 20 09:25:31 2014] I think you'll have to ask the debian package maintainer, then [Thu Nov 20 09:28:32 2014] * pokes kaduk_ [Thu Nov 20 09:50:24 2014] hmm... is "vos release" supposed to send incremental changes by default? [Thu Nov 20 09:52:49 2014] I have a test volume with fixed contents which I've added a remote RO site for, but every "vos release" seems to result in quite a lot of traffic to the RO site [Thu Nov 20 10:04:12 2014] Sigh, just missed Worf. [Thu Nov 20 10:04:43 2014] If he shows up again, please ask him to file a debian bug, which is the proper way to report issues with debian. [Thu Nov 20 10:06:18 2014] dezgot, it can send incrementals, but whether it does or not depends on a number of things. in particular it works by file, not by block. [Thu Nov 20 10:06:28 2014] a 1-byte change to a file sends the whole file [Thu Nov 20 10:06:48 2014] also, rx is not the most optimized protocol >.> [Thu Nov 20 10:07:24 2014] geekosaur, ah ok i didn't know about the whole-file thing [Thu Nov 20 10:07:29 2014] but actually i'm not changing any files at all [Thu Nov 20 10:08:05 2014] then it shouldn't be sending anything, I think. might use -verbose and see what it says it's up to [Thu Nov 20 10:09:11 2014] I'm looking at it with -verbose, but nothing really stands out too much to me [Thu Nov 20 10:09:22 2014] pastebin [Thu Nov 20 10:09:58 2014] ok, just a minute [Thu Nov 20 10:14:00 2014] geekosaur, here is the current release in progress: http://pastebin.com/0vjspg1v [Thu Nov 20 10:15:36 2014] actually now i'm noting that the date in the last line is not actually today's date. is that an issue? the boxes seem to have the right times set [Thu Nov 20 10:19:34 2014] yes, "as of" --- it thinks the previous release did not complete [Thu Nov 20 10:19:55 2014] so it is trying to finish that before it will do a new release [Thu Nov 20 10:20:17 2014] yet it started out with "Re-cloning permanent RO volume"... strange [Thu Nov 20 10:21:10 2014] what version of openafs is this? [Thu Nov 20 10:21:14 2014] and actually i have the previous release operation's log: http://pastebin.com/gq7seaBX [Thu Nov 20 10:21:21 2014] where it claims to have completed successfully [Thu Nov 20 10:22:10 2014] remotebox is 1.6.7, and localbox is 1.6.1 [Thu Nov 20 10:22:11 2014] both of them the same time. probably because that's when it last changed [Thu Nov 20 10:22:35 2014] that is, it's doing the release from the existing RO volume, and that volume was created at the time it says [Thu Nov 20 10:23:06 2014] it is doing a full release instead of an incremental, I don't know why [Thu Nov 20 10:23:15 2014] am I asking for trouble with my 1.6.1/1.6.7 mix? [Thu Nov 20 10:24:53 2014] shouldn't be [Thu Nov 20 10:25:10 2014] the logic for incremental vs. full is a bit arcane, though [Thu Nov 20 10:26:13 2014] by full release, to you mean it says "complete release" ? [Thu Nov 20 10:27:07 2014] meffie, yes "vos release" starts with "This is a complete release of volume 536870975" [Thu Nov 20 10:27:46 2014] I am a bit fuzzbrained today but I recall it mentioning incremental in the forward line [Thu Nov 20 10:28:31 2014] a complete release means it will be releasing data to all the sites. [Thu Nov 20 10:28:55 2014] ah, potentially incrementally? [Thu Nov 20 10:29:02 2014] yes [Thu Nov 20 10:29:23 2014] right, complete release means all sites, incremental is shown later in how it forwards data to each site [Thu Nov 20 10:29:32 2014] by file, like geekosaur says [Thu Nov 20 10:29:37 2014] in your case only the one, since the local RO is a special kind of clone [Thu Nov 20 10:30:11 2014] geekosaur, but there isn't any mention of "incremental" [Thu Nov 20 10:30:21 2014] right, which is why I said it was doing a full release [Thu Nov 20 10:30:33 2014] the opposite of complete release is when vos is trying to recover from a previous failed release [Thu Nov 20 10:30:47 2014] ah ok [Thu Nov 20 10:30:56 2014] it's sending all the data instead of an incremental consisting of only the changed files [Thu Nov 20 10:31:05 2014] in effect it's doing vos dump | vos restore [Thu Nov 20 10:31:59 2014] ok, I follow [Thu Nov 20 10:32:29 2014] I traced through the logic of when it does incremental vs. full dump once, it was ... arcane [Thu Nov 20 10:32:29 2014] is it kind of difficult to figure out what's going on with the incremental vs non-incremental logic? are there logs somewhere I can get at? [Thu Nov 20 10:32:36 2014] ok [Thu Nov 20 10:32:43 2014] maybe i should just blow this away and try it again [Thu Nov 20 10:33:01 2014] i can even blow away the partition on the problematic RO fs [Thu Nov 20 10:33:50 2014] there is however a certain tendency toward safety, which would mean it does a full release if it doesn't quite trust what it sees. which it might not if it thinks nothing has changed but you're doing a release, which is "a bit peculiar" [Thu Nov 20 10:34:36 2014] ah hmm [Thu Nov 20 10:35:09 2014] i can understand the conservative behavior [Thu Nov 20 10:35:38 2014] but it sounds like doing a release when nothing's changed can have a large penalty in that case [Thu Nov 20 10:35:48 2014] is there a way for me to check if the release is unnecessary? [Thu Nov 20 10:36:07 2014] also, your ro is not on the same part as the rw [Thu Nov 20 10:36:21 2014] so it needs to reclone every time [Thu Nov 20 10:36:36 2014] (looking at the pastebin) [Thu Nov 20 10:36:51 2014] maybe i fudged my hostname replacements [Thu Nov 20 10:37:25 2014] well i've got a RW and RO on the same partition on the same host (localbox) [Thu Nov 20 10:37:28 2014] huh? localbox ro and rw are on the same partition [Thu Nov 20 10:37:32 2014] oh, no, i read that wrong. sorry. they are both on b. [Thu Nov 20 10:37:33 2014] partition on remotebox does not matter [Thu Nov 20 10:38:23 2014] nm, my error [Thu Nov 20 10:39:09 2014] I kind of want to test out what geekosaur was thinking [Thu Nov 20 10:39:18 2014] about no changes being suspicious to the logic maybe [Thu Nov 20 10:39:27 2014] and so it conservatively might be doing a non-incremental [Thu Nov 20 10:39:33 2014] but the release it's doing now will take like 11 hours lol [Thu Nov 20 10:39:41 2014] can i just control+c it? [Thu Nov 20 10:39:45 2014] or will that result in a mess? [Thu Nov 20 10:40:21 2014] my exact thought pattern there was "I see no changes but they're releasing, they must be seeing something odd on an RO so I'll force an update" [Thu Nov 20 10:41:24 2014] it's supposed to recover, but it's not something I've liked to trust [Thu Nov 20 10:42:06 2014] ok [Thu Nov 20 10:42:14 2014] the volume and both partitions involved are unimportant [Thu Nov 20 10:42:15 2014] that said, I have repeatedly forced releases to fail and it recovers fine [Thu Nov 20 10:42:24 2014] but i don't want to screw up my vldb, which is important [Thu Nov 20 10:42:30 2014] (by arranging for the token to expire in the middle of the release, for testing_ [Thu Nov 20 10:42:49 2014] vldb will be fine, only question is whether it thinks there's an incomplete release it needs to finish afterward [Thu Nov 20 10:43:06 2014] it won't leave you with garbage in the volume or something like that [Thu Nov 20 10:43:10 2014] ah ok then i will kill it and fiddle around [Thu Nov 20 10:43:32 2014] the only other weird (maybe not?) thing about my setup is that the partitions are all ZFS filesystems [Thu Nov 20 10:43:40 2014] that should not matter [Thu Nov 20 10:44:11 2014] ok, great [Thu Nov 20 10:44:49 2014] there are some things you can do to tune openafs to work better on zfs vice partitions, in particular sync behavior in older 1.6s which can hurt performance a bit [Thu Nov 20 10:47:03 2014] so, what's the latest with OpenAFS on Yosemite? [Thu Nov 20 10:48:02 2014] have not heard anything. I set up Yosemite in a VM and may at some point try to figure out the trick someone found to get unsigned kexts to load without diddling the eeprom switch [Thu Nov 20 10:48:06 2014] http://gerrit.openafs.org/#q,status:open+project:openafs+branch:openafs-stable-1_6_x+topic:yosemite,n,z [Thu Nov 20 10:48:35 2014] the eeprom switch? [Thu Nov 20 10:48:45 2014] however I am still recovering from a major sinus infection and am spending most of my time the past week either choking or trying to sleep >.> [Thu Nov 20 10:48:45 2014] that's not referring to the boot args modification, is it? [Thu Nov 20 10:48:50 2014] yes [Thu Nov 20 10:49:32 2014] What is this "trick" ? [Thu Nov 20 10:49:53 2014] and I think I meant nvram but I'm fuzzbrained >.> also maybe that is done with the plist file, I don't recall offhand what that thing was [Thu Nov 20 10:50:34 2014] well, you made it sound like there's some other trick other than the nvram boot-args=kext-dev-mode=1 [Thu Nov 20 10:51:34 2014] some things can be done by modifying a plist-ish file related to the kernel [Thu Nov 20 10:51:43 2014] hunh [Thu Nov 20 10:51:48 2014] I didnt think this was one of them [Thu Nov 20 10:52:25 2014] dunno [Thu Nov 20 11:08:55 2014] * loves new linux kernels and the way they break openafs [Thu Nov 20 11:09:10 2014] more fun? :) [Thu Nov 20 11:09:34 2014] \me loves new linux kernels and the way they make YFS go faster [Thu Nov 20 11:10:08 2014] :/ [Thu Nov 20 11:12:12 2014] /home/jsbillin/code/openafs-1.6.10/src/libafs/MODLOAD-3.17.3-300.fc21.x86_64-SP/osi_sysctl.c:37:8: error: unknown type name ‘ctl_table’ [Thu Nov 20 11:13:02 2014] ah, I see it's been fixed in gerrit [Thu Nov 20 11:13:04 2014] http://gerrit.openafs.org/11549 [Thu Nov 20 11:13:11 2014] heh, I was just looking at that [Thu Nov 20 11:13:17 2014] I forgot to look at the merged ones [Thu Nov 20 11:13:28 2014] We just have a couple more things to do before we issue a 1.6.11pre1, IIRC. [Thu Nov 20 11:13:40 2014] I suppose that means I should build against 1.6.11pre1 [Thu Nov 20 11:13:48 2014] Well, it doesn't exist yet. [Thu Nov 20 11:14:20 2014] it's in openafs-stable-1_6_x though? [Thu Nov 20 11:14:32 2014] Yes. [Thu Nov 20 11:14:47 2014] There are some other fixes not in openafs-stable-1_6_x yet which are needed for 3.18 kernels [Thu Nov 20 11:14:53 2014] blast [Thu Nov 20 11:15:07 2014] well, Fedora 21 is 3.17 [Thu Nov 20 11:16:21 2014] RedFyre, oh, I did mention the other thing. someone claimed to have discovered a way to load unsigned kexts via dedicated LaunchDaemons. I intend to test it at some point and if it works try to put together an openafs package using it [Thu Nov 20 11:16:31 2014] except our Mac packaging is an insane mess [Thu Nov 20 11:16:37 2014] well [Thu Nov 20 11:16:46 2014] is there any packaging that isn't? [Thu Nov 20 11:17:08 2014] gawd, MS Support is a PITA [Thu Nov 20 11:17:09 2014] The FreeBSD packaging is not so bad, if I do say so myself ;) [Thu Nov 20 11:17:24 2014] it relies on stuff that was deprecated and removed from xcode years ago, and adding it back in didnt work for me on 10.9 [Thu Nov 20 11:19:53 2014] hunh... once I downloaded the packagebuilder, it worked fine [Thu Nov 20 11:20:19 2014] I got a package which failed to install [Thu Nov 20 11:20:44 2014] I've successfully built on 10.9 (but not on 10.10) [Thu Nov 20 11:21:07 2014] the issue I ran into with 10.9 was a package that referenced libraries in fink instead of the native libs, but that was easily fixable [Thu Nov 20 11:21:30 2014] heh, yes, I ran intot hat with respect to macports, likewise easily fixable [Thu Nov 20 11:21:57 2014] the vm I set up for 10.10 is deliberately stock so I don't have to mess with that [Thu Nov 20 11:22:02 2014] but honestly, otherwise it just worked [Thu Nov 20 11:22:15 2014] 10.10 I couldn't build due to some kerberos erros... maybe someone has fixed those [Thu Nov 20 11:22:41 2014] Do you remember anything more specific than just "kerberos errors"? [Thu Nov 20 11:22:41 2014] one can run stock mac in a vm? [Thu Nov 20 11:22:51 2014] IIRC 10.10 removes support for 1DES completely. [Thu Nov 20 11:22:59 2014] uh... it would be in the chat logs from a while back [Thu Nov 20 11:23:05 2014] but no, I don't remember [Thu Nov 20 11:23:07 2014] You should be able to run stock mac in a VM, if the host is also a mac. [Thu Nov 20 11:23:25 2014] hmmm.. using what hypervisor? [Thu Nov 20 11:23:59 2014] I use VMware Fusion [Thu Nov 20 11:24:14 2014] I mostly expect at least Parallels to also work [Thu Nov 20 11:25:12 2014] I am doing it in Fusion, have in the past used Parallels [Thu Nov 20 11:25:32 2014] vbox is very spotty and prone to crash randomly virtualizing OS X [Thu Nov 20 11:25:58 2014] I just dual boot my box for now [Thu Nov 20 11:27:32 2014] my box is never idle, dual boot is Not An Option (tm) [Thu Nov 20 11:27:50 2014] it also has 16GB specifically so I can VM as desired >.> [Thu Nov 20 11:28:07 2014] *16GB high-five* [Thu Nov 20 11:29:31 2014] ok .looks like the pastebin s of those build errors on Yosemite have expired [Thu Nov 20 12:14:36 2014] well, yay. building openafs from openafs-stable-1_6_x works on the latest updates to FEdora 21 [Thu Nov 20 14:30:50 2014] how do you initiate a volume salvage with dafs? [Thu Nov 20 14:44:11 2014] I thought the salvage server was automatically involved when problems were encountered in a volume? [Thu Nov 20 14:45:40 2014] http://docs.openafs.org/Reference/8/bos_salvage.html [Thu Nov 20 14:45:50 2014] and there is the -forceDAFS option [Thu Nov 20 14:47:43 2014] i have a volume that shows could not attach [Thu Nov 20 14:47:55 2014] and the salvager is failing to salvage it, recommending a general salvage [Thu Nov 20 14:48:10 2014] but my understanding was that is not recommended with dafs [Thu Nov 20 14:48:30 2014] also you used to be able to just remove the .vol file for the volume doing that to fix it, and that doesnt seem to be working now [Thu Nov 20 14:48:56 2014] http://docs.openafs.org/QuickStartUnix/DAFS003.html [Thu Nov 20 14:49:05 2014] "In normal DAFS operation, you should not need to ever run bos salvage. However, if you suspect a bug, or that there is corruption in a volume that the fileserver has not detected, you can run bos salvage to manually issue a salvage." [Thu Nov 20 14:49:38 2014] hm, that means all my other rw's on that server go offline during the salvage yes [Thu Nov 20 14:50:05 2014] I thought if you salvaged just a single volume, just that one volume went offline [Thu Nov 20 14:50:07 2014] or would it just run on the individual volume [Thu Nov 20 14:50:12 2014] you should test in non-production though [Thu Nov 20 14:50:21 2014] yeah [Thu Nov 20 14:50:29 2014] I have never actually used dafs [Thu Nov 20 14:50:31 2014] kind of an emergency at this point though [Thu Nov 20 14:51:51 2014] would you move the .vol file out of there? [Thu Nov 20 14:51:56 2014] like the old way [Thu Nov 20 14:54:26 2014] I only ever did that after having remove a volume [Thu Nov 20 14:54:48 2014] are you able to vos dump the volume? I'd make sure you have a backup [Thu Nov 20 14:55:04 2014] i have backups and about 3 RO's i could convert if i had to [Thu Nov 20 14:55:07 2014] ah, ok [Thu Nov 20 14:55:31 2014] in fact the backup from last night is still running on the .backup volume [Thu Nov 20 14:55:47 2014] the salvage should only touch the RW right, not backup or readonly? [Thu Nov 20 14:56:19 2014] the backup and readonly that are on the same vice partition contain links into the same data [Thu Nov 20 14:56:40 2014] an RO on a different server shouldn't be affected though [Thu Nov 20 14:56:43 2014] ok [Thu Nov 20 15:06:28 2014] i seems to have worked but dafs keeps scheduling it for a salvage [Thu Nov 20 15:06:32 2014] is there a way to unflag it? [Thu Nov 20 15:09:22 2014] short of restarting the file server [Thu Nov 20 15:25:34 2014] the manual salvage worked, i had a process trying to initiate a release [Thu Nov 20 15:25:45 2014] once i stopped that and ran the manual salvage again, it was able to unflag it [Thu Nov 20 15:25:52 2014] volume seems ok [Thu Nov 20 15:52:11 2014] thanks for the tip btw [Mon Nov 24 05:20:00 2014] * pokes kaduk_ [Mon Nov 24 08:57:18 2014] Amiga4000: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=770815 [Mon Nov 24 09:49:30 2014] hey all, I am trying to compile on 3.17.4 and getting the following http://pastebin.com/G7jFMS96 [Mon Nov 24 09:54:47 2014] I am going to try 1.6.10 [Mon Nov 24 10:03:41 2014] same issue [Mon Nov 24 11:40:15 2014] I just tried to release a huge (~400GB) volume over WAN [Mon Nov 24 11:40:29 2014] it went for several days and then "vos release" failed with this: http://pastebin.com/zzKJzizu [Mon Nov 24 11:41:14 2014] although my token/tickets aren't expired, I guess maybe I should have used -localauth? [Mon Nov 24 11:41:40 2014] can anyone advise if there's any way to recover from this? [Mon Nov 24 11:41:40 2014] yes, -localauth is best for vos release or any long-running vos operation [Mon Nov 24 11:42:10 2014] since it's just a release, you should be able to re-run and vos should sort it out for you. [Mon Nov 24 11:42:35 2014] mvitale: ok, thanks. good to know. Do you expect it will need to send the 400GB again? [Mon Nov 24 11:42:42 2014] are you sure your token didn't expire? because that's what it looks like in the output you provided [Mon Nov 24 11:42:54 2014] that depends [Mon Nov 24 11:43:09 2014] I can't say for sure without knowing a bit more [Mon Nov 24 11:43:13 2014] i'm probably just missing something, but i can say that I still have tokens and can do things that require authentication [Mon Nov 24 11:43:57 2014] perhaps you have a renewable ticket/token? [Mon Nov 24 11:44:45 2014] if it renews, vos doesn't see it - it can only use the creds present at the beginning of each (sub)operation [Mon Nov 24 11:45:47 2014] I'm sure something like this happened [Mon Nov 24 11:45:59 2014] but I have inappropriately long ticket times set up [Mon Nov 24 11:46:23 2014] I suggest trying the same command again w/ -localauth [Mon Nov 24 11:46:28 2014] klist says that I authenticated Nov 6 and it will keep until Dec 6 [Mon Nov 24 11:46:39 2014] um [Mon Nov 24 11:46:58 2014] so maybe your token didn't expire.... [Mon Nov 24 11:47:06 2014] how about 'tokens' ? [Mon Nov 24 11:47:18 2014] (should be the same as what klist says, but...) [Mon Nov 24 11:47:26 2014] "Expires Dec 6" [Mon Nov 24 11:47:53 2014] okay, stand by [Mon Nov 24 11:48:11 2014] ok, will do [Mon Nov 24 11:50:55 2014] vos -version [Mon Nov 24 11:51:44 2014] rxdebug 7005 -version [Mon Nov 24 11:52:00 2014] (for both source and target volservers) [Mon Nov 24 11:52:46 2014] dezgot: ^ [Mon Nov 24 12:22:19 2014] mvitale, on the machine running vos: openafs 1.6.7 [Mon Nov 24 12:22:43 2014] on the host with the RW volume: AFS version: OpenAFS 1.6.1-3+deb7u2-debian built 2014-04-08 [Mon Nov 24 12:23:04 2014] and on the destination: AFS version: OpenAFS 1.6.9 built 2014-06-12 [Mon Nov 24 12:23:30 2014] the source is Debian/Linux and the destination runs OmniOS, if that matters [Mon Nov 24 12:23:43 2014] no, should not matter for this [Mon Nov 24 12:24:24 2014] what do you see in the VolserLog for source and target machines? [Mon Nov 24 12:26:47 2014] one sec, i'll pastebinit the end of them [Mon Nov 24 12:28:54 2014] the end of the source VolserLog looks like http://pastebin.com/1dPXHSH5 [Mon Nov 24 12:29:15 2014] mvitale, and the end of the destination VolserLog looks like http://pastebin.com/LTNavqdf [Mon Nov 24 12:31:17 2014] mvitale, sorry: i need to leave the computer for a bit but I'll be back to check for any responses in an hour or so [Mon Nov 24 12:31:38 2014] k [Mon Nov 24 12:50:53 2014] desgot: I also need to see the exact vos command you issued [Mon Nov 24 12:51:09 2014] gah, dezgot: [Mon Nov 24 12:51:21 2014] when you return [Mon Nov 24 16:03:10 2014] mvitale, it was: vos release -verbose shared [Mon Nov 24 16:03:45 2014] okay, give me a few minutes, I'll be right with you [Mon Nov 24 16:04:36 2014] sure thing [Mon Nov 24 16:50:09 2014] dezgot: okay [Mon Nov 24 16:53:47 2014] I found one possible fix you might need (in 1.6.3) on your source fileserver - but honestly I'm not sure that's really going to fix this. So I think your best bet is to retry the release first, only use -localauth this time. If that still fails, then the only other suggestion I have is to upgrade your 1.6.1 fileserver to 1.6.3 or higher to pick up these: gerrit.openafs.org/#change,9279 and 9280 [Mon Nov 24 16:54:15 2014] perhaps others here might have other ideas... [Mon Nov 24 16:54:32 2014] mvitale, ok, thanks. [Mon Nov 24 16:54:51 2014] what do you think my odds are of having to re-send the 400GB? [Mon Nov 24 16:55:20 2014] is there anything i can do to try to increase my chances of not needing to send it all again? [Mon Nov 24 16:55:28 2014] well, it looks like the whole thing was transmitted successfully based on what I see in the logs [Mon Nov 24 16:55:49 2014] so I think there's a good chance you won't have to send the whole thing [Mon Nov 24 16:56:06 2014] i can see the .readonly volume on the receiver [Mon Nov 24 16:56:07 2014] checking the vos options available to you, stand by [Mon Nov 24 16:56:12 2014] ok, thanks [Mon Nov 24 16:58:01 2014] "vos listvol receiver" shows shared.readonly as "On-line", and "vos listvldb shared" shows the problematic host's RO volume as "Old release" [Mon Nov 24 16:58:09 2014] (in case any of that matters at all) [Mon Nov 24 16:59:25 2014] yes, from the logs it appeared that only the vldb update did not succeed, and your info confirms that's probably the case. [Mon Nov 24 17:00:07 2014] so it probably will not retransmit the whole thing. [Mon Nov 24 17:00:25 2014] but if vos decides it wants to anyway, there's not much you can do (with options) to prevent it. [Mon Nov 24 17:00:44 2014] s /not much/nothing/ [Mon Nov 24 17:01:01 2014] except monkeying with the vldb and taking your chances [Mon Nov 24 17:01:09 2014] of course I do not recommend that. [Mon Nov 24 17:06:19 2014] ok, got it [Mon Nov 24 17:06:58 2014] I'll retry and if it retransmits the whole thing I'll wait it out [Mon Nov 24 17:07:23 2014] it if fails again I'll upgrade the source server to something 1.6.3 or beyond [Mon Nov 24 17:08:20 2014] okay - hope it works for you this time. [Mon Nov 24 17:09:30 2014] thanks for all your time. I'll be sure to pop in and let you know what happens in any case [Mon Nov 24 17:10:09 2014] you're very welcome. [Mon Nov 24 18:35:43 2014] mvitale, for the record I'm pretty sure it's sending everything again [Mon Nov 24 18:36:30 2014] it freed all of the space that was used on the receiver's partition [Mon Nov 24 18:36:37 2014] oh well lol [Mon Nov 24 18:36:54 2014] will report back in 3-7 days [Tue Nov 25 05:03:37 2014] trying to figure out when to use and when not to use -afsdb ... [Tue Nov 25 08:32:37 2014] these days? when to use is "usually" [Tue Nov 25 08:33:10 2014] if you're in a high security installation where you cannot trust external DNS then you would omit it [Tue Nov 25 08:39:32 2014] geekosaur: maybe i read http://docs.openafs.org/Reference/5/CellServDB.html wrongly, but to me this sounds as if DNS is used with -afsdb and not without ..? [Tue Nov 25 08:41:55 2014] err ... misread your post ... [Tue Nov 25 13:15:12 2014] I'm surprised there's no openafs in the centos repos. [Tue Nov 25 13:15:36 2014] I see it's on openafs.org. What's the best guide for installing openafs on centos 7? [Tue Nov 25 13:15:45 2014] Any recommendations are appreciated. [Tue Nov 25 13:15:58 2014] you shouldn't be surprised; they have Issues with kernel modules [Tue Nov 25 13:18:32 2014] geekosaur: I used to run centos 5, and ran openafs on it, but it has been a while. I thought that was one of the pluses for centos 7 and other redhat clones, that they'd run openafs pretty easily. [Tue Nov 25 13:18:57 2014] Doesn't scientific have it almost as part of the regular install? [Tue Nov 25 13:19:24 2014] geekosaur: how does this page look: http://www.dartmouth.edu/comp/soft-comp/datastorage/afs/afs-linux.html#rhel6 ? [Tue Nov 25 13:24:49 2014] scientific linux has it, scientific linux has different rules from rhel/centos/fedora [Tue Nov 25 13:25:18 2014] Does that page look like a decent guide to you? [Tue Nov 25 13:25:53 2014] and up through rhel6/centos6 openafs.org provided rpms, but starting with el7 they've stopped [Tue Nov 25 13:26:15 2014] geekosaur: so there's *not* rpm's for centos 7? [Tue Nov 25 13:26:53 2014] I don't know what the current state is [Tue Nov 25 13:27:08 2014] it may be in one of the add-in repos currently [Tue Nov 25 13:27:12 2014] Actually, they do have: http://openafs.org/dl/openafs/1.6.10/openafs-release-rhel-1.6.10-1.noarch.rpm [Tue Nov 25 13:27:50 2014] there is ongoing worh with rh's storage sig which may see openafs available as an add-on repo of some kind [Tue Nov 25 13:27:54 2014] *work with [Tue Nov 25 13:28:26 2014] Upstream OpenAFS decided to stop distributing rpms for EL7 and higher, IIRC. [Tue Nov 25 13:28:36 2014] yes [Tue Nov 25 13:28:42 2014] which I already said [Tue Nov 25 13:28:51 2014] of course nobody else seems to have taken up the slack... [Tue Nov 25 13:29:18 2014] stuff going in for the future but nothing now [Tue Nov 25 13:29:38 2014] Is what I pasted the current release? It appears to be so. [Tue Nov 25 13:30:26 2014] is it for el7 though, is the question [Tue Nov 25 13:30:39 2014] I see an unversioned rpm that does not specify which OS releases it supports [Tue Nov 25 13:30:54 2014] I know that openafs.org does not intend to support el7 RPMs [Tue Nov 25 13:31:34 2014] So, would you recommend my building from source? [Tue Nov 25 13:32:08 2014] that's probably the best currently available course of action [Tue Nov 25 13:32:39 2014] And since this is such a new install, maybe I'll think real seriously of switching to scientific linux. [Tue Nov 25 13:32:57 2014] Do you guys have any recommendations? [Tue Nov 25 13:36:05 2014] Maybe I'll try scientific in a `box` I guess what centos calls a vm, and see how I like it. [Wed Nov 26 11:35:51 2014] how do I build the rpm for 1.4.15? I need to update a fedora13 system so I can use rxkad [Thu Nov 27 02:41:24 2014] Hi. Is this project still under development? Anyone running on FreeBSD? [Thu Nov 27 03:01:18 2014] it is still in dev and I do not know about freebsd [Thu Nov 27 03:01:50 2014] https://wiki.freebsd.org/afs should help [Thu Nov 27 07:52:40 2014] are there any APIs to automate AFS administration? [Thu Nov 27 21:44:39 2014] I've got a brand new cell, and I issue fs setacl /afs system:anyuser rl as it says in the docs, I get fs:'/afs': Operation timed out [Thu Nov 27 21:44:58 2014] bos status on the server shows everything is running normally [Thu Nov 27 21:46:34 2014] I do have a token. Anyone know what might cause this? [Thu Nov 27 22:03:55 2014] using dynroot, which is generally the default on linux [Thu Nov 27 22:04:31 2014] /afs isn't actually a volume, it's synthetic and you can't change permissions, create things in it, etc. [Thu Nov 27 22:04:53 2014] "Operation timed out" is because it's trying to contact a server for it, but there isn't one [Thu Nov 27 22:06:28 2014] So do I just skip that step? [Thu Nov 27 22:06:35 2014] yes [Thu Nov 27 22:08:14 2014] well, if you have an actual root.afs volume you can manipulate it directly. see "With -dynroot" at http://wiki.openafs.org/SolarisQuickStart/#index10h2 [Thu Nov 27 22:13:34 2014] Now I'm getting: fs: cell dynroot not in /usr/local/etc/openafs/CellServDB [Thu Nov 27 22:13:37 2014] What is dynroot anyway? [Thu Nov 27 22:14:38 2014] traditionally, clients mount root.afs on /afs and that volume contains mountpoints for the local cell and other cells. [Thu Nov 27 22:15:29 2014] dynroot fakes a root volume containing mountpoints for cells listed in the CellServDB, plus will create new mountpoints on the fly if you try to access a cell that is not in CellServDB but whose DNS publishes SRV records for it [Thu Nov 27 22:22:39 2014] fs examine /afs failes, but fs examine /afs/mine.example works. I assume its related [Thu Nov 27 22:23:00 2014] yes [Thu Nov 27 22:23:11 2014] because /afs is not a real volume, it's a fake dynroot [Thu Nov 27 22:23:39 2014] Okay, so I'll never try to work with it. So now can I start using /afs/mine.example as a datastore? [Thu Nov 27 22:23:50 2014] Eh, I'll continue with the docs [Thu Nov 27 22:26:21 2014] Is the idea that I populate CellServDB with my own and other sites? [Thu Nov 27 22:26:30 2014] For filesharing reasons? [Thu Nov 27 22:27:39 2014] I thought I remembered being able to list the public directories of universities [Thu Nov 27 22:28:05 2014] ... which was pretty barren if I recall [Thu Nov 27 22:32:46 2014] there is a common CellServDB available from grand.central.org; most Linux packages should provide it for you, other platforms you may need to download it yourself [Thu Nov 27 22:50:10 2014] I see [Thu Nov 27 22:50:53 2014] So a partition is the physical space that afs uses to store volumes. Volumes become a directory under the cell in /afs/cellname. Do I have that right? [Thu Nov 27 22:51:11 2014] And each volume can have its own replication strategy [Thu Nov 27 22:51:35 2014] So if my partition is too small to hold the volume I want, then I just need to create a bigger partition, correct? [Thu Nov 27 22:57:03 2014] and move the volume to it, yes [Thu Nov 27 22:57:18 2014] or indeed install a new fileserver and move the volume to that [Thu Nov 27 23:05:50 2014] Ah, you can move volumes between partitions [Thu Nov 27 23:06:02 2014] Are partitions globally unique, or only unique to the fileserver? [Thu Nov 27 23:06:17 2014] unique to the fileserver [Thu Nov 27 23:08:08 2014] hm, depending on what you meant by that question, partition "a" on fileserver file1 is distinct from partiiton "a" on fileserver file3 or whatever [Thu Nov 27 23:09:25 2014] note also that you can move volumes between servers just as easily as between partitions on the same fileserver; the vlserver keeps track of that and clients that have registered interest in a particular volume are notified if it is moved [Thu Nov 27 23:11:02 2014] And the creations, destruction and movment of volumes is all done with the vos command, correct? [Thu Nov 27 23:11:26 2014] yes [Thu Nov 27 23:11:28 2014] Can volumes be resized? [Thu Nov 27 23:11:39 2014] yes [Thu Nov 27 23:13:31 2014] volumes don't actually have a "size" per se, they have an initial quota. "fs setquota" will change the quota for a volume (applied to the volume root) or for any directory in the volume [Thu Nov 27 23:13:55 2014] and otherwise the limit on a volume's size is essentially that of the partition it resides on [Thu Nov 27 23:14:08 2014] Excellent [Thu Nov 27 23:17:43 2014] Do I need the readonly volumes here? http://pastie.org/9747891 [Thu Nov 27 23:18:08 2014] How am I to distinguish which volume I'm actually trying to use? [Thu Nov 27 23:25:47 2014] if a mountpoint resides in an r/w volume, it will mount an r/w volume. if it is in an r/o volume, it will by default mount an r/o volume unless created with the -rw option [Thu Nov 27 23:27:19 2014] so generally you create r/o volumes for your roots (root.afs for non-dynroot, and root.cell) and then for a tree of (say) user home directories which you want mounted r/w you could create a user.home volume and explicitly mount it -rw, then mountpoints created in that will be mounted r/w [Thu Nov 27 23:28:53 2014] there are also advantages in having r/o replicas on multiple fileservers for redundancy or to spread out load (not that for a small test cell you'll have much load) [Thu Nov 27 23:31:20 2014] so root.afs is for /afs, root.cell is for /afs/mine.example, and then I'll create new volumes for /afs/mine.example/home or some such, right? [Thu Nov 27 23:31:41 2014] that's how it's usually done, yes [Thu Nov 27 23:32:12 2014] Cool, time to move some home directories :) [Thu Nov 27 23:33:30 2014] remember to "fs mkmount -rw" the mountpoint /afs/mine.example/home so user volumes uner it automatically get mounted r/w [Thu Nov 27 23:38:22 2014] I just realized I didn't actually build the voluem in a real zfs. can I stop the services and just move all the files in /vicepa to the zfs? [Thu Nov 27 23:41:06 2014] I'm going to try [Thu Nov 27 23:43:29 2014] Seemed to work [Thu Nov 27 23:44:11 2014] What is the /afs/.mycell for? [Thu Nov 27 23:45:21 2014] fs mkmount /afs/mycell home -rw gives me the old junk error: fs: cell dynroot not in /usr/local/etc/openafs/CellServDB [Thu Nov 27 23:45:26 2014] Is that just a warning? [Thu Nov 27 23:46:52 2014] I mentioned that it mounts the r/o volume by default? The .-prefixed one is r/w [Thu Nov 27 23:47:04 2014] otherwise it'd be hard to do things like create mountpoints [Thu Nov 27 23:47:39 2014] I think you wanted /afs/mycell/home as the mountpoint name? [Thu Nov 27 23:47:47 2014] right [Thu Nov 27 23:48:04 2014] /afs/mycell should be your root.cell (but r/o, so really you want /afs/.mycell/home) [Thu Nov 27 23:48:21 2014] strange [Thu Nov 27 23:49:13 2014] so I need to use the path /afs/.mycell for everythign rw? [Thu Nov 27 23:49:25 2014] What do I need to read to understand this? [Thu Nov 27 23:49:26 2014] yes [Thu Nov 27 23:49:39 2014] So user homedirs will be /afs/.mycell/home/userdir [Thu Nov 27 23:49:48 2014] unless you want to remove the r/o replica for root.cell, but then everything mounts r/w by default [Thu Nov 27 23:50:09 2014] right, but you create that with the -rw option so stuff under it will be mounted r/w by default instead of r/o [Thu Nov 27 23:50:09 2014] Is that undesireable for some reason? [Thu Nov 27 23:50:47 2014] well, for a small cell it's okay. for a large cell you generally want replicas of key stuff like root.cell, and you can't replicate r/w volumes (yet) [Thu Nov 27 23:51:31 2014] Well, I'd like to do/learn best practice. My homedir is important to me, though I have backups [Thu Nov 27 23:51:37 2014] * notes that he's going to have to go to bed soon [Thu Nov 27 23:52:02 2014] No trouble. Thanks for the help tonight [Thu Nov 27 23:52:59 2014] Wait so if I want to replicate my home dir, how do I keep it rw? [Fri Nov 28 00:17:14 2014] AFS does not support replication of read/write data [Fri Nov 28 00:18:01 2014] typically you would replicate the read only volumes that make up the paths containing the mount points to the user's home directories but not the user's volumes themselves [Fri Nov 28 00:18:37 2014] then the user's read/write volumes can be accessed via the normal /afs/cellname/home/user/... path [Fri Nov 28 00:19:01 2014] you should read the administrator's guide at docs.openafs.org [Fri Nov 28 13:23:41 2014] That guide has proven useful. Thank you. I was getting confused between the various guides. [Fri Nov 28 13:24:12 2014] How do I clean up a replication site? I've done vos remsite, but the .readonly copy is still listed in vos listvol. [Fri Nov 28 13:24:32 2014] you do not use vos remsite. you use vos remove [Fri Nov 28 13:24:58 2014] just as vos addsite does not populate a volume, vos remsite does not remove one [Fri Nov 28 13:28:17 2014] The addsite just targets a replication peer, sort of? [Fri Nov 28 13:28:54 2014] addsite/remsite only modify the database [Fri Nov 28 13:29:17 2014] I added a second database server to my cluster last night, and for some reason the primary hangs at dumb parts. I tab completted /afs/.mys.. and the command hung about 10 minutes ago and I cant [Fri Nov 28 13:29:28 2014] 't get it to return. The secondary is performing fine [Fri Nov 28 13:29:37 2014] bos status for both systems shows everything is good. [Fri Nov 28 13:30:44 2014] I wanted to make sure that the two were solid before I moved to adding more fileservers. [Fri Nov 28 13:32:16 2014] On the first/primary server I tried to mount a new volume and got: mount points must be created within the AFS file system. On the secondary system I was able to run the mount command for the primary without issue. [Fri Nov 28 13:32:33 2014] What might cause this kind of thing? [Fri Nov 28 13:47:32 2014] I opted not to create a upclientbin instance on the second database server. [Fri Nov 28 13:47:50 2014] But upclientetc on the secondary shows running. [Fri Nov 28 14:07:20 2014] I wouldn't bother with upserver/upclient; it was an early take on things like puppet/saltstack/chef/ansible, which in general are far superior [Fri Nov 28 14:07:27 2014] so where did you try to create that mountpoint? [Fri Nov 28 14:46:57 2014] I tried to add it to /afs/.mycell/common/etc where common is just a directory and etc/ is the volume common.etc [Fri Nov 28 14:47:30 2014] I alos just did a fs lsmount /afs about 20 minutes ago and it locked up the prompt. Can't kill it or anything. [Fri Nov 28 14:49:55 2014] Though bos status for the system still shows everything is normal. [Fri Nov 28 14:51:13 2014] you need to think of bos as a completely independent service from the AFS services. it is just a management tool to permit the remote starting and stopping of managed services. It doesn't provide you any information about the status of the processes it starts [Fri Nov 28 14:53:35 2014] Okay [Fri Nov 28 14:54:38 2014] The a [Fri Nov 28 14:55:21 2014] The AFS services are the fileserver, vlserver, ptserver, buserver, upserver, etc. These are started from a bosserver but each is managed independently. [Fri Nov 28 14:56:05 2014] The AFS servers do not require that a client be installed on the same system. Some people believe that as a best practice the AFS client should not be installed on systems that are running the services. [Fri Nov 28 15:23:11 2014] I'm having trouble stopping the afs client daemons [Fri Nov 28 15:24:41 2014] an afsd is not a true userland process. it donates its threads to the kernel module. It cannot be stopped while there is active activity on /afs [Fri Nov 28 15:53:11 2014] I get a hang on trying to rsync data from the local disk into a volume. It gets a few megabyte into syncing and just stops. The quota looks barely used. [Fri Nov 28 15:53:57 2014] I can kill the process the first time, but if I try to perform the same rsync again, the prompt just hangs. [Fri Nov 28 15:54:10 2014] Oh there it came back [Fri Nov 28 15:55:33 2014] Still not syncing though [Fri Nov 28 15:57:29 2014] How would I go about digging into the process that bos starts? I assume there is a status somewhere that will tell my why my system keeps going unresponsive. [Fri Nov 28 15:59:44 2014] zleslie: there are some bugs with the freebsd client that can cause it to become unhappy when processing rsync-level loads. [Fri Nov 28 16:02:31 2014] Is there a bug I can track? [Fri Nov 28 16:02:39 2014] I'd love not to have to run any linux. [Fri Nov 28 16:03:48 2014] Not particularly. There's something not-quite-right with the VFS-level locking, and possibly also something that causes the kernel to miss rx packets on a given connection. [Fri Nov 28 16:04:21 2014] I haven't spent much time on them recently, since I have been working on things which affect all the architectures OpenAFS supports, and not just "my pet one". [Fri Nov 28 16:04:40 2014] Fair enough. [Fri Nov 28 16:05:00 2014] Though hopefully that means this doesn't sit as a problem forever. [Fri Nov 28 16:05:13 2014] I hope not :) [Fri Nov 28 16:05:28 2014] Who develops OpenAFS? [Fri Nov 28 16:07:07 2014] https://www.openhub.net/p/openafs/contributors/summary [Fri Nov 28 16:07:57 2014] kaduk has been the primary contributor to the FreeBSD client [Fri Nov 28 16:08:04 2014] I suppose what I'm after, is when should I check back to see if this gets fixed? Should I just run another platform for this? Is this just a matter of cutting a new release? [Fri Nov 28 16:08:23 2014] Excellent, well thank you kaduk_ [Fri Nov 28 16:08:28 2014] the problem will be fixed when someone fixes it [Fri Nov 28 16:08:35 2014] Hah [Fri Nov 28 16:08:48 2014] I surely didn't mean to sound that bad :) [Fri Nov 28 16:09:29 2014] the fix is not trivial and I wouldn't plan any production schedule around it being fixed unless someone was to provide the necessary development resources themselves [Fri Nov 28 16:09:51 2014] So if its just a client issue, then I should be ablt o connect my OSX client to the cluster, since it all seems running an usable. [Fri Nov 28 16:10:02 2014] Eh, this is just my home network. [Fri Nov 28 16:10:02 2014] That seems worth trying. [Fri Nov 28 16:10:10 2014] the best supported platforms for OpenAFS clients are Linux, Solaris, OSX, Windows [Fri Nov 28 16:10:32 2014] When I had it up in my home lab many years ago, it was running on Linux, and quite well. [Fri Nov 28 16:10:37 2014] Realistically, my time is claimed for the next 6 months (at least) on other things, so you may be better off using a different platform for now. [Fri Nov 28 16:11:40 2014] "(at least)" :-) [Fri Nov 28 16:17:49 2014] Does that also suggest, that under lighter load, the FreeBSD client might continue to operate? [Fri Nov 28 16:18:15 2014] aklog on OSX just yields: aklog(80260,0x7fff79c12310) malloc: *** error for object 0x7fff8c38d000: pointer being freed was not allocated [Fri Nov 28 16:19:00 2014] which OSX? which OpenAFS? [Fri Nov 28 16:19:45 2014] I may have installed the wrong version. [Fri Nov 28 16:20:09 2014] I'll uninstall and try again [Fri Nov 28 16:21:08 2014] OpenAFS 1.6.5 on 10.9 OSX [Fri Nov 28 16:23:50 2014] Same issue after reinstall. I also note that the system preferences gui is not able to retreive a token as well. [Fri Nov 28 16:29:05 2014] After verifying that my cluster is actually listed in the CellServDB file on disk and not just in the gui, I get aklog: unknown RPC error (-1765328228) while getting AFS tickets [Fri Nov 28 16:29:18 2014] Which seems better [Fri Nov 28 16:29:36 2014] I do have a firewall between the cluster and this client though [Fri Nov 28 16:29:46 2014] But I also think I have all the ports open [Fri Nov 28 16:30:08 2014] -1765328228 = Cannot contact any KDC for requested realm [Fri Nov 28 16:36:14 2014] I have a kerberos ticket though [Fri Nov 28 16:36:23 2014] aklog -d gives: Kerberos error code returned by get_cred [Fri Nov 28 16:37:10 2014] I only need a keytab with afs/mycell@MYCELL on the fileservers, correct? [Fri Nov 28 16:37:33 2014] On the osx client, I've only setup ThisCell, CellServeDB and krb5.conf [Fri Nov 28 16:40:57 2014] Lost network for a while. The freebsd client should be nicer behaved on lighter load, like individual 'cp' or 'mv' operations, yes. [Fri Nov 28 16:41:36 2014] I've been getting some panics from freebsd-current which I should look into, but I think that's because something changed on -current that openafs hasn't caught up to yet. [Fri Nov 28 16:42:53 2014] 10.1-RELEASE here. [Fri Nov 28 16:43:20 2014] I think that should be okay. [Fri Nov 28 16:44:07 2014] Not sure whats up with the osx client. I don't even see connection attempts to my kdc [Fri Nov 28 16:44:31 2014] I mean, after I acquire a ticket during aklog [Fri Nov 28 16:44:39 2014] What is the meaning of 'vicep'? [Fri Nov 28 16:46:51 2014] Vice partition [Fri Nov 28 16:47:06 2014] e.g. where an OpenAFS server stores data [Fri Nov 28 16:47:28 2014] "but where does 'vice' come from?" [Fri Nov 28 16:47:48 2014] presumably based off of research projects at CMU with the names Venus, Vice and Virtue [Fri Nov 28 16:49:40 2014] according to this http://bethlynn.livejournal.com/34365.html the Vice project was renamed "Andrew" which is where afs comes from [Fri Nov 28 16:49:51 2014] I see [Fri Nov 28 16:50:19 2014] I forgot about a linux laptop I've got. I'll see if I can get that one to behave nicely as a client. [Fri Nov 28 16:59:15 2014] pioctl failed while obtaining tokens for cell running aklog on the Linux client [Fri Nov 28 17:00:08 2014] that implies the afs kernel module is not loaded [Fri Nov 28 17:03:35 2014] that would make sense [Fri Nov 28 17:33:39 2014] Alright, I think I have the linux client setup correctly. But... why would the client attempt to contact non-related addresses on port 7000? [Fri Nov 28 17:34:05 2014] Each server does have many IP addresses, so perhaps there is some discovery process happpengin [Fri Nov 28 17:35:00 2014] After many failed connection logs in my firewall, the client does seem to be working. [Fri Nov 28 17:37:34 2014] Cool [Fri Nov 28 17:37:47 2014] Thank you all for the help and patience. [Sat Nov 29 13:45:51 2014] is the NFS translator still maintained/expected to work well? [Sat Nov 29 13:51:53 2014] I guess last bit in this section http://wiki.openafs.org/SolarisQuickStart/#index2h2 seems to indicate "no" [Sat Nov 29 13:57:41 2014] I would not expect it to still work (well). [Sat Nov 29 14:00:07 2014] IIRC, the linux version required kernel features that have not been present in mainline kernels for a long, long time. [Sat Nov 29 14:06:28 2014] actually I'm told the solaris version does work; nobody has tested the linux version [Sat Nov 29 14:08:53 2014] that said, the translator is definitely unloved [Sat Nov 29 14:43:19 2014] The Linux version of the translator requires GPL only interfaces that OpenAFS cannot use. [Sat Nov 29 14:43:57 2014] The Solaris version of the translator uses the in-kernel Solaris NFSv3 server and it works to the extent that translation can work. [Sat Nov 29 17:09:17 2014] hmm ok thanks [Sat Nov 29 17:09:43 2014] might give the solaris version a go i guess [Mon Dec 1 05:21:47 2014] Hi, I'm thinking about deploying openafs, but a colleague just pointed a very good question. Will it work with inotify ? We need to perform some operations when some files are created / modified and we currently use inotify for that, will the event be raised if that's modified on the afs by another client ? [Mon Dec 1 05:22:04 2014] Would love to run those scripts directly on the client but I couldn't find any equivalent of inotify for windows [Mon Dec 1 05:26:33 2014] inotify is linux-specific, it's very platform specific [Mon Dec 1 05:29:07 2014] yeah, and it's unfortunately the only solution we found to perform tasks on modified files without using up 100% of the cpu at all time [Mon Dec 1 05:30:54 2014] But for something like NFS you can run your script on the server and it'll see the modification happening on the local filesystem. The clients won't, but that's fine for me, just need inotify in one place [Mon Dec 1 05:32:08 2014] AFS doesn't support inotify on the client - the interfaces in the kernel that are required for a filesystem to support it are GPL-only, so we can't make use of them [Mon Dec 1 05:32:24 2014] You can do something similar on the fileserver using the fileserver's audit logs [Mon Dec 1 05:38:15 2014] it doesn't support it on the client but it does on the server ? Or is the only solution to do some magic with the log ? afs might not be the correct solution I guess [Mon Dec 1 05:39:09 2014] You could use inotify on the server, but AFS has an opaque storage mechanism which doesn't easily translate back to the paths which are seen on the client [Mon Dec 1 05:40:04 2014] What there is on the server is an audit log which records every operation that's performed on every file on that server. That log is designed to be parsable by external programs, and can be a pipe (or various other IPC communication methods) - so you can have a program on the server that sees every change made and responds to them. [Mon Dec 1 05:41:08 2014] Ha, that's cool [Mon Dec 1 05:41:19 2014] I'll try that then, thanks ! [Mon Dec 1 05:50:10 2014] sxw: Would you happen to have an example of output for this ? My colleague would like to be sure he can adapt his scripts for that format [Mon Dec 1 05:56:17 2014] None to hand - just set up a fileserver with the -auditlog option, and look at what it produces [Mon Dec 1 05:59:05 2014] yeah, I'm installing one right now [Mon Dec 1 08:06:47 2014] also, the Windows client has limited support for the change notification interfaces http://msdn.microsoft.com/en-us/library/windows/desktop/aa364417%28v=vs.85%29.aspx [Mon Dec 1 08:09:04 2014] Yeah that's why I want to run it on a linux server, in one place [Mon Dec 1 08:16:57 2014] The reason the support is limited is that the OpenAFS servers do not include the appropriate level of detail in the callback messages [Mon Dec 1 08:17:27 2014] Its not that the Windows client (unlike the Linux client) cannot use them [Mon Dec 1 14:19:11 2014] it's gonna be tough to get Mac testers for the RC of 1.6.11 without packages [Mon Dec 1 14:19:35 2014] Nor is there a mac buildslave that's currently running... [Mon Dec 1 14:19:51 2014] hrm... that's not good, if true [Mon Dec 1 14:20:05 2014] I don't remember who was contributing that buildslave, though. [Mon Dec 1 14:20:10 2014] moi [Mon Dec 1 14:20:16 2014] Oh hi :) [Mon Dec 1 14:20:18 2014] would be nice if it emailed me when it decided the buildslave isn't working [Mon Dec 1 14:20:40 2014] It would. I want to say that Jason was going to look into that, but am quite possibly making that up. [Mon Dec 1 14:20:41 2014] buildbot seems to have issues.. it'll still be running fine on the slave, thinking it is talking fine to the server [Mon Dec 1 14:20:53 2014] Alas. [Mon Dec 1 14:21:28 2014] tho this time [Mon Dec 1 14:21:35 2014] even tho the power light is on, the machine is not ssh'able [Mon Dec 1 14:21:37 2014] hrm... [Mon Dec 1 14:21:41 2014] hmm... [Mon Dec 1 14:21:59 2014] hunh... musta hard locked [Mon Dec 1 14:22:14 2014] yeah, it's not happy [Mon Dec 1 14:22:23 2014] I got a mouse cursor and two notifications and a black screen [Mon Dec 1 14:22:41 2014] and a very warm mac mini [Mon Dec 1 14:23:04 2014] tho, the buildslave doesn't build packages [Mon Dec 1 14:23:15 2014] (and it's a 10.9 buildslave, anyway) [Mon Dec 1 14:23:37 2014] "10.9 is a lot better than nothing" [Mon Dec 1 14:23:53 2014] well, it's a dual boot machine, so for buildslave, would have to choose one or the other [Mon Dec 1 14:24:47 2014] oh that's not good [Mon Dec 1 14:26:34 2014] ok, there we go [Mon Dec 1 14:31:38 2014] and we're back [Mon Dec 1 14:31:56 2014] Thanks! [Mon Dec 1 15:48:20 2014] Were would I start troubleshooting a Windows client that has tokens (as reported by tokens), but is making unauthenticated calls to the fileserver? [Mon Dec 1 15:48:52 2014] The windows release notes includes steps for debugging many common issues. [Mon Dec 1 16:30:11 2014] If I abort (control-C) a vos move operation, what (if any) cleanup do I need to do? [Mon Dec 1 16:31:15 2014] might need to unlock and/or online the volume [Mon Dec 1 16:32:08 2014] cclausen: okay. Do you think I would need to do any cleanup on the destination? [Mon Dec 1 16:32:58 2014] there will be a clone volume but the next vos release will clean it up [Mon Dec 1 16:36:22 2014] geekosaur: this is for a volume with no ro replicas. Does a vos release still make sense? [Mon Dec 1 16:38:37 2014] ... what would it do in that case? [Mon Dec 1 16:38:54 2014] (in other words, no sense at all) [Mon Dec 1 16:41:37 2014] geekosaur: what do you mean then by "there will be a clone volume but the next vos release will clean it up" [Mon Dec 1 16:41:51 2014] I was asking about a interrupted vos move (not vos release) [Mon Dec 1 16:44:00 2014] vos move creates a clone, moves the clone and then updates pointers to reference the moved volume [Mon Dec 1 16:44:27 2014] that way you can move without the volume being locked the whole time (just needs to lock while clone is created) [Mon Dec 1 16:45:22 2014] ^ so same story, there will be a dangling clone but it'll be silently removed on the next salvage or fixed the next time you attempt the move (or do a release if you add a replica on that server). it's possible a lock will be left behind after aborting the move, and you might need to wait out the transaction timing out before unlocking [Mon Dec 1 16:45:51 2014] cclausen, geekosaur: thanks I think that makes sense now [Mon Dec 1 16:46:56 2014] or there is vos endtrans but make sure there are no other transactions on the server [Mon Dec 1 16:48:08 2014] I've interruped the operation. It looks like everything on the source is fine, but the partial clone on the destination was left around. [Mon Dec 1 16:48:20 2014] Do I need to try to salvange that clone? [Mon Dec 1 16:49:08 2014] if you want. as I said, it'll be cleaned up automatically eventually; if you have space problems, you can force it now. it won't cause any problems to leave it sitting around since it's not referenced in the vldb [Mon Dec 1 16:49:38 2014] [01 21:45] ^ so same story, there will be a dangling clone but it'll be silently removed on the next salvage or fixed the next time you attempt the move (or do a release if you add a replica on that server) [Mon Dec 1 16:49:53 2014] geekosaur: ah thanks! I mostly wanted to clean it up now so I won't be confused by it later [Mon Dec 1 18:01:45 2014] mvitale, for posterity: that big "vos release" succeeded when I re-ran it with "-localauth" [Mon Dec 1 18:02:25 2014] hooray! [Mon Dec 1 18:02:31 2014] congrats [Mon Dec 1 18:05:05 2014] guess I should still upgrade that older server eventually [Mon Dec 1 18:06:08 2014] unfortunately I've got to immediately do another big release of stuff which has accumulated while it was running heh [Mon Dec 1 18:07:21 2014] I guess in my case I should do releases pretty frequently [Mon Dec 1 18:07:58 2014] is there anything wrong with just having bos run "vos release" for me something like once an hour? [Mon Dec 1 18:20:31 2014] I might ask you why bos instead of, like, cron. [Mon Dec 1 18:20:58 2014] You may be sad if one of them (by chance or whatever) takes longer than an hour to complete. [Mon Dec 1 19:58:49 2014] kaduk_, does everyone just whip up their own release schedule/mechanisms? are there any popular tools I'm missing beyond cron/bos + "vos release"? [Mon Dec 1 20:02:33 2014] dezgot: I don't have data on very many cells. The one cell I help to run has a cron script or two to do nightly releases. [Mon Dec 1 20:07:43 2014] might just go with cron then I guess [Mon Dec 1 20:08:04 2014] need a distributed cron daemon... [Mon Dec 1 20:08:41 2014] why? [Mon Dec 1 20:10:01 2014] want my releases to continue even if the machine tasked with running "vos release" stops [Mon Dec 1 20:11:05 2014] I imagine I could run the release on say the server with the RW volume [Mon Dec 1 20:11:29 2014] but then it's kind of administratively annoying [Mon Dec 1 20:13:25 2014] secureendpoints, by "need" I meant, "in a perfect world there might exist" [Mon Dec 1 20:14:01 2014] you can run the release on any computer [Mon Dec 1 20:14:58 2014] right, but with cron I imagine you would pick just one [Mon Dec 1 20:15:22 2014] then if that machine goes down your releases stop [Mon Dec 1 20:16:10 2014] if your fileserver stops then you aren't releasing the volumes either [Mon Dec 1 20:17:08 2014] so a reasonable choice of where to run the "vos release" might be in cron on the fileserver with the RW volume [Mon Dec 1 20:17:40 2014] in a cell with 200 file servers and rw volumes on each of them, which do you choose? [Mon Dec 1 20:18:03 2014] afs is a distributed system. you can run the release anywhere. it doesn't have to be on a single machine. [Mon Dec 1 20:18:27 2014] I certainly understand [Mon Dec 1 20:18:49 2014] you shouldn't run releases just because a clock time was reached. only perform releases if the volume has changed [Mon Dec 1 20:21:46 2014] so generally not via cron then [Mon Dec 1 20:24:29 2014] what is wrong with cron? [Mon Dec 1 20:24:55 2014] you need to script more than vos release [Mon Dec 1 20:27:47 2014] are you just saying that you should check if anything needs to be released before running the actual "vos release" command? [Mon Dec 1 20:30:26 2014] my "distributed cron" comment was aimed toward ease of administration [Mon Dec 1 20:30:59 2014] if you write your command scripts properly you can execute them on multiple hosts [Mon Dec 1 20:35:44 2014] for example, for each volume to be released, check to see if there has been a change since the last release, if yes and if the volume is not locked, issue a "vos release". Noting that the vos release might fail due to an inability to obtain a lock if this instance of the script races with another instance of the script running on another node. [Mon Dec 1 20:37:50 2014] ah I see [Mon Dec 1 20:38:27 2014] i was assuming vos release would just wait for the vldb entry to be unlocked and then continue [Mon Dec 1 20:40:50 2014] so you could potentially have this single release script which runs on all machines I guess [Mon Dec 1 20:41:32 2014] is there an easy way to tell if there have been changes at the AFS level? or do you have to simply flag/check the filesystem? [Mon Dec 1 20:41:47 2014] vos examine [Mon Dec 1 20:45:11 2014] comparing "Creation" on a .readonly to "Last Update" on the RW volume? [Mon Dec 1 21:01:21 2014] wonder whose clocks these times are based on [Mon Dec 1 21:02:28 2014] even if something is out of sync somehow I guess the penalty is just an unnecessary release [Mon Dec 1 21:06:25 2014] anyway I think I have the right idea of what people do now [Mon Dec 1 21:07:13 2014] I couldn't find any examples on the web so this helps a lot [Tue Dec 2 04:58:50 2014] I'm installing an afs server and when I try to run afs-rootvol it says "Could not fetch the list of partitions from the server - Possible communication failure". I can see in the logs afs: Lost contact with volume location server in cell [Tue Dec 2 04:59:04 2014] Any way to know more about why ? How can it fail to contact itself ? [Tue Dec 2 05:13:41 2014] restarted the server already? [Tue Dec 2 05:16:33 2014] too many open files? Issues in the local stack? Server too crowded, service restart? [Tue Dec 2 05:17:43 2014] I tried restarting it, and I'm the only one using the server [Tue Dec 2 05:20:38 2014] does the server listen on the right interface and bos status shows all ok? [Tue Dec 2 05:21:24 2014] bos status -server afs1 doesn't say anything [Tue Dec 2 05:21:39 2014] as for the interface I don't know, didn't configure that [Tue Dec 2 06:12:33 2014] if bos cannot connect, something is wrong, does the dns work fine? [Tue Dec 2 06:12:52 2014] did the server worked fine before? [Tue Dec 2 07:26:59 2014] Never worked, I installed it this morning [Tue Dec 2 07:27:15 2014] The dns seems fine, afs1 resolve to the public IP of the server [Tue Dec 2 07:38:17 2014] you did install the openafs server and you already did setup your cell? [Tue Dec 2 07:39:02 2014] Yeah, I'm following this : https://www.debian-administration.org/article/610/OpenAFS_installation_on_Debian [Tue Dec 2 07:39:42 2014] I do have bosserver in my process list, and I have AFSIDat and Lock on /vicepa that have been created [Tue Dec 2 07:40:04 2014] I can kinit and aklog fine [Tue Dec 2 07:42:11 2014] so newcell is missing [Tue Dec 2 07:42:34 2014] you did not create a afs cell for your server [Tue Dec 2 07:43:02 2014] I did run afs-newcell [Tue Dec 2 07:43:14 2014] Looked successful, but may be not [Tue Dec 2 07:43:17 2014] should I re-run it ? [Tue Dec 2 07:45:51 2014] did you restart the openafs server again? [Tue Dec 2 07:46:11 2014] I did, and I even tried rebooting it [Tue Dec 2 07:46:16 2014] several times [Tue Dec 2 07:46:22 2014] what does /var/log/openafs/FileLog tell you? [Tue Dec 2 07:46:31 2014] or boslog? [Tue Dec 2 07:46:44 2014] Partition /vicepa has 1 online volumes [Tue Dec 2 07:46:46 2014] if there is a cell, it shoulod tell you the volumes are online [Tue Dec 2 07:46:51 2014] ok, it is online [Tue Dec 2 07:47:12 2014] so after kinit/aklog as admin on client, you should be able to bos status server [Tue Dec 2 07:47:29 2014] the volume is /afs I guess [Tue Dec 2 07:47:33 2014] It doesn't do anything, it just returns [Tue Dec 2 07:47:42 2014] and echo $? shows 0 [Tue Dec 2 07:47:58 2014] and /var/log/openafs/VLLog tells you the correct address? [Tue Dec 2 07:48:14 2014] Mh yeah [Tue Dec 2 07:48:40 2014] and you did create the three needed server on the server (fileserver, ptserver, vlserver) ? [Tue Dec 2 07:49:03 2014] I understood that afs-newcell should do that [Tue Dec 2 07:49:13 2014] something like ptserver started pid 29749: /usr/lib/openafs/ptserver should be in boslog [Tue Dec 2 07:49:33 2014] rxdebug server 700[0-9] should give you at least a reply [Tue Dec 2 07:50:28 2014] Mh, bosserver doesn't listen on the correct adresse [Tue Dec 2 07:50:32 2014] in the log it says 127.0.0.1 [Tue Dec 2 07:50:57 2014] thats not good [Tue Dec 2 07:51:10 2014] thats a reason it does not connect ;-) [Tue Dec 2 07:51:25 2014] go on the hunt for a netrestrict file [Tue Dec 2 07:51:51 2014] I did change that in bosserver.rxbind but looks like it rewrites it [Tue Dec 2 07:51:53 2014] /var/lib/openafs/local/NetRestrict [Tue Dec 2 07:51:57 2014] oh okay [Tue Dec 2 07:52:12 2014] doesn't exist [Tue Dec 2 07:52:26 2014] I'll do a find [Tue Dec 2 07:52:27 2014] create it and write the networks it should not bind to in it [Tue Dec 2 07:53:07 2014] also keep in mind, OpenAFS does only IPv4 [Tue Dec 2 07:53:37 2014] It still binds to 127.0.0.1 [Tue Dec 2 07:56:37 2014] According to https://lists.openafs.org/pipermail/openafs-info/2012-August/038509.html that log doesn't mean it's listening only on lo [Tue Dec 2 07:57:26 2014] I do indeed have that in netstat udp 0 0 0.0.0.0:7007 0.0.0.0:* 4771/bosserver [Tue Dec 2 08:11:26 2014] I think I got it [Tue Dec 2 08:17:48 2014] Yeah okay it works, my hostname wasn't valid. I had afs1 instead of afs1.domain so it looked good but wasn't [Tue Dec 2 08:17:56 2014] My fault then ! Thanks for your help ! [Tue Dec 2 10:30:53 2014] does anyone have a working setup with k5start and apache together in the centos httpd init script? I have the following http://pastebin.com/Jzme8dR0 and the echo `tokens` shows tokens for the requested uid but apache cannot read the files. This worked with klog and reauth with kas. [Tue Dec 2 10:31:31 2014] uhhhhmmmm, with 'rxkad' is it possible to have *multiple* keys in the keytab? E.g. for the purpose of giving a different key to each fileserver? I understand that would be accidental if it worked... [Tue Dec 2 10:36:34 2014] I have multiple keys in the keytab an it doesn't seem upset, I just have to remove the DES keys [Tue Dec 2 10:44:30 2014] You can have multiple keys in the keytab, but any server which is accepting connections from other servers (e.g., dbservers) must have all the keys which might be in use. [Tue Dec 2 10:44:56 2014] Fileservers need to receive connections from other fileservers in the case of, e.g., VolForward, IIRC (the vos move backend). [Tue Dec 2 11:22:31 2014] Walex2: Isn't this one of the selling points of YFS? Different keys for different servers? [Tue Dec 2 12:00:15 2014] kaduk_: thanks, that's exactly what I hoped [Tue Dec 2 12:01:19 2014] nwf: YFS have a lot of selling points, but we have to do with base OpenAFS (subject to review). [Tue Dec 2 12:02:14 2014] kaduk_: I also notice that I did not say multiple keys from multiple principals, but I wonder whether matters for inter-server communications [Tue Dec 2 12:02:38 2014] as opposed to client-server comms. [Tue Dec 2 12:04:56 2014] IIRC with the rxkad_krb5 on 1.6, we just try every key in the keytab, for decryption. [Tue Dec 2 12:05:29 2014] I also don't think we really look at the principals, since if a key is in rxkad.keytab, it's probably for AFS use. [Tue Dec 2 12:07:20 2014] Hmm, K5Auth is building principal names, though; maybe I misremember [Tue Dec 2 12:09:21 2014] In addition to the k5start question, has anyone seen this "aklog: Cannot allocate memory while converting principal to Kerberos V4 format" I just updated afs on the machine in question that is running 10.8 with the 10.8 installer. [Tue Dec 2 12:09:23 2014] But maybe just looking at afs/*; I'm very distracted right now [Tue Dec 2 12:10:32 2014] that sounds vaguely like something fixed in 1.11pre? [Tue Dec 2 12:10:43 2014] er [Tue Dec 2 12:10:52 2014] 1.6.11pre [Tue Dec 2 12:13:31 2014] I installed 1.6.5 on Mac [Tue Dec 2 13:33:08 2014] Hi, in tring to track down my problem from yesterday where I had a Windows client that has tokens, but was making unauthenticated calls to the fileserver, I got some fs trace output [Tue Dec 2 13:33:22 2014] However, I don't see anything that looks like it has to do with auth or tokens. [Tue Dec 2 13:33:41 2014] Suggestions on lines that I might look for, or any other solution to my problem? [Tue Dec 2 14:45:03 2014] looks like mac buildslave is caught up [Tue Dec 2 14:45:10 2014] Yup. [Tue Dec 2 15:23:44 2014] Would it be reasonable to get the CellServDB in the 1.6.x (and I presume 1.7.x) release series updated to match what is currently in master/grand.central.org? [Tue Dec 2 15:24:41 2014] It's in the works. [Tue Dec 2 15:24:48 2014] kaduk_: thanks! [Tue Dec 2 15:25:14 2014] we do not update prior releases, only the new releases. [Tue Dec 2 15:25:54 2014] secureendpoints: so that mean that 1.6.11 would have it, but not older ones? Yes, that's what I was thinking. [Tue Dec 2 15:26:00 2014] yes [Tue Dec 2 15:26:36 2014] also not that only the packages provided by openafs.org are affected. downstream distributions will do their own thing [Tue Dec 2 15:26:46 2014] s/not/note [Tue Dec 2 21:55:00 2014] /home/jakllsch/openafs/src/libafs/../rxkad/rxkad_common.c:600:20: error: 'RX_SECTYPE_KAD' undeclared (first use in this function) [Tue Dec 2 21:57:30 2014] Should be RX_SECIDX_KAD, I thought... [Tue Dec 2 21:58:01 2014] well, i'm not entirely sure i've got a clean build environment [Tue Dec 2 21:58:16 2014] Oh, there's [Tue Dec 2 21:58:17 2014] src/rx/rx.h: RX_SECTYPE_KAD = 3 [Tue Dec 2 21:58:35 2014] that is; i did a ./configure and then immediately went back to rebuild libafs without rebuilding the openafs userland first [Tue Dec 2 21:58:49 2014] Oh, right, because that's for the stats, not the security class. [Tue Dec 2 21:59:09 2014] Sigh. (They're off by one, of course.) [Tue Dec 2 21:59:13 2014] i'm just wondering if this is Just Me [Tue Dec 2 22:00:05 2014] what with a libafs Makefile that is probably out of sync and whatnot [Tue Dec 2 22:00:19 2014] Did you make clean in libafs? [Tue Dec 2 22:00:30 2014] not as such; i did a rm *.o [Tue Dec 2 22:01:00 2014] oh well, i'm building the openafs userland now [Tue Dec 2 22:22:39 2014] okay, build succeeded, nevermind :-) [Tue Dec 2 22:27:13 2014] oh joy, two uuid.c files to be compiled into the same uuid.o [Tue Dec 2 22:27:37 2014] or maybe not [Tue Dec 2 23:07:30 2014] yeah, wrong source file :-) [Tue Dec 2 23:32:02 2014] i'm about to turn in for the night; but if anyone has ideas as to why afs_WaitForCacheDrain doesn't get cleared i might be looking at that tomorrow [Wed Dec 3 13:24:12 2014] ha! Someone here has a Kerberos principal name longer than 63 characters... So it cannot be mapped 1-to-1 to an AFS user name. Is there some way to put in the PTS or elsewhere "this v5 principal name maps to this AFS user name"? [Wed Dec 3 13:25:48 2014] There is a protocol extension which OpenAFS does not implement [Wed Dec 3 13:29:02 2014] ha! thanks [Wed Dec 3 13:32:51 2014] but shouldn't this be doable just by modifying 'aklog'? Once the AFS token is generated, does OpenAFS use the v5 principal name at all? [Wed Dec 3 13:34:28 2014] the instant case is that the user have a TGT as 'user/a.very.long.fqdn' and they would like to get using that a token for 'user.a.very.long' instead of 'user.a.very.long.fqdn' [Wed Dec 3 13:40:46 2014] The v5 principal name is extracted as the server processes the rxkad response packet [Wed Dec 3 13:54:16 2014] Walex2: aklog doesn't need to know the v5 principal at all. It is the servers that need to know the name so that they can perform a name to id lookup and then use that id to obtain the list of group memberships. The id and group memberships are then used to evaluate ACLs for the purpose of determining the permissions to be granted. [Wed Dec 3 13:58:43 2014] The secondary problem that you will have is that OpenAFS PTS names are Kerberos v4 names. There can only be one period in them. "user.a.very.long" is not a valid v4 name. The extension that kaduk_ is referring to permits the storage of aliases and name forms other than Kerberos v4. AuriStor implements that. [Wed Dec 3 14:00:00 2014] A digression, but did your-file-system.com recently get a redesign and new content? I think I tried looking there to remember the AuriStor name a few weeks ago and came up empty... [Wed Dec 3 14:00:56 2014] if you go to https://www.your-file-system.com/ you cannot miss the name [Wed Dec 3 14:01:06 2014] Right. [Wed Dec 3 14:01:21 2014] I am half-remembering some other website content which did not have that property [Wed Dec 3 14:01:33 2014] the updated web site went live after all of the trademark filings were completed [Wed Dec 3 14:01:41 2014] Ah, that would make sense. [Wed Dec 3 14:02:17 2014] it will get fleshed out more as time goes on. [Wed Dec 3 14:02:58 2014] there will be animations to show how cells works, how replication works, etc. [Wed Dec 3 14:46:30 2014] secureendpoints: kaduk_: been afk on the bus... [Wed Dec 3 14:51:30 2014] secureendpoints: I understand the v4 names (that's why I was pointedly saying v5 here and there). Also that ACLs are in terms of user ids and group ids. But I may be missing something as to where and how the v5 tickets and the v4 tokens are used and the relationship to AFS "v4-like" names... I'll do some more investigation.# [Wed Dec 3 14:55:50 2014] you can think of a token as either a v4 service ticket or a v5 service ticket wrapped with some metadata [Wed Dec 3 14:56:08 2014] when you are using v5, a v5 service ticket is decoded by the afs servers [Wed Dec 3 14:57:07 2014] secureendpoints: that's my understanding too... [Wed Dec 3 14:57:59 2014] secureendpoints: BTW we have a few dozen working PTS use names with 4 dots in them... [Wed Dec 3 14:58:11 2014] so the client principal name that is received by the afs servers is a v5 principal that can be multi component and might contain multiple periods. [Wed Dec 3 14:58:59 2014] but all names in AFS PTDB are v4 names so the v5 names must first be translated to v4. Not all names in the v5 name space are legal in the v4 name space. [Wed Dec 3 14:59:43 2014] there are not appropriate checks for v4 correctness in all of the places they need to exist. [Wed Dec 3 15:00:05 2014] it is quite possible they work for some code paths and not others resulting in random failures [Wed Dec 3 15:00:13 2014] or random success in this case [Wed Dec 3 15:00:25 2014] let's say we have hard-to-change conventions, and one of them is that service tickets are for principals with the traditional 'user/fqdn' user name pattern, and so far we have created PTDB names that are "user.fqdn". I gather that was an optimistic convention. [Wed Dec 3 15:01:03 2014] * had raised eyebrows, but "it works" is hard to discuss with. [Wed Dec 3 15:01:31 2014] * does not like "it works" arguments, e.g. about MS-Windows or 'systemd' :-) [Wed Dec 3 15:03:46 2014] There are sites that use two and three component kerberos v5 names with openafs. they have traditionally used gssklogd or krb524d to perform name translation outside of AFS so that the tokens include tickets that contain the name that PTS can use [Wed Dec 3 15:04:57 2014] Part of AuriStor's support for these sites are translation rules to permit folding of multi-component names to single component aliases for rxkad. yfs-rxgk doesn't require the folding since it uses GSS names. [Wed Dec 3 15:05:47 2014] of course you get auditing of the full names [Wed Dec 3 15:06:55 2014] the bit I am missing may be very delicate, and it is how does the PT server or a file server presented with a token labeled "user1.group1" containing a v5 ticket granted to "user1/group1" know the two match. [Wed Dec 3 15:07:34 2014] it converts the v5 name to v4 names via the krb524 convert principal api [Wed Dec 3 15:08:28 2014] secureendpoints: so the token contains both the v5 principal name and the ticket for the 'afs' or 'afs/$FQDN' domain, I guess [Wed Dec 3 15:08:49 2014] it is effectively take first component and join to second component with a period instead of a slash. it should reject components that already contain a period but doesn't. [Wed Dec 3 15:09:18 2014] there are also special conventions for the conversion based upon the value of the first component. For example, "host" is mapped to "rcmd" [Wed Dec 3 15:09:41 2014] the token only contains the encrypted v5 service ticket for afs [Wed Dec 3 15:09:55 2014] the encrypted v5 ticket contains the clients v5 name [Wed Dec 3 15:12:14 2014] secureendpoints: and the vital detail then is that the PT server does not have a cleint v5 name to 'user.group' v4 name, but derives the v4 name algorithmically from the v5 name it obtains by decrypting the token with 'rxkad.keytab'... [Wed Dec 3 15:12:43 2014] s/a cleint/a table of client/ [Wed Dec 3 15:13:02 2014] i guess that's my current understanding. [Wed Dec 3 15:13:10 2014] The file server performs the v5 to v4 translation. [Wed Dec 3 15:13:22 2014] the ptserver only deals in v4 names [Wed Dec 3 15:14:10 2014] the only name the ptserver knows about are the names reported by "pts listentries" [Wed Dec 3 15:14:15 2014] secureendpoints: ahhh so the client-fileserver protocol is not in v4 names or v4 ids. I should have known that. [Wed Dec 3 15:14:48 2014] there are no names passed in afs rpcs except for managing ids in the ptserver [Wed Dec 3 15:14:56 2014] names are obtained from tokens [Wed Dec 3 15:15:04 2014] secureendpoints: thanks that clears a few uncertainties I had. But one lingers and it is unrelated to this... [Wed Dec 3 15:15:24 2014] secureendpoints: but the client protocol includes the tokens. [Wed Dec 3 15:16:20 2014] all Rx connections that are authenticated are protected by a challenge / response exchange. It is via that exchange that Kerberos protocol (or GSS in the case of yfs-rxgk) is used [Wed Dec 3 15:16:24 2014] the uncertainty I have then is what is the role of numeric ids in AFS. It seems that they are vestigial. [Wed Dec 3 15:16:44 2014] the IDs are what are stored in the volumes on ACLs [Wed Dec 3 15:17:11 2014] An access control entry in AFS is ID := [Wed Dec 3 15:17:40 2014] ahhhhhhh, so the fileserver contacts the PT server to lookup v4 names into ids. [Wed Dec 3 15:18:31 2014] not just to lookup v4 names to IDs but to obtain the IDs of the group memberships [Wed Dec 3 15:19:29 2014] BTW the multiple dot issue I guess is not checked because with a proper Kerberos V4 server multiple-dot ones are not accepted so redundant to check before. But then that means that one can have in the PTDB names that are V5-compatible but v4-incompatible, but we don't care. [Wed Dec 3 15:20:10 2014] well, you don't care until you do [Wed Dec 3 15:20:31 2014] secureendpoints: that's why I always say to people who say "it works" :-) [Wed Dec 3 15:22:15 2014] I try to follow the RFC: tolerant in what I accept, rigorous in what I make, because as you hint other stuff is not necessarily so tolerant in what they accept... [Wed Dec 3 15:22:23 2014] Oh well [Wed Dec 3 15:23:19 2014] the bigger problem is that by permitting multiple periods you have ambiguous mappings of v5 names to v4 names. [Wed Dec 3 15:23:46 2014] secureendpoints: thanks a lot for clearing up my confusion, as I started using AFS/Kerberos a long time ago in v4 times, then a hiatus, and I was making wrong tentative assumptions about the transition mechanisms [Wed Dec 3 15:23:54 2014] That can result in two users accidentally sharing the same PTS ID or for an attacker to exploit [Wed Dec 3 15:25:36 2014] secureendpoints: I was discussing with my colleagues that and I was more worried about the opposite, that the same v4 name with multiple dots can map to more than v5 name, as the position of the "/" is not obvious, but then it was hard to imagine where that could be a problem if AFS etc. use the actual v5 names. [Wed Dec 3 15:26:06 2014] I can guess that if we used a convention where the dots in FQDNs were replaced by something else than a dot it would be better. [Wed Dec 3 15:26:36 2014] Because AuriStor's focus is security and system integrity its checks for invalid names are stronger. Your site would have a bit of work to do before it could deploy AuriStor protection servers to clean up the invalid names [Wed Dec 3 15:27:01 2014] you can certainly patch the code to use an alternate mapping [Wed Dec 3 15:27:24 2014] that is v5 principals with names like "service/server.example.com" being spelt instead as "service/server_example_com" [Wed Dec 3 15:27:54 2014] but then some !"£$%^ services enforce literal FQDNs as the first instance name. [Wed Dec 3 15:27:59 2014] host/server.example.com is mapped to rcmd/server [Wed Dec 3 15:28:12 2014] there are others [Wed Dec 3 15:29:22 2014] secureendpoints: that "is mapped" where does it happen? I remember principal-to-account mappings in '/etc/krb5.conf' but I guess that is not relevant here or is it? [Wed Dec 3 15:30:22 2014] I don't have the code in front of me but it will be in the src/rxkad directory [Wed Dec 3 15:31:22 2014] secureendpoints: so that seems to be a hardcoded algorithmic mapping... I may have a look and make a list. [Wed Dec 3 15:32:17 2014] its the same mapping that is included in the krb5 libraries [Wed Dec 3 15:34:44 2014] secureendpoints: as in hardcoded in the krb5 libs or as from 'auth_to_local' rewrite rules? Because if 'auth_to_local' applies to AFS tickets it would be wonderful. [Wed Dec 3 15:37:54 2014] this bit: http://web.mit.edu/kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html#realms [Wed Dec 3 15:38:32 2014] src/rxkad/ticket5.c is the source file where the ticket is decoded, for v5 tickets [Wed Dec 3 15:38:39 2014] I guess I can try to experiment a bit... [Wed Dec 3 15:38:46 2014] tkt_DecodeTicket5() [Wed Dec 3 15:40:40 2014] looking at it "/* Extract realm and principal */" so far [Wed Dec 3 15:41:18 2014] uhhh "disableCheckdot" vbut that's for before the "/" [Wed Dec 3 15:41:36 2014] krb5 auth_to_local is not used for afs. Its the hard coded krb5_524_convert_principal rules [Wed Dec 3 15:42:08 2014] secureendpoints: ahhh oh well, a quick-n-easy hack dream disappear... [Wed Dec 3 15:43:20 2014] what you want is the functionality that AuriStor provides. I understand that you can't have that at the moment. but it is what you want [Wed Dec 3 15:44:40 2014] uh still looking at 'ticket5.c' but a bit too tired to follow the details e.g. "and chop off instance's domain name if requested.". [Wed Dec 3 15:46:53 2014] secureendpoints: we can make do with unadorned OpenAFS 1.6 for now because it is a medium size but essentially internal-only, that is "contained" cell. So we can change things to suit ourselves. But we have a bit of a Debian-like culture in which every change in conventions however bizarre they be is suspect and has to be discussed in fine detail by very "academic" and bright people :-) [Wed Dec 3 15:48:06 2014] and mild trolling is an appreciated and cultivated art form :-) [Wed Dec 3 15:48:48 2014] I guess some people here may be familiar with such environments :-) [Wed Dec 3 15:49:43 2014] I am very familiar with such environments [Wed Dec 3 15:50:36 2014] :-) :-) :-) [Wed Dec 3 15:50:39 2014] Of course the YFS staff is equally bright and our solutions tend to make security minded folks drool [Wed Dec 3 15:51:34 2014] I think we are off topic enough for this channel. contact me privately when you want to know more [Wed Dec 3 15:51:40 2014] secureendpoints: we have been discussing YFS of course, and we drool over it, but not yet. We hope eventually to offer a wider service. [Wed Dec 3 15:52:14 2014] speaking of drool. I need to take care of my dog. back later [Wed Dec 3 15:53:05 2014] ok, thanks a lot, I am about to got to sleep too. I am happy for the explanation above, I'll try to write it up somewhere public like my filesystems page and AFS wiki (behind on the latter). [Thu Dec 4 12:09:29 2014] hi guys [Thu Dec 4 12:11:13 2014] I should be installing on centos 7 fairly soon, and they don't have an rpm in their repos. [Thu Dec 4 12:11:29 2014] So I'm going to install: openafs-release-rhel-1.6.10-1.noarch.rpm. [Thu Dec 4 12:11:51 2014] Just wondering if anyone has any experience with this, any hints, caveats, etc. [Thu Dec 4 12:12:38 2014] openafs.org does not provide rhel7/centos7 rpms; even if that package installs it won't give you a usable repo [Thu Dec 4 12:13:41 2014] geekosaur: I may have got the wrong rpm. Is that the repo install rpm not the source for openafs? [Thu Dec 4 12:13:59 2014] that is the repo install rpm, yes [Thu Dec 4 12:14:03 2014] geekosaur: And, maybe I should ask what's the best way of installing openafs on centos 7. [Thu Dec 4 12:14:37 2014] afaik http://openafs.org/dl/openafs/1.6.10/openafs-1.6.10-2.src.rpm [Thu Dec 4 12:22:06 2014] geekosaur: Not that it matters, but will openafs provide rpms for rhel7/cento7 in the future? [Thu Dec 4 12:26:08 2014] no [Thu Dec 4 12:47:23 2014] Ok, have those files/archives in /root/rpmbuild/SOURCES, any guide for building from there? [Thu Dec 4 12:50:58 2014] should just be rpmbuild -ba /root/rpmbuild/SPECS/openafs.spec [Thu Dec 4 12:52:02 2014] geekosaur: I'll redo the extraction as though. The folks in #centos chided me for doing it as root. [Thu Dec 4 12:52:28 2014] yep [Thu Dec 4 12:54:50 2014] geekosaur: For now and future reference, I've got /home/ To execute the rpm -ba, should I just be at $HOME and execute it with the path? [Thu Dec 4 12:55:54 2014] either way, I usually do it from ~ when just building and from SOURCES when debugging/testing stuff [Thu Dec 4 12:56:27 2014] am I building or debugging :-) Hopefully the former, eh? [Thu Dec 4 12:56:39 2014] hopefully [Thu Dec 4 13:23:03 2014] geekosaur: Not so easy. Getting errors about unmet dependencies, which isn't accurate. [Thu Dec 4 14:15:37 2014] Apparently that src.rpm was for a much older kernel. [Thu Dec 4 14:19:31 2014] if so then you need a 1.6.11pre [Thu Dec 4 14:19:39 2014] note "pre" [Thu Dec 4 14:23:18 2014] geekosaur: Where's that hiding? I was just on openafs.org and didn't see it. [Thu Dec 4 14:24:28 2014] There's an announcement at https://lists.openafs.org/pipermail/openafs-announce/2014/000476.html [Thu Dec 4 14:26:00 2014] kaduk_: Thanks. Maybe I'll give the `pre` a try. [Fri Dec 5 12:29:21 2014] can i use vos changeaddr -oldaddr < > -newaddr < > after a server renumbering followed by a syncvldb hosed my vldb? [Fri Dec 5 12:29:48 2014] I'm pretty sure that "don't use vos changeaddr" is still true. [Fri Dec 5 12:30:08 2014] Have you restarted the fileserver bnode on the affected address? [Fri Dec 5 12:34:07 2014] kaduk_: yes, for a while the dnscache was poisoned with both the old and new addrs coexisting [Fri Dec 5 12:34:37 2014] kaduk_: i also deleted /var/openafs/sysid and restarted fs bnode [Fri Dec 5 12:34:43 2014] So, when the fileserver starts up, it attempts to register the addresses for itself with the vldb. [Fri Dec 5 12:35:06 2014] deleting the sysid may not have been a good idea, though this is not exactly my area of expertise. [Fri Dec 5 12:35:07 2014] But deleting sysid will mean that it's now registering with a new UUID, so it won't move the addresses from the old fileserver to itself [Fri Dec 5 12:35:23 2014] Right. [Fri Dec 5 12:35:31 2014] kaduk_: i am not sure... vos listaddrs shows both the old/wrong IP and the new qualified address [Fri Dec 5 12:35:35 2014] So, you probably just want vos changeaddres -remove for the old addresses, I think. [Fri Dec 5 12:35:39 2014] And doing the syncvldb before registering the new addresses may mean that non-MH entries are now registered for that fileserver [Fri Dec 5 12:36:42 2014] vos syncvldb shows RW and RO entries still assigned to the old IP, with one new RO assigned to the new IP [Fri Dec 5 12:37:06 2014] sxw: what is non-MH? [Fri Dec 5 12:37:13 2014] "old format" [Fri Dec 5 12:37:40 2014] non-multi-homed [Fri Dec 5 12:37:59 2014] There are two different types of address entries in the vldb - old ones that correspond to non-multihomed fileservers, and new (MH) ones that are registered for multi-homed fileservers. MH fileservers also have UUIDs, which is what you'll find in the sysid file. [Fri Dec 5 12:38:06 2014] FWIW, these are old /vicep partitions on freebsd-10.1-RELEASE with fresh openafs [Fri Dec 5 12:39:18 2014] vos listaddrs -printuuid show unique UUID for each of the two IP addresses [Fri Dec 5 12:39:30 2014] That's as expected, since you deleted the sysid file. [Fri Dec 5 12:40:18 2014] should I multihome the fileserver to include both IP addr and then use vos move -old IPold -new IPnew ? [Fri Dec 5 12:41:06 2014] The easiest way to fix this would be to restore the old sysid file. [Fri Dec 5 12:41:52 2014] sxw: i do not have the sysid file from before the IP address change [Fri Dec 5 12:42:50 2014] by "old" we are referring to IBM AFS 3.3 and earlier [Fri Dec 5 12:43:17 2014] So the situation you have at the moment is that you have a fileserver (old-UUID) on which the vldb thinks all of the volumes are still resident. And you have a new fileserver (new-UUID) on which the volumes are really present. [Fri Dec 5 12:43:52 2014] The fact that these are actually the same machine is irrelevant - the changed sysid means that to OpenAFS they are completely different servers. [Fri Dec 5 12:47:04 2014] the sysid file could be recreated from the UUID returned from the VLDB. [Fri Dec 5 12:47:35 2014] secureendpoints: I was too chicken to actually look at the code and see how annoying that would be, before suggesting it ;) [Fri Dec 5 12:49:28 2014] sxw: that is the situation [Fri Dec 5 12:51:37 2014] can i blow away the contents of /var/openafs/db and rebuild with vos syncvldb ? [Fri Dec 5 13:52:04 2014] OK. vos changeloc [Fri Dec 5 14:08:25 2014] I have a user in a group and the group has full acls on a directory, the user cannot cd into the directory, shouldn't she be able to? [Fri Dec 5 14:08:50 2014] "usually". [Fri Dec 5 14:09:03 2014] Does she have a valid token? Has the group membership changed recently? [Fri Dec 5 14:09:12 2014] yes to both [Fri Dec 5 14:09:17 2014] 'aklog -force' may help, as the fileserver caches acls. [Fri Dec 5 14:10:18 2014] ok, will try that [Fri Dec 5 14:40:27 2014] yeah, it seems to have been a cache issue, thanks [Fri Dec 5 15:31:46 2014] cache or check? [Fri Dec 5 15:53:39 2014] cache it seems [Fri Dec 5 18:52:03 2014] so, i'm being told that osi_vnhold() and it's aliases can't work on NetBSD [Fri Dec 5 18:52:18 2014] *its [Fri Dec 5 18:53:15 2014] Told by whom? [Fri Dec 5 18:54:41 2014] netbsd people who've supposedly looked at it [Fri Dec 5 18:54:59 2014] Hmm. [Fri Dec 5 18:55:06 2014] Are they more specific about what "can't work" means? [Fri Dec 5 18:55:07 2014] the trouble is apparently that vget() can fail [Fri Dec 5 18:55:16 2014] Oh. Uh. [Fri Dec 5 18:56:27 2014] it looks like some sort of similar issue came about for darwin 8 [Fri Dec 5 18:59:25 2014] anyway, there are a lot of places that not having a osi_vnhold() breaks; and i have no idea how i can fix them all. and i'm not even sure if this is actually related to the afs stalling issue i see with my current code [Fri Dec 5 19:00:32 2014] basically the issue is that i run out of vcache slots when using a 64MiB memcache [Fri Dec 5 19:01:12 2014] but when i use a 7-ish GB disk cache i end up deadlocking the kernel vnode system somehow [Fri Dec 5 19:02:21 2014] the code that fails these ways is at https://github.com/jakllsch/openafs/tree/WIP if anyone is curious [Fri Dec 5 19:03:56 2014] (i've since started another branch looking at replacing osi_vnhold() for netbsd) [Fri Dec 5 19:04:06 2014] (but i've not even got that compiling yet) [Fri Dec 5 19:06:16 2014] and i'm not sure why vcache slots aren't becoming available and i don't have a very good clue as to where to look [Fri Dec 5 19:07:19 2014] and i'm still not sure how vnode use counts and hold counts are different [Fri Dec 5 19:14:26 2014] I could possibly say something about the latter, but not right now (dinnertime) [Fri Dec 5 19:15:49 2014] Though, a lot of it could probably be gleamed from the links at the bottom of https://lists.freebsd.org/pipermail/freebsd-current/2014-November/053510.html [Fri Dec 5 19:20:29 2014] Perhaps these people are just talking about the DOOMED state? [Fri Dec 5 19:37:09 2014] So, the refcount is for people who are accessing the contents of the vnode (but they need to take locks for active access like reads and writes), and the usecount is more for administrative bookkeeping, like being on the free list or such. I think. [Fri Dec 5 19:44:04 2014] I guess they're talking about http://fxr.watson.org/fxr/source/kern/vfs_vnode.c?v=NETBSD#L555 [Fri Dec 5 19:45:14 2014] so, the thing is; i have something that sort of works; and i'm not sure if that means anything [Fri Dec 5 19:45:16 2014] (FreeBSD's vget will also return an error if vn_lock() fails, but that shouldn't really happen) [Fri Dec 5 19:45:27 2014] It probably doesn't mean very much. [Fri Dec 5 19:48:28 2014] what sort of VFS system does OpenAFS tend to assume? solaris-like? [Fri Dec 5 19:49:04 2014] I think so, but I am too new to really be authoritative. [Fri Dec 5 19:50:13 2014] starting to wonder if i shouldn't go back to trying to get libuafs working [Fri Dec 5 19:51:00 2014] "It builds with libtool, now" [Fri Dec 5 19:52:05 2014] last time i tried libuafs (maybe 4-6 years ago) i ended up discovering a giant ifdef mess and that it a lot of the structures it expected to use were kernel-internal [Fri Dec 5 19:52:33 2014] That sounds about right. :( [Fri Dec 5 20:05:53 2014] I think you may want to just use vref instead of vget [Fri Dec 5 20:08:27 2014] (like FreeBSD does) [Fri Dec 5 20:09:34 2014] Perhaps you need something slightly more complicated, but I don't think vget is quite what is needed for osi_vnhold. [Fri Dec 5 21:31:05 2014] well, using vref doesn't make it worse [Fri Dec 5 21:48:58 2014] at least in the memcache case. disk cache case is a little bit worse. [Fri Dec 5 21:49:36 2014] Oh, right, disk cache. There may be an easy thing. [Fri Dec 5 21:51:17 2014] oh; any idea if memcache or disk cache is a better initial testing setup? [Fri Dec 5 21:52:14 2014] memcache has less moving parts, and is probably no longer so unloved so as to be randomly broken, so I would probably start with memcache. [Fri Dec 5 21:54:35 2014] as it stands; i've been able to get hours out of disk cache doing things like git gc of the openafs tree and cvs updates of a few netbsd trees; but git gc on openafs under memcache always hangs at the 81% mark when writing out files [Fri Dec 5 21:55:28 2014] seems it's run out of cache slots when it stalls [Fri Dec 5 21:55:30 2014] Huh, git gc tends to crash my freebsd memcache client. [Fri Dec 5 21:55:35 2014] Maybe I should try disk cache :) [Fri Dec 5 21:55:55 2014] Anyway, I was just checking for LOCKLEAF in gop_lookupname, but it seems that you already do the right thing. [Fri Dec 5 21:56:07 2014] the disk cache case ended up with total system deadlock instead of just afs deadlock [Fri Dec 5 21:56:29 2014] but who knows if that'd happen on freebsd [Fri Dec 5 21:58:41 2014] Yeah, the disk cache has more locks to deadlock with. [Fri Dec 5 21:59:19 2014] I can "fairly easily" get the freebsd memcache client stuck in a state where it seems to be waiting for a data packet to show up (if I remember and did the analysis correctly, which is perhaps questionable). [Fri Dec 5 21:59:46 2014] So I was toying with writing an upcall receive stack instead of relying on the listener, but didn't get terribly far. [Fri Dec 5 22:02:09 2014] i seem to recall a similar state in the netbsd port a while back. i'm not sure it's gone, but i've not noticed it recently. [Fri Dec 5 22:02:52 2014] anyway, i guess my plan for avoiding systemd isn't going to be any easier than i'd hoped [Fri Dec 5 22:03:17 2014] and of course smf is just as bad as systemd [Fri Dec 5 22:04:51 2014] guess it's time to quit banging my head against this wall tonight. thanks for the attention. [Sat Dec 6 16:52:08 2014] vos changeloc was helpful after an OS upgrade led to a botched vldb and sysid [Mon Dec 8 01:28:44 2014] exit [Mon Dec 8 01:45:53 2014] Hi all, [Mon Dec 8 01:46:29 2014] I'm new in this channel and new to OpenAFS. [Mon Dec 8 01:47:00 2014] Currently I'm working on a test at work to see if OpenAFS can work for us. [Mon Dec 8 01:47:31 2014] Since this is my first "descent" install ever of OpenAFS I'm running into trouble every once in a while :-) [Mon Dec 8 01:47:55 2014] Now I'm having a strange thing: [Mon Dec 8 01:48:33 2014] I'm running my servers (1 file/vldb server, one file server and one vldb server) on CentOS6.6 [Mon Dec 8 01:49:04 2014] The clients for the moment are Mac OSX Yosemite and will be CentOS as well [Mon Dec 8 01:49:53 2014] I have been trying to work with repcliated volumes but somehow it does not work properly. [Mon Dec 8 01:50:14 2014] I think it has to do with the way the root.afs volume is mounted. [Mon Dec 8 01:50:50 2014] If I go to /afs/mycell.org/whatever/directory/ it is always RW mounted even though it is not the dotted path. [Mon Dec 8 01:51:30 2014] I have checked all the volumes in the parent mount path, they are all mounted RO and replicated. Except one, the one on /afs [Mon Dec 8 01:52:25 2014] I can't seem to get the ACL's nor get the quota of it. When I try to list them, I get "Connection timed out" [Mon Dec 8 01:52:56 2014] I think it has something to do with dynroot being on or of, but I'm not sure. [Mon Dec 8 01:55:09 2014] I have made a similar setup at home but with Debian as a base OS. While setting up my own cell, I tried to replicate the behaviour of the described above and I could actually list ACLS's and stuff from /afs [Mon Dec 8 01:55:33 2014] Now as I'm writing this (to double check), I can't do it anymore? [Mon Dec 8 01:55:47 2014] Anyone an idea why what and where? [Mon Dec 8 01:55:51 2014] Thanks in advance! [Mon Dec 8 05:07:22 2014] bucovaina: that's pretty vague... [Mon Dec 8 05:07:55 2014] bucovaina: usual suggestions: logs, check carefully the paths with 'fs listquota' etc. [Mon Dec 8 05:08:30 2014] bucovaina: and look at using 'rxdebug' and 'udebug' to verify you can connect to the right places [Mon Dec 8 05:08:59 2014] bucovaina: but in particular check that the mount points use the right volume names and ids. [Mon Dec 8 05:10:07 2014] bucovaina: if you have 'dynroot' check the DNS entries, and also note that multihoming and NAT can cause complications. [Mon Dec 8 07:51:01 2014] hello, someone could explain me how the remote filesystem is merged in the logical tree filesystem in afs? [Mon Dec 8 08:07:24 2014] bucovaina: it's also worth noting that you can explicitly mount the root.afs volume somewhere else in the hierarchy, if your client is running with -dynroot. The subcommand is "fs mkmount". [Mon Dec 8 08:08:50 2014] timelag_: that's more the responsibility of the kernel's VFS layer than AFS itself -- AFS just provides normal VFS-layer routines such as to get the root vnode of the AFS namespace, do lookup operations, read/write, etc.. [Mon Dec 8 09:20:42 2014] Hi, where is the configuration file for the fileserver process ? The man page says that you can pass it a lot of options but I don't start it by hand, so I guess there is some kind of configuration file somewhere to configure the passed options ? [Mon Dec 8 09:34:13 2014] BosConfig, which you do not normally edit directly (if you do while bosserver is running, it will be overwritten). instead: `bos status localhost -long` to see the instance definitions, and use `bos delete` and `bos create` to remove an existing one and create a new one with different parameters [Mon Dec 8 09:42:53 2014] Mh yeah, it works, I have an audit log [Mon Dec 8 09:43:39 2014] But it doesn't give the full path of the created or modified file, so that won't do [Mon Dec 8 09:45:27 2014] And as expected inotify doesn't work [Mon Dec 8 09:45:44 2014] Well that's unfortunate, it has been a lot of effort for nothing :( [Mon Dec 8 11:28:54 2014] hi guys [Mon Dec 8 11:30:13 2014] Not sure what happened to my log, it seems truncated, but if someone could post the url that, I think, billings, posted late last week, as to where he had openafs rpms for centos 7. [Mon Dec 8 11:30:24 2014] That would be great. [Mon Dec 8 11:36:16 2014] not billings and not in irc, afaict; perhaps you meant https://lists.openafs.org/pipermail/openafs-info/2014-October/041078.html [Mon Dec 8 11:41:45 2014] geekosaur: Ok, I'm gonna check in kerberos too, but I'll check your reference. Thanks! [Mon Dec 8 11:47:37 2014] geekosaur: Would I need to install *all* of the rpms there? (Unless I don't want server, i.e.) [Mon Dec 8 11:49:51 2014] FWIW vos changeloc was the right way to correct the problem of changing my fileserver IP with a lagging DNS change for the fileserver's name [Mon Dec 8 11:52:01 2014] openafs-client, openafs-krb5, either the appropriate kmod-openafs for your kernel or openafs-dkms at minimum [Mon Dec 8 11:53:56 2014] geekosaur: Sounds good. I also am looking at this: http://www.dartmouth.edu/comp/soft-comp/datastorage/afs/afs-linux.html [Mon Dec 8 11:54:03 2014] Does that look decent? [Mon Dec 8 11:55:34 2014] geekosaur: In getting the rpms, any thought on using scp vs. rsync to copy them over to centos? [Mon Dec 8 11:55:35 2014] I would not include openafs-kpasswd unless you're using kaserver [Mon Dec 8 12:00:49 2014] geekosaur: If you look at the kmod rpms, there's an oddity: the most recent appears to be an older kernel version, where the last listed kmod-openafs-1.6.10-2.3.10.0_123.el7.x86_64.rpm, is my kernel version. [Mon Dec 8 12:01:07 2014] Would you go with the last listed then? [Mon Dec 8 12:01:51 2014] so they have not built new ones for newer kernels. you'll have to ask squinney, and be aware that you may need to get a 1.6.11pre for sufficiently new kernels [Mon Dec 8 12:02:38 2014] maybe look in /afs/inf.ed.ac.uk/group/afsbuild/1.6.11pre1/rhel7/x86_64 [Mon Dec 8 12:03:01 2014] geekosaur: My kernel is: 3.10.0-123.el7.x86_64. So wouldn't the last kmod be correct? [Mon Dec 8 12:03:07 2014] also I feel like I'm repeating past discussions with you. yes, it'd be nice if the linux kernel didn't randomly change kernel interfaces [Mon Dec 8 12:03:38 2014] geekosaur: Sorry. But repetition is the mother of learning, eh? [Mon Dec 8 12:03:58 2014] 3.10.0 is not 2.3.10 [Mon Dec 8 12:04:17 2014] the directory I just pointed you at has 3.10.0 kernel modules [Mon Dec 8 12:05:07 2014] geekosaur: In that directory, it looks to me like that -2 goes with the kmod, not the kernel version, which is the same as mine. No? [Mon Dec 8 12:06:15 2014] why would the kmod for openafs have a completely different version number from openafs itself? [Mon Dec 8 12:06:17 2014] billings: Ah, there you are! Would you mind reposting the repo you recommended for rpms for centos7? [Mon Dec 8 12:06:25 2014] 1.6.10 or 1.6.11pre1 is the openafs version [Mon Dec 8 12:06:38 2014] 2.3.10 or 3.10.0 is the kernel version [Mon Dec 8 12:07:13 2014] -rw-r--r-- 1 28139 28139 347992 Oct 10 07:38 kmod-openafs-1.6.10-2.3.10.0_123.el7.x86_64.rpm. This is the listing I get. [Mon Dec 8 12:07:37 2014] So, openafs version 1.6.10-2, and the kernel the same version as mine. [Mon Dec 8 12:08:13 2014] geekosaur: Are we looking at the same place? [Mon Dec 8 12:08:19 2014] maybe I am misparsing [Mon Dec 8 12:08:44 2014] geekosaur: It is kind of a weird looking file name, but that's how I read/parse it. [Mon Dec 8 12:08:45 2014] let's put it this way, you can install it and see if it works [Mon Dec 8 12:08:56 2014] if the kernel refuses to load it then you lose [Mon Dec 8 12:09:49 2014] linux doesn't play particularly nice with kernel interfaces; in general you're better off using dkms-openafs [Mon Dec 8 12:12:29 2014] geekosaur: So, it's using either the dkms *or* the kmod, and you'd recommend the dkms? [Mon Dec 8 12:13:30 2014] yes, because if you use the kmod then you're screwed if you install a new kernel and the old kmod won't work against it until someone updates your rpm/yum repo with a build for the new kernel [Mon Dec 8 12:13:47 2014] with dkms you at least have a fighting chance of building a new module for the new kernel [Mon Dec 8 12:14:21 2014] geekosaur: Ok, thanks for your help. I'm gonna install and get the openafs client working. [Mon Dec 8 12:14:31 2014] geekosaur: Will let you know if I succeed. [Mon Dec 8 12:15:10 2014] (dkms will build the new module during boot, if it fails then the openafs client won't start) [Mon Dec 8 12:16:12 2014] Doesn't look like the order of installation is important, so I'll just do: yum install openafs*.rpm dkms*.rpm, then reboot. [Mon Dec 8 12:44:02 2014] geekosaur: Bummer, it appears that the openafs-client from that source requires the kmod pkg, so was built with it, I guess. [Mon Dec 8 12:48:32 2014] geekosaur: I worked around it. I also installed just openafs, then the requirements were met for the rest. [Mon Dec 8 12:49:11 2014] Good thing I didn't try to install openafs-kpasswd. [Mon Dec 8 12:49:16 2014] file /usr/bin/kpasswd from install of openafs-kpasswd-1.6.10-2.el7.x86_64 conflicts with file from package krb5-workstation-1.11.3-49.el7.x86_64 [Mon Dec 8 12:50:56 2014] [08 16:55] I would not include openafs-kpasswd unless you're using kase [Mon Dec 8 12:51:15 2014] *kaserver [Mon Dec 8 12:52:43 2014] Right, conflicting instructions, but I'm *not* installing it. [Mon Dec 8 12:52:55 2014] geekosaur: Not conflicting from you, however. [Mon Dec 8 12:53:11 2014] then their build differs from the one you would normally get [Mon Dec 8 12:53:28 2014] because it's unbundled into its own package for a reason and it should not be pulled in by a dependency [Mon Dec 8 12:53:49 2014] which suggests you should be building your own [Mon Dec 8 12:54:07 2014] which you seem to be doing anything you can think of to avoid [Mon Dec 8 12:55:38 2014] geekosaur: Maybe, but that's from some chiding I got some years ago. I almost *always* built from source, and was pretty roundly criticized for it. I think I may have been mainly on fedora at the time. [Mon Dec 8 12:56:15 2014] But it was, don't build from source when there are pkgs available for your os! [Mon Dec 8 12:56:30 2014] which there are NOT [Mon Dec 8 12:56:46 2014] except occasionally people put up their packages built for their specific environments [Mon Dec 8 12:56:55 2014] wich you seem to think are guaranteed drop-ins [Mon Dec 8 12:58:55 2014] geekosaur: I'm not quite that naive :-) But that did affect the way I first try to do things any more. [Mon Dec 8 12:59:59 2014] There's a little bit of bias against openafs in some circles. With my gentoo boxes, seems like there's always issues, because the devs don't keep up with the latest versions/patches. [Mon Dec 8 13:00:22 2014] And when I ask about it, it's like, why would we maintain what hardly anyone uses? [Mon Dec 8 13:00:49 2014] debians's not too bad. But I think we're a few versions behind. [Mon Dec 8 13:00:50 2014] So pick a platform that supports the applications you require [Mon Dec 8 13:01:00 2014] like Debian [Mon Dec 8 13:01:01 2014] secureendpoints: Right. [Mon Dec 8 13:01:07 2014] or Scientific Linux [Mon Dec 8 13:01:32 2014] secureendpoints: debian is my openafs-server. [Mon Dec 8 13:01:41 2014] or purchase a support contract with a provider to ensure that you have the appropriate packaging [Mon Dec 8 13:01:56 2014] I'm gonna give SL a try in a vm on this centos box. [Mon Dec 8 13:02:07 2014] I have heard good things about it. [Mon Dec 8 13:02:35 2014] secureendpoints: Do have a preference for Debian vs. SL, or vice-verse? [Mon Dec 8 13:02:50 2014] the fact is that tracking Linux kernels (which are custom per Linux distribution) and developing packaging for each Linux distribution is extremely time consuming and therefore expensive. [Mon Dec 8 13:05:01 2014] I prefer Solaris [Mon Dec 8 13:05:41 2014] and Windows and OSX and other operating systems that don't break third party file systems every other week [Mon Dec 8 13:08:11 2014] secureendpoints: True, my win 7 openafs clients are quite stable. [Mon Dec 8 13:14:42 2014] up, running and in my cell! Thanks so much for the help guys! [Mon Dec 8 13:30:08 2014] johnfg: I"m working on getting openafs and openafs-kmod building in CentOS7's Storage SIG [Mon Dec 8 13:30:51 2014] I still don't have bits to add to their build system though [Mon Dec 8 13:31:31 2014] billings: Ok, I'll look for your note about it. Thankfully the rpms that geekosaur recommended have me working. [Mon Dec 8 13:31:52 2014] from this centos openafs-client [Mon Dec 8 13:33:32 2014] yeh, I lost track of the number of times some kernel upgrade on centos6 broke me even with openafs-dkms until the next openafs release. good thing I have a Mac to fall back on when Linux is stupid again [Mon Dec 8 13:34:08 2014] (and this is why I really want openafs on freebsd; *they* have the concepts of stable abi and release engineering) [Mon Dec 8 13:38:51 2014] geekosaur: So, is openafs not on/available for freebsd? [Mon Dec 8 13:39:05 2014] I ran bsdos, back in the day. [Mon Dec 8 13:39:23 2014] But that was before I was using openafs. [Mon Dec 8 13:39:49 2014] it's "available" but not particularly stable; kaduk has been working on it in his "copious free time" (hah) [Mon Dec 8 13:59:28 2014] FWIW, I've not found the centos-6 kernel updates to be too broken lately [Mon Dec 8 13:59:50 2014] but I also tend to follow the latest openafs releases, so I don't see problems with older versions of openafs on newer centos kernels [Mon Dec 8 14:43:37 2014] OpenAFS servers on freebsd should be just fine; it's only the client that is not ready for heavy usage. [Mon Dec 8 16:23:50 2014] * added a dkms-openafs subpackage to his openafs-kmod spec file [Mon Dec 8 16:23:50 2014] https://copr.fedoraproject.org/coprs/jsbillings/openafs-kmod/ [Mon Dec 8 16:24:07 2014] (to be used along side https://copr.fedoraproject.org/coprs/jsbillings/openafs/ ) [Mon Dec 8 16:25:06 2014] I'm playing around with making an 1.6.11pre1 version to test on f21 [Mon Dec 8 16:25:16 2014] since 1.6.10 doesn't work there. [Mon Dec 8 16:27:33 2014] FWIW, afs server on the fbsd is excellent for me. the client works as long as I use memcache. [Mon Dec 8 18:36:05 2014] I can second chrisb's observations; I run a server quite happily on FreeBSD/sparc64 of all things and have only had trouble while using disk caches on the clients. [Tue Dec 9 03:04:32 2014] Walex2: Ok sorry for the late reply. I'll have a look at all the logs and rxdeb/udebug you suggested! Thanks man! [Tue Dec 9 03:31:21 2014] bucovaina: here currently... [Tue Dec 9 03:33:34 2014] bucovaina: also check things with 'fs lsmount' [Tue Dec 9 03:34:42 2014] bucovaina: 'fs listquota /afs/.' for example [Tue Dec 9 03:35:15 2014] bucovaina: 'fs listquota /afs/.root.afs/.' also [Tue Dec 9 04:52:43 2014] Walex: thanks, when I do fs lq /afs/.root.afs/., I get "File '/afs/.root.afs/.' doesn't exist. [Tue Dec 9 04:53:01 2014] the former command you suggested results in "connection timed out" [Tue Dec 9 04:56:34 2014] Wales: looks I've had a small error in my CellServDB file. Rectified and rebooting now. Curious if it will help. [Tue Dec 9 05:14:08 2014] bucovaina: oops, temporarily distracted. [Tue Dec 9 05:14:37 2014] bucovaina: that "'/afs/.root.afs/.' doesn't exist" is not unexpected as you are using 'dynroot' [Tue Dec 9 05:15:11 2014] bucovaina: note the difference between the two 'CellServDB' files, the one for the clients and the one for the servers... [Tue Dec 9 09:38:08 2014] Hello there! We've had troubles with a 100% full filesystem yesterday and for some reason the AFS shares are not available anymore... When trying to list the shares in `/afs/domain` with `ls`, I get: `ls: cannot access share: No such device` and `ls: cannot access users: No such device` [Tue Dec 9 09:38:19 2014] I'm quite new to openafs and was wondering what I could do to debug this problem... [Tue Dec 9 09:41:31 2014] what does the logfiles for bos tells you, and maybe the client? what was full, didn´t you set the quota correct? [Tue Dec 9 10:22:55 2014] Amiga4000: The `/` on the AFS server was full, not sure if that has anything to do with the AFS failure though [Tue Dec 9 10:23:12 2014] I'll check the bos logs later on (meeting atm) [Tue Dec 9 11:46:35 2014] secureendpoints: When you recommended SL yesterday, what Base Environment would you recommend? This won't be production, yet, but will be a vm. [Tue Dec 9 14:30:48 2014] * really wishes deleting of groups via pts required some sort of extra flag. [Tue Dec 9 14:30:58 2014] Can't tell you how many times I've accidentally deleted a group when I meant a user [Tue Dec 9 14:32:45 2014] I'd be happy if it just required the group be empty first; that'd avoid most mistakes... [Tue Dec 9 14:42:06 2014] true [Tue Dec 9 14:42:37 2014] fortunately, for this group, users were named in such a way that re-populating the script was doable with a simple "for" script [Tue Dec 9 14:43:10 2014] The question is how do you make such a change without breaking existing scripts that assume that groups can be destroyed with members? [Tue Dec 9 14:43:35 2014] make it a configurable option in the server [Tue Dec 9 14:44:38 2014] the correct method to do that is to create a new RPC [Tue Dec 9 14:45:27 2014] you would still require a new command in vos [Tue Dec 9 14:45:40 2014] in vos? [Tue Dec 9 14:45:51 2014] sorry, pts [Tue Dec 9 14:46:19 2014] well, it depends on how you do it... if a separate command for deleeting groups over users, yes. If simply disallowing the deleting of non-empty groups, no [Tue Dec 9 14:47:19 2014] you are missing the point. you need to alter the command in order for scripts to know that the behavior changed or to prevent a script someone might write in the future from executing incorrectly using an older pts [Tue Dec 9 14:48:21 2014] that's why I said to make it a configurable server option. Default behavior is the current behavior, so that nothing breaks and such that the administrator has to specifically change the behavior. [Tue Dec 9 14:48:32 2014] Something very similar was done with allowing dots in user names. [Tue Dec 9 14:49:30 2014] its a very different type of issue [Tue Dec 9 14:49:44 2014] I disagree [Tue Dec 9 14:50:00 2014] authorization names are not known to the clients [Tue Dec 9 14:50:48 2014] If you require that a site remove thousands of users from a group before group removal, that is going to annoy some sites. [Tue Dec 9 14:51:06 2014] which is why one makes it a configurable option. Sites that don't want to be annoyed by it don't turn it on. [Tue Dec 9 14:52:51 2014] there are guarantees about the behavior of network RPCs. We do not change the semantics of an RPC on a server by server basis. You also then must be sure that the configuration is identical across all of your ptservers which OpenAFS has no method of enforcing. [Tue Dec 9 14:52:52 2014] there could also be a new command to remove all users from a group, which shouldn't need a new RPC since it's just coding inside pts the iterate over list of members stuff [Tue Dec 9 14:53:13 2014] and then issue N rpcs to do the removals [Tue Dec 9 14:53:37 2014] true, tho guaranteeing idential config across afs servers is already something the administrator needs to do if they want things to work well. So, that's not anything new. [Tue Dec 9 14:54:12 2014] We try hard to not give users or admins additional rope to hang themselves with. [Tue Dec 9 14:55:03 2014] I would argue in this case that it's rope that needs to be cut off. It's can be a pretty huge "oh sh*t" and bring the cell to a grinding halt "Oh sh*5" [Tue Dec 9 14:55:12 2014] because accidentally nuking a group with thousands of members is rope enough? [Tue Dec 9 14:57:10 2014] its documented rope. So you put in your fix which requires getting configuration correct to prevent the issue. Then in some number of years your current admin leaves and brings up a new server because of a hardware failure forgetting to set that config option. Suddenly your org is vulnerable again. [Tue Dec 9 14:57:23 2014] but you didn't think it was [Tue Dec 9 14:58:24 2014] That's really not good reasoning. Said new admin might also forget to apply security patches, so now the server is vulnerable to hacking. [Tue Dec 9 14:58:45 2014] We don't fix stuff like this with config options. The way you fix it is by creating a new RPC with different semantics and then implementing new commands that call the new RPC. The new RPC should have a "delete group with members" flag. and the new command has a -force option to override the detection of members. [Tue Dec 9 14:58:47 2014] Documentation is good, really good. However, really destructive and kill your cell type of operations should really have some sort of protection. [Wed Dec 10 11:28:05 2014] as I understand it, sockets don't work in AFS? (that is, binding to AF_FILE where the path is a path into AFS someplace)? [Wed Dec 10 11:32:25 2014] No [Wed Dec 10 23:20:35 2014] So is the bit about OpenAFS supporting only weak enctypes a legacy thing or is that still valid information? What enctypes are supported and which ones should I have in my keytab? Are those enctypes different from those I should have in the KeyFile? [Wed Dec 10 23:22:35 2014] I'm confused because my principal supports whatever default enctypes, but the keytab and the KeyFile contain only des-cbc-crc. [Wed Dec 10 23:22:59 2014] And I'm getting fits on my OSX client trying to use aklog. [Wed Dec 10 23:27:01 2014] OpenAFS from 1.6.5 will support Kerberos enctypes other than des-cbc-* for the afs/*@REALM key but it can still only use 56-bit fcrypt encryption for all on the wire operations. [Wed Dec 10 23:28:31 2014] OSX Yosemite has no support at all for des-cbc-* so you must upgrade all OpenAFS servers to 1.6.5 or later and upgrade the keys. However, you should be doing that anyway because otherwise you must assume that anyone that can execute aklog in your organization can brute force the AFS server key in less than a day. [Thu Dec 11 00:24:42 2014] Yes, I'm clued in to this by the log messages point to the cve, though reading the rekey doc doesn't make it clear (so far) what enctypes are supported and not supported. I'm not sure what you mean by the 56-bit max for on the wire operations. [Thu Dec 11 00:29:01 2014] I'm on 1.6.10 for the server and whatever the latest osx client is. I don't have any other services that require special enctypes, so should I just do the rekey and not worry about what legacy des-cbc-* used to be required? [Thu Dec 11 00:29:46 2014] I obviously don't want the bruteforcability, but its just me on this realm, because who needs free time. :) [Thu Dec 11 00:33:20 2014] Also, is the rxkad already installed for 1.6.10? I'd expect that these kind of fixes wouldn't need to be patched in future releases manually. Is that correct/reasonable? [Thu Dec 11 00:57:29 2014] Do I still need /etc/afs.keytab? Did I ever? [Thu Dec 11 00:57:47 2014] It seems that I no longer need the KeyFile as well after the rekey, correct? [Thu Dec 11 01:05:52 2014] I think I've got my rekey complete. I've replaced the /etc/afs.keytab (though I'd still like to know if I need it), removed the KeyFile, and added the rxkad.keytab to the server directory. All bos status on each server seems good, tokens etc all look good. [Thu Dec 11 01:06:41 2014] I'm still getting trouble with my osx client though. aklog -d gives: Kerberos error code returned by get_cred : -1765328228 and aklog: unknown RPC error (-1765328228) while getting AFS tickets [Thu Dec 11 01:12:49 2014] That 1765328228 error seems to indicate that the kdc can't be contacted, though I have a ticket, so I'm not sure what part is not available. [Thu Dec 11 01:13:22 2014] I'm going to bed. If anyone can offer suggestions, I'd love to hear it. I'll check back in later. Cheers [Thu Dec 11 03:33:51 2014] zleslie: Hi there, I'm rather new to OpenAFS myself so verify my claims please. [Thu Dec 11 03:34:42 2014] I am working with OpenAFS 1.6.10 and rekeyed my cell. I have kept the Keyfile since I think it is needed for inter server communication. [Thu Dec 11 03:35:13 2014] The clients I'm testing with are OSX Yosemite and I have them up and running. [Thu Dec 11 03:36:19 2014] Download the official Mavericks client and unpack it first. [Thu Dec 11 03:38:05 2014] zleslie: inside the .dmg installer there is another .pkg file. You can simply cd into it. [Thu Dec 11 03:38:20 2014] zleslie: go to /OpenAFS.pkg/Contents/Resources/InstallationCheck [Thu Dec 11 03:38:55 2014] zleslie: and change the line that says ----> if [ $majorvers -ne 14 ]; then <---- [Thu Dec 11 03:39:00 2014] zleslie: to [Thu Dec 11 03:39:18 2014] zleslie: sorry typo, I start again [Thu Dec 11 03:39:28 2014] and change the line that says ----> if [ $majorvers -ne 13 ]; then <---- [Thu Dec 11 03:39:31 2014] to [Thu Dec 11 03:39:33 2014] and change the line that says ----> if [ $majorvers -ne 14 ]; then <---- [Thu Dec 11 03:39:52 2014] zleslie: so replace 13 by 14. [Thu Dec 11 03:40:04 2014] zleslie: now it should run the installer fine. [Thu Dec 11 03:41:00 2014] zleslie: On my clients I needed to change /var/db/openafs/etc/CellServDB manually. The OpenAFS.prefpane didn't do it for me. [Thu Dec 11 05:17:50 2014] Walex2: The problem has been solved. Apparently all I was seeing is normal behaviour but unexpected form my (newbie's) point of view [Thu Dec 11 05:18:44 2014] Walex2: I was testing volume replication sites. The undotted path I was in stayed writable even after volume release. I assumed that the root.afs volume [Thu Dec 11 05:19:07 2014] Walex2: was mounted RW causing all the "child mount points" being RW as well. [Thu Dec 11 05:19:29 2014] Walex2: Testing this morning all of a magical sudden, I got the expected behaviour. [Thu Dec 11 05:20:24 2014] Walex2: It was due to the cache manager only updating the volume mappings hourly ... [Thu Dec 11 05:20:50 2014] Walex2: "fs checkvolumes" gives me the expected behaviour immediately. [Thu Dec 11 05:21:12 2014] Walex2: And thanks for the rxdebug and udebug hints, fill explore them shortly :-) [Thu Dec 11 05:36:02 2014] bucovaina: ah yes, the cache manager, I was tempted to suggest refreshing that. [Thu Dec 11 05:37:11 2014] bucovaina: IIRC that can be done by 'fs newcell' (with the old values) or 'fs flushmount' but then you need to know the mountpoints [Thu Dec 11 05:37:53 2014] bucovaina: in your case the mountpoint was obvious though :-) [Thu Dec 11 05:38:21 2014] Walex: Yes it was indeed :-) [Thu Dec 11 05:39:47 2014] Walex: is there a reason why I should not use fs checkvolumes? Perhaps a lot of clients rechecking and some added load? [Thu Dec 11 05:40:08 2014] bucovaina: 'fs checkvolumes' discards some other bit of cached information, useful too. But 'fs newcell' seems to do the whole lot as it reinitializes almost as if in a reboot [Thu Dec 11 05:41:05 2014] Walex: OK thanks, I'll read the manpage a bit more thoroughly. At first sight fs newcell is something I'd never issue in an existing one :-) [Thu Dec 11 05:54:17 2014] bucovaina: the "trick" is to use in 'fs newcell' the same values as before. [Thu Dec 11 05:55:43 2014] bucovaina: I don't know what it does to *existing* connections, but I suspect it is not a big deal, because AFS is sort of stateless. [Thu Dec 11 05:56:43 2014] Walex: based on the fact that it's all UDP? [Thu Dec 11 07:55:04 2014] bucovaina: UDP helps a bit, but mostly it is that "callbacks" can be recalled at any time, etc. so the client is designed to "reconnect" at any time. [Fri Dec 12 20:01:23 2014] Not to ask (another) dumb question, but... The documentation for 'vos clone' makes it sound like it updates the VLDB ("but the clone volume remains in the VLDB"), but when I run a command like 'vos clone -server batman -part viceps -id test.replication.readonly -readonly -verbose' the resulting volume ID exists on the server with name ".clone" but the VLDB doesn't believe a word of it; 'vos listvldb [Fri Dec 12 20:01:25 2014] NNNNNNNN' just says "no such entry". [Fri Dec 12 20:01:30 2014] What am I doing wrong this time? :) [Fri Dec 12 21:25:49 2014] http://docs.openafs.org/Reference/1/vos_clone.html [Fri Dec 12 21:25:58 2014] "This command is not used during normal OpenAFS administration and may have adverse effects on the VLDB if not used properly! This command should only be used by an expert." [Fri Dec 12 21:26:23 2014] nwf: what are you attempting to do? more than likely vos clone isn't the way to do it. [Fri Dec 12 21:30:41 2014] I was attempting to mimic vos release -stayonline by a combination of vos clone and vos shadow. I realize this is playing with fire, but I do not understand what's wrong with -stayonline well enough to fix it. I was sort of hoping that the tools would let me step through the process by hand to aid in understanding. [Fri Dec 12 21:31:06 2014] The vos clone commands I've issued do work (sort of) but vos shadow refuses to do anything to an id not in the VLDB. [Fri Dec 12 21:31:56 2014] nwf: what is your end-goal? e.g. why are you running these commands? [Fri Dec 12 21:32:45 2014] How far back up the stack would you like me to go? [Fri Dec 12 21:33:22 2014] is this already discussed in this channel? [Fri Dec 12 21:33:32 2014] A long while ago. I'll summarize. [Fri Dec 12 21:33:59 2014] The jhuacm cell tries to replicate its mirror volumes across two servers so that they remain available even when the... charming... cloud storage system (Ceph) is down. Admittedly, this happens less often these days. [Fri Dec 12 21:34:50 2014] and just using normal RO clones doesn't do what you want? [Fri Dec 12 21:34:53 2014] Periodically, 'vos release' errors out and leaves the VLDB locked and replicas in inconsistent states. The next 'vos release' deletes the replicas, but often fails in doing this right, and so it takes a lot of 'vos zap' and 'vos release' to fix the problem. [Fri Dec 12 21:35:35 2014] These volumes are upwards of a TB a piece and our servers seem able to move data at 10MB/s at peak. [Fri Dec 12 21:35:40 2014] hmm... I guess I'd ask why vos release is failing [Fri Dec 12 21:36:08 2014] can you split the volumes up into smaller ones? [Fri Dec 12 21:36:28 2014] Probably not easily. [Fri Dec 12 21:36:46 2014] 1TB seems like a very large AFS volume to me. this was several years ago, but I was splitting items into 10GB volumes to more easily manage it [Fri Dec 12 21:37:51 2014] Well, Fedora, 19,20,21, and rawhide add up to just under 2TB. It's easiest to point rsync at it and let it rip, rather than having to carefully stage mountpoints to look like the file tree. [Fri Dec 12 21:37:54 2014] Though of course we could. [Fri Dec 12 21:38:06 2014] I wonder if you use vos backup to do what you want instead and just pipe vos backup to vos restore using the -offline option [Fri Dec 12 21:38:45 2014] Oh, interesting idea... use the backup slot rather than taking a clone. [Fri Dec 12 21:38:50 2014] yeah [Fri Dec 12 21:38:58 2014] It is true that our mirror volumes do not use their BK slot... [Fri Dec 12 21:39:01 2014] I like the way you think. [Fri Dec 12 21:39:09 2014] * scamper off to play with more fire [Fri Dec 12 21:39:14 2014] also, I'm not sure that AFS is really the best option for this use case. [Fri Dec 12 21:39:33 2014] if you just need to sync a mirror, why not just use local storage on the two servers? [Fri Dec 12 21:40:21 2014] Well, we also use 'vos release' to replicate user homedirs (again, not using BK) off of our historically-funny storage stack onto a much more mundane host. [Fri Dec 12 21:40:55 2014] yeah, that sounds safer than using vos shadow [Fri Dec 12 21:41:13 2014] another option you might want to look into is replication below AFS at the block level if you really just want a replicated server [Fri Dec 12 21:41:14 2014] But as far as mirrors go, our web proxy and all our sources.list and so on point to the one path in AFS. [Fri Dec 12 21:57:54 2014] Well that's full of excitement. It generally seems to do the right thing now, but I got an invalid cross-device link error back from the server at one point. [Fri Dec 12 21:59:09 2014] are you sure that the underlying storage is error free? [Fri Dec 12 21:59:21 2014] Well, reasonably so, since it's ZFS. [Fri Dec 12 21:59:33 2014] I am sure I had put something in a dumb state. [Fri Dec 12 22:00:14 2014] Am I correct in remembering that the UNIX clients do not pay attention to the new/old replica site distinction? [Fri Dec 12 22:01:07 2014] I am not sure on that [Mon Dec 15 05:04:08 2014] My openafs instance constantly blocks. Every ls or cd effectively kills the terminal. Volumes constantly need salvaging and i have absolutely no idea why. [Mon Dec 15 05:05:12 2014] anything in server logs? [Mon Dec 15 05:05:18 2014] anything in dmesg on the server? [Mon Dec 15 05:05:34 2014] constant salvaging gives a hint about storage issues on the server [Mon Dec 15 05:11:23 2014] a lot of logs like this: "afs: Waiting for busy volume 536871026 (XXXXX) in cell XXXXXX" [Mon Dec 15 05:12:34 2014] also rebooting the server did not work because afs did not terminate [Mon Dec 15 05:12:46 2014] even after SIGKILL [Mon Dec 15 05:15:34 2014] busy volume looks like some kind of relase waiting? [Mon Dec 15 05:15:47 2014] did you release the volume? [Mon Dec 15 05:18:23 2014] we are not using ro volumes [Mon Dec 15 05:19:09 2014] so the volumes were busy with anything else. Read the logs (FileLog and others) to get a notice abotu whats wrong [Mon Dec 15 05:19:29 2014] are you sure, your filesystem on which the /vicepX lies on, is error free and works fine? [Mon Dec 15 05:23:26 2014] this occurs really often in FileLog. google does not know it: "nUsers == 0, but header not on LRU" [Mon Dec 15 05:25:48 2014] hm, I am not so much into OpenAFS internals [Mon Dec 15 05:26:19 2014] in your case I would do run bos salvage two times across the partitions and see if the voluems are online after [Mon Dec 15 05:26:34 2014] and maybe a fsck on the partitions [Mon Dec 15 05:26:52 2014] i'll try that [Mon Dec 15 05:26:53 2014] I´ve seen enough faulty SATA cables the last years [Mon Dec 15 05:30:47 2014] fs is faulty... [Mon Dec 15 05:31:08 2014] fsck revealed several orphaned inodes [Mon Dec 15 05:31:43 2014] thats quite a good explanation of OpoenAFS being faulty, if the underlying data is faulkty^^ [Mon Dec 15 05:34:16 2014] damn... seems like the xen system just killed the server when we shut down for maintenance [Mon Dec 15 05:34:46 2014] because it took too long to shutdown properly [Mon Dec 15 05:48:37 2014] it works again... thank you very much Amiga4000 :) [Mon Dec 15 05:54:50 2014] I just asked some questions ;-) [Tue Dec 16 10:58:28 2014] Hey all, is there a an eta for the OSX 10.10 version of openafs? [Tue Dec 16 10:59:23 2014] You mean an installer, or the code working if you build it yourself? (I thought the latter already was true.) [Tue Dec 16 11:03:39 2014] the installer, I am not in a huge rush so if the installer was on the horizon, I would rather use that [Tue Dec 16 11:04:19 2014] bitch at apple or wait for someone to decide to produce a signed kext for public use [Tue Dec 16 11:05:23 2014] ah [Tue Dec 16 11:05:27 2014] (this involves someone deciding to take on risk. IOW the problem is not technical, it's "political") [Tue Dec 16 11:05:39 2014] I understand [Tue Dec 16 11:06:26 2014] I thought someone had already succeeded at getting approval from Apple to sign the kext, but would have to check the -info archives (which I'm not going to do right now). [Tue Dec 16 11:06:31 2014] I am aware of one site which has a suitable signing key and another that intends to, but do not know if they intend to produce a public package [Tue Dec 16 11:12:26 2014] you can get around the signed kext issue [Tue Dec 16 11:12:37 2014] and you can edit the mavericks installer to make it work [Tue Dec 16 11:12:50 2014] for the former, I think it's add kext-dev-mode=1 to your boot args [Tue Dec 16 11:13:03 2014] for the latter, just edit the versioncheck binary to return [Tue Dec 16 11:13:09 2014] not user friendly, but works [Tue Dec 16 11:13:25 2014] not sure who's supposed to be packaging up OAFS for Mac, tho [Tue Dec 16 11:13:34 2014] It appears that I need to put in an apple id to get xcode to build on my own. I need to talk to the macbooks owner for that. [Tue Dec 16 11:14:24 2014] I don't think an apple dev id is tied to a computer [Tue Dec 16 11:14:46 2014] when I went to the xcode site, it sent to the app store [Tue Dec 16 11:14:50 2014] it's not. I use my personal apple id with my work-issued macbook pro [Tue Dec 16 11:15:41 2014] even though to do anything I need to auth locally as the work-owned administrator account instead of my personal account [Tue Dec 16 11:23:06 2014] k. xcode might not require an apple id anymore [Tue Dec 16 11:23:08 2014] it used to [Tue Dec 16 11:23:12 2014] certainly older versions do [Tue Dec 16 11:23:22 2014] and you'll need to get some older pieces to be able to make the package [Tue Dec 16 11:23:30 2014] so, in the end you'll still need an apple id [Tue Dec 16 11:25:39 2014] xcode sent me to the app store which asked for an apple id, if the app store won't lock down the machine, then I can use one of ours but since this isn't a machine owned by our group, I don't want to lock it down by accident. I am not the mac guy so I am not caught up on all this. [Tue Dec 16 11:27:23 2014] oh right, app store does req an apple id [Tue Dec 16 11:27:48 2014] yeah, guy who owns the machine should use his appleid to grab xcode from the app store [Tue Dec 16 11:27:52 2014] and the commandline dev tools as well [Tue Dec 16 11:28:29 2014] ok, that's what I figured. I will have to get it contact him. [Tue Dec 16 11:34:23 2014] hi guys [Tue Dec 16 11:34:43 2014] Have located what I guess is a bug in the openafs client for centos 7. [Tue Dec 16 11:35:56 2014] I enter my info in CellServDB and save it. It's good and available for that time. But if I restart the openafs-client.service, it gets overwritten and my information isn't there. [Tue Dec 16 11:36:09 2014] You're supposed to modify CellServDB.local [Tue Dec 16 11:36:17 2014] Or maybe CellServDB.dist in some cases. [Tue Dec 16 11:37:49 2014] See https://github.com/openafs/openafs/blob/openafs-stable-1_6_x/src/packaging/RedHat/openafs-client.service#L10 [Tue Dec 16 11:37:50 2014] kaduk_: Ok, makes sense. This is different from either debian, gentoo or win7. [Tue Dec 16 11:38:12 2014] Yes, it is. Apparently it's traditional for OpenAFS on RPM systems, though. *shrug* [Tue Dec 16 11:38:38 2014] kaduk_: Right. What's the point, eh? [Tue Dec 16 11:38:47 2014] kaduk_: Thanks thought [Tue Dec 16 11:38:49 2014] Breaking things is hard, I guess. [Tue Dec 16 11:39:00 2014] s/thought/though/ [Tue Dec 16 14:24:04 2014] the instructions for building openafs on a mac refer to mountian lion, are they the same through yosemite? [Tue Dec 16 15:58:35 2014] yes, those mountain lion instructions are mostly what you want [Tue Dec 16 15:59:12 2014] also see: http://blog.gmane.org/gmane.comp.file-systems.openafs.darwin [Tue Dec 16 15:59:35 2014] I think you'll also need a few things from macports [Tue Dec 16 16:47:46 2014] RedFyre: Thanks [Tue Dec 16 16:58:14 2014] yep [Wed Dec 17 14:59:01 2014] if a file server has no volumes on it, I can shut it off without removing it from the vldb, right? [Wed Dec 17 14:59:49 2014] yes [Wed Dec 17 15:00:23 2014] thought so, we have a chilled water outage and I am trying to cool our machine room down but I am not usually the afs server manager so I wanted to double check. [Wed Dec 17 15:00:50 2014] well. were any of the volumes used primarily r/w? [Wed Dec 17 15:01:19 2014] there were no volumes on this particular server [Wed Dec 17 15:01:39 2014] ok [Wed Dec 17 15:01:58 2014] I am not sure why it is empty, it may have been recently rebuilt and not repopulated. [Wed Dec 17 15:02:30 2014] (there's a bug in older unix clients that causes them to not time out r/w volume references and look up their current location, so they'll keep hammering the original server forever) [Wed Dec 17 15:05:34 2014] ah ok, yeah, we are using at least 1.6.5 most places (1.4.15 is on a handful) because of moving to rxkad. [Wed Dec 17 15:54:56 2014] <[gorgo]> you mean rxkad-k5 [Wed Dec 17 15:55:18 2014] yes, sorry [Wed Dec 17 18:01:05 2014] a 64 bit compile on a 64 bit os should not be using open64(), should it? [Wed Dec 17 18:10:38 2014] (trying to compile servers on smartos, whee). [Wed Dec 17 18:15:56 2014] is there any way to see the status of a vos move, google isn't helping me and this move has taken waaay longer than expected, I did two other partitions worth of data and this one volume is still shown as moving. [Wed Dec 17 18:17:23 2014] if you didn't run it -verbose, you may be able to glean something from rxdebug on the source and destination fileserver ports (7000) [Wed Dec 17 18:18:44 2014] -verbose is on [Wed Dec 17 18:19:17 2014] then I think the rxdebug bit is all you get [Wed Dec 17 18:19:23 2014] http://pastebin.com/5PaQCynU [Wed Dec 17 18:19:41 2014] I don't think we're set up to use SIGINFO even if Linux supported it [Wed Dec 17 18:19:57 2014] did you check the size of this volume, btw? [Wed Dec 17 18:20:11 2014] it is 16G [Wed Dec 17 18:20:42 2014] that will take a while, then. Rx isn't a speed demon, and if you haven't done any UDP or fileserver tuning then it'll be even slower [Wed Dec 17 18:21:07 2014] I moved 173G worth of volumes from another server/partion in less time lol [Wed Dec 17 18:23:01 2014] is this slowness specific to this server, then? I'd check for network errors in that case. (netstat -i, for starters) [Wed Dec 17 18:28:35 2014] six errors and no drops, top process is davolserver and the load is pretty low on the machine. It is dogging it though through ssh. I bet the heat has gotten to the old guy. [Thu Dec 18 11:20:34 2014] has anyone seen this before? still having problems with open64(): https://gist.github.com/natefoo/d371279b30c2c0276118 [Thu Dec 18 11:21:16 2014] doing some testing, functions using open64() link just fine. [Thu Dec 18 11:27:07 2014] It doesn't look familiar, no. [Thu Dec 18 11:30:13 2014] it's pretty strange. [Thu Dec 18 11:30:21 2014] Neat, http://wiki.openafs.org/AFSLore/WindowsEndUserQuickStartGuide is a 404. [Thu Dec 18 11:30:47 2014] natefoo: I assume you're getting this even from a clean build from a clean tree? [Thu Dec 18 11:31:13 2014] PMT: I would expect the windows release notes to be a better resource, from what I remember. [Thu Dec 18 11:31:53 2014] kaduk_: no, this is probably caused by me. [Thu Dec 18 11:32:13 2014] trying to compile (servers only) on smartos. [Thu Dec 18 11:32:22 2014] kaduk_: yeah, I'm reading those now. I'm just entertained/. [Thu Dec 18 11:32:31 2014] so i've had to hack up the compiler stuff since sun* assumes studio. [Thu Dec 18 11:32:44 2014] natefoo: ooh, illumos. [Thu Dec 18 11:33:09 2014] based mostly on http://gerrit.openafs.org/#change,11132 [Thu Dec 18 11:33:32 2014] PMT, the AFSLore subtree is a mirror of an old wiki, IIRC. the windows client has been completely redone since then [Thu Dec 18 11:33:45 2014] Yeah, I've got the 1.7 release installed, it's true. [Thu Dec 18 11:33:53 2014] Also, nice ttuttle-vintage shell. :) [Thu Dec 18 12:03:50 2014] I've a dafs which went down (and will stay down for a few days) in the middle of a release to it. My clients just use another RO volume, which is great. However, subsequent "vos release" attempts fail on the downed host, and so I can't update the other RO. [Thu Dec 18 12:04:48 2014] Any way to mark the downed host as "down" or something, so "vos release" skips it? What's the right way to handle something like this? [Thu Dec 18 12:05:38 2014] I should mention that this downed server had other RO volumes that weren't in the middle of releases, and releases of those volumes don't seem to suffer from this problem [Thu Dec 18 12:06:45 2014] presumably those are no longer registered in the VLDB either [Thu Dec 18 12:08:32 2014] those other volumes end up listed in listvldb as just "Old release" [Thu Dec 18 12:09:22 2014] then it's not releasing fully to those volumes [Thu Dec 18 12:10:24 2014] well the volumes that are still up show "New release", but the volumes that are on the server that is down show "Old release" [Thu Dec 18 12:10:34 2014] (not sure if that necessarily contradicts what you said) [Thu Dec 18 12:13:13 2014] maybe this is just some artifact of whatever arbitrary order vos release tries to the RO volumes in [Thu Dec 18 12:19:52 2014] right now doing a release on the interrupted volume and other volumes have similar-looking output [Thu Dec 18 12:21:06 2014] the difference is that I actually observe changes in the remaining RO mounts for the volumes that weren't interrupted [Thu Dec 18 13:15:54 2014] well, i get the same problem when compiling with roughly the same modifications to 1.6.10 on solaris 10. so it's not illumos-specific. [Thu Dec 18 13:22:15 2014] natefoo: what's the problem you're having? [Thu Dec 18 14:33:58 2014] <[gorgo]> dezgot: you can remsite the unreachable site, then it can release to all the others without issues [Thu Dec 18 14:46:05 2014] PMT: https://gist.github.com/natefoo/d371279b30c2c0276118 [Thu Dec 18 14:47:00 2014] switching compilation on sunx86 from sun studio to gcc. [Thu Dec 18 14:49:24 2014] natefoo: -lproc help? [Thu Dec 18 14:49:55 2014] hm [Thu Dec 18 14:51:19 2014] natefoo: #illumos suggests you want #include [Thu Dec 18 14:54:17 2014] yeah, i was thinking about that. [Thu Dec 18 14:54:19 2014] also man open.2 knew that, apparently. :) [Thu Dec 18 14:54:58 2014] including fcntl.h causes O_LARGEFILE to be defined. [Thu Dec 18 14:55:43 2014] And you expect it to be undefined? [Thu Dec 18 14:56:44 2014] actually, it should already be defined. since vutil.c is using open64. [Thu Dec 18 14:57:09 2014] so what's the problem? is it spitting out "oops O_LARGEFILE redefined" or [Thu Dec 18 14:57:24 2014] no, no warnings either way. [Thu Dec 18 14:57:42 2014] and with fcntl.h it works, or no? [Thu Dec 18 14:58:03 2014] yes. [Thu Dec 18 14:58:10 2014] excellent. [Thu Dec 18 14:58:17 2014] this will need to be done for almost every source file though. [Thu Dec 18 14:58:34 2014] natefoo: put it in a common #include file? [Thu Dec 18 14:58:35 2014] and i don't understand how others before me got it to compile on solaris with gcc. [Thu Dec 18 14:59:42 2014] do you have evidence that they did? [Thu Dec 18 15:00:07 2014] yes. [Thu Dec 18 15:00:27 2014] I'd be curious to know what it is. [Thu Dec 18 15:00:31 2014] http://gerrit.openafs.org/#change,10464 [Thu Dec 18 15:00:37 2014] http://gerrit.openafs.org/#change,11132 [Thu Dec 18 15:00:55 2014] huh, a year ago. [Thu Dec 18 15:01:03 2014] https://lists.openafs.org/pipermail/openafs-devel/2014-May/019877.html [Thu Dec 18 15:02:35 2014] neither are 1.6 though so who knows. [Thu Dec 18 15:03:41 2014] those problems should be fixable at least [Thu Dec 18 15:04:21 2014] natefoo: you could also forcibly do the open64 definition in the compile line if you wanted to be a monster. [Thu Dec 18 15:04:40 2014] D= [Thu Dec 18 15:05:00 2014] natefoo: if you'd like to keep going back and forth about getting shit working on illumos, i have both an install base, some limited experience, and time. :) [Thu Dec 18 15:05:29 2014] yeah, i've got lots of smartos to play with. [Thu Dec 18 15:05:33 2014] [also even English, despite using "both" to modify 3 things] [Thu Dec 18 15:05:36 2014] i used to be good at compiling things on solaris too. [Thu Dec 18 15:08:23 2014] So how's it broken now? [Thu Dec 18 15:08:44 2014] oh, right, you said it wants #includes in many places. I'm surprised you can't put a common include definition in. [Thu Dec 18 15:09:05 2014] yeah, trying that out now. [Thu Dec 18 15:09:21 2014] i'm still bothered that it worked for others without this though. [Thu Dec 18 15:09:35 2014] guess i could try the master branch. [Thu Dec 18 15:10:17 2014] what're you trying? 1.6.10 stock or ? [Thu Dec 18 15:10:27 2014] yeah 1.6.10 stock. [Thu Dec 18 15:15:28 2014] so yeah, add to src/config/afsconfig.h and it compiles. and now i feel dirty. [Thu Dec 18 15:16:36 2014] Master includes roken.h in pretty much every compilation unit. And roken.h includes fcntl.h if its enabled. [Thu Dec 18 15:16:43 2014] So I suspect this "just works" on master. [Thu Dec 18 15:17:03 2014] validation! thanks sxw. [Thu Dec 18 15:19:09 2014] shrug. [Thu Dec 18 15:19:54 2014] build complete, now to test it out. [Thu Dec 18 19:29:12 2014] https://github.com/blog/1938-vulnerability-announced-update-your-git-clients [Thu Dec 18 21:00:22 2014] [gorgo]: if I did that and then re-added the site later, would it have to transfer the whole volume to that site again? (The link is slow enough that this is something I really need to avoid.) [Fri Dec 19 02:18:13 2014] <[gorgo]> dezgot: not sure, but I think not. [Fri Dec 19 10:30:34 2014] it appears that my scripted moving of volumes had a few that failed to move due to either auth expiration or "VOLSER: Problems encountered in doing the dump" and the volumes continued to be locked. Should I just be able to use vos onlockvldb to safely unlock them? [Fri Dec 19 10:45:08 2014] unlock the db and then you may need to wait for a transaction timeout (roughly 10 minutes) or vos endtrans (make sure no other transactions are current on the volserver first!) [Fri Dec 19 10:56:49 2014] it has been 10 minutes, I have other transactions running though so I will have to wait on that [Fri Dec 19 11:20:24 2014] (not surprisingly), fileserver on smartos works. [Fri Dec 19 13:31:28 2014] natefoo: hurrah! [Fri Dec 19 15:28:58 2014] hm [Fri Dec 19 15:29:11 2014] http://i.imgur.com/olyLlZB.png seems like a sub-ideal thing to get out of NIM [Fri Dec 19 15:33:54 2014] hm, is user secureendpoints someone I should ask about NIM misbehaving in strange ways? [Fri Dec 19 15:35:25 2014] Because I just got an assertion failure out of NIM and it died. [Fri Dec 19 15:36:45 2014] http://i.imgur.com/jsz1gLD.png seems sub-optimal at best. [Fri Dec 19 15:53:29 2014] oh neat, and if I move the keystore file aside to see if it's a problem that's persisting in that, NIM just assertion fails and dies on startup. [Fri Dec 19 15:54:56 2014] wow, and it even persists after an uninstall and reinstall of NIM. [Fri Dec 19 15:58:21 2014] neat, the contents of HKLM\Software\MIT was toxic. [Fri Dec 19 15:58:54 2014] Did you save the contents? [Fri Dec 19 15:58:58 2014] now it "just" refuses to make a new identity because no plugins are loaded. [Fri Dec 19 15:59:00 2014] yes, of course. [Fri Dec 19 15:59:45 2014] used procmon to find out where it was looking [Fri Dec 19 15:59:53 2014] then moved the registry entries aside to see what failed [Fri Dec 19 16:00:21 2014] oh wow [Fri Dec 19 16:00:28 2014] and if i uninstall NIM and reinstall it after moving that aside [Fri Dec 19 16:00:30 2014] it still bombs [Fri Dec 19 16:00:34 2014] phenomenal [Fri Dec 19 16:01:24 2014] perhaps i should just burn the computer and soak it in holy water [Fri Dec 19 16:03:37 2014] ah, HKCU\Software\MIT also needed to be moved aside [Fri Dec 19 16:03:47 2014] with both of those moved, starting it works correctly nwo. [Fri Dec 19 16:05:29 2014] oh wow [Fri Dec 19 16:05:32 2014] this is even nicer [Fri Dec 19 17:45:00 2014] pmt - which verson of NiM are you using? [Fri Dec 19 17:47:22 2014] CybrFyre: www.secure-endpoints.com/netidmgr/v2/ offers me "2.0.102.907" for 64-bit [Fri Dec 19 17:47:30 2014] which is what i'm running [Sat Dec 20 13:05:16 2014] hi guys [Sat Dec 20 13:05:58 2014] I just emerged and built kernel 3.16.5 on gentoo, and rebuilt the openafs modules. Still some kind of a bug. [Sat Dec 20 13:06:55 2014] I get a kerberos ticket and openafs token, and can get in my cell, but when I try to use/read/cat, whatever to a file, it's killed. [Sat Dec 20 13:07:15 2014] I've told them in #gentoo, but thought I'd mention it here as well. [Sat Dec 20 19:27:26 2014] I just emerged and built kernel 3.16.5 on gentoo, and rebuilt the openafs modules. Still some kind of a bug. [Sat Dec 20 19:27:36 2014] I get a kerberos ticket and openafs token, and can get in my cell, but when I try to use/read/cat, whatever to a file, it's killed. [Sun Dec 21 01:32:10 2014] I am testing with openafs and yesterday I felt brave enough to move over my ~ from /home/username to /afs/mycell.org/username. But now I'm getting unexpected behaviour. Certain applications won't start. Eg. cmus complains about bind not being permitted a certain operation and when I start weechat it says it couldn't create what I think is a unix socket in my home directory. Anyone an idea what might be going [Sun Dec 21 01:32:12 2014] on? [Sun Dec 21 01:33:34 2014] Oh yes, I'm running on debian, server and client. The client is a diskless client booted from an NFS export (I know, I'm not brave enough to try OpenAFS just yet :-)) [Sun Dec 21 02:09:59 2014] The problem is directly related to UNIX sockets. If I start cmus as follows: "cmus --listen /tmp/cmus.socket" it does start. [Sun Dec 21 04:31:25 2014] <[gorgo]> bucovaina: yes, afs does not support unix domain sockets [Wed Dec 24 13:17:41 2014] how do I get the version of openafs that is running? [Wed Dec 24 13:22:57 2014] rxdebug localhost 7001 -version [Wed Dec 24 13:23:38 2014] geekosaur: And this *is* a question you've answered before. Thanks again. [Wed Dec 24 13:25:11 2014] geekosaur: I see I'm running 1.6.9 on gentoo (built by me from an overlay). I wonder if 1.6.10 did anything about the bug that won't allow 1.6.9 to be functional with kernel 3.16.5? [Wed Dec 24 13:25:56 2014] I do not know, sorry [Wed Dec 24 13:25:57 2014] I'm running 3.14.16, so openafs works correctly. [Wed Dec 24 13:26:04 2014] geekosaur: np [Wed Dec 24 13:26:48 2014] Haven't had any action from the dev for openafs at gentoo for a while. [Fri Dec 26 14:18:35 2014] <_1_sahil> hi [Sat Dec 27 19:45:38 2014] does there exist an android openafs client? I could not find one, but perhaps my googlefu is weak. [Sat Dec 27 19:53:03 2014] nope, and none planned that I'm aware of; it's difficult to do without one of (a) rooting (b) putting the actual openafs client elsewhere and accessing it via an HTTP REST API (c) writing a completely new client that works more like e.g. the Android Dropbox client [Sat Dec 27 19:53:49 2014] I take it that people do (a) and just don't bother telling the world about it. [Sat Dec 27 19:53:53 2014] (b) has no appeal to AFS users, (c) would require a significant time commitment from a (probably paid) developer, (a) limits the audience too much [Sat Dec 27 19:54:53 2014] enh, it's not like iOS, rooting isn't discouraged. but you'd still have to get the client working in an Android environment, and that's moderately difficult. [Sat Dec 27 19:55:40 2014] the real point is, it's a lot of work and nobody has stepped forward to do it [Sat Dec 27 19:56:05 2014] ok, thanks. [Sat Dec 27 19:57:03 2014] and at least one commercial entity in the openafs arena has explicitly said it *won't* do it [Mon Jan 5 09:31:23 2015] Happy New Year to all [Mon Jan 5 09:32:28 2015] HNY! [Mon Jan 5 12:28:20 2015] does anyone know if I can use vos restore/dump to create a new RO site? When I try to "vos restore -readonly" I'm warned that "Volume exists and no -overwrite option specified; Aborting restore command" [Mon Jan 5 12:29:21 2015] (but of course there is no RO/RW instance of that volume on the target machine yet) [Mon Jan 5 12:29:35 2015] that would not add it to the vldb properly, no. why do you want to do it that way instead of addsite/release? [Mon Jan 5 12:30:11 2015] only because UDP is not very fast over my networks [Mon Jan 5 12:30:53 2015] vos release will only go at like ~20 mbps, but I can do pretty much gigabit via TCP [Mon Jan 5 12:32:18 2015] or at least 10x faster, I guess [Mon Jan 5 12:32:22 2015] this is over WAN [Mon Jan 5 13:06:35 2015] bah! [Mon Jan 5 13:07:58 2015] Hmm, that reminds me, I was going to ask secureendpoints if he remembered the numbers for how much of the true MTU was unused by rx due to the bogus handling we were looking at recently. [Mon Jan 5 13:08:31 2015] I am testing this case: old fileserver goes down, restore to new fileserver, 'vos syncvldb' of new fileserver to register the volumes with a different fileserver id; but the command keeps trying to contact the old fileserver, obviously to double check, but the old fileserver is down... [Mon Jan 5 13:09:08 2015] Walex: does the new fileserver have all the volumes that the old fileserver had? [Mon Jan 5 13:11:52 2015] kaduk_: yes indeed [Mon Jan 5 13:12:14 2015] kaduk_: it ought to timeout and be safe I guess [Mon Jan 5 13:12:40 2015] kaduk_: It takes a long time to timout, just boring. [Mon Jan 5 13:14:17 2015] The quick-'n'-easy way to do this is to copy or otherwise replace the fileserver's UUID to the new machine, so that when it registers with the vldb, it will just update the old entries. [Mon Jan 5 13:22:39 2015] kaduk_: ahhhhhhhhh if the old fileserver is available, which should be in the VLDB. But not sure whether it is desirable in our case as we want to switch back as soon as the old fileserver is back (the new fileserver is "closer" to the clients). [Mon Jan 5 13:23:06 2015] flipping server IDs might be error prone... [Mon Jan 5 13:23:42 2015] I guess I'll bear with the timeouts rather than risk that. But good point for a full restore. [Mon Jan 5 13:23:44 2015] I think there are some reversed logical senses in there, but I think I understand the meaning. [Mon Jan 5 13:24:31 2015] Yeah, it's risky if there will be data changes and the old fileserver might come back without warning. [Mon Jan 5 13:35:37 2015] uhm, I am devastated to report that the long timeout ends and re-registration fails... seems like a bug to me: http://paste.debian.net/139296/ [Mon Jan 5 13:42:52 2015] I am afraid this means I shall have to 'vos changeloc' each volume. [Mon Jan 5 13:47:02 2015] me is sad :-( [Mon Jan 5 13:47:15 2015] * :-( [Mon Jan 5 13:47:32 2015] oh well, after all that's what the command is for [Mon Jan 5 14:04:17 2015] oh too bad, actually 'vos changeloc' despite the manual page saying "without needing to contact the original file server" does contact it. That seems a bug more than 'vos syncvldb' doing it. [Mon Jan 5 14:06:11 2015] next thing I try is to 'vos delentry' followed by 'vos syncvldb', should also be quicker as 'delentry' allows specifying a whole server or partition instead of volume-by-volume. [Mon Jan 5 14:12:46 2015] actually the above is incorrect: in my test cell the failure was due to the failed system also being a DB server and then quorum was lost. [Mon Jan 5 14:15:26 2015] and I think that rather than 'vos changeloc' of each volume, followed by 'vos syncvldb' of the relevant partitions it is rather quicker to do 'vos delentry' of the relevant partitions/server and then 'vos syncvldb' of the same. [Mon Jan 5 14:16:03 2015] Walex: vos syncvldb must contact the original vol server to perform conflict detection. The command logic doesn't have any knowledge of which servers might be unreachable from the location the command is being executed or those servers that are permanently gone. [Mon Jan 5 14:20:14 2015] secureendpoints: I guess that's why 'vos changeloc' exists. [Mon Jan 5 14:20:45 2015] Walex: if the server or partition is permanently gone, then yes, you either need to changeloc each volume as you restore it (the preferred way) or remove the partition(s) with delentry and then sync*. The changeloc method is preferred because it leave a history of what volumes were not restored in the VLDB. [Mon Jan 5 14:21:09 2015] secureendpoints: but then 'vos changeloc' could be subsumed in 'vos syncvldb' is the latter has a '-chaŋgeloc option' [Mon Jan 5 14:21:13 2015] All of the commands that work solely against the VLDB are there for disaster recovery [Mon Jan 5 14:21:30 2015] secureendpoints: ahhhh that is enlightening. [Mon Jan 5 14:21:57 2015] they should never be used otherwise [Mon Jan 5 14:22:22 2015] secureendpoints: different topic, but do you remember the numbers for the MTU (non-)usage from your experimentation? [Mon Jan 5 14:22:39 2015] I believe 14 octets but I could be wrong [Mon Jan 5 14:22:56 2015] in my particularl case I am interested in a partition with a half a dozen volumes, so the difference between 'vos changeloc' and 'vos delentry' is not big in terms of convenience. [Mon Jan 5 14:23:12 2015] sounds plausible. "Not exceptionally exciting, then." [Mon Jan 5 14:24:11 2015] secureendpoints: 'man vos_changeloc' recommends running 'vos syncvldb' afterwards, but in my (minimal) testing it looks like unnecessary because the VLDB entry indeed is already updated. Am I missing something? [Mon Jan 5 14:24:16 2015] kaduk_: not exciting but 1% adds up over time [Mon Jan 5 14:24:32 2015] secureendpoints: *nods* [Thu Jan 8 12:26:30 2015] i don't believe that the work adam mentions here ever materialized: http://gerrit.openafs.org/#change,10464 [Thu Jan 8 12:26:49 2015] getting userspace building on illumos would be fairly trivial - would code to support this be accepted? [Thu Jan 8 12:29:07 2015] Probably [Thu Jan 8 12:30:31 2015] sweet. any chance it could make it back to 1.6? [Thu Jan 8 12:32:20 2015] If it doesn't touch other platforms, possibly. But, we're looking at branching a 1.8 soon, so fewer things will get pulled back to 1.6 at that point [Thu Jan 8 12:34:23 2015] okay. [Thu Jan 8 13:03:24 2015] natefoo: I poked Adam elsewhere, and if we remind him in a couple weeks he will try to pull out his changes from his (running) machine. There's a big event coming up, so he doesn't have much time right now. [Thu Jan 8 13:06:01 2015] ooh. [Thu Jan 8 13:06:10 2015] thanks, i appreciate it. [Thu Jan 8 13:06:22 2015] He also said that the userspace bits were ~trivial makefile changes. [Thu Jan 8 13:06:36 2015] So, it would probably not be the end of the world if you re-implemented those if you want them sooner. [Thu Jan 8 13:07:12 2015] they are, i have a hacky version here: https://github.com/natefoo/openafs/tree/openafs-stable-1_6_x-illumos [Thu Jan 8 13:07:33 2015] hacky in that it breaks Solaris(tm) and doesn't support 32 bit. [Thu Jan 8 13:07:42 2015] just "make it build" [Thu Jan 8 13:07:53 2015] Ah. Not ready for gerrit, then. [Thu Jan 8 13:08:15 2015] nope, but i could clean it up if it might make it in. [Thu Jan 8 13:13:36 2015] I would recommend making the changes for master so they are in 1.8. there are significant build differences between 1.6 and master. [Thu Jan 8 13:14:41 2015] yeah, i found that - the changes have actually been made twice, once by chas in the linked gerrit review, and once by coy hile: http://gerrit.openafs.org/#change,11132 [Thu Jan 8 13:15:04 2015] but there were quite a few changes back to 1.6. [Thu Jan 8 17:26:59 2015] * just found out about mogilefs [Thu Jan 8 18:01:56 2015] what's that? [Thu Jan 8 18:03:52 2015] "MogileFS is our open source distributed filesystem." [Thu Jan 8 18:04:02 2015] https://code.google.com/p/mogilefs/ [Thu Jan 8 18:05:01 2015] it appears to be application level, so I don't consider it a true file system [Thu Jan 8 18:25:10 2015] yep [Fri Jan 9 09:58:44 2015] my rsync is failing to copy over links(maybe hard, maybe soft) to an openafs volume. does openafs not like links? [Fri Jan 9 10:00:58 2015] chrisb: IBM AFS and OpenAFS do not support cross-directory hard links. AuriStor does. Neither support cross-volume hard links. [Fri Jan 9 10:05:36 2015] secureendpoints: ok, that explains it. i am trying to mirror a debian repo. [Fri Jan 9 11:24:20 2015] chrisb: does a debian repo have cross-directory hardlinks? I believe we have mirrored Ubuntu repos with both apt-mirror and rsync into AFS without problems. [Fri Jan 9 11:26:49 2015] jackhill: i don't know what the links are. i am using the ftpsync script, which calls rsync. apt-mirror worked for me on afs in the past, too [Fri Jan 9 11:27:45 2015] the scripts are here: https://ftp-master.debian.org/git/archvsync.git [Wed Jan 14 11:56:35 2015] nwf: do you have any plans to update gerrit 11349? I see you have a -1 on it, but it wasn't quite clear to me what direction you want to take it in. [Wed Jan 14 11:59:21 2015] It's not been a high priority. If I were to rework it, it'd be to make it conditional on vos's link being encrypted ala Deason's sugestion. [Wed Jan 14 11:59:23 2015] Why? [Wed Jan 14 11:59:58 2015] (secureendpoints also was pretty adamantly against it because rxkad crypto is very slow) [Wed Jan 14 12:00:08 2015] We're approaching branching for 1.8, and this is the sort of change that really only ought to be done on a major release boundary. [Wed Jan 14 12:00:31 2015] I think you may be overreading the strength of secureendpoints' opinion, not that I can speak for him of course. [Wed Jan 14 12:01:06 2015] It's probably not 1.8 material, but I can try to kick out a new version this weekend, if you like. [Wed Jan 14 12:01:33 2015] If you don't mind, that would be nice. [Wed Jan 14 12:02:01 2015] I think there were also some other places where we were considering default-to-encrypted, but would have to pull up email to remember which ones. [Wed Jan 14 12:02:05 2015] nwf: I wasn't objecting to the change. I was predicting that when the change appears many sites are likely to grumble [Wed Jan 14 12:02:54 2015] Fair. Do you think Deason's suggestion, of encrypting only when vos -encrypt has been specified, is also likely to cause grumbling? [Wed Jan 14 12:03:05 2015] lets cut the throughput of volume forward operations to 1/10th their prior speed is going to be painful [Wed Jan 14 12:03:31 2015] that depends on whether the default for vos -encrypt becomes true or not [Wed Jan 14 12:03:45 2015] vos -encrypt is on by default from the windows client [Wed Jan 14 12:04:03 2015] Ah. I'll rework it to include a server flag, off by default. [Wed Jan 14 16:20:26 2015] ahhhh I have been using IPsec recently with happyness: http://www.sabi.co.uk/blog/14-two.html#141211 because it runs at 1Gb/s with AES and authentication on around 10-20% CPU time on single CPU. [Wed Jan 14 23:47:17 2015] do i have to compile sqlite3 in a special way for it to play nicely with openafs? [Wed Jan 14 23:48:24 2015] http://thread.gmane.org/gmane.comp.file-systems.openafs.general/27783/focus=27939 [Thu Jan 15 09:47:47 2015] is there a way for sqlite3 to read and write on openafs volumes? [Thu Jan 15 09:54:46 2015] sqlite3's documentation warns against using it on any network filesystem [Thu Jan 15 09:57:54 2015] ok, but there have been discussions in the past about sqlite on afs: http://thread.gmane.org/gmane.comp.file-systems.openafs.general/27783/focus=27939 [Thu Jan 15 09:58:59 2015] yes, and sqlite3 has of course not changed since then... [Thu Jan 15 09:59:19 2015] you know, stuff like write-ahead logging that is very explicitly ONE HOST ONLY [Thu Jan 15 10:03:31 2015] geekosaur: ok, i was just hoping that megacz solved it, since he proposed patches to sqlite [Thu Jan 15 10:04:04 2015] I am pretty sure sqlite's take on it is that databases should not be stored on network filesystem [Thu Jan 15 10:04:14 2015] especially since their documentation *says* so... [Thu Jan 15 10:05:39 2015] everyone wants cheap-lazy shared databases, ignoring the fact that pretty much nobody implements that because ??? apparently everyone thinks there are no problems trying to do that and the reason nobody *does* it is just to annoy them [Thu Jan 15 10:20:44 2015] geekosaur: ok, i get your point. if you look at that thread from openafs.general though, allbery, dreyer, altman, megacz all seemed to think that it was possible for sqlite to be adapted to work on openafs in April 2010 [Thu Jan 15 10:21:17 2015] yes, I'm one of those named people... [Thu Jan 15 10:21:27 2015] maybe "work" is too strong. "run painfully slow while avoiding near-certainty of data corruption in favor of mere possible data corruption", maybe... [Thu Jan 15 10:21:28 2015] and yes, they got something sort of working back then, with that version of sqlite [Thu Jan 15 10:21:49 2015] and kaduk_ just hit the nail on the head. [Thu Jan 15 10:22:03 2015] geekosaur: wow, i didn't realize who you were, ok [Thu Jan 15 10:22:22 2015] you can make it "work" if you try hard enough. and then discover the hard way the reason why nobody *does*, as I said [Thu Jan 15 10:23:14 2015] but if you absoolutely insist then you can go back to the then current version of sqlite and apply patches. and then find out that *linux* changed since then, and whole file locking is done in a way that is not reliable over network file systems [Thu Jan 15 10:23:56 2015] ok, got it. [Thu Jan 15 11:22:47 2015] geekosaur: "everyone wants cheap-lazy shared databases", "the reason nobody *does* it is just to annoy them": this is called in the recent jargon of system/filesystem people as the "O_PONIES problem": http://lwn.net/Articles/351422/ but I guess you are well aware [Thu Jan 15 11:23:52 2015] yep [Thu Jan 15 11:25:41 2015] chrisb: people like you I call those with a "syntactic" approach: those that think that every possible convenient combination of setups must "just work", and be fast, cheap and resilient. In your case your "syntactic" combination is "fast, cheap, resilient DBMS on shared network storage", and why not? :-) [Thu Jan 15 11:25:52 2015] and it goes back a long way too, I remember the PC database folks trying to make it work way back when PC networking first showed up and everyone was convinced they could make it work... [Thu Jan 15 11:26:21 2015] geekosaur: well, it is a "syntactically" valid combination :-) [Thu Jan 15 11:26:59 2015] * notes that he is an ex-DBA (and does not particularly miss the headaches) [Thu Jan 15 11:28:12 2015] BTW I was recently looking at the Ceph documentation and it makes a point that might be of great interest to AFS sysadms... [Thu Jan 15 11:28:29 2015] the point is somewhat related to this discussion. [Thu Jan 15 11:28:42 2015] "colourless green ideas sleep furiously" :p [Thu Jan 15 11:28:48 2015] syntactically valid... [Thu Jan 15 11:29:30 2015] it is that BTRFS (I guess as a consequence of being COW) can push out metadata and data writes simultaneously, while 'ext4' and XFS cannot. [Thu Jan 15 11:30:21 2015] the result is that for network 'fsync' (for Ceph, but I suspect that applies to any metafilesystem) BTRFS delivers 50% better write rates than 'ext4' or XFS. [Thu Jan 15 11:32:04 2015] hm, I don't think COW has much to do with it; more the matter of keeping them together in a write implies a certain style of either journaling, final storage on disk, or possibly both. [Thu Jan 15 11:32:51 2015] (that, or they are updating separate blocks in a single write via writev() or etc. in which case they are kidding themselves...) [Thu Jan 15 11:34:21 2015] also I think anyone using afs servers over zfs already knows this. the question being whether linux's nih reinvention will be stable enough to trust for production servers in a reasonable timeframe [Thu Jan 15 11:36:54 2015] Walex: yes [Thu Jan 15 11:39:54 2015] geekosaur: i never measured the improvement moving to afs-on-zfs, but i'm glad to know it is even faster on writes [Thu Jan 15 11:48:49 2015] chrisb: you from ed.ac.uk? [Thu Jan 15 11:49:46 2015] geekosaur: I think that they are able to do parallel writes as a consequence of COW: because that makes a failed transaction easily undoable. But it is a guess. [Thu Jan 15 11:51:07 2015] BTW while BTRFS was clearly "inspired" by ZFS, in some respects it is significantly better, e.g. resources required, but also some features. [Thu Jan 15 11:58:48 2015] Walex: ? [Thu Jan 15 12:15:07 2015] chrisb: the "major" people who run AFS on ZFS are at Edinburgh compsci [Thu Jan 15 12:15:21 2015] chrisb: you mentioned running AFS on ZFS... [Thu Jan 15 12:22:32 2015] actually they just *present* there... [Thu Jan 15 12:23:51 2015] http://conferences.inf.ed.ac.uk/eakc2012/slides/AFS_on_Solaris_ZFS.pdf https://blogs.oracle.com/openomics/entry/morgan_stanley_openafs_solaris_11 [Thu Jan 15 12:26:47 2015] yes, the edinburgh workshop, but I thought that the AFS-on-ZFS Linux based cell was done by Simon Wilkinson or Stephen Quinner at inf.ed.ac.uk [Thu Jan 15 12:27:37 2015] I remember that is was specifically on Linux, while the links above are for Solaris [Thu Jan 15 12:27:46 2015] oh, wouldn't know about linux zfs aside from linux is allergic to zfs's license :p [Thu Jan 15 12:28:20 2015] geekosaur: it is the same (a bit older), you just download the binary from a different repo [Thu Jan 15 12:30:35 2015] doing some web search UMich also did it [Thu Jan 15 12:31:29 2015] ooops, disappearing. [Thu Jan 15 12:32:04 2015] the operative work in that last being "did", as I understand it... [Thu Jan 15 12:39:21 2015] * moved from afs on ext3/linux to consolidate everyting on afs-on-zfs/freebsd [Fri Jan 16 17:04:25 2015] I finally looked at redoing 11349; apparently the way to see if a connection is encrypted involves calling rxkad_GetServerInfo; will an additional call to this function complicate the rxgk work? Is there no more generic API for getting rx security information? [Fri Jan 16 17:04:50 2015] kaduk__: secureendpoints1: ^ [Fri Jan 16 17:27:14 2015] There is not currently a more generic API for things other than UserOK. It will not be a noticable additional complication for rxgk if you add such a call. [Sat Jan 17 07:54:13 2015] Hey guys, I am new to AFS and have a small question. How do I configure my identity? I set things up and /afs looks right and I can "open" the desired cell [Sat Jan 17 07:54:29 2015] but obviously I need to somehow tell the server who I am [Sat Jan 17 07:54:40 2015] But no prompt of sorts popped up [Sat Jan 17 08:29:32 2015] rimdeker: you need an account on a kerberos authentication server [Sat Jan 17 08:34:25 2015] treegazer: Assuming I have one, do I configure it in /etc/afs/ or ? [Sat Jan 17 08:35:20 2015] /etc/openafs * [Sat Jan 17 08:42:02 2015] https://wiki.mageia.org/en/Installing_OpenAFS_Client#Configure_Kerberos_client [Sat Jan 17 08:42:08 2015] treegazer: Thank you [Sun Jan 18 23:47:20 2015] anyone maintaining fedora 21 openafs rpms? [Mon Jan 19 00:05:40 2015] hello. I'm sorry if I've missed this part of the documentation. What type of distributedness can I have within a cell? Can there be multiple master nodes? I've seen many references to what appear to be read-only replicas or slaves/hot backups [Mon Jan 19 00:06:01 2015] openafs only does readonly (RO) replication [Mon Jan 19 00:06:31 2015] for the "replicated" RO volume, the RW volume that you "release" to the ROs only lives on one server [Mon Jan 19 00:06:47 2015] there are tricks you can do like promoting a RO to a RW, but those are manual [Mon Jan 19 00:08:17 2015] thanks CybrFyre [Mon Jan 19 00:10:28 2015] one additional question: can cells be nested? for instance, could I have /afs/example.com/site-a.example.com and /afs/example.com/site-b.example.com living on different servers (at their respective sites) [Mon Jan 19 00:10:44 2015] or is it best to set up different cells [Mon Jan 19 00:13:23 2015] I'm not sure I understand what you mean [Mon Jan 19 00:13:33 2015] the part after "/afs" *is* the cell [Mon Jan 19 00:14:00 2015] in both your examples, example.com is the cell, and site-a.example.com and site-b.example.com are just volume mount points in those cells [Mon Jan 19 00:14:09 2015] now, those volumes can live on whatever servers [Mon Jan 19 00:14:20 2015] er, s/in those cells/in that cell/ [Mon Jan 19 00:14:59 2015] Yes, that makes sense. I think I was unclear on what volume meant [Mon Jan 19 00:15:09 2015] thank you again:) [Mon Jan 19 00:15:13 2015] a volume is an entity that lives in cells [Mon Jan 19 00:15:22 2015] you can mount the volume wherever [Mon Jan 19 00:15:44 2015] (so can users, so don't make the misake of depending on security by the path to a volume being restricted) [Mon Jan 19 00:16:08 2015] w.r.t. geographicness of file servers, afs has server prefs which you can set on each client [Mon Jan 19 00:17:06 2015] clients should prefer "local" fileservers over remote ones [Mon Jan 19 00:17:20 2015] http://docs.openafs.org/AdminGuide/HDRWQ414.html [Mon Jan 19 00:17:54 2015] thanks! [Mon Jan 19 00:18:58 2015] I'm still just learning. I think that moving to AFS from a simple NFS mount fits much better with where my organization will be in the next few years, where it'd be much harder to switch, and since I have the opportunity now, I figured I'd explore options [Mon Jan 19 00:19:47 2015] each system has its plusses and minuses, depending on what you're hoping to accomplish [Mon Jan 19 00:19:55 2015] I used to use AFS at the Pittsburgh Supercomputing Center and I as a user never had issues and the admins seemed happy with it (though, they were comparing it to lustr) [Mon Jan 19 00:20:05 2015] and, there are things (such as databases) which should not be run on any network fs, really [Mon Jan 19 00:20:33 2015] coolbeans [Mon Jan 19 00:20:34 2015] :) We're switching to krb5 right now, and in the next 2 years we'll be opening another office [Mon Jan 19 00:21:12 2015] yeah, this is simply for shared home directories and common folders [Mon Jan 19 00:22:06 2015] that should mostly work fine [Mon Jan 19 00:22:46 2015] an distributed dbs are the bane of my existence. ACID's an awesome thing and a more awseome checkbox....but eventual consistency does have it's perks in a distributed encironment [Mon Jan 19 00:22:51 2015] environment * [Mon Jan 19 00:23:25 2015] for our database stuff, we are running postgresql on top of drbd (with linux-ha), so, that drbd/linux-ha takes care of the distributed redundancy [Mon Jan 19 00:24:32 2015] do you have other pg instances running on the replicated machines? [Mon Jan 19 00:24:48 2015] one PG instance with multiple databases [Mon Jan 19 00:25:25 2015] which I think is the way RedHat's PG install likes to work [Mon Jan 19 00:25:53 2015] I mean on the drbd replicates, is there a pg instance waiting to be used or started if needed? [Mon Jan 19 00:26:05 2015] on the secondary server, yes [Mon Jan 19 00:26:29 2015] linux-ha fences the primary server, sets drbd to master on the secondary, mounts the filesystem, then starts up whatever services (including PG) [Mon Jan 19 00:26:49 2015] Gotchya [Mon Jan 19 00:33:57 2015] thank you for chatting with me:) [Mon Jan 19 00:34:07 2015] I must head to bed now! Cheers! [Mon Jan 19 00:39:30 2015] l8r! [Mon Jan 19 09:36:05 2015] CybrFyre: "running postgresql on top of drbd" is a weird combination, considering that Postgres has got a pretty nice replication system builtin. [Mon Jan 19 15:07:47 2015] Walex - that "nice" replication system built in is in newer versions then what comes with RHEL6 [Mon Jan 19 15:08:30 2015] there was other replication stuff with transaction logging I used to use, but the failover (and recovery from) was a PITA [Mon Jan 19 15:08:45 2015] just having the underlying filesystem be replicated is really a lot easier [Mon Jan 19 15:09:25 2015] would be nice if postgres was like bind or dhcp where in the configs you just tell it it's one of a couple of nodes and it takes care of the rest, but, nope [Tue Jan 20 02:18:01 2015] Hi all, I'm doing a stress test on OpenAFS. Things you would never do like overwriting the superblock of one of your (ext4) partitions. Yesterday I tried to do that and then I did a "vos move" to the crippled partition. Obviously, OpenAFS didn't like that but a few reboots and "salvager" commands later it worked again (to my surprise). [Tue Jan 20 02:18:12 2015] After my first stress test it was time for something better :-) [Tue Jan 20 02:19:56 2015] In the first test I moved one volume from one server to another. What I did next is restore the hard disks of /vicepa of both servers to the state of yesterday. Now OpenAFS seems to really disklike it. What I think is happening now is that the volume location database is no longer in sync with what data is effectively on the disks. [Tue Jan 20 02:20:53 2015] So my question. What would you suggest me to do to fix this? Something like salvaging everything? flushing all caches? resynching vldb? [Tue Jan 20 02:43:32 2015] salvage because your backup was presumably done outside of openafs and therefore likely not internally consistent, then vos syncvldb and vos syncserv [Tue Jan 20 02:52:19 2015] geekosaur: Yes indeed, my OpenAFS is backed up with an external mechanism. And yes, your suggestion works: vos syncvldb and vos syncserv made my cell accessible again :-) [Tue Jan 20 03:03:06 2015] Oh anddsdlKDJSck [Tue Jan 20 03:04:52 2015] sorry, my SSH connection got stuch :-) [Tue Jan 20 03:05:33 2015] So what I wanted to ask, is there some way to "thick provision" volumes on a partition? [Tue Jan 20 05:35:56 2015] wannes1: what do you mean by "thick provision"? [Tue Jan 20 05:47:26 2015] sur5r: For example if I say it has to be 1TB that it takes 1TB of actual disk space from the beginning. This to avoid running out of disk space. [Tue Jan 20 05:48:35 2015] sur5r: Suppose you have a 1TB disk with 5 volumes of 200GB which use only 1GB effectively, would you be able to create another volume of 200GB and what would happen if all of them fill up? [Tue Jan 20 05:49:54 2015] I think the way most people handle this is by using volume quotas. When you create volumes, ensure that the quotas you allocate never exceed the available disk space. When you move volumes, do the same. It requires scripting on top of vos. [Tue Jan 20 05:58:26 2015] sxw1: Yes I thought so. I was hoping for some obscure function that would enable moving/creating more volumes if the total quota exceeded the available disk space or something like that. [Tue Jan 20 06:04:15 2015] And yet another question :-). Is there support for spotlight search in OpenAFS? I've been trying but I just can't make it work. [Tue Jan 20 06:05:31 2015] I've used mdimport and mdutil on OSX (Yosemite) but to no avail. [Tue Jan 20 06:13:22 2015] wannes1: I wrote this some years ago: http://blogs.nnev.de/sur5r/index.php?/archives/14-Quota-consistency-check-for-AFS.html [Tue Jan 20 06:25:02 2015] sur5r: wow that's more then I could have hoped for. I will test this ASAP, thanks a lot for this! [Tue Jan 20 06:27:29 2015] it's not particularly pretty but it does the job [Tue Jan 20 06:30:41 2015] sur5r: I'm not particularly good at scripting so this will give me a jumpstart! It'd be nice if I could turn this into a cron job which sends a mail when some of the volumes have more maxquota than disk space. [Tue Jan 20 13:36:19 2015] And just wondering, is there en OpenAFS conference 2015 planned to happen? [Tue Jan 20 13:36:59 2015] On www.openafs.org I see a link to the 2014 conference but no 2015 ;-) [Tue Jan 20 13:37:04 2015] There is no conference planned that I am aware of [Tue Jan 20 14:11:41 2015] not currently, no [Tue Jan 20 14:12:15 2015] wannes1 - make sure you're subscribed to openafs-info and openafs-announce... any news would be posted to one or both of those [Tue Jan 20 14:19:55 2015] Thanks RedFyre, I have done so. Is it an annual conference or on what does it depend when it is organized? Volounteers? [Tue Jan 20 14:20:36 2015] A conference happens when someone steps up to organize one. [Tue Jan 20 14:21:07 2015] but when it happens, it's annual [Tue Jan 20 14:21:23 2015] so, the main conference is usually June-ish [Tue Jan 20 14:21:28 2015] the European conference, I forget when [Tue Jan 20 14:21:59 2015] <[gorgo]> in the past few years the european conference became the main one, as there was no other [Tue Jan 20 14:24:34 2015] <[gorgo]> last one was March 2014, Geneva. previously October 2012, Edinburgh and October 2011, Hamburg [Tue Jan 20 14:24:57 2015] there used to be two conferences. one on each side of the Atlantic. The North American conference is dependent upon finding a hosting organization, some one to provide the necessary insurance and contracting, and organizers. The North American conference used to be held in the late Spring and was paid for entirely with attendee and sponsor fees. The European conference is dependent upon a host organization that has tra [Tue Jan 20 14:25:11 2015] chopped at "that has tra" [Tue Jan 20 14:25:26 2015] ... traditionally paid for the event out of pocket and paid for the travel expenses of the gatekeepers and some other key contributors. Sometimes there are sponsors and sometimes not. The European conference is typically held in the Fall. [Tue Jan 20 14:25:50 2015] The CERN conference was moved to the following Spring due to scheduling conflicts. [Tue Jan 20 14:26:37 2015] Thanks all for your replies, for me personally a European conference sounds very interesting. [Tue Jan 20 14:27:09 2015] But so far there are no concrete plans I understand. [Tue Jan 20 14:28:15 2015] There was one potential site discussed at the CERN conference but that did not pan out and no one has stepped forward to organize. [Tue Jan 20 14:30:23 2015] Running conferences in the U.S. used to be really easy back in 2004 to 2006. That was before Universities determined that hosting conferences could be a profit center. Now its actually cheaper to rent space in a hotel. [Tue Jan 20 14:44:39 2015] I'm fairly certain I can still get rooms at UIUC for minimal cost, but I'm not sure how many people like to fly into CMI as there is only a single airline that flies here [Tue Jan 20 14:46:08 2015] UIUC was the last space that worked well because it is controlled by the department and not the school. [Tue Jan 20 14:46:34 2015] Yeah, I only flew from CMI when someone else was paying for my trip, back when I was a student there. [Tue Jan 20 14:59:50 2015] <[gorgo]> cclausen: iirc there were two airlines back in 2010... [Tue Jan 20 15:09:16 2015] and now there is one [Tue Jan 20 15:11:28 2015] American Eagle is the only airline left. Was United Express the other one in 2010? [Tue Jan 20 15:12:49 2015] <[gorgo]> I believe it was a delta-related airline, flying from detroit [Tue Jan 20 15:13:10 2015] if we knew that a large number of attendees could arrive in a two or three hour window at OHare it might be cheaper to get a bus to drive everyone to UIUC [Tue Jan 20 15:13:22 2015] Delta makes sense [Tue Jan 20 15:15:44 2015] <[gorgo]> well, yes, the flight from ohare to cmi was kinda ridiculuously short [Tue Jan 20 15:16:53 2015] "We will not reach cruising altitude on this flight." Well, not quite, but it feels like it... [Tue Jan 20 15:17:50 2015] I have been on too many short hops where they do not have enough time to pull out the drink cart [Tue Jan 20 15:18:34 2015] The bus ride from CWI to campus is surprisingly long for the size of the city. [Tue Jan 20 15:24:31 2015] <[gorgo]> secureendpoints: if you want to try the shortest hop, look here: http://www.amusingplanet.com/2013/08/worlds-shortest-commercial-flight-is.html [Tue Jan 20 15:25:42 2015] might want to try a human cannon for that distance [Tue Jan 20 15:27:00 2015] how did I guess? (I know some folks in Orkney. grumping about flights is remarkably common...) [Tue Jan 20 15:28:12 2015] Hell, even if you crash landed, you could probably swim to the other island in about the same time. [Tue Jan 20 15:28:36 2015] only in the rare cases where the channel is calm... [Wed Jan 21 04:05:26 2015] Hi all, anyone an idea if spotlight in OSX works with OpenAFS? I've been trying but can't make it work. [Wed Jan 21 07:41:07 2015] I believe it's specifically excluded because there's no way to stop it trying to index every cell [Wed Jan 21 07:41:36 2015] do not put all cells in CellServDB? [Wed Jan 21 07:42:02 2015] also spotlight puts its metadata at the root of the volume and /afs is not writable to it (and shouldn't be; think about it) [Wed Jan 21 07:42:51 2015] that doesn't help when many cells have info in DNS [Wed Jan 21 07:43:33 2015] but only requested on first access? [Wed Jan 21 07:43:46 2015] if you do only mount yoru own cell, how should it know about the others? [Wed Jan 21 07:43:58 2015] or am I wrong? [Wed Jan 21 07:45:06 2015] you're also kinda missing the point in some sense it you lock it down to only being able to mount the local cell, imo [Wed Jan 21 07:46:42 2015] but you haven't addressed the point about it wanting to create /afs/.Spotlight-V100 and /afs/.DocumentRevisions-V100 [Thu Jan 22 07:56:47 2015] hrmm ... seems in current debian sid the openafs module doesn't build (openafs-modules-dkms 1.6.10-3, linux-image-3.16.0-4-amd64 3.16.7-ckt4-1). [Thu Jan 22 08:36:30 2015] ok, something between linux 3.16.7-ckt2-1 and 3.16.7-ckt4-1 breaks it ... [Thu Jan 22 12:38:39 2015] is afsd.fuse still considered experimental ? [Thu Jan 22 12:39:14 2015] Mostly [Thu Jan 22 12:44:41 2015] are their known issues published somwehere? [Thu Jan 22 12:46:04 2015] I don't think there's a single comprehensive list, though some people here should be able to list the big ones. [Thu Jan 22 12:47:17 2015] anyone done anything with openafs clients in a docker container? [Thu Jan 22 12:47:31 2015] The big thing with afsd.fuse is the inability to make authenticated requests, IIRC. [Thu Jan 22 12:47:48 2015] ah, yes that is an issue for us then [Thu Jan 22 13:14:39 2015] you cannot get tokens with fuse because there is no pioctl interface [Thu Jan 22 13:19:30 2015] But is the client unstable for unauthenticated access, Jeff? [Thu Jan 22 13:21:45 2015] The openafs fuse module isn't well tested nor it is heavily used in production environments so I can't really speak to its reliability. [Thu Jan 22 13:21:52 2015] or performance [Fri Jan 23 17:19:07 2015] nfw: (re 11349) Also, with rxgk's per-fileserver keys, there will be lots of cases where VolForward just plain can't work -- the only case when it would reliably work is when the source fileserver has the cell-wide key. [Fri Jan 23 18:18:17 2015] I'm afraid I didn't understand that... what prevents the source server from initiating a connection to the recipient server, using the VLDB to grab its key just as any other client would? [Fri Jan 23 18:18:55 2015] Also, can someone please confirm for me that ./src/libafscp/afscp_util.c line 261 ("if (realm)") and following is completely bogus? [Fri Jan 23 18:38:03 2015] The target server needs to know that the source server is authorized to create volumes. [Fri Jan 23 18:39:46 2015] "if (realm)" means "if (realm != NULL)" for poiters, yes, so the block of code there will never execute. [Fri Jan 23 18:39:56 2015] Also, shame on whomever did not use braces for the outer if. [Fri Jan 23 18:46:45 2015] Surely the target server knows *which* server is authenticating to it? (Well, or possibly the VLDB has been compromised and the server keys are being misappropriated...) [Fri Jan 23 18:47:12 2015] Well, there's a question of what key to use. [Fri Jan 23 18:47:37 2015] The fileserver cannot get a token using its server key in a client role, since that is just an rfc3961 key stored in the vldb, not a GSS identity. [Fri Jan 23 18:48:15 2015] There is probably going to be some GSS identity which is used to authenticate updates to that fileserver's key, but there does not strictly speaking need to be such a thing. [Fri Jan 23 18:48:51 2015] Well, is there anything wrong with "if a rxgk server is going to initiate VolForward RPCs, it must have a GSS identity" ? [Fri Jan 23 18:49:33 2015] nwf: you can do anything you want if you standardize the protocol and write the code. OpenAFS doesn't have a protocol that permits the necessary functionality and no one has written the code [Fri Jan 23 18:49:38 2015] It doesn't really seem reasonable to expect the destination fileserver to know enough about the vldb to query what GSS identity corresponds to the source fileserver (and there aren't even RPCs do to so, anyway). [Fri Jan 23 18:50:38 2015] secureendpoints: Well, OpenAFS right now only has the one cell-wide key, right? I imagine (haven't looked recently enough to know) that VolForward just uses that key on both ends? [Fri Jan 23 18:50:44 2015] And even if the target fileserver could verify that the GSS identity being used corresponds to the source fileserver, it would still have to make a decision about whether that fileserver is authorized to create volumes on it -- there's no particular reason why different departmental fileservers in a cell would trust each other. [Fri Jan 23 18:50:54 2015] That is correct for the current situation with rxkad. [Fri Jan 23 18:52:16 2015] What's the alternative, tho', if not GSS id-based ACLs for release? [Fri Jan 23 18:52:26 2015] Don't? [Fri Jan 23 18:52:34 2015] Use the cell-wide key. [Fri Jan 23 18:52:47 2015] Make the admin get a dump and restore it manually. [Fri Jan 23 18:56:30 2015] dump-and-restore, aside from its triangular data flow, doesn't update the VLDB replication flags. Though maybe that's OK. [Fri Jan 23 18:56:55 2015] Someone could also write the code to make vos do that under the covers, of course. [Fri Jan 23 21:08:09 2015] Good evening, I didn't see cve-2014-2852 listed in openafs security advisories on the website. I Was wondering if anyone knows what commit/patch fixes this issue. [Fri Jan 23 21:12:31 2015] It is fixed by commit 0ec67b0a9a175af14e360da75d1f5429c6c97b24 on master [Fri Jan 23 21:12:55 2015] The fix for other branches should have a similar commit message [Fri Jan 23 21:15:38 2015] kaduk_: thank you [Fri Jan 23 21:16:16 2015] The fix was released (IIRC) at the same time as a much more serious issue, so there was not a separate writeup for the more minor issue. [Fri Jan 23 21:19:19 2015] kaduk_: Ah ha. Got it. Thanks again. I'm attempting to get Gentoo's OpenAFS up to snuff :) [Fri Jan 23 21:51:45 2015] coolness... having distro oafs up to stuff is "a good thing" :) [Fri Jan 23 21:58:43 2015] heh, yeah, right now the latest in the tree is 1.6.5, haha. I've been running 1.6.10 out of my personal repository for a while. No idea where our maintainer disappeared to, so I asked to proxy-maintain it, and have spend the afternoon evening to close out bug reports [Fri Jan 23 22:03:38 2015] thank you so much [Fri Jan 23 22:04:33 2015] No problem :) [Fri Jan 23 22:05:02 2015] So you'll probably see a lot more of me, hehe [Fri Jan 23 22:05:32 2015] :) [Fri Jan 23 22:19:15 2015] someone else who hangs out here has been grumping a bit about gentoo's openafs, IIRC? [Fri Jan 23 22:27:54 2015] looks like the commit that kaduk_ referenced earlier isn't going to be trivial to apply :P [Fri Jan 23 22:29:45 2015] a patch against master won't apply to 1.6, no [Fri Jan 23 22:30:03 2015] if you update to 1.6.10 you should get the 1.6 version of that commit though [Fri Jan 23 22:31:23 2015] 19c4d6023c8f616de0d194e560e64576e5986f70 is what's on the 1.6 branch for it [Fri Jan 23 22:32:04 2015] yep, I found that, issue is that it references ConnectionSendLater (or whatever it was, I'm pulling it up now) and that doesn't exist in 1.6.5 [Fri Jan 23 22:32:54 2015] so I lookated the commit where that was split out into it's own function (don't have it handy [yet]) and that requires more changes too [Fri Jan 23 22:33:35 2015] s/lookated/looked at/ [Fri Jan 23 22:34:33 2015] geekosaur: not bircoph, is it? [Fri Jan 23 22:35:11 2015] no [Fri Jan 23 22:35:37 2015] johnfg, not here now [Fri Jan 23 22:35:54 2015] name sounds familiar [Fri Jan 23 22:36:12 2015] * grepped his logs... [Fri Jan 23 22:37:01 2015] hmm, my machine can't load git.openafs.org [Fri Jan 23 22:37:18 2015] works here [Fri Jan 23 22:37:28 2015] what address is git.openafs.org resolving to [Fri Jan 23 22:37:49 2015] 18.9.44.50 [Fri Jan 23 22:38:06 2015] that is correct [Fri Jan 23 22:38:14 2015] lol and now it is working... [Fri Jan 23 22:38:34 2015] okay, now let me get specifics [Fri Jan 23 22:38:58 2015] net-18 has been having some connectivity issues, it seems. [Fri Jan 23 22:39:08 2015] My ssh connections keep coming and going. [Fri Jan 23 22:40:35 2015] so 1.6.5 doesn't have rxi_SendConnectionAbortLater, so I went and grabbed: 32688c069f22f3b96e261f2361e251081957a047 blob3 had failed. [Fri Jan 23 22:40:50 2015] s/blob/and blob/ [Fri Jan 23 22:44:29 2015] so if you think it's okay for blob 3 to be the rxi_sendSpecial call instead of SendConnectionAbort referenced in 32688c069f22f3b96e261f2361e251081957a047, I'll fudge it, otherwise, I can hop and try to find the previous patch necessary to get that working XD [Fri Jan 23 22:51:03 2015] That sounds plausible, but I'm headed to sleep so my opinion is untrustworthy. [Fri Jan 23 22:51:47 2015] heh, np, well thanks for your help kaduk_ [Sat Jan 24 08:52:53 2015] :) [Sat Jan 24 08:52:57 2015] haloo [Sat Jan 24 15:34:47 2015] Is the jabber chatroom working for anyone else? [Sat Jan 24 15:35:47 2015] let's see... [Sat Jan 24 15:36:40 2015] doesnt look like [Sat Jan 24 15:38:57 2015] The last message I see is yesterday at 15:22 [Sat Jan 24 15:40:01 2015] jabber.openafs.org does not ping... [Sat Jan 24 15:41:07 2015] indeed [Sat Jan 24 15:45:40 2015] glad it's not just me. [Sat Jan 24 15:46:05 2015] who can kick the chat server? [Sat Jan 24 15:47:50 2015] kaduk_'s at MIT [Sat Jan 24 15:50:36 2015] the jabber server is working [Sat Jan 24 15:52:43 2015] secureendpoints: yes, I see your post [Sat Jan 24 16:03:09 2015] kaduk_ was complaining yesterday that net18 kept falling off the net, iirc [Sat Jan 24 16:03:44 2015] (but fwiw I just reconnected my work box to it) [Sat Jan 24 18:04:57 2015] I can kick the jabber server, yes, but I can't do much about network issues getting to it ;) [Sun Jan 25 09:53:06 2015] hello. I'm following this tutorial (https://wiki.freebsd.org/afs-server) and when I get down to the create users section the prs createuser command failes with "pts: server or network not responding ; unable to create user" there is an afs-bos server running and listening on the default port [Thu Jan 29 13:10:48 2015] I thought that had been patched in the stable tree [Thu Jan 29 13:19:03 2015] yes. [Thu Jan 29 13:19:16 2015] you need to run regen.sh before running ./configure [Thu Jan 29 13:19:56 2015] if you are building against something at or before 1.6.11pre1 [Thu Jan 29 13:23:26 2015] are you referring to the man pages or to the d_alias error (I'm presuming man pages) [Thu Jan 29 13:23:34 2015] d_alias [Thu Jan 29 13:23:57 2015] if you pulled the recent patches to address it, one of the patches adds a check in autoconf [Thu Jan 29 13:24:04 2015] ah [Thu Jan 29 13:24:11 2015] would I have had to have manually pulled said patches? [Thu Jan 29 13:24:12 2015] if you haven't re-generated the ./configure, it won't have the check. [Thu Jan 29 13:24:23 2015] I only did a git checkout of openafs-stable_1_6_x [Thu Jan 29 13:24:25 2015] no, they're in the latest openafs-stable-1_6_x [Thu Jan 29 13:24:54 2015] but if you hadn't run a regen in the past couple weeks, ./configure wouldn't know to set the appropriate flags [Thu Jan 29 13:26:01 2015] * runs git rebase origin/openafs-stable-1_6_x [Thu Jan 29 13:40:07 2015] hunh, even with a "git rebase" I'm still getting that error (and reran regen.sh, too) [Thu Jan 29 13:52:02 2015] It may have been that those ones were only just merged on master and not pulled up yet. [Thu Jan 29 13:52:30 2015] But no, it looks to be there in (e.g.) 860764da5ee2e48a2c3f7552fad1766e19eae47f [Thu Jan 29 13:52:54 2015] any way to verify I've actually got it? [Thu Jan 29 13:58:55 2015] I guess, if STRUCT_DENTRY_HAS_D_U_D_ALIAS appears in your tree [Thu Jan 29 14:00:23 2015] string in a file someplace? [Thu Jan 29 14:00:54 2015] well, seems to be neither [Thu Jan 29 14:11:17 2015] ok, I think I'm cooking, now... git tree was... confused [Fri Jan 30 12:11:44 2015] * nods [Wed Feb 4 06:18:23 2015] Hi all, I'm trying to PXE boot Linux Workstations. Currently the root resides in an NFS-share but since we're considering a move to OpenAFS, I'm trying to move the root filesystem to AFS-space as well. Now I'm sort of stuck at creating the initramfs with dracut. I run dracut from a Linux workstation that is PXE booted over NFS and is AFS aware. My pwd is something like [Wed Feb 4 06:18:25 2015] /afs/.mycell.org/software/netboot/centos6.6/ and the command I run is "dracut --force tmpimage". Then I get the feedback: [Wed Feb 4 06:18:31 2015] W: Dracut module "debug" cannot be found. [Wed Feb 4 06:18:39 2015] W: Dracut module "syslog" cannot be found. [Wed Feb 4 06:18:46 2015] W: Dracut module "openafs" cannot be found. [Wed Feb 4 06:20:24 2015] If I do lsmod | grep openafs, the module is loaded [Wed Feb 4 06:20:55 2015] And clearly when I examine the cpio root image, I don't find anything like an openafs.ko file. [Wed Feb 4 06:25:07 2015] but I do have these files: [Wed Feb 4 06:25:09 2015] /lib/modules/2.6.32-504.1.3.el6.x86_64/extra/openafs.ko [Wed Feb 4 06:25:18 2015] /lib/modules/2.6.32-504.1.3.el6.x86_64/weak-updates/openafs.ko [Wed Feb 4 06:25:23 2015] /lib/modules/2.6.32-71.29.1.el6.x86_64/extra/openafs.ko [Wed Feb 4 06:25:50 2015] and I'm running 2.6.32-71.29.1.el6.x86_64 [Wed Feb 4 06:26:18 2015] So my best guess is that the the openafs.ko file in the last directory needs to be linked somewhere else? [Wed Feb 4 07:17:36 2015] [freenode-info] channel flooding and no channel staff around to help? Please check with freenode support: http://freenode.net/faq.shtml#gettinghelp [Wed Feb 4 08:20:04 2015] Just solved it by removing the openafs module from /etc/dracut.conf and specifiying it manually on the command line like so: [Wed Feb 4 08:20:06 2015] dracut --add-drivers openafs --force tmp_image.img [Wed Feb 4 09:33:42 2015] hello [Wed Feb 4 09:33:50 2015] hi [Wed Feb 4 09:34:41 2015] i'm setting up my first cell, following the manual, i've completed the "Initializing the Protection Database" chapter, however i would like to have the fs processes on a different machine, but the doc says to start the fs processes on the same box [Wed Feb 4 09:34:53 2015] how should i go about having the fs processes on a different box? [Wed Feb 4 09:35:27 2015] should i just redo the cellnaming/krb stuff part on the other box, then from the first one issue the bos commands targeting the second one to add it to the cell? [Wed Feb 4 09:39:51 2015] which "manual" are you following? the IBM one is pretty out of date [Wed Feb 4 09:40:38 2015] http://docs.openafs.org/QuickStartUnix/#HDRWQ60.html [Wed Feb 4 09:40:53 2015] in any case, once you have vl and pt running on one machine, you should be able to set up a second machine, copy the cell config and KeyFile / keytab from the first one, and set up bos and fs on it [Wed Feb 4 09:41:11 2015] rriiiight [Wed Feb 4 09:41:58 2015] what steps does that "set up bos" include on the second machine? [Wed Feb 4 09:42:22 2015] please keep in mind it's my first afs installation, i've read quite a few docs, but not familiar with a lot of things myself [Wed Feb 4 09:43:02 2015] setting up bos is basically just ensuring that the keytab and server CellservDB/ThisCell are present, then running the command 'bosserver'. [Wed Feb 4 09:43:33 2015] After that you should be able to use bos to add the fs instance as (I think) the manual says. [Wed Feb 4 09:43:36 2015] so, while bos is not running, copy files over, then start it? [Wed Feb 4 09:43:39 2015] mm. so depending on paths you can just copy /etc/openafs/server or /usr/afs/etc to the new machine and make sure the server component is installed. then start the bosserver [Wed Feb 4 09:43:50 2015] yes [Wed Feb 4 09:44:09 2015] that seems easier than what i've expected, let me give it a shot [Wed Feb 4 09:44:16 2015] make sure you dont copy BosConfig though as it will at that point want to try to start the vlserver and ptserver and you don't want those on a fileserver [Wed Feb 4 09:44:37 2015] gotcha [Wed Feb 4 09:45:35 2015] btw, it wasn't clear from the docs, do i actually need any kernel mods on the servers, if i'm not planning on having them as a client as well? [Wed Feb 4 09:45:44 2015] also, the IBM manual probably had you create a KeyFile. modern openafs prefers you use rxkad.keytab with modern encryption [Wed Feb 4 09:45:52 2015] kernel mods are for the "client" only [Wed Feb 4 09:45:57 2015] ^ [Wed Feb 4 09:46:01 2015] ==RedFyre [Wed Feb 4 09:46:08 2015] but, you'll find it's useful to have a client on the servers [Wed Feb 4 09:46:18 2015] any new installation should use namei servers which doesn't require a kernel module [Wed Feb 4 09:46:31 2015] geekosaur: I think I updated the QuickStartGuide to use rxkad.keytab, but that might only be in my local copy. [Wed Feb 4 09:46:33 2015] geekosaur, the openafs.com tutorial made me create a KeyFile [Wed Feb 4 09:47:09 2015] you will need a client *somewhere* to set up the initial cell volumes (root.cell, possibly root.afs if you're planning to use non-dynroot clients) [Wed Feb 4 09:47:11 2015] kaduk_, http://docs.openafs.org/QuickStartUnix/#HDRWQ53.html [Wed Feb 4 09:47:14 2015] bottom of the page [Wed Feb 4 09:47:54 2015] phx: I see, thanks. [Wed Feb 4 09:48:51 2015] I'm probably thinking about https://github.com/openafs/openafs/commit/27b66f24aad04d1e74a7aa43d6ebcca0b98af18f but I don't know/remember how the ones on the website get built. [Wed Feb 4 09:54:34 2015] when copying over etc to the second box, CellServDB is only having the first machine's address, is that all right? [Wed Feb 4 09:54:53 2015] Yes. [Wed Feb 4 09:55:03 2015] CellServDB should only list machines running vlserver/ptserver [Wed Feb 4 09:55:35 2015] and how should i do this when i'm going to add my third server, which should be a vl/pt server? [Wed Feb 4 09:55:40 2015] a non-fs server [Wed Feb 4 09:56:48 2015] Going from one to two database server (the vl and pt servers are database server processes, so it's more common to refer to such machines as dbservers) requires a bit of caution to avoid a split-brain scenario. [Wed Feb 4 09:57:21 2015] currently i'm planning on to have 2 db and 2 fs servers in the cell, you think i should not go with 2 db servers? [Wed Feb 4 09:57:59 2015] 2 dbservers is usually disrecommended; 1 or 3 is better. [Wed Feb 4 09:58:02 2015] 2 is bad because of how IBM implemented the Ubik protocol. basically, you don't actually have redundancy because the cell won't recover [Wed Feb 4 09:58:42 2015] and fixing this while maintaining compatibility (as required by our agreements with IBM) is very hard [Wed Feb 4 09:58:48 2015] currently this is a PoC demo i'm doing for my colleages+managers, so you guys think i should take the effort and do 3, or leave it as a single one? [Wed Feb 4 09:59:10 2015] lose one of them and the other will not be able to conclude that it is the "master" which coordinates updates, so your cell becomes read-only [Wed Feb 4 09:59:12 2015] best is 1 or 3 [Wed Feb 4 09:59:34 2015] Up to you. For a PoC a single one is probably okay, just make sure it doesn't go down during the demo ;) [Wed Feb 4 10:02:24 2015] uhm, i've issued bos create $fs1 fs fs /usr/afs/bin/fileserver /usr/afs/bin/volserver /usr/afs/bin/salvager -noauth [Wed Feb 4 10:02:47 2015] from the first server (which is the db), and on the fileserver i do not see the salvager process running, is that normal? :) [Wed Feb 4 10:03:02 2015] also BosLog has no entry for it [Wed Feb 4 10:05:38 2015] did you start bosserver with -noauth on the fileserver? (if you have a KeyFIle or rxkad.keytab then you shouldn't, and you should use -localauth instead of -noauth) [Wed Feb 4 10:05:51 2015] yes i did [Wed Feb 4 10:06:16 2015] what's the way of gracefully stopping the bos server? or should i just bos restert $host ? [Wed Feb 4 10:06:22 2015] s/restert/restart/ [Wed Feb 4 10:07:20 2015] just using 'kill' on the bosserver pid should do the right thing; and bos restart won't work if you've got it confused about whether it's running in auth mode or not [Wed Feb 4 10:08:40 2015] are you sure on localauth? [Wed Feb 4 10:08:45 2015] bos -localauth prints the help [Wed Feb 4 10:08:52 2015] err [Wed Feb 4 10:08:57 2015] /usr/afs/bin/bosserver -localauth [Wed Feb 4 10:09:43 2015] right, manual indicates omitting -noauth will do auth [Wed Feb 4 10:10:50 2015] strange, bos status reports the salvager, but i don't see the process on the another box [Wed Feb 4 10:12:13 2015] now i'm confused. the bos on the first fs serveris started without -noauth. from the first machine i've issued ``vos create $fs1 /vicepa root.afs -noauth'' as the doc said, but i was given a permission denied [Wed Feb 4 10:12:21 2015] what am i doing wrong? :) [Wed Feb 4 10:12:54 2015] http://pastebin.com/7jkvrnvs [Wed Feb 4 10:12:57 2015] the exact error message [Wed Feb 4 10:18:11 2015] that seems right, noauth turns off key checking not susers checking [Wed Feb 4 10:18:50 2015] hm, or does it? I don't think I've played with that, or indeed with noauth across machines (generally I turn on auth as soon as possible) [Wed Feb 4 10:19:28 2015] You may want to start over using a guide that includes https://github.com/openafs/openafs/commit/27b66f24aad04d1e74a7aa43d6ebcca0b98af18f (or maybe https://wiki.freebsd.org/afs-server if that's relevant?) [Wed Feb 4 10:19:46 2015] geekosaur: "as soon as possible" is "from the very start", these days ;) [Wed Feb 4 10:19:54 2015] yes, yes it is :) [Wed Feb 4 10:22:21 2015] kaduk_, is there a place where that update docs is up and functional? i can read unified diffs of htmls and stuff, i just wouldn't prefer where my goal is different :) [Wed Feb 4 10:23:39 2015] Let me ask if the website just needs to be rebuilt. [Wed Feb 4 10:23:47 2015] thanks [Wed Feb 4 10:24:35 2015] any chance it might happen really soon? they want me to give that demo on friday, so i'm on a bit of a time constraint here [Wed Feb 4 10:26:34 2015] A slight chance, I may know more in a few minutes. [Wed Feb 4 10:26:44 2015] (Are you running this on freebsd, if I may ask?) [Wed Feb 4 10:26:57 2015] nope, centos6 this time [Wed Feb 4 10:28:07 2015] and a question on authentication. our username convention is given.family, when creating the a peruser admin principal, that renders to given.family/admin in kerberos, but given.family.admin in pts. wouldn't the dot in the regular username be a problem? [Wed Feb 4 10:32:12 2015] Yes, the dot is a problem. [Wed Feb 4 10:32:28 2015] You will need to enable an extra option in one or two places. [Wed Feb 4 10:33:43 2015] are the docs covering it, or should i be aware of it at the very first place? [Wed Feb 4 10:35:42 2015] I don't think the docs cover it, because it's rather disrecommended --- the interaction between kerberos 5 principals and afs's (still rather krb4 in many places) is not well defined or tested in the presence of extra dots [Wed Feb 4 10:37:00 2015] by reading that fbsd page, it seems the difference between the KeyFile and rxkad, is that i directly have to generate the keytab to the etc directory as rxkad.keytab, and that seems to be all [Wed Feb 4 10:37:39 2015] in terms of operation, yes. in terms of encryption... KeyFile supports only single-des, which is essentially no security at all these days [Wed Feb 4 10:37:45 2015] rxkad.keytab lets you use e.g. AES [Wed Feb 4 10:37:50 2015] kaduk_, geekosaur and what might be those extra options i might need in case of extra dots? [Wed Feb 4 10:38:17 2015] geekosaur, so i don't have to create the principal with des-only, i can use all the other ciphers? [Wed Feb 4 10:38:52 2015] also you can precreate an rxkad.keytab and never have to run the servers in noauth mode, whereas creating a KeyFIle requires some initial cell setup [Wed Feb 4 10:39:14 2015] right, re enctypes/ciphers [Wed Feb 4 10:39:18 2015] and starting over would mean, just cleaning the /usr/afs/etc dir? [Wed Feb 4 10:39:28 2015] ofcourse, after stopping everything [Wed Feb 4 10:39:48 2015] also /usr/afs/db and /usr/afs/local on the dbservers [Wed Feb 4 10:40:05 2015] and if anything is under /vicepxx/ those as well? [Wed Feb 4 10:40:13 2015] Sounds like getting the docs updated on the website is a manual process, so probably isn't going to happen right away. [Wed Feb 4 10:40:35 2015] kaduk_, thanks for it anyway :) [Wed Feb 4 10:40:43 2015] let me start over then [Wed Feb 4 10:40:46 2015] that depends. there's no partocilar reason you can't set up a new fileserver and just have it attach an existing partition [Wed Feb 4 10:40:55 2015] *particular [Wed Feb 4 10:40:56 2015] The fileserver has an -allow-dotted-principals option [Wed Feb 4 10:41:40 2015] and i have to specify that to bos when launching the fileserver? [Wed Feb 4 10:41:52 2015] Yes [Wed Feb 4 10:41:53 2015] btw is it the fileserver? shouldn't the ptserver take care of the principals? [Wed Feb 4 10:42:13 2015] It's also needed for the ptserver, it looks like. [Wed Feb 4 10:42:39 2015] Hmm, maybe it's needed for all the server processes (I just looked at vlserver too) [Wed Feb 4 10:42:43 2015] let me start over, fix the principal, generate the keytab, and let me see when creating the cell again [Wed Feb 4 10:43:37 2015] pretty sure it's needed for all of them [Wed Feb 4 10:45:38 2015] ok, keytab removed, etc/,db/,local/ cleared on both servers, service principal recreated with all the enctypes [Wed Feb 4 10:49:00 2015] so, i've started bos, and trying to set the cell, but i'm not authenticated. as what principal should I authenticate that bos should accept? [Wed Feb 4 10:49:37 2015] use -localauth [Wed Feb 4 10:49:47 2015] (this authenticates using rxkad.keytab) [Wed Feb 4 10:50:02 2015] thanks [Wed Feb 4 10:51:52 2015] and using bos create $machine $servertype simple /usr/afs/bin/$server how do i pass -allow-dotted-principals to the actual service? [Wed Feb 4 10:52:05 2015] i don't really see this in bos_create(8) [Wed Feb 4 10:52:37 2015] sorry, i'm blind [Wed Feb 4 10:53:13 2015] Wed Feb 4 15:52:51 2015: buserver started pid 8088: /usr/afs/bin/buserver -allow-dotted-principals [Wed Feb 4 10:53:16 2015] Wed Feb 4 15:52:51 2015: buserver exited with code 6 [Wed Feb 4 10:53:18 2015] i think it's not a good sign [Wed Feb 4 10:54:08 2015] Put quotes around "/usr/afs/bin/$server -allow-dotted-principals" [Wed Feb 4 10:54:16 2015] got that part [Wed Feb 4 10:54:23 2015] It's possible that buserver didn't get that patch ported to it, I guess :-/ [Wed Feb 4 10:54:27 2015] however i found a service which does _not_ take the dotter principals arg [Wed Feb 4 10:54:57 2015] And indeed, that argument is not in the buserver manpage [Wed Feb 4 10:55:25 2015] uhm [Wed Feb 4 10:55:37 2015] i should remove it with bos delete now, right? [Wed Feb 4 10:55:40 2015] # bos delete `hostname` "/usr/afs/bin/buserver -allow-dotted-principals" -localauth [Wed Feb 4 10:55:44 2015] bos: failed to delete instance '/usr/afs/bin/buserver -allow-dotted-principals' (no such entity) [Wed Feb 4 10:55:47 2015] # bos delete `hostname` "/usr/afs/bin/buserver" -localauth [Wed Feb 4 10:55:48 2015] so how should i? :) [Wed Feb 4 10:55:51 2015] bos: failed to delete instance '/usr/afs/bin/buserver' (no such entity) [Wed Feb 4 10:56:21 2015] The argument to delete is the short name of the bnode, not the full command line [Wed Feb 4 10:56:25 2015] just give the "instance" name; listed by bos status [Wed Feb 4 10:56:47 2015] So, the $servertype in your line above [Wed Feb 4 10:57:36 2015] got it. stop first, then delete [Wed Feb 4 10:58:56 2015] and if i want all of these processes to log to syslog, o should use the -syslog option to each of them? [Wed Feb 4 10:59:12 2015] yes [Wed Feb 4 10:59:12 2015] I think so. [Wed Feb 4 11:00:30 2015] well, im not sure about buserver [Wed Feb 4 11:02:30 2015] with rxkad.keytab, what should be the bos listkeys output? currently all it says is "all done" [Wed Feb 4 11:03:08 2015] That is the correct output. [Wed Feb 4 11:03:29 2015] from a command named "list" i expected a list of keys of output. but okey :) [Wed Feb 4 11:04:44 2015] It's a list of all keys which are compatible with an ancient RPC format/data structure ;) [Wed Feb 4 11:05:11 2015] We could define a new RPC to be more flexible about listing keys, if we had time and no other more-pressing issues. [Wed Feb 4 11:05:21 2015] looks like -syslog was not added to the buserver. it gets no love. [Wed Feb 4 11:05:42 2015] Few people run the buserver. [Wed Feb 4 11:07:04 2015] meffie, ptserver is also missing it [Wed Feb 4 11:07:56 2015] phx, you don't need bos listkey with rxkad,keytab because ktutil works on it. with KeyFIle nothing knew how to work with it except bosserver, so there had to be a bos listkeys [Wed Feb 4 11:08:22 2015] got it, thanks [Wed Feb 4 11:10:41 2015] ptserver has -syslog on the master branch, i see [Wed Feb 4 11:10:49 2015] not on 1.6.10 [Wed Feb 4 11:15:36 2015] sorry about that. at least we know it will be there in 1.8.x [Wed Feb 4 11:16:31 2015] nope, just saying [Wed Feb 4 11:33:31 2015] davolserver is also seem to mis the syslog option [Wed Feb 4 11:33:55 2015] gah, buildbot "slave info" is broken, sadly. [Wed Feb 4 11:34:12 2015] exceptions.TypeError: 'NoneType' object is unsubscriptable [Wed Feb 4 11:37:33 2015] how optional is the updateserver? i have binaries from packages installed, and i don't think it would be a good idea to distribute binaries by afs, invalidating the package info [Wed Feb 4 11:37:59 2015] most people don't run it [Wed Feb 4 11:38:28 2015] upserver is kinda puppet/salt type of thing from before either of those existed [Wed Feb 4 11:38:45 2015] if you're starting from scratch, use real configuration management and ignore upserver/upclient [Wed Feb 4 11:39:03 2015] i already have a salt installation with various other stuff in place [Wed Feb 4 11:40:00 2015] then use that instead [Wed Feb 4 11:40:00 2015] so it seems i have added the first fileserver into the cell [Wed Feb 4 11:40:55 2015] you'll find that afs includes for backward compatibility reasons a lot of half-measures for which better solutions exist these days (kaserver, time server, upserver, buserver, ...) [Wed Feb 4 11:41:17 2015] because none of them existed (or if they did they were painful) back when afs was created [Wed Feb 4 11:41:28 2015] i've already noticed that [Wed Feb 4 11:44:54 2015] how does afs react to partition size changes? [Wed Feb 4 11:45:10 2015] let's say /vicepa has 20G space, but in a second it just 10G of total space instead of 20G [Wed Feb 4 11:46:34 2015] uhm [Wed Feb 4 11:46:46 2015] i'm trying to add /vicepa on the second fileserver, however it doesn't let me: [Wed Feb 4 11:46:51 2015] # vos create $fs2 /vicepa root.afs -localauth [Wed Feb 4 11:46:51 2015] vos : partition /vicepa does not exist on the server [Wed Feb 4 11:47:00 2015] # df -h /vicepa [Wed Feb 4 11:47:00 2015] Filesystem Size Used Avail Use% Mounted on [Wed Feb 4 11:47:00 2015] tank/vicepa 20G 128K 20G 1% /vicepa [Wed Feb 4 11:47:03 2015] it's clearly there [Wed Feb 4 11:47:14 2015] what am I doing wrong here? [Wed Feb 4 11:48:05 2015] uhm, root.afs is already on $fs1's /vicepa, so how do i add that partition to $fs2? [Wed Feb 4 11:49:46 2015] You may need to create /vicepa/AlwaysAttach (an empty file) and resteart the fileserver processes. ISTR there is some confusion about how ZFS reports partitions [Wed Feb 4 11:51:32 2015] okey [Wed Feb 4 11:51:38 2015] let's add a client then [Wed Feb 4 11:51:46 2015] then let's see how can I automate the client installations [Wed Feb 4 11:52:16 2015] btw, should i use kmod- or kdms- ? [Wed Feb 4 11:52:26 2015] dkms [Wed Feb 4 11:53:18 2015] uh, the zfs dkms package was a nightmare [Wed Feb 4 11:54:40 2015] zfs kinda would be. the openafs dkms stuff is pretty good, except when upstream scrambles their kernel stuff (kernel-devel redirected to kernel-debug-devel on el6 keeps breaking it... which is an rpm issue really; dpkg handles it better) [Wed Feb 4 11:55:25 2015] and where should i find an rc/init script for afs server? i don't see any reported by chkconfig, and under /usr/afs i neither see any [Wed Feb 4 11:55:28 2015] oh centos6 here [Wed Feb 4 11:56:49 2015] yo phx [Wed Feb 4 11:57:53 2015] if you installed the openafs-server package then there should be an init script already [Wed Feb 4 12:05:02 2015] w [Wed Feb 4 12:05:10 2015] i did that [Wed Feb 4 12:05:21 2015] okey, dkms module went smoothly [Wed Feb 4 12:05:53 2015] i've spent like 3 hours fighting with the zfs one, becuase that just installed the source, and i had to fight with it, and it required conflicting deps throughout the process [Wed Feb 4 12:06:35 2015] I guess zfs-on-linux is inherently exciting, possibly even more so than openafs. [Wed Feb 4 12:07:39 2015] for zfs i prefer to pick freebsd, since it just works out of the box [Wed Feb 4 12:08:00 2015] Me, too :) [Wed Feb 4 12:08:02 2015] however, there's not a redhat-like company behind freebsd, and at some places that's a showstopped, however technically nice is the stuff [Wed Feb 4 12:08:36 2015] i imagine if i wanted to go that way here, i would hear from our customers things like "that's not good because i haven't heard about it" [Wed Feb 4 12:08:43 2015] invalid arguments, meh [Wed Feb 4 12:18:37 2015] <[gorgo]> you could go proper solaris, there's a company behind it and zfs works out of the box :) [Wed Feb 4 12:24:39 2015] my insanity haven't reached that level yet :) [Wed Feb 4 12:57:42 2015] /etc/rc.d/init.d/openafs-server [Wed Feb 4 12:57:45 2015] that will be it [Wed Feb 4 13:14:28 2015] http://docs.openafs.org/QuickStartUnix/#HDRWQ146.html [Wed Feb 4 13:14:37 2015] here the disk and memory cache commands are pretty much the same [Wed Feb 4 13:16:15 2015] yup [Wed Feb 4 13:18:25 2015] err [Wed Feb 4 13:18:36 2015] i expected configuring the memory and disk cache would have some differenc [Wed Feb 4 13:18:44 2015] like... anything that makes them differ :) [Wed Feb 4 13:22:16 2015] ... they can't think for you. that they are configured the same way aside from -memcache itself means you have to figure out how much (disk | memory) to assign to them [Wed Feb 4 13:22:27 2015] otherwise, what stunning difference were you expecting to see? [Wed Feb 4 13:22:39 2015] they're both doing pretty much the same thing, after all [Wed Feb 4 13:22:41 2015] well, i expected it to be able to handle both a memory and a disk cache at the same time [Wed Feb 4 13:22:53 2015] and to be able to configure both caches [Wed Feb 4 13:23:05 2015] no, just as it does not do multiple disk caches [Wed Feb 4 13:23:06 2015] but the -memory option was discussed later in a chapter [Wed Feb 4 13:24:04 2015] mhm [Wed Feb 4 13:24:23 2015] when i do a service start openafs-client, my /user/vice/etc/CellServDB is being overwritten [Wed Feb 4 13:25:47 2015] Yea, the RPM packaging wants you to use the .local version. [Wed Feb 4 13:26:11 2015] the RPM packaging is ... annoying [Wed Feb 4 13:26:23 2015] and how can i tell it i want to use my own CellServDB? [Wed Feb 4 13:26:35 2015] it regenerates the client CellServDB every time from CellServDB.dist and CellServDB.local [Wed Feb 4 13:26:56 2015] so i have to do something like echo -n > CellServDB.dist ? :) [Wed Feb 4 13:27:06 2015] so if you only want your local ones, empty out CellServDB.dist and put your local ones in CellServDB.local [Wed Feb 4 13:27:18 2015] cp /dev/null CellServDB.dist [Wed Feb 4 13:30:06 2015] or edit the init script to not do that [Wed Feb 4 13:30:27 2015] i would rather not edit init scripts, because then i will also have to maintain the modifications [Wed Feb 4 13:30:52 2015] it just depends on what you want to maintain [Wed Feb 4 13:31:02 2015] whether or not an rpm upgrade would undo your CellServ.dist mod, dunno [Wed Feb 4 13:31:36 2015] that's easy, i can tell salt to keep it empty [Wed Feb 4 13:35:43 2015] mhm [Wed Feb 4 13:42:51 2015] Title II here we come [Thu Feb 5 01:32:38 2015] what's the impact if i issue an mkmount for a cell's root.afs other than /afs/$cellname ? [Thu Feb 5 01:33:25 2015] Nil. [Thu Feb 5 01:33:54 2015] Having a volume mounted in multiple locations is a normal part of AFS operation. (Well, the linux VFS model hates us for it, but it mostly works. Usually.) [Thu Feb 5 01:37:53 2015] in general you shouldn't create mount points to root.afs volumes. They are explicitly there to be /afs. I believe you meant to say root.cell [Thu Feb 5 01:38:56 2015] Well, you have to make a mount point to root.afs if you are setting up a new cell with a dynroot client and want to support non-dynroot clients in the cell, right? (Best practice is still to remove that mountpoint once it's done.) [Thu Feb 5 01:39:12 2015] secureendpoints, yes, i've meant that [Thu Feb 5 01:39:43 2015] um. root.afs is not required if you use dynroot everywhere [Thu Feb 5 01:39:46 2015] i'm still confused with a couple of things, a quite few terms are not very clear from the top of my head [Thu Feb 5 01:39:59 2015] > and want to support non-dynroot clients in the cell [Thu Feb 5 01:40:25 2015] basically i would like to create a cell per site, plus a "master" cell, from where i will synchronize stuff to the per-site cells [Thu Feb 5 01:40:50 2015] and the per-site cell should somehow have a static path across the infrastructure at the end [Thu Feb 5 01:41:19 2015] IIRC, synchronizing stuff from one cell to a different one is not a terribly great experience. 'vos dump' ~piped to 'vos restore' is about all you get. [Thu Feb 5 01:42:11 2015] (I mean, people do it, certainly, but it's not something the standard tooling is really set up to do out-of-the-box.) [Thu Feb 5 01:42:14 2015] i have yet to see that. first i still need to get 2 more cells up for this PoC, make the mount points, create example volumes, then i can see to the sync [Thu Feb 5 01:42:43 2015] kaduk_, to be honest, something not being a standard as out-of-the-box is nothing new to IT [Thu Feb 5 01:43:05 2015] i prefer to interpret most software as frameworks, which i have to integrate to get the functionality my needs require [Thu Feb 5 01:43:59 2015] openafs does not provide cell to cell synchronization. organizations that do so have built proprietary infrastructures to manage data outside of afs. Morgan Stanley has discussed their VMS tooling at various conferences but has never released it. Merrill Lynch open sourced openefs.org but never added the AFS support. [Thu Feb 5 01:44:21 2015] i know that [Thu Feb 5 01:44:54 2015] i already have an integration layer on top of (so far) saltstack, openldap, mit kerberos, zabbix, bind9, strongswan, and a couple of other pieces [Thu Feb 5 01:45:29 2015] i'm planning to do that tooling. actually i've spent like 18 months at morgan stanley, i'm more or less familiar with a few things from there [Thu Feb 5 01:58:32 2015] so root.afs maps to /afs and root.cell to /afs/$cell right? [Thu Feb 5 01:58:57 2015] and ThisCell's root.afs will be /afs on the client? [Thu Feb 5 02:00:44 2015] The volume names root.afs and root.cell are a convention. You are free to use whatever names you would like. The default name that afsd looks for when mounting /afs is root.afs. The default name that dynroot cache managers look for when evaluating /afs/ is root.cell. [Thu Feb 5 02:03:47 2015] i see [Thu Feb 5 02:03:56 2015] in case of dynroot, root.afs is not being used? [Thu Feb 5 02:04:03 2015] no [Thu Feb 5 02:04:08 2015] gotcha [Thu Feb 5 02:04:32 2015] and i've read that if i use SRV records, then CellServDB is not being used to look up the cells, but it relies on the DNS [Thu Feb 5 02:04:42 2015] but in this case, how do i tell the client, what other cells are there? [Thu Feb 5 03:38:17 2015] gods have mercy, my deadline is modified to next tuesday \o/ [Thu Feb 5 07:04:53 2015] phx: the user on the client "knows" which cells it wants to access. [Thu Feb 5 08:03:28 2015] Walex, right, but how do i make it available? [Thu Feb 5 08:03:58 2015] or he just does some cd /afs/$somecell without even the directory existing, and afsd will discover it using SRV records? [Thu Feb 5 08:22:03 2015] phx - basically the latter [Thu Feb 5 08:23:05 2015] when trying to mkmount root.cell i get an error: [Thu Feb 5 08:23:09 2015] # fs mkmount /afs/afs-master.veeva.poc root.cell [Thu Feb 5 08:23:09 2015] fs: cell dynroot not in /usr/vice/etc/CellServDB [Thu Feb 5 08:23:19 2015] however, it is in cellservdb [Thu Feb 5 08:23:29 2015] what's wrong here? googling didn't provide much help [Thu Feb 5 08:24:42 2015] if you're using dynroot, then you don't make that mount... it makes it for you [Thu Feb 5 08:24:59 2015] the client will usually auto place in /afs the contents of CellServDB [Thu Feb 5 08:25:37 2015] and with -afsdb and -dynroot, if you try to cd to /afs/somecell where somecell can be looked up, it will then auto create /afs/somecell even if the dir isn't there [Thu Feb 5 08:25:58 2015] of course, for gui file browsers, that doesn't quite work, since until you try to go there, there's nohing there to click on [Thu Feb 5 08:26:24 2015] gladly i do not have a GUI around here [Thu Feb 5 08:26:47 2015] so, you don't need to do that fs mkmount because of dynroot [Thu Feb 5 08:26:53 2015] i see [Thu Feb 5 08:28:41 2015] https://lists.openafs.org/pipermail/openafs-info/2003-February/008195.html and http://docs.openafs.org/Reference/8/afsd.html (look for the -dynroot option in the second one) [Thu Feb 5 08:30:17 2015] got it, thank you [Thu Feb 5 08:30:38 2015] now i'm trying to set acls on root.cell, so i can do something with it, but i'm getting premission denied all over the place [Thu Feb 5 08:31:05 2015] i've got a kerberos token for the admin user i've specified in the cell setup, and aklog says i've got a token for uid1 (which was the uid of the pts user) [Thu Feb 5 08:31:12 2015] # fs listacl /afs/afs-master.veeva.poc [Thu Feb 5 08:31:12 2015] fs: You don't have the required access rights on '/afs/afs-master.veeva.poc' [Thu Feb 5 08:31:28 2015] what am I missing here? [Thu Feb 5 08:31:47 2015] if you have both a RW and RO of root.cell, you'll need to do that on /afs/.root.cell (note the leading dot) to get to the RW and the "vos release root.cell" [Thu Feb 5 08:32:29 2015] if there's no ACLs set then setting ACLs can be hard... you need to be the afs "admin" user so that you have permission to do that [Thu Feb 5 08:32:51 2015] this user is the admin user i've defined while setting up the cell [Thu Feb 5 08:33:06 2015] and is in the system:administrators group? [Thu Feb 5 08:33:48 2015] yes [Thu Feb 5 08:34:49 2015] btw, my kerberos realm's name does not match the cell's name. do i have to do something on the client side? on the server side i have my realm's name in kdc.conf [Thu Feb 5 08:35:06 2015] there are multiple possible setups there [Thu Feb 5 08:35:15 2015] I gotta get to work, right now, but can discuss more later [Thu Feb 5 08:35:23 2015] thanks [Thu Feb 5 08:50:17 2015] hmmm. A couple people are complaining to us that their homedir is viewable from https://lost-contact.mit.edu/afs/umich.edu/user/... [Thu Feb 5 08:50:49 2015] Of course, I think homedirs by default only have 'system:anyuser l' [Thu Feb 5 08:51:00 2015] so it's not like there's a huge loss there [Thu Feb 5 08:51:10 2015] but I'm wondering if anyone here knows about lost-contact.mit.edu? [Thu Feb 5 08:51:54 2015] I thought it was generally considered bad for to export afs to the world [Thu Feb 5 08:52:02 2015] bad *form* [Thu Feb 5 08:53:45 2015] if i have a kerberos and afs ticket for the admin user i've specified during cell init, and pts does report it to be a member of system:administrators, then why does fs setacl root.cell rl tell me i don't have permission to do that? [Thu Feb 5 08:53:57 2015] and how can i check why fs is thinking i don't have premission? [Thu Feb 5 08:54:14 2015] billings: even metadata has some security implications [Thu Feb 5 08:54:17 2015] lost-contact is only one of many afs to http gateways in the world. but of course any machine with an actual afs client can read those directories so if umich.edu is concerned then it should no make user volumes accessible to system:anyuser. [Thu Feb 5 08:55:04 2015] secureendpoints: I understand that much. I don't run the cell, and my homedir is protected. I just seem to recall that we tried to avoid exporting /afs to the world on hosts that are indexed by google. [Thu Feb 5 08:56:14 2015] if I knew what the pts groups were called that included all the campus subnets, I'd probably encourage their use too. Sadly, I know of no such thing [Thu Feb 5 08:56:42 2015] we had them when engin.umich.edu existed [Thu Feb 5 08:56:46 2015] billings: security by obscurity is not security. [Thu Feb 5 08:56:55 2015] whatever. [Thu Feb 5 08:57:25 2015] So its not considered bad anymore? It isn't discouraged? [Thu Feb 5 08:57:52 2015] I haven't discouraged it for nearly a decade. I host a service that exports /afs to google. [Thu Feb 5 08:58:02 2015] Fun. [Thu Feb 5 08:58:08 2015] what I do encourage is for sites to fix their security [Thu Feb 5 08:58:17 2015] Well, good luck with that. [Thu Feb 5 08:58:22 2015] They won't listen to me. [Thu Feb 5 09:00:39 2015] the typical pattern is that someone complains that someone has stolen their stuff and they call either the local police or the FBI. Then I get a call and explain the situation and the campus security office gets involved and then folks begin to pay attention to ACLs in their afs cell. [Thu Feb 5 09:04:04 2015] secureendpoints, then probably you could also give me some tips why am i not able to setacl on root.cell which a user that belongs to system:administrators, because of permission deined :) [Thu Feb 5 09:10:42 2015] mhm, from the client i'm getting "pts: Permission denied ; unable to get membership of system:administrators (id: -204)" for "pts mem system:administrators", what might cause this? [Thu Feb 5 09:22:21 2015] phx - what does "tokens" tell you? [Thu Feb 5 09:22:22 2015] when i do pts mem system:administrators, there's communicaton between the client and the server: http://pastebin.com/0W8D4jkG however i see nothing in the server logs which might indicate why that name-to-id call returns -204 [Thu Feb 5 09:22:31 2015] Tokens held by the Cache Manager: [Thu Feb 5 09:22:31 2015] User's (AFS ID 2) tokens for afs@afs-master.veeva.poc [Expires Feb 6 13:58] --End of list-- [Thu Feb 5 09:23:41 2015] what happened to AFS ID 1 ? [Thu Feb 5 09:24:02 2015] i've created another user and put it into system:administrators just to try something [Thu Feb 5 09:24:08 2015] i can switch back to AFS ID 1, it's the very same [Thu Feb 5 09:25:00 2015] what the hack [Thu Feb 5 09:25:35 2015] http://pastebin.com/bes1L6DZ [Thu Feb 5 09:25:56 2015] when i didn't have afs tokens, pts told me the members of the grousp while compaining about not having tokens, but when i had tokens, it was unable to retrieve them [Thu Feb 5 09:26:18 2015] so, dots are not going to work in principal names unless you're running servers with a special option [Thu Feb 5 09:26:32 2015] i'm running them with that dots option [Thu Feb 5 09:26:38 2015] ok [Thu Feb 5 09:26:48 2015] that's why i've created an "afsadmin" principal, that's AFS ID 2, without dots [Thu Feb 5 09:27:33 2015] so, if you are unix root on the afs database server, you can run those pts commands with "-local" as an option and it will use the afs KeyFile to get you afs superuser perms [Thu Feb 5 09:28:01 2015] -localauth [Thu Feb 5 09:28:01 2015] Constructs a server ticket using a key from the local [Thu Feb 5 09:28:01 2015] /usr/afs/etc/KeyFile file. Do not combine this flag with the -cell [Thu Feb 5 09:28:01 2015] or -noauth options. For more details, see pts(1). [Thu Feb 5 09:28:16 2015] yes, -localauth works [Thu Feb 5 09:28:41 2015] i'm trying to see why it refuses me to fs setacl root.cell [Thu Feb 5 09:28:52 2015] i cannot run that on the servers, because they do not have client functionality [Thu Feb 5 09:28:58 2015] and on the client it doesn't seem to work [Thu Feb 5 09:30:12 2015] although fs doesn't have a -localauth option, this is one of those times when having a client on the server is useful [Thu Feb 5 09:30:33 2015] so, on the server, what's the output if you pts examine each of those users? [Thu Feb 5 09:30:52 2015] and od a pts membership for each of those users? [Thu Feb 5 09:31:11 2015] but me being in the administrators group, things should work, shouldn't they?~# pts examine gergely.czuczy.admin -localauth [Thu Feb 5 09:31:14 2015] Name: gergely.czuczy.admin, id: 1, owner: system:administrators, creator: system:administrators, membership: 1, flags: S----, group quota: unlimited. [Thu Feb 5 09:31:23 2015] well, clearly something is confused [Thu Feb 5 09:31:58 2015] http://pastebin.com/B3kNrXR7 [Thu Feb 5 09:32:35 2015] so, in your previous pastebin, the libprot errors make sense, as you have no tokens [Thu Feb 5 09:33:10 2015] but, the outputs don't list tokens after aklog [Thu Feb 5 09:33:18 2015] also, output of "klist -e" would be useful [Thu Feb 5 09:33:36 2015] that prints the help [Thu Feb 5 09:33:50 2015] oh klist [Thu Feb 5 09:34:07 2015] also, pts exa afsadmin -local [Thu Feb 5 09:34:11 2015] and bos listusers -local [Thu Feb 5 09:35:37 2015] anything else? [Thu Feb 5 09:36:08 2015] we'll see :) [Thu Feb 5 09:36:12 2015] http://pastebin.com/gF093sv5 [Thu Feb 5 09:37:47 2015] so, afsadmin will need to be in the UserList on both servers [Thu Feb 5 09:37:59 2015] http://docs.openafs.org/Reference/5/UserList.html [Thu Feb 5 09:38:16 2015] i have a single dbserver, and 2 fileservers currently [Thu Feb 5 09:39:00 2015] so, the UserList needs to be correct on all 3 [Thu Feb 5 09:39:16 2015] that affects bos and vos commands [Thu Feb 5 09:39:20 2015] shall i just copy over, or issue bos adduser for all 3? [Thu Feb 5 09:39:27 2015] bos adduser [Thu Feb 5 09:40:35 2015] ok, both users are in both listuser output now [Thu Feb 5 09:41:02 2015] you mean all 3? [Thu Feb 5 09:41:28 2015] yup [Thu Feb 5 09:43:46 2015] so, what do you have in your, uh, might be /etc/openafs/server/krb.conf file? [Thu Feb 5 09:44:00 2015] different path if using /usr/afs/... [Thu Feb 5 09:44:30 2015] would be /usr/afs/etc/krb.conf [Thu Feb 5 09:44:31 2015] oh damn [Thu Feb 5 09:44:40 2015] http://docs.openafs.org/Reference/5/krb.conf.html [Thu Feb 5 09:45:12 2015] yes, i completely missed that file. yesterday evening i've started over, and somehow i didn't check whether i preserved that file [Thu Feb 5 09:45:19 2015] will put the realm name inside [Thu Feb 5 09:45:39 2015] don't know if that takes effect immediately or if you need to restart the server processes [Thu Feb 5 09:45:49 2015] i will restart then [Thu Feb 5 09:46:04 2015] should also restart the bosserver? [Thu Feb 5 09:46:20 2015] shouldn't hurt [Thu Feb 5 09:46:50 2015] bos restart `hostname` -bosserver -all -localauth [Thu Feb 5 09:46:57 2015] should do it all, according to the manual [Thu Feb 5 09:47:02 2015] yep [Thu Feb 5 09:47:41 2015] tokens still doesn't list tokens [Thu Feb 5 09:47:50 2015] did unlog;aklog;tokens [Thu Feb 5 09:47:55 2015] tokens not listing tokens means you don't have any tokens [Thu Feb 5 09:48:07 2015] try starting clean (no kerb tickets either) [Thu Feb 5 09:48:46 2015] so... kdestroy, kinit, aklog, klist, tokens [Thu Feb 5 09:48:49 2015] restarting server processes again, or just on the client side? [Thu Feb 5 09:48:52 2015] right, client side [Thu Feb 5 09:49:08 2015] make sure you put that krb.conf on all 3 servers [Thu Feb 5 09:49:20 2015] it's there [Thu Feb 5 09:49:22 2015] and restart stuff on all 3 [Thu Feb 5 09:49:31 2015] that's done for all 3 with the above command [Thu Feb 5 09:50:05 2015] do i somehow have to tell the client as well that the realm name is not matching the cell name? [Thu Feb 5 09:50:22 2015] I don't know if you *have* to [Thu Feb 5 09:50:29 2015] the client should know the cell name from the "ThisCell" file [Thu Feb 5 09:50:47 2015] on aklog, you can certainly manually specify the kerberls realm and the cell names as options [Thu Feb 5 09:50:53 2015] if it's not automagically figuring things out, try that [Thu Feb 5 09:50:57 2015] there's a debug option as well [Thu Feb 5 09:51:06 2015] and, btw, aklog cares about the order of the options [Thu Feb 5 09:52:55 2015] http://pastebin.com/EDb7KyhW [Thu Feb 5 09:54:39 2015] so you appear to have tokens in the right cell [Thu Feb 5 09:54:54 2015] but those tokens are not listed in tokens... [Thu Feb 5 09:57:22 2015] User's (AFS ID 1) tokens for afs@afs-master.veeva.poc [Expires Feb 6 14:49] [Thu Feb 5 09:57:28 2015] they are listed [Thu Feb 5 09:57:53 2015] but that part was there already [Thu Feb 5 09:58:13 2015] oh well anyway, pts mem syste:administrators still doesn't work, and i cannot set acls on root.cell :) [Thu Feb 5 10:00:16 2015] so, your krb.conf has: VEEVA.POC in it? [Thu Feb 5 10:00:37 2015] yup, with a trailing newline [Thu Feb 5 10:01:01 2015] okey [Thu Feb 5 10:01:07 2015] i will have a smoke and do less typos. [Thu Feb 5 10:01:24 2015] ok... [Thu Feb 5 10:01:58 2015] man [Thu Feb 5 10:02:07 2015] setacl magically works now [Thu Feb 5 10:02:15 2015] RedFyre, thank you very much for your patience [Thu Feb 5 10:02:23 2015] sure a thing [Thu Feb 5 10:02:53 2015] in the past two weeks i'm working like 10-11 hours a day, usually including weekends, and i'm getting tired, making too much mistakes [Thu Feb 5 10:03:51 2015] yeah... get some rest :) [Thu Feb 5 10:08:51 2015] in 52 minutes i think i will do a world record breaking sprint to the pub [Thu Feb 5 10:15:56 2015] another question, i have root.cell on both fileservers, but when i'm trying to replicate, it says: [Thu Feb 5 10:16:03 2015] # vos release root.cell -localauth [Thu Feb 5 10:16:03 2015] Could not lock the VLDB entry for the volume 536870915. [Thu Feb 5 10:16:03 2015] VLDB: vldb entry is already locked [Thu Feb 5 10:16:15 2015] vos unlock ID [Thu Feb 5 10:16:18 2015] and check again [Thu Feb 5 10:16:19 2015] once i've issued this command and accidentally interrupted it [Thu Feb 5 10:16:21 2015] thanks [Thu Feb 5 10:16:46 2015] sucessful \o/ [Thu Feb 5 11:41:41 2015] to use the AFS perl API, at most of the synopsis part (like AFS::PTS, AFS::FS AFS::VOS), i don't see any authentication stuff. to use the perl API, first i have to kinit+aklog, or can that be done from within the API as well? [Thu Feb 5 11:42:53 2015] the kinit part at least can be done with Authen::Krb5. probably have to shell out to aklog for that part. [Thu Feb 5 11:43:41 2015] you... do not want to know about what AFS wants to use natively (and would therefore be provided by such an API); it's extremely obsolete and known insecure/broken [Thu Feb 5 11:44:35 2015] relics of an ancient and wise civilisation who's knownledge barely has a footprint in our mundane world? :) [Thu Feb 5 11:45:10 2015] https://lists.openafs.org/pipermail/openafs-devel/2000-December/005320.html [Thu Feb 5 11:45:40 2015] a dump taken with vos dump can be restored into another cell, right? [Thu Feb 5 11:48:00 2015] I would expect so [Thu Feb 5 11:49:11 2015] secureendpoints, I think more to the point there is kas, not fcrypt [Thu Feb 5 11:49:26 2015] not that fcrypt isn't also horribly obsolete and effectively plaintext these days... [Thu Feb 5 11:49:55 2015] i see that the API requires me to specify the partition for restore. if i have clones of the volume at multiple servers/partitions, after the restore i have to run a release to replicate the freshly dumped, right? [Thu Feb 5 11:50:30 2015] um? restore will create a new volume [Thu Feb 5 11:50:46 2015] if a volume exists already at the destination with same name? [Thu Feb 5 11:52:26 2015] OVERWRITE must be specified when overwriting an existing volume. [Thu Feb 5 11:52:28 2015] from the API [Thu Feb 5 11:52:34 2015] so it can just overwrite it, i assume [Thu Feb 5 11:53:21 2015] and in that case you'd specify the partition with the r/w, and yes you would need to release afterward [Thu Feb 5 11:53:48 2015] starts to make sense, thanks [Thu Feb 5 11:54:08 2015] thanks for today's help guys, i'm off to get a break :) [Fri Feb 6 09:14:15 2015] if i fill the CellServDB on my clients without the dbserver IPs, just with the cellnames, then those cells should be dynamically visible at /afs/$cellname and the db servers will be looked up using SRV records? [Fri Feb 6 09:18:02 2015] in theory: yes [Fri Feb 6 09:18:18 2015] according to the manual, also yes [Fri Feb 6 10:15:46 2015] methinks i found a bug in the rpm packaging [Fri Feb 6 10:16:09 2015] when specifying -mountdir to afsd it'll be different than /afs, but the init script at stop will nevertheless statically umont /afs [Fri Feb 6 10:16:29 2015] there should be a mountdir option in the sysconfig/openafs file, and that should be used for this [Fri Feb 6 11:25:01 2015] what stuff do i need on centos to get AFS::VOS and suchlike thingies? it seems they are not packaged, and cpan build needs some "libafsrpc and/or libafsauthent" stuff for what yum said openafs-authlibs-devel.x86_64, i installed that, but it's still not enough [Fri Feb 6 11:34:12 2015] why would you want to mount other than /afs ? :) [Fri Feb 6 11:35:27 2015] basically the goal would be to have a cell for each of our sites under a static path, and another cell for holding the master copies of the stuff (sources, apps, etc), then replicate the given volumes to the cells so applications can be ran [Fri Feb 6 11:35:56 2015] each site's cell has a different name, but the stuff has to be accessed over the same path at all locations [Fri Feb 6 11:36:18 2015] The standard convention is /afs/cellname/..... [Fri Feb 6 11:37:09 2015] secureendpoints, i know, however we do not want to participate in the global namespace, private data all over the place [Fri Feb 6 11:37:25 2015] but thisperl module, huh [Fri Feb 6 11:37:34 2015] it doesn't even recognize openafs1.6 [Fri Feb 6 11:37:41 2015] where you choose to mount your volumes has nothing to do with the global visibility of your data [Fri Feb 6 11:38:00 2015] 1.5 and devel are the tops, nothing about 1.6 or 1.7 in the source [Fri Feb 6 11:38:50 2015] there are two perl modules. one executes "afs commands" and parse the output. the other requires linking to the private internals of openafs which are not stable across releases. [Fri Feb 6 11:39:06 2015] this seems to be linking stuff [Fri Feb 6 11:39:15 2015] then the interface is not stable [Fri Feb 6 11:39:32 2015] <[gorgo]> yes, the AFS perl module is not compatible with 1.6. you should take a look at AFS::Command [Fri Feb 6 11:40:16 2015] phx: just to be clear, you can only have one instance of the cache manager running at once on a given client host, and that cache manager will only mount AFS at one path. [Fri Feb 6 11:40:49 2015] [gorgo], thanks, splendid [Fri Feb 6 11:40:58 2015] kaduk_, yes, i'm aware of that [Fri Feb 6 11:41:23 2015] just the init script is buggy, it doesn't respect the actual mountpoint. umount /afs is hardcoded into the init script [Fri Feb 6 11:41:24 2015] phx: okay, good; I just wanted to check [Fri Feb 6 11:41:41 2015] It could probably umount -t AFS instead [Fri Feb 6 11:42:21 2015] yup [Fri Feb 6 11:42:26 2015] so let's see AFS::Command [Fri Feb 6 11:52:53 2015] phx, this is one reason why the RPMs are not being updated, someone who actually knows what they are doing needs to take over their maintenance (as, for example, Debian has an openafs maintainer) [Fri Feb 6 11:54:10 2015] "you make it sound like I know what I'm doing" [Fri Feb 6 11:56:02 2015] they're at least more likely to know how to chamfer it into their distribution... [Fri Feb 6 11:56:09 2015] and let me guess, there's no python module for the afs management utilities, right? :) [Fri Feb 6 11:56:21 2015] (this does not of course guarantee they won't cut off something essential in the process :) [Fri Feb 6 11:56:54 2015] there is no python module, correct [Fri Feb 6 11:57:09 2015] "there is, however, subprocess" [Fri Feb 6 11:57:18 2015] and IPC [Fri Feb 6 11:59:10 2015] There's a python module that does acl parsing at least (probably more bit I was only looking at it for an acl crawler.) Not sure where it is from off the top of my head, I can look later today [Fri Feb 6 11:59:55 2015] don't stress yourself, alone that won't help me. probably i'm better off with AFS::Command [Fri Feb 6 12:16:03 2015] what would be the command to verify whether a volume exists on a cell? reading the vos commands, but what i find, i have to specify the vldb server directly, and i don't really care which one is being queried [Fri Feb 6 12:16:39 2015] There are separate listvldb and listvol commands that query (mostly) only the vldb and volserver; vos examine combines the two [Fri Feb 6 12:17:16 2015] excellent, thanks [Fri Feb 6 12:17:47 2015] listvldb is particularly useful because it does not hang during a fileserver outage the way that examine does :) [Fri Feb 6 12:26:12 2015] gladly i haven't been to that situation [Fri Feb 6 12:26:25 2015] when creating a fresh volume, to mkmount it, do i have to precreate the directory for it? [Fri Feb 6 12:26:39 2015] no [Fri Feb 6 12:27:00 2015] thanks [Fri Feb 6 12:27:22 2015] the mountpoint is actually a sort of magic symlink to the root directory of the volume, which is created when you create the volume [Fri Feb 6 12:27:53 2015] mhm [Fri Feb 6 12:27:57 2015] i kinda lost you there [Fri Feb 6 12:28:14 2015] in vos_create(1) i do not see any options for the path [Fri Feb 6 12:28:22 2015] when you "fs mkm", it creates a reference to the volume's root directory [Fri Feb 6 12:29:21 2015] i got that, you just said it gets created when i create the volume itself, so i assumed that it happens at vos create time [Fri Feb 6 12:29:51 2015] ... [Fri Feb 6 12:30:10 2015] let's say you have a USB drive. you understand that that drive contains a root directory of its own? [Fri Feb 6 12:30:22 2015] it's not created out of thin air when you plug it into your computer? [Fri Feb 6 12:30:42 2015] similarly every AFS volume has its own root directory; paths *ithin that volume* are all relative to it [Fri Feb 6 12:31:08 2015] but to be *visible* you have to create a mountpoint, which points to that volume root directory [Fri Feb 6 12:32:07 2015] yes, actually that's what i was expecting [Fri Feb 6 12:32:43 2015] geekosaur, I think phx is just pointing out that you had a typo in your earlier message [Fri Feb 6 12:32:47 2015] on unix, you must mount onto an existing directory. on AFS (and ZFS, for that matter) mountpoints are internal links, and do not need directories [Fri Feb 6 12:32:54 2015] to mount on [Fri Feb 6 12:33:41 2015] sorry, i'm still fairly unfamiliar with things, and i just try to pay attentions to details [Fri Feb 6 12:34:07 2015] (and in fact a mountpoint is just a symlink with some special properties. one version of afs for solaris had a bug that made this visible, insofar as you could see it with "ls -ld" *and* create one by hand with "ln -s") [Fri Feb 6 12:34:32 2015] but, but... salaris is perfect! [Fri Feb 6 12:34:39 2015] solaris, whatever :) [Fri Feb 6 12:34:47 2015] solaris may be, openafs kernel module for it not necessarily :p [Fri Feb 6 12:34:59 2015] this was leaking internal afs details to solaris's vfs layer [Fri Feb 6 12:35:17 2015] it could have been worse, i'm sure [Fri Feb 6 12:37:34 2015] phx - trying to move to someplace other than /afs may cause you pain later [Fri Feb 6 12:37:45 2015] and that has no bearing, anyway, on "participating" in the global namespace [Fri Feb 6 12:38:09 2015] you can always symlink to /afs/cellname from whereever in the file system to make the cells appear there [Fri Feb 6 12:38:51 2015] what kind of troubles? [Fri Feb 6 12:38:52 2015] if you don't want other cells to be easily accessible, you can not use -dynroot and instead use a static root.afs volume where you manually mount the root.cell volumes and you also manually put the cells in the client CellServDB [Fri Feb 6 12:39:27 2015] as i have read, without dynroot the clients might hang if something is wrong with the connectivity of the afs cells [Fri Feb 6 12:39:34 2015] yes [Fri Feb 6 12:39:38 2015] (frankly, if that happens, it's already bad enough) [Fri Feb 6 12:39:40 2015] even with dynroot, they'll hang [Fri Feb 6 12:40:19 2015] if the afs servers are not accessible when the afs kernel module loads and dynroot is disabled, the system will panic [Fri Feb 6 12:40:25 2015] with or without dynroot, you'll see the two times 50 sec timeout per server [Fri Feb 6 12:41:00 2015] hunh... never experienced a panic with that scenario in the old days [Fri Feb 6 12:41:07 2015] someone musta "fixed" something [Fri Feb 6 12:41:19 2015] but, since you are using unix clients, how often do you actiually reboot a system? [Fri Feb 6 12:41:41 2015] and if you have a cell completely down, then you have bigger problems [Fri Feb 6 12:43:15 2015] um, mobile client in a hotel room [Fri Feb 6 12:43:41 2015] with what they're doing, I'd be a bit surprised if that's a use case [Fri Feb 6 12:43:46 2015] anyway, just don't start afs at boot [Fri Feb 6 12:43:49 2015] * remembers hacking the afs startup script in cmu ece to deal with laptops [Fri Feb 6 12:43:52 2015] or on those mobile clients, use dynroot [Fri Feb 6 12:44:26 2015] the primary reason that dynroot was created was to prevent the panics because in order to mount the contents of the directory must be available. [Fri Feb 6 12:44:36 2015] with dynroot, you can't stop people trying to access other cells, anyway [Fri Feb 6 12:44:59 2015] as far as people going to your cells, if you don't publish to the global cellservdb and dont' have stuff in dns, then people will find it very hard to get to yoru cell [Fri Feb 6 12:45:10 2015] and via ACLs you can prohibit "outside" people from getting to your cells [Fri Feb 6 12:45:46 2015] mounting afs locally someplace other than /afs changes absollutely none of that [Fri Feb 6 12:47:25 2015] well, actually I take that back, since the man page has this cryptic statement: If a value other [Fri Feb 6 12:47:25 2015] than the /afs directory is used, the machine cannot access the [Fri Feb 6 12:47:25 2015] filespace of cells that do use that value. [Fri Feb 6 12:47:34 2015] which makes no sense as that isn't a server side config [Fri Feb 6 12:48:10 2015] I suppose you could maybe look at the -rootvol option [Fri Feb 6 12:48:35 2015] RedFyre, I think that just means that if someone hands you a path /afs/ourcell/... you have to modify it to use your local afsroot? [Fri Feb 6 12:48:58 2015] dunno [Fri Feb 6 12:49:25 2015] which is unfortunate if it's a symlink or you're e.g. browsing html files in their afs space and they didn't use relative links [Fri Feb 6 12:49:36 2015] What the man page intends to say is that if you mount at /myorg/.... then file paths /afs/... will not resolve. It isn't saying anything about accessing cells or volumes [Fri Feb 6 12:49:59 2015] yes, that [Fri Feb 6 12:50:23 2015] which makes sense [Fri Feb 6 13:53:58 2015] if i'm deleting a volume which has 2 RO copies, then vos remove only removes the RW one, but leaves 2 ROs, how can I remove those as well? [Fri Feb 6 13:54:07 2015] specifying the id results in VLDB: Volume '536870935' matches more than one RO [Fri Feb 6 13:55:46 2015] after remsite it's LOCKED for dlete/misc operation. how this stuff works? :) [Fri Feb 6 13:59:34 2015] remsite probably isn't the operation you wanted [Fri Feb 6 14:00:45 2015] http://docs.openafs.org/Reference/1/vos_remove.html [Fri Feb 6 14:01:03 2015] "This command is the most appropriate one for removing volumes in almost all cases." [Fri Feb 6 14:01:17 2015] "this command can remove a volume that does not have a VLDB entry, as long as the volume is online, -server and -partition arguments are provided, and the -id argument specifies the volume's ID number." [Fri Feb 6 14:01:55 2015] so try also providing server and partition arguments to vos remove [Fri Feb 6 14:48:38 2015] FTR https://github.com/ebroder/pyafs is the package I mentioned earlier [Fri Feb 6 14:49:27 2015] Seems to also have something about fs and pts in it, though I suppose it could be stubs. [Fri Feb 6 15:04:54 2015] yeah, looks like I'm using afs.fs.lsmount and afs.acl.showRights mostly [Fri Feb 6 16:02:25 2015] cclausen, thanks [Fri Feb 6 16:02:54 2015] cclausen, i've started with vos remove, but that only removed the rw, will check again [Fri Feb 6 16:03:56 2015] eichin, last commit in 2011 :) [Fri Feb 6 16:04:17 2015] Yeah, Evan graduated and his job doesn't use AFS [Fri Feb 6 16:05:18 2015] there's a jdreed fork, too; I just happen to be using a locally-built copy which is probably from 2011 :-) [Fri Feb 6 16:05:43 2015] The jdreed fork is probably more current/likely to see updates. [Fri Feb 6 16:05:47 2015] btw, is there a way to automaticly acquire afs tokens upon login for multiple cells? [Fri Feb 6 16:06:02 2015] (and none of the bits *I* care about have changed since... long before 2011) [Fri Feb 6 16:06:10 2015] aklog respects ~/.xlog which can list multiple cells [Fri Feb 6 16:06:46 2015] I don't remember with pam_afs_session with whatever option to use an external aklog does, though -- it may do aklog by path to the user's home directory. [Fri Feb 6 16:07:22 2015] my problem iwth xlog is, depending on where they log in, they might require tokens to different cells [Fri Feb 6 16:13:25 2015] cclausen, thanks. i had to do vos unlockvldb, and then vos remove it again [Fri Feb 6 16:14:10 2015] I don't think there's a canned solution for your problem. You might be able to get away with 'aklog cell1 cell2' in a system dotfile. [Fri Feb 6 16:14:31 2015] probably [Fri Feb 6 16:14:58 2015] uhm [Fri Feb 6 16:15:01 2015] afs_cells=cell[,cell...] [Fri Feb 6 16:15:04 2015] pam_afs_session [Fri Feb 6 16:18:32 2015] Huh, I don't remember that being there last time I looked. [Fri Feb 6 16:19:12 2015] must have been log time ago, this is a centos6, resurrected from the dinosaur park :) [Fri Feb 6 16:19:33 2015] A long time ago, or my memory is growing holes ;) [Fri Feb 6 16:20:51 2015] As of Heimdal 0.7, the default behavior is to contact the krb524 service to translate Kerberos v5 tickets into Kerberos v4 tickets to use as tokens. [Fri Feb 6 16:20:55 2015] that seems like a mess [Fri Feb 6 16:21:10 2015] Heimdal 0.7 is, uh, kind of old. [Fri Feb 6 16:21:37 2015] i never touched heimdal in my life, so i have no idea [Fri Feb 6 16:21:55 2015] Never ran kinit from the base system on a freebsd machine? [Fri Feb 6 16:22:10 2015] nope [Fri Feb 6 16:22:13 2015] that's heimdal? [Fri Feb 6 16:22:18 2015] yup [Fri Feb 6 16:36:54 2015] when removing a volume, do i have to manually remove all clones(or whatever is the proper term) manually, or is there some command that purges it completely? [Fri Feb 6 16:38:55 2015] phx; what you are asking for is a command to remove a volume group. that doesn't exist [Fri Feb 6 16:42:08 2015] i see thanks [Fri Feb 6 16:45:37 2015] a volume group consists of a RW volume, zero or more RO volumes, zero or one BK volumes, zero or more temporary clone volumes. [Fri Feb 6 16:46:14 2015] it is a very common use case for a RW to be removed after the final contents are published as a RO [Fri Feb 6 17:34:40 2015] and is it possible to restore a dumped volume without having a RW? [Fri Feb 6 17:35:04 2015] I think I've seen talk of doing so go by [Fri Feb 6 17:35:32 2015] -readonly seems to do it [Fri Feb 6 17:36:00 2015] mhm [Fri Feb 6 17:36:54 2015] will play with that tomorrow, it's nearly midnight here. i aslo have to make a script for adding users from ldap to pts to keep the UIDs in sync [Mon Feb 9 07:49:25 2015] how surprising, if you give a cell a wrong name AND do a typo to its service principal name, then it will not work as expected [Mon Feb 9 07:53:20 2015] lol [Mon Feb 9 07:53:26 2015] 100% correct [Mon Feb 9 08:22:50 2015] can there be a group and a user with the same name? (prefixless group) [Mon Feb 9 08:28:41 2015] yes [Mon Feb 9 08:29:07 2015] and how do i query groups then, by name? [Mon Feb 9 08:29:44 2015] pts listentries [Mon Feb 9 08:30:45 2015] thanks :) [Mon Feb 9 08:53:17 2015] AFAIK people tend to sync unix userIDs with AFS UserIDs, but what's the case with groups? [Mon Feb 9 08:53:24 2015] afs groups are having negative user IDs [Mon Feb 9 08:53:32 2015] err [Mon Feb 9 08:53:35 2015] negative group IDs [Mon Feb 9 08:54:22 2015] afs groups don't really correspond to anything else, so not a whole lot of point in syncing with anything else [Mon Feb 9 11:01:29 2015] can i somehow list all volumes in a cell? [Mon Feb 9 11:01:38 2015] vos listvldb [Mon Feb 9 11:03:17 2015] thx [Mon Feb 9 13:11:28 2015] red orange yellow green blue indigo violet [Mon Feb 9 13:12:04 2015] hello B) [Mon Feb 9 13:14:51 2015] there must be a pot of gold around here somewheres... [Mon Feb 9 13:59:01 2015] billings: Question about openafs on my fedora 21. [Mon Feb 9 13:59:38 2015] All has been working great, but now, just today, openafs-client.service doesn't start. [Mon Feb 9 14:00:12 2015] What does systemctl status say? [Mon Feb 9 14:02:58 2015] kaduk_: http://dpaste.com/0J815AR [Mon Feb 9 14:03:20 2015] modprobe: FATAL: Module openafs not found [Mon Feb 9 14:03:25 2015] Just like it says on the tin... [Mon Feb 9 14:05:23 2015] kaduk_: I did see that, but why wasn't it loaded? [Mon Feb 9 14:05:48 2015] The journal and/or syslog may have more to say about that; I'm not sure. [Mon Feb 9 14:06:05 2015] I guess systemctl also has a verbose mode, but I'm not very familiar with the tool yet. [Mon Feb 9 14:06:06 2015] kaduk_: Nope, I checked and it doesn't say much more. [Mon Feb 9 14:06:48 2015] Is there a module in /lib/..../dkms/ for your running kernel? [Mon Feb 9 14:06:57 2015] lemme try a reboot on the vm and make sure nothing else is screwy. [Mon Feb 9 14:07:03 2015] kaduk_: I'll check. [Mon Feb 9 14:07:19 2015] You could also try the modprobe command by hand to see if that gives an error message [Mon Feb 9 14:09:21 2015] what's the kernel version, now, on your f21? [Mon Feb 9 14:12:20 2015] kaduk_: I rebooted to the previous kernel and all is as should be with openafs. [Mon Feb 9 14:12:39 2015] Sounds like the new kernel is too new for the openafs you have, then. [Mon Feb 9 14:12:58 2015] "You could test the tree which will become 1.6.11pre2 but hasn't been released yet" [Mon Feb 9 14:13:04 2015] kaduk_: I was just gonna say the same. [Mon Feb 9 16:09:46 2015] * suspects that's the case [Mon Feb 9 16:09:53 2015] are you using the kmod-openafs or the dkms-openafs package? [Mon Feb 9 16:10:07 2015] perhaps the new kernel has broken openafs again [Mon Feb 9 16:20:33 2015] * wonders if openafs-stable-1_6_11pre2 will be tagged [Mon Feb 9 16:29:14 2015] * kicks off a new COPR run with the latest patches in openafs-stable-1_6_x [Mon Feb 9 16:36:42 2015] I'm trying to remember what we were waiting for before cutting a pre2. [Mon Feb 9 16:40:44 2015] one of these days I'll really figure out tags versus branches [Mon Feb 9 16:42:00 2015] a tag is just a label for a sha1 [Mon Feb 9 16:53:56 2015] for a specific commit, then? [Mon Feb 9 16:54:32 2015] yes [Mon Feb 9 16:55:07 2015] 1 to 1 mapping? Or can it be a many (commit) to 1 (tag) mapping? [Mon Feb 9 16:55:43 2015] think of a branch as a label that is moved with each commit to the latest sha1 [Mon Feb 9 16:55:59 2015] tags are mapped to commits, not the other way [Mon Feb 9 16:56:16 2015] ok [Mon Feb 9 17:01:19 2015] RedFyre, a branch is a series of commits, while a tag marks explictly a single a commit [Mon Feb 9 17:01:51 2015] RedFyre, imagine a tree, it have lots of branches, just like a repo. toss your knife into the tree. you've marked a point on a branch, that's a tag :) [Mon Feb 9 17:45:27 2015] * wonders if this kind of confusion comes from having worked with svn before [Mon Feb 9 17:49:10 2015] <[gorgo]> geekosaur: you mean this? http://devopsreactions.tumblr.com/post/50800607139/how-an-svn-user-sees-git-workflow [Mon Feb 9 17:50:04 2015] more thinking of the fact that what svn calls tags are rather wtf [Mon Feb 9 17:50:11 2015] (it's essentially a repocopy) [Mon Feb 9 17:51:26 2015] whereas git tags resemble what pretty much every other open-source source control system calls tags [Mon Feb 9 17:51:52 2015] (I'm mostly qualifying that to exclude things like clearcase...) [Mon Feb 9 18:01:11 2015] we've actually pondered using pegged externals as tags, instead of copies [Mon Feb 9 18:04:01 2015] (since it's not like svn actually *helps* with the convention, and it fixes a couple of other workflow problems) [Tue Feb 10 03:59:37 2015] a bit unrelated but you guys might be able to give me a hint. with an MIT kerberos realm, i'm always ending up having my "renew until" timestamp for my tickets to have the exact time when acquiring the ticket. basically the renewal time is 0, and i have no idea why i'm ending up like this [Tue Feb 10 04:00:09 2015] in kdc.conf the realm is set to max_renewable_life = 7d, in krb5.conf renew_lifetime = 7d [Tue Feb 10 04:00:30 2015] and even if i init with kinit -r 4d $princ or something, i still get it wrong [Tue Feb 10 04:00:44 2015] Valid starting Expires Service principal [Tue Feb 10 04:00:44 2015] 02/10/15 08:58:29 02/11/15 08:58:29 krbtgt/VEEVA.POC@VEEVA.POC renew until 02/10/15 08:58:29 [Tue Feb 10 04:01:00 2015] something like this. so if anyone has any idea why does this happen, i'm eager to hear it :) [Tue Feb 10 04:18:33 2015] phx: that's 24 hours [Tue Feb 10 04:18:52 2015] phx: note that the renew times is the shorter of the one you ask and the one configured for the KDC. [Tue Feb 10 04:18:53 2015] actually no [Tue Feb 10 04:19:03 2015] 02/10/15 08:58:29 start [Tue Feb 10 04:19:09 2015] 02/10/15 08:58:29 renew [Tue Feb 10 04:19:13 2015] 02/11/15 08:58:2 expires [Tue Feb 10 04:19:42 2015] validity es 1d, renew time is 0 [Tue Feb 10 04:19:43 2015] ahhh you were asking about the renewal time, not the can-renew time [Tue Feb 10 04:20:00 2015] uhm, renew-until that is [Tue Feb 10 04:20:12 2015] that's indeed odd [Tue Feb 10 04:21:23 2015] phx: but note that '-r' and '-l' are different [Tue Feb 10 04:21:29 2015] phx: also check the KDC config [Tue Feb 10 04:22:15 2015] -r is the renewal period -l is the ticket lifetime [Tue Feb 10 04:22:38 2015] phx: it may be that "renew until" value means you cannot renew it because its lifetime is the same as the renewal time. [Tue Feb 10 04:23:12 2015] i don't think so. everywhere else it has a proper timestamp in the future, also on the google examples i could find [Tue Feb 10 04:23:43 2015] i guess if you just issued a klist somewhere at your place, you'd have it properly [Tue Feb 10 04:24:21 2015] phx: try this: 'kinit -r 6h -l 4h' and viceversa [Tue Feb 10 04:26:05 2015] phx: BTW this is more a question for #Kerberos [Tue Feb 10 04:27:13 2015] i've also asked it there [Tue Feb 10 04:27:55 2015] but population seems to be either vanished or otherwise not available, and since i suspect many afs admins are also maintaining a kdc, i would just ask it here as well :) [Tue Feb 10 04:28:32 2015] -r 6h -l 4h set the validity to 6h, but had no impact on the 0h renew lifetime [Tue Feb 10 04:48:55 2015] i even set this with kdb5_ldap_util for the whole realm, still nada [Tue Feb 10 05:03:34 2015] phx: have you trie '-r 4h -l 6h' [Tue Feb 10 05:16:25 2015] renew until hasn't changed [Tue Feb 10 05:16:48 2015] so far i've added a ticket policy with krb5_ldap_util and specified that policy for this principal with modpol -x, but no effect either [Tue Feb 10 05:32:23 2015] well, droppad a mail to the kerberos@mit mailing list, now back to afs [Tue Feb 10 05:52:14 2015] when afs is used for rw stuff, like homes or such, how replication is taken care of? as far as i know a vos release is required for a volume to replicate data to its RO clones [Tue Feb 10 06:42:17 2015] phx: when you replicate an RO rom a RW volume then access will be to the RO copy. You don't replicate RO volumes that you need to RW on a regular basis (eg like a $HOME). A typical use for RO replication is for resources that are needed for mostly RO access (eg: local commands for a site). The process of updating the RW, syncing the RO copies is more suited to "reasonably static" type resources than more volatile resources [Tue Feb 10 06:42:17 2015] (eg a $HOME). [Tue Feb 10 06:43:07 2015] oopsie: s/rom a RW/from a RW/ [Tue Feb 10 07:01:06 2015] i see, thanks [Tue Feb 10 07:18:03 2015] phx: more generally AFS was initially designed for "dataless" workstations, with RW volumes holding user home directories and RO volumes holding sw package collections, e.g. '/usr' or '/usr/x11' [Tue Feb 10 07:18:13 2015] i expected some replication to RO clones, because i expected people usually have ROs for such things as well for just in case. like of the fileserver hosting the RW goes down for any time/reason, they can just promote an RO to RW [Tue Feb 10 07:18:32 2015] phx: and 'vos release' meant literally that: release a new version of a software collection to all mirror servers. [Tue Feb 10 07:18:51 2015] i see [Tue Feb 10 07:18:54 2015] phx: "promote an RO to RW" is a somewhat messy/risky operation [Tue Feb 10 07:19:06 2015] phx: the typical AFS deployment scenario was... [Tue Feb 10 07:19:09 2015] especially when the down RW comes back up? [Tue Feb 10 07:19:30 2015] phx: if you promote RO to RW the previous RW must disappear... [Tue Feb 10 07:19:50 2015] i see [Tue Feb 10 07:19:55 2015] phx: or else you can use Auristor (commercial version) that has multiple RW,. [Tue Feb 10 07:20:13 2015] so in those cases, it's like moving the RW to another server, then bringing the one used to hold it down for maint/whatever? [Tue Feb 10 07:20:53 2015] phx: the tyical setup was that there would be a server with say 10-20 RW user volumes, and 2-3 RO sw distribution volumes, serving around 10-20 user workstations. [Tue Feb 10 07:21:28 2015] phx: where each of the 10-20 user workstations "mounted" the Ro volumes as '/usr', '/usr/local' etc. [Tue Feb 10 07:21:43 2015] phx: and each RW volume would be used only on a single workstation. [Tue Feb 10 07:21:54 2015] well, with packaged distributions and config management that setup is pretty much obsolate now [Tue Feb 10 07:22:17 2015] phx: so for example caching and callbacks, which makes using a RW volume from a single machine very efficient. [Tue Feb 10 07:22:38 2015] phx: yes, and no: AFS effectively implemented "Docker" style packaging. [Tue Feb 10 07:23:05 2015] phx: each RO volume is effectively a "depot" style repository, or a kind of "Docker" overlay image. [Tue Feb 10 07:23:09 2015] phx: the really neat thing about AFS is the ability to move a RW volume from one fileserver to another and the user does not notice. So you could have fileservers geographically spread out (eg local offices) and move someone's $HOME volume to the fileserver close to their location to improve response and reduce WAN traffic. [Tue Feb 10 07:23:50 2015] treegazer: same for RO replicas of the software collections. [Tue Feb 10 07:23:55 2015] yes [Tue Feb 10 07:24:03 2015] RO copies at each local office [Tue Feb 10 07:24:36 2015] the client picks the "closest" accessable fileserver automatically by some metrics? [Tue Feb 10 07:24:55 2015] phx: no. It is up to your to arrange things suitably. [Tue Feb 10 07:26:01 2015] phx: but the idea is that all RO volumes are replicated on all fileservers so they are "near" to all workstations, and you choose where to put each user's RW volume so it is on the server nearest to their workstation. [Tue Feb 10 07:26:29 2015] phx: actually in part yes for RO replicas [Tue Feb 10 07:27:09 2015] phx: and as pointed out if the user moved from one location to another it was easy to move their home dir to a new server. [Tue Feb 10 07:27:46 2015] phx: and for temporary moves since AFS is a global filesystem the RW volume is accessible from anywhere anyhow. [Tue Feb 10 07:28:57 2015] phx: another way you can look at it is that the AFS client is really an automounter daemon, and the DB servers are the mount map servers. [Tue Feb 10 07:29:40 2015] phx: it is possible but somewhat laborious to replicate most of AFS with NFS+'amd'+LDAP mount map server. [Tue Feb 10 07:30:13 2015] this is also great for systems management of fileservers because it gives you the capability of building a brand new fileserver with everything up to date/new hardware and then simply "vos move-ing" the volumes from an old/obsolete fileserver that you need to decommission. [Tue Feb 10 07:30:16 2015] replicate in the sense of providing equivalent functionality. [Tue Feb 10 07:31:15 2015] BTW for the "depot" style of package management one classic implementation that can well be used with AFS RO volumes is GNU 'stow' [Tue Feb 10 07:33:09 2015] and something recent that uses a similar logic is CernVM-FS [Tue Feb 10 07:56:43 2015] Hi all, i want to install oenafs for my network and first, i'm building the authentication services. I thought of Kerberos/gssapi with openldap (which work well with openafs), and a friend wants me to install OAuth. What do you think of it ? Is it compatible with AFS ? [Tue Feb 10 08:03:57 2015] https://tools.ietf.org/html/draft-hardjono-oauth-kerberos-01 [Tue Feb 10 08:04:00 2015] with this, probably [Tue Feb 10 08:04:22 2015] AFS needs kerberos, not oauth. as far as you can interface oauth with kerberos, you can have both [Tue Feb 10 08:08:06 2015] that's what i thought [Tue Feb 10 08:15:14 2015] thanks guys :) [Tue Feb 10 11:11:18 2015] uhm, i've removed a mountpoint, cell/foo/bar/baz, ls -lh cell/foo/bar/ says dir is empty, however ``test -d cell/foo/bar/baz && echo dir'' says "dir" [Tue Feb 10 11:11:50 2015] how can a directory not show a mount point, but testing it against to be a directory still reporting it? [Tue Feb 10 11:18:59 2015] fs flush [Tue Feb 10 11:25:13 2015] if you did the mount point wrong or if something is cached wrongly [Tue Feb 10 11:25:51 2015] fs lsm /full/path/to/mount_point gives info on the mount point [Tue Feb 10 11:27:08 2015] also make sure you're not doing something like removed on r/w path, didn't release volume, testing on r/o path [Tue Feb 10 11:27:45 2015] or even two different clients, where one might be seeing the change and one not [Tue Feb 10 11:35:03 2015] i'm trying to be aware of these, right [Tue Feb 10 11:35:33 2015] thing was i did forget to release the volume having the mount point. but i mounted it using the rw path, removed it using the rw path, and kept on checking using that path [Tue Feb 10 11:35:38 2015] but a release + flush solved it [Tue Feb 10 11:38:53 2015] :) [Tue Feb 10 11:46:56 2015] slowly i'm getting used to the stuff and finding my way around it [Tue Feb 10 12:56:16 2015] billings: I just installed an update of yours and got this: http://dpaste.com/1QE7Y5K [Tue Feb 10 12:59:09 2015] I'm gonna do a reboot and see if it maybe comes up automagically. [Tue Feb 10 13:04:54 2015] Yup, it did. [Tue Feb 10 13:18:05 2015] if i restored a volume to a cell from another with readonly, how do i add a site for it and make it sync? examine says "Volume rel.themis.test2.0_1 does not exist in VLDB" but prints the vol information, for the second server it says "not released", when i do vos release on it, it says "not a RW volume" [Tue Feb 10 13:18:27 2015] how do i go around with dumps restore only as RO, to be on multiple fileservers in a cell? [Tue Feb 10 13:19:51 2015] you don't normaly do that, you restore an r/w and addsite/release [Tue Feb 10 13:20:23 2015] the vldb keys off of r/w volumes, so an r/o with no r/w anywhere in the cell is kinda orphaned [Tue Feb 10 13:21:19 2015] mhm [Tue Feb 10 13:22:53 2015] the ro-only nature seemed kinda appealing, the purpose would be to put mostly application into production sites, which should not be able to write to it [Tue Feb 10 13:23:05 2015] then what would be a good way to achieve that? [Tue Feb 10 13:45:48 2015] our "/usr/local" is ROs replicated on all servers [Tue Feb 10 13:46:05 2015] that is an RW, of course, but someone isn't going to get to that unless they explicitly mount it or explicitly go down the RW path to get there [Tue Feb 10 13:46:14 2015] and even if they get there, the ACLs don't give them write perms on it [Tue Feb 10 13:47:04 2015] ROs sync from the RW, not from other ROs... so, the easy thing is you make your changes in the RW, even point your test computer directly at the RW to test, and then "vos release" to deploy to everyone else [Tue Feb 10 13:48:02 2015] you don't really gain anything by not having the RWs around someplace [Tue Feb 10 13:55:47 2015] RedFyre: what phx is trying to do is special. He is trying to replicate from a cell of RW volumes to other cells that will not have RW volumes. Having cells in geographic locations makes a lot of sense with OpenAFS because ubik's behavior is so bad in high latency environments or when there is likely to be semi-frequent network outages. The OpenAFS ubik quorum recovery time is horrendous. [Tue Feb 10 13:57:32 2015] The OpenAFS tooling is not designed to handle the multiple cell deployment scenario. There are some low level building blocks but a higher level volume management system needs to be built to manage the replication, transmissions of dumps, restoration to file servers, and poking at the location database. [Tue Feb 10 13:57:52 2015] that sounds interesting [Tue Feb 10 14:04:27 2015] i'm halfway there [Tue Feb 10 14:04:30 2015] actually :) [Tue Feb 10 14:05:07 2015] and yeah, RW volumes it is now, released and stuff, and I'm putting ACLs to the prod site cells disallowing writes [Tue Feb 10 14:05:19 2015] so even if they get to the RW path, ACLs will block them for good [Tue Feb 10 14:05:49 2015] * is thinking of how many beer would one cost to be in the system:administrators group ... [Tue Feb 10 14:06:17 2015] phx - if you get it up and running well, contributing to the wiki what you did would be great... I'm sure you won't be the last person to come across this [Tue Feb 10 14:06:27 2015] phx: look at the fileserver -readonly option [Tue Feb 10 14:06:37 2015] secureendpoints, thanks for the tip! [Tue Feb 10 14:06:59 2015] RedFyre, i will. however it's a bit special. will have to demo this in 3 hours to my senior management, and still have code to write [Tue Feb 10 14:07:14 2015] RedFyre: what phx is building is effectively the file system component of Aurora http://tinyurl.com/pr342px [Tue Feb 10 14:07:34 2015] http://www-conf.slac.stanford.edu/AFSBestPractices/Slides/MorganStanley.pdf [Tue Feb 10 14:07:46 2015] why it's special, because i'm doing an "integration" layer on top of a couple of things, starting with saltstack, bind, kerberos, ldap, couple of less important things like ipsec, iptables, syslogging, so on, then adding AFS to the mix [Tue Feb 10 14:07:47 2015] http://workshop.openafs.org/afsbpw08/talks/wed_1/OpenAFS_and_the_Dawn_of_a_New_Era.pdf [Tue Feb 10 14:08:01 2015] coolbeans... would still be useful to have on the wiki, of course, since it's basically afs (especially if code can be shared) [Tue Feb 10 14:08:02 2015] basicaly people will get a single interface which will manage all the parts and interface with them [Tue Feb 10 14:08:37 2015] https://www.usenix.org/legacy/publications/library/proceedings/sec96/full_papers/hollander/ [Tue Feb 10 14:08:44 2015] RedFyre, please keep in mind i still don't have the years of experience and knowledge that most of the guys here have [Tue Feb 10 14:08:57 2015] but sure, contributing is fine, especially if i can make someone else's life easier :) [Tue Feb 10 14:12:49 2015] secureendpoints, interesting slides, thanks for the pointers [Tue Feb 10 14:15:48 2015] Cell Wide Outages and other unpleasant disasters [Tue Feb 10 14:15:49 2015] • vos delentry root.afs [Tue Feb 10 14:15:50 2015] sounds fun :) [Tue Feb 10 14:23:55 2015] secureendpoints, btw, it's not effectively the filesystem component of aurora. /usr and such things are not managed. tho that might be aquilon [Tue Feb 10 14:30:10 2015] I've accidentally deleted a few things in my day, resulting in cell wide outages [Tue Feb 10 15:00:59 2015] * reminded of setting up a test cell to figure out how AFS worked at his last job --- which he did way too thoroughly, as $coworker nuked root.cell and he had to use a cross-cell mount to fix it >.> [Tue Feb 10 15:02:47 2015] sorry root.afs [Tue Feb 10 15:03:50 2015] (ancient fileservers running something like transarc 3.3, somehow the r/w for root.afs had disappeared and been recreated empty. $coworker added a new cell, released... /afs went empty except for the new cell) [Tue Feb 10 15:51:18 2015] if only a single RO clone left of a volume, how do i delete that? vos remsite/remove both unwilling to do that [Tue Feb 10 15:53:19 2015] is vos delentry any good in such cases? [Tue Feb 10 16:00:20 2015] if this is cleaning after your R/O without an R/W, you probably want vos zap since it doesn't have a valid VLDB entry anyway [Tue Feb 10 16:33:18 2015] thanks [Wed Feb 11 11:10:50 2015] Hi all, i'd like to install freebsd+kerberos+openafs, and i don't know where to start. Would it be fine to have a system partition, and an afs one ? So i could install the kerberos server before the afs fs, or not ? [Wed Feb 11 11:26:18 2015] hi snolahc, yes, you'll want to setup kerberos before openafs [Wed Feb 11 11:27:33 2015] the basic setup is, 1. kerberos, 2. openafs db servers, 3. openafs fileservers [Wed Feb 11 11:48:55 2015] you might also want to configure your DNS for your kerberos and AFS servers [Wed Feb 11 11:58:26 2015] i dont think the wiki (http://wiki.openafs.org) freebsd specific, but it has some general info and some linux specific guidance [Wed Feb 11 12:14:07 2015] as for partitioning, you want separate partitions for AFS server partitions (/vicep*) and you would want it for the client cache manager's disk cache (except that doesn't currently work on freebsd, last I heard; need to use memcache) [Wed Feb 11 12:15:08 2015] you might want to also look at the solaris quick start on the wiki if you are using zfs on freebsd; most of the zfs-related parts are also relevant to freebsd zfs [Wed Feb 11 12:56:49 2015] kaduk_: FWIW, I'd mentor GSOC if there's interest. [Wed Feb 11 13:06:43 2015] the openafs experience with gsoc was very mixed. [Wed Feb 11 13:08:20 2015] for an individual mentor of one student once the program is underway there is a 10 hour per weeek commitment. but there is a significant commitment of effort in order to obtain acceptance and then to train the students before the projects begin. [Wed Feb 11 13:10:50 2015] the primary goal of gsoc from the perspective of an organization is to recruit developers for the long term. it is not to produce a small amount of code over the Summer. for the total amount of usable code produced during gsoc it would be a better use of resources for one of the existing developers to sit down and write it in over a week. [Fri Feb 13 15:13:24 2015] doh. don't put pam_afs_session in the auth section of PAM defaulting to done. [Fri Feb 13 15:13:34 2015] unless you don't care about people logging in with any password [Fri Feb 13 15:14:41 2015] * wonders if pam_afs_session should just always fail in auth. isn't it just session anyway? [Fri Feb 13 15:16:57 2015] in my case, yeah. [Fri Feb 13 15:17:31 2015] I don't remember if it actually implements anything auth-related [Fri Feb 13 15:20:45 2015] apparently it's returning PAM_SUCCESS, which strikes me as a bad idea [Fri Feb 13 15:21:16 2015] (then again I usppose that protects against people who just blindly put it everywhere) [Fri Feb 13 15:22:17 2015] maybe... but if you add it to the bottom of your stack before pam_deny, it lets any password through [Fri Feb 13 15:22:18 2015] I think there were reasons with ssh doing silly things to have it in the auth section [Fri Feb 13 15:22:46 2015] I'm removing it from auth just in case. [Fri Feb 13 15:23:00 2015] I only want it in session, since I'm letting sssd do the auth part. [Fri Feb 13 15:23:03 2015] yeah, we only have it in session [Fri Feb 13 15:23:33 2015] and with "optional" in there [Fri Feb 13 15:32:40 2015] You may want to stack your Kerberos PAM module and the Unix module differently, but note that this module should always run after the Kerberos PAM module. If there is no ticket cache available in the PAM environment, it will succeed silently. [Fri Feb 13 15:32:50 2015] http://www.eyrie.org/~eagle/software/pam-afs-session/readme.html [Fri Feb 13 15:33:02 2015] I'm wondering if that's related to what I'm seeing. [Fri Feb 13 15:44:55 2015] * needs to find somewhere to host those pages... [Fri Feb 13 16:20:09 2015] geekosaur - afs wiki? [Fri Feb 13 16:21:42 2015] maybe, or my employer (have "agreement in principal" there but making it happen means annoying negotiations with $grandboss) [Fri Feb 13 16:21:47 2015] *in principle [Mon Feb 16 09:49:16 2015] I'm using Ubuntu 14.04.2 LTS and after updating (apt-get dist-upgrade) to GNU/Linux 3.13.0-46-generic x86_64 (from …-44), openafs no longer works. It doesn't compile because of dentry d_alias problems. Anyone else see something similar? [Mon Feb 16 09:51:18 2015] which openafs version? [Mon Feb 16 09:52:02 2015] (1) whatever is default for Ubuntu, and (2) I tried wget;'ing 1.6.10 and building from source. Same problem. [Mon Feb 16 09:52:20 2015] imho the 1.6.11 pre does have a fix for this [Mon Feb 16 09:52:39 2015] So, perhaps build from git ? [Mon Feb 16 09:54:39 2015] I am not really deep into this, sorry [Mon Feb 16 09:55:08 2015] But thanks anyway. I'll try building from git. And then report what happens. [Mon Feb 16 10:05:40 2015] OK git master head builds without error. Good sign. [Mon Feb 16 10:06:24 2015] isn't master 1.7 or something like that? [Mon Feb 16 10:15:44 2015] 1.8pre, yes [Mon Feb 16 10:17:04 2015] build the openafs-stable-1_6_x branch [Mon Feb 16 10:22:58 2015] OK. I've checked it out and am configuring/building etc. [Mon Feb 16 10:27:12 2015] "make" fails with a HUGE number of errors for ubik_int.cs.c. Maybe something went wrong with checlout. Will blow away repo and start afresh. [Mon Feb 16 10:28:54 2015] you have to clean your tree "git clean -xfd" [Mon Feb 16 10:34:59 2015] After "rm -r openafs; git clone ..; cd openafs; git checkout origin/openafs-stable-1_6_x" it builds fine. [Mon Feb 16 10:43:31 2015] OK. make; make dest succeeded, but make srpm fails. So tried "make install" which failed because /usr/local exists. I guess I need to manually move the kernel module into place? [Mon Feb 16 10:54:32 2015] No, that didn't work. Renaming "libafs.ko" as "openafs.ko" and dropping in /lib/modules/3.13.0-46-generic/fs/. does not work. Make I can debug bitrotten makesrpm.pl script. [Mon Feb 16 10:59:09 2015] ok, I wonder why this one windows client tells me "failed to renew credentials for afs" when I login (win7/32 1.7.29) [Mon Feb 16 10:59:24 2015] and then I look and I have tokens [Mon Feb 16 11:05:23 2015] @RedFyre: Wild guess but maybe it tried to renew tokens, failed and then got new tokens. [Mon Feb 16 11:06:06 2015] Back to Ubuntu problems. I guess "make srpm" is for RedHat. Not clear how to do this for Ubuntu (Debian?). Giving up. I can compile but can't install. [Mon Feb 16 11:10:53 2015] RPMs would definitely be for systems that use RPM packages :) [Mon Feb 16 11:29:16 2015] you'll need to get the debian packaging stuff from debian/ubuntu's source deb [Mon Feb 16 11:29:38 2015] (the rpm packaging will be going away at some point as well; we don't really want to be in the packaging business) [Mon Feb 16 11:32:20 2015] tho, it's nice to be able to build your own packages [Mon Feb 16 11:33:05 2015] @geekosaur: How do I go about getting an appropriate package for openafs 1.6.11 pre? I've determined it compiles on my Ubuntu computer. I have no experience with doing so. "apt-get install openafs-client" says it has the latest version (presumably 1.6.10). [Mon Feb 16 11:33:39 2015] hm. you are not then familiar with debian/ubuntu packaging? [Mon Feb 16 11:34:01 2015] you are building from source; you are going to have to learn some things [Mon Feb 16 11:34:10 2015] first off, remove the existing package. [Mon Feb 16 11:34:42 2015] second: I do not understand your comment about make install failing because /usr/local exists, unless that means you ran make install as a normal user instead of as root. [Mon Feb 16 11:37:07 2015] mm, actually we have debian packaging in the tree. see src/packaging/Debian/README.Debian [Mon Feb 16 11:37:56 2015] oh, no, that's more user facing docs. hold on [Mon Feb 16 11:39:40 2015] README.source there has useful information but if you're not familiar with dpkg and debian's pacge build stuff then you're going to be in some trouble [Mon Feb 16 11:40:31 2015] I am not clear on whether "make install" will get you appropriate startup files (upstart for ubuntu). certainly it will not magically configure it to start on boot [Mon Feb 16 11:40:56 2015] Yes, indeed. The first step was confusing: "1 Updated the package version in debian/changelog to match the new upstream version". [Mon Feb 16 11:41:19 2015] make install failed because /usr/local is a symlink into AFS space that root/boyland didn't have permission to updated. [Mon Feb 16 11:41:43 2015] yes, the instructions are geared toward people maintaining the official debian package [Mon Feb 16 11:41:56 2015] you would not be doing much of it [Mon Feb 16 11:42:56 2015] I am not sure there's a simple solution for you if you're not familar with debian packaging. you can probably solve the /usr/local problem by doing configure --prefix=/usr [Mon Feb 16 11:43:48 2015] (but, if you didn't know that, (a) you really are at a disadvantage doing a source build (b) that's a TERRIBLE idea if you ever want to switch back to official packages) [Mon Feb 16 11:44:24 2015] there's an ubuntu ticket for this problem already, on launchpad; maybe look at that and see if there's a test package? [Mon Feb 16 11:45:38 2015] (I think we just went with rolling back the kernel instead, though) [Mon Feb 16 11:46:47 2015] Rolling back the kernel would be fine, but also sounds tricky. [Mon Feb 16 11:47:04 2015] I'm sorry I can't be very helpful about this, by the way; Monday mornings are rather busy for me --- can't hang on IRC for more than a couple seconds at a time [Mon Feb 16 12:01:05 2015] @geekosaur: You have already helped. Thanks. I am reading tutorials on debian packaging. Then at least I can understand the terminology I hope. [Mon Feb 16 12:46:34 2015] So the easiset solution is just to use the openafs PPA which already has the patches needed for 14.04.2's new kernel. [Mon Feb 16 12:47:56 2015] https://launchpad.net/~openafs/+archive/ubuntu/stable [Mon Feb 16 12:48:34 2015] Debian testing ("jessie") has also taken a similar kernel update, and has the same problem. The version of openafs in Debian unstable ("sid") has been patched. [Mon Feb 16 12:52:23 2015] The debian bug for jessie's build failure is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778196 [Mon Feb 16 13:50:26 2015] when setting up a new openafs/kerberos, most guides explain how to export the key from kerberos for openafs in des-cbc-crc format. on the other hand, the applied crypto hardening guide from bettercrypto.org tells me to use aes256-cts-hmac-sha1-96 as enctype. [Mon Feb 16 13:50:39 2015] is this about the same thing, or am I mixing things up there? [Mon Feb 16 13:50:57 2015] @kaduk: Thanks! That worked like a charm. I have openafs working again. [Mon Feb 16 13:50:58 2015] It is about the same thing; the guides for openafs that say to use des-cbc-crc are outdated [Mon Feb 16 13:51:04 2015] is there a limitation in the choice of export format when getting the key from kerberos? [Mon Feb 16 13:51:16 2015] boyland: happy to help. [Mon Feb 16 13:52:02 2015] Spida: afs/sur5r.net@SUR5R.NET has only aes256-cts-hmac-sha1-96 [Mon Feb 16 13:52:14 2015] Spida: with openafs newer than 1.6.5, the export format is just a krb5 keytab file. [Mon Feb 16 13:52:41 2015] so it works without des-cbc-crc [Mon Feb 16 13:55:25 2015] sur5r: thanks, that was what I wanted to hear :-) [Mon Feb 16 13:58:42 2015] sur5r: which kerberos server do you use? [Mon Feb 16 13:59:02 2015] mit [Mon Feb 16 13:59:14 2015] everything from debian jessie [Mon Feb 16 13:59:37 2015] but be careful: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778196 [Mon Feb 16 14:00:03 2015] "openafs from sid works", but yes, that is unfortunate. [Mon Feb 16 14:00:50 2015] did you get any response yet outside the bts to your unblock request? [Mon Feb 16 14:00:59 2015] No. [Mon Feb 16 14:01:02 2015] :/ [Mon Feb 16 14:01:10 2015] Well, it is not exactly an unblock request, but still. [Mon Feb 16 14:01:16 2015] MIT kerberos 1.10.1 from debian stable: "ktadd: Invalid argument while parsing keysalts aes256" [Mon Feb 16 14:01:50 2015] I guess I should try to poke people in #debian-release at some point. [Mon Feb 16 14:02:34 2015] The ability to just use the short form "aes256" is new in 1.13 [Mon Feb 16 14:02:54 2015] aes256-cts-hmac-sha1-96:normal should work everywhere, though. [Mon Feb 16 14:04:02 2015] ah. doesn't work without the ":normal" [Mon Feb 16 14:04:03 2015] thanks [Mon Feb 16 14:43:34 2015] I created the keytab with "ktadd -e aes256-cts-hmac-sha1-96:normal -k /etc/afsserver.krb5-keytab afs/afsdomain@kerberosdomain", and tried to import it with "asetkey add 2 /etc/afsserver.krb5-keytab afs/afsdomain@kerberosdomain". the asetkey command gives me "asetkey: unknown RPC error (-1765328203) for keytab entry with Principal afs/afsdomain@kerberosdomain, kvno 3, DES-CBC-CRC/MD5/MD4". why does it say DES-CBC-CRC there? [Mon Feb 16 14:44:11 2015] asetkey in 1.6.x is only used with krb5 keys of the single-des enctypes. [Mon Feb 16 14:44:21 2015] Since you're using stronger enctypes, you don't need to use asetkey at all. [Mon Feb 16 15:06:02 2015] how do I then tell openafs to use this keytab? [Mon Feb 16 15:06:19 2015] Name it rxkad.keytab and put it in the server configuration directory. [Mon Feb 16 15:07:16 2015] Unfortunately http://openafs.org/pages/security/install-rxkad-k5-1.6.txt may still be the closest thing to documentation of this that I can link you to. [Mon Feb 16 18:14:14 2015] I am now trying to get openafs for linux-3.16 on debian stable. openafs-1.6.9-1 does not compile for 3.16 (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=762248). now trying a manual backport from sid (openafs-1.6.10)... [Mon Feb 16 18:14:30 2015] That backport should be fine. [Mon Feb 16 18:15:57 2015] thanks [Tue Feb 17 11:35:58 2015] thoughts on a "cp" failing due to error: "no buffer space available" ? [Tue Feb 17 11:36:02 2015] a second try succeeded [Tue Feb 17 11:47:34 2015] That's just ENOBUFS from some system call, I think. [Tue Feb 17 12:07:36 2015] RedFyre: any time i see strange behavour i reach for the dmesg first [Tue Feb 17 13:53:04 2015] unfortunately, dmesg doesn't give you timestamps :( [Tue Feb 17 13:53:51 2015] but, nothing useful in dmesg this time [Tue Feb 17 16:15:28 2015] on linux, dmesg gives you time-since boot, /var/log/kern.log gives you timestamps... [Tue Feb 17 16:48:26 2015] dmesg has zero timestamps on linux [Tue Feb 17 16:48:36 2015] unless there's some flag that adds those [Tue Feb 17 16:49:07 2015] quoting my ubuntu box: [Tue Feb 17 16:49:13 2015] [7962647.415335] NVRM: make sure that this kernel module and all NVIDIA driver [Tue Feb 17 16:49:13 2015] [7962647.415335] NVRM: components have the same version. [Tue Feb 17 16:49:32 2015] nope, those timestamps are not there on rhel6 [Tue Feb 17 16:49:54 2015] and there is no kern.log [Tue Feb 17 16:50:56 2015] oh huh, dmesg -T turns those into ctimes. [Tue Feb 17 16:51:19 2015] different versions, who knows [Tue Feb 17 16:51:36 2015] hmm... my fedora 21 box dmesg has timestamps [Tue Feb 17 16:51:41 2015] so, must be a relatively new feature [Tue Feb 17 16:52:11 2015] (ubuntu precise, though it does have a 3.11 kernel) [Tue Feb 17 16:55:22 2015] hmm, the internet suggests you might have /sys/module/printk/parameters/printk_time [Tue Feb 17 23:53:07 2015] might have? [Wed Feb 18 00:07:55 2015] well, I don't have it under 3.11, but others reference it so it presumably got added, then dropped when it became the only (correct) value, but I don't know when (or what kernel rhel6 has) [Wed Feb 18 09:33:02 2015] hunh... well, there is a /sys/module/printk/parameters directory [Thu Feb 19 14:51:02 2015] I wonder if 1.6.7 works with 3.8.13 [Thu Feb 19 18:26:08 2015] turns out it does [Fri Feb 20 13:14:08 2015] * built openafs-1.6.11pre2 packages on his COPR page, if any fedora users want to give them a spin [Fri Feb 20 13:20:19 2015] billings: remind me of the URL? I did have someone asking about RPMs for recent Fedora... [Fri Feb 20 13:20:56 2015] https://copr.fedoraproject.org/coprs/jsbillings/openafs/ for the client and server packages and https://copr.fedoraproject.org/coprs/jsbillings/openafs-kmod/ for the kmod-openafs and dkms-openafs packages. [Fri Feb 20 13:35:43 2015] basically it's 1.) buildsystem-friendly packages and 2.) linux kernel modules split from userspace packages. [Fri Feb 20 13:37:07 2015] oh and 3.) more RHEL/Fedora-friendly paths instead of transarc paths [Fri Feb 20 13:53:55 2015] I dunno... giving up transarc paths just feels wrong [Fri Feb 20 14:01:33 2015] * happy to switch to fhs paths [Fri Feb 20 15:37:31 2015] it makes SELinux happier [Fri Feb 20 15:37:35 2015] sometimes [Fri Feb 20 15:37:43 2015] for the server components particularly [Fri Feb 20 15:40:41 2015] how so? Don't have to add as many custom rules? [Fri Feb 20 15:45:39 2015] * builds pre2 on fedora 21 [Fri Feb 20 15:45:50 2015] tho, I will need to install the latest kernel... [Fri Feb 20 16:01:04 2015] uploaded 1.6.11~pre2 (experimental) to ftp-master [Fri Feb 20 16:03:07 2015] hopefully it works with 3.18.7-200.fc21 [Fri Feb 20 16:03:31 2015] Judging by 3.18.7, it ought to be okay, but I hear those Fedora folks are aggressive about taking new kernel bits [Fri Feb 20 16:04:10 2015] well, let's update kernel rpms and see what happens [Fri Feb 20 16:04:23 2015] will reboot later when I'm actually home [Fri Feb 20 16:05:56 2015] Where's the fun and excitement in that? [Fri Feb 20 16:06:35 2015] the fun and excitement was my power going out at 5:30 AM with temps of -6° [Fri Feb 20 16:06:52 2015] Hmm, I don't think our temperature went that low. We did have our water go out, though. [Fri Feb 20 16:07:16 2015] a transformer took out like 2/3s of the neighborhood [Fri Feb 20 16:07:20 2015] we got power back around 8:30 [Fri Feb 20 16:07:31 2015] in-house temps were 52° upstairs and 42° downstairs by then [Fri Feb 20 16:07:36 2015] yikes [Fri Feb 20 16:08:21 2015] yeah, another couple of hours and I would have had to start worrying about pipes freezing [Fri Feb 20 16:10:40 2015] Nashville hit -4F last night [Fri Feb 20 16:11:08 2015] -14 here [Fri Feb 20 16:11:12 2015] was still -12 when I got up [Fri Feb 20 16:11:28 2015] looks like we're up to about 7° [Fri Feb 20 16:11:30 2015] Nashville worries about pipes freezing at that temp with the power on because the insulation isn't thick enough [Fri Feb 20 16:12:01 2015] everyone is required to drip their faucets [Fri Feb 20 16:12:06 2015] not sure who's gonna have it worse in the spring... we keep getting snow followed by rain followed by deep freeze, which means the snow turns to slush and then freezes solid [Fri Feb 20 16:12:22 2015] we're developing some nice miniature glaciers >.> [Fri Feb 20 16:12:37 2015] welp, I got a /lib/modules/3.18.7-200.fc21.x86_64/extra/openafs.ko [Fri Feb 20 16:12:37 2015] We're supposed to get down to 4 tonight, then tomorrow is a bit warmer, sunday is 37 with rain, and sunday night the temperature drops down to 15. Oh, and we still have three feet of snow on the ground. So ... lots of ice next week. [Fri Feb 20 16:13:00 2015] yeah, we had bad ice jams down town last year cuz of the freeze thaw cycle [Fri Feb 20 16:13:55 2015] RedFyre: what were temps before power went out? My heat is set at 60F in my house [Fri Feb 20 16:14:00 2015] but even up here, a lot of homes are older homes built when they didn't give a crap about energy efficiency [Fri Feb 20 16:14:21 2015] my thermo was set at 69... but it's in the hallway... could be about 5° colder in other parts of the house [Fri Feb 20 16:14:36 2015] I'm hoping the extra $$ I spent to upgrade my house to R50 insulation was worth it [Fri Feb 20 16:14:40 2015] wasn't surprised at the downstairs cooling down as all the heat went up [Fri Feb 20 16:15:06 2015] we put an extra 9" in the attic a few years back and the ceiling was sealed last summer... also did insulation improvements in the bsmt utility room [Fri Feb 20 16:15:15 2015] now how much of a difference did that really make? Dunno [Fri Feb 20 16:15:34 2015] at these extreme temps, even my double paned windows had some leakage coming straight through 'em [Fri Feb 20 16:16:09 2015] ok, time for beer :) [Sat Feb 21 19:06:25 2015] dumb question in fedora 21: where do i find the basic repo definition rpm, incl gpg key? or must i cons up a repo def'n directly, in which case, where does one get the gpg key? [Mon Feb 23 13:44:19 2015] so far so good with pre2 on fedora21 [Mon Feb 23 13:46:22 2015] interesting... A meeting password can contain neither spaces nor any of the following characters: \ ` " / & < > = [ ] [Mon Feb 23 22:49:25 2015] hi guys [Mon Feb 23 22:50:47 2015] Does openafs create /afs when it does its install? [Mon Feb 23 22:57:12 2015] ISTR needing to mkdir /afs before starting afsd [Mon Feb 23 22:57:21 2015] but i'm not using typical startup scripts [Mon Feb 23 23:13:21 2015] In debian, openafs-client.postinst does [Mon Feb 23 23:13:30 2015] # Create the standard AFS mount point if it doesn't exist. [Mon Feb 23 23:13:30 2015] test -d /afs || mkdir /afs [Mon Feb 23 23:16:31 2015] what they said [Mon Feb 23 23:26:19 2015] Yeah. The packaging usually does, but the stock upstream code doesn't. At this point, it should at least warn at you cleanly, though, instead of crashing like it used to... [Mon Feb 23 23:30:29 2015] kaduk_: what do you mean, "the stock upstream code doesn't"? [Mon Feb 23 23:31:38 2015] eichin: Ya, it did for me, and does on most of my clients. [Mon Feb 23 23:32:04 2015] But i'm working on a client that didn't. [Mon Feb 23 23:35:48 2015] johnfg: if you, say, clone the git repository git://git.openafs.org/openafs.git and build from source there, nothing is going to create /afs for you; you have to do that manually. [Mon Feb 23 23:37:12 2015] kaduk_: Ah, that's strange. Was it always that way? [Mon Feb 23 23:37:53 2015] I believe so. [Mon Feb 23 23:42:24 2015] It seems that there are many places where the rough edges of its origin as a research project still remain. [Mon Feb 23 23:45:52 2015] "seems" [Mon Feb 23 23:46:10 2015] Well, how many has Auristor fixed? Hundreds? [Mon Feb 23 23:46:21 2015] thousands [Mon Feb 23 23:46:40 2015] many many thousands [Mon Feb 23 23:49:20 2015] the way gentoo has handled it for as long as I know, the openrc init script creates the directory if necessary [Mon Feb 23 23:49:43 2015] I have included a systemd .service, but I've never tested it [Mon Feb 23 23:50:03 2015] (I'm also not the original author of said service script) [Mon Feb 23 23:51:01 2015] Debian's unit file checks for the existence of the mountpoint, but it is created in the postinst. [Mon Feb 23 23:52:25 2015] The sad part of the state of AFS is that unlike Athena the result of Andrew was supposed to be a commercially supported product of IBM according to the contract between Carnegie Mellon University and International Business Machines. The reason that Transarc was formed after the IBM funding for Andrew came to an end was the code quality, the lack of test suites, lack of documentation, etc. [Mon Feb 23 23:52:51 2015] That cut off after "lack of documentation, etc.". [Mon Feb 23 23:53:08 2015] that is where it ended [Mon Feb 23 23:53:17 2015] kaduk_ (and eichin) I'll take a look at Debian's for reference. Appreciated. [Mon Feb 23 23:53:34 2015] Oh, sorry. I guess I am too quick to blame your IRC client ;) [Mon Feb 23 23:54:18 2015] IBM later acquired Transarc but it wasn't for AFS. IBM wanted the transaction broker technology Transarc developed to integrate into IBM TPS. [Mon Feb 23 23:58:57 2015] yeah, I think one of the clever bits about project athena was the way it played DEC and IBM against each other :-) [Mon Feb 23 23:59:34 2015] My understanding is that it wasn't MIT's choice regarding the licensing. [Tue Feb 24 00:00:07 2015] MIT originally went to public bid for Athena and there were three submissions: DEC, IBM, and Apple. [Tue Feb 24 00:00:14 2015] Apple was not taken seriously [Tue Feb 24 00:00:39 2015] IBM already had a partnership with CMU for Andrew which started a year earlier. [Tue Feb 24 00:00:52 2015] MIT had a close relationship with DEC and went in that direction. [Tue Feb 24 00:02:01 2015] However, after about a year MIT and DEC realized there wasn't enough money to meet the Athena goals which unlike Andrew were not limited to the development of a distribution computing environment but also the creation of a new educational platform [Tue Feb 24 00:03:20 2015] So MIT went to IBM to see if IBM would like to take part in order to get more funds. When IBM said yes, DEC was rightly paranoid about IBM having any say over how the intellectual property could be used and decided that MIT should own all of it and license it back without restrictions. [Tue Feb 24 00:05:58 2015] I've also heard that MIT's original fears about IBM's prior investment in Andrew were well founded. [Tue Feb 24 00:06:09 2015] IBM's heart wasn't in it. [Tue Feb 24 00:09:39 2015] has any looked at the opendfs source code to compare code quality with openafs? [Tue Feb 24 16:25:59 2015] kaduk_: you are the debian maintainer, correct? [Tue Feb 24 16:26:45 2015] correct [Tue Feb 24 16:27:17 2015] under what license are your systemd scripts released? [Tue Feb 24 16:28:40 2015] the systemd units and related scripts called by them [Tue Feb 24 16:32:22 2015] 2-clause BSD, if they're big enough to be copyrightable. You should probably file a bug to remind me to consider whether they are big enough to be copyrightable and document it if so. [Tue Feb 24 16:33:44 2015] kaduk_: file a bug on openafs.org or debian.org? [Tue Feb 24 16:33:54 2015] debian.org [Tue Feb 24 16:34:10 2015] kaduk_: okay, I'll do that. Thanks. [Tue Feb 24 16:34:52 2015] Thanks. [Tue Feb 24 22:12:56 2015] kaduk_ - is there a minimum size req for something to be under the bsd lic? [Tue Feb 24 22:14:36 2015] CybrFyre: no, but there's not a hard-and-fast-rule for how big something has to be to be copyrightable. [Tue Feb 24 22:15:15 2015] kaduk_: hmm, is the only way to submit debian bugs via software in debian, or via an email? [Tue Feb 24 22:15:36 2015] kaduk_ - oh? [Tue Feb 24 22:17:11 2015] oh, sorry, missed the *not* in there [Tue Feb 24 22:17:12 2015] https://www.debian.org/Bugs/Reporting [Tue Feb 24 22:17:32 2015] so, did you answer the Q on the license, if any, of the systemd scripts? [Tue Feb 24 22:17:43 2015] I would hope that if they are licensed, there is a comment in the scripts? [Tue Feb 24 22:17:48 2015] IANAL but everything is copyrightable unless a court determines otherwise [Tue Feb 24 22:18:42 2015] right, so if the scripts are nots pecifically licensed under whatefver, then, whoever wrote them? [Tue Feb 24 22:19:52 2015] I wrote an IRC saying that they are 2-clause BSD. [Tue Feb 24 22:20:08 2015] That is unlikely to be considered a binding statement by a court, of course. [Tue Feb 24 22:22:47 2015] well, if there's nothing in the files themselves... [Tue Feb 24 22:23:14 2015] then it's just you randomly stating something, presuming you are the author (and if you are, thanks for writing said scripts) [Tue Feb 24 22:23:31 2015] so, I need more Resolve cleaner for the next time a cat decides to puke on the carpet [Tue Feb 24 22:23:59 2015] again IANAL but copyright belongs to the author unless it has been assigned under contract and the assignment clause of the contract is legal in the jurisdiction in which the work was performed. A source file that does not have a license statement is unlicensed. The author can of course issue a license in writing to anyone. I agree that IRC chat logs will not be considered a valid license contract. [Tue Feb 24 22:24:27 2015] secureendpoints - I think we're in agreement :) [Tue Feb 24 22:25:03 2015] In particular, there are some debian people who would mark a "code without license" bug as severity grave or something like that. [Tue Feb 24 22:25:08 2015] I suspect that Debian might also have an implied license [Tue Feb 24 22:25:17 2015] or that :-) [Tue Feb 24 22:25:33 2015] Well, there is a copyright file, which is mostly IPL. I guess I could dual-license IPL and BSD to make things easier. [Tue Feb 24 22:25:34 2015] tho, these systemd files are there in fedora, so... one would have t be able to trace the origin, etc etc etc [Tue Feb 24 22:25:51 2015] They are somewhat different between debian and fedora, since we have different requirements. [Tue Feb 24 22:26:22 2015] well, I think the easy soln is to comment the files with at least a reference to the license [Tue Feb 24 22:27:15 2015] and maybe an "authored by" and "date authored" [Tue Feb 24 22:48:53 2015] kaduk_: so the answer to that question is a yes, only via a pkg in debian, or via email XD [Tue Feb 24 22:49:56 2015] via email, yes. [Tue Feb 24 22:50:10 2015] Sorry, was in a conversation and couldn't type more commentary when I sent the link [Tue Feb 24 22:50:24 2015] To: submit@bugs.debian.org with pseudoheaders at the top of the message [Tue Feb 24 22:51:29 2015] kaduk_: Not a problem. I read about it. it's just a little bit of a pain, so I was hoping that I missed something and there was a hidden web interface somewhere :P [Tue Feb 24 22:51:40 2015] There is no web interface. [Tue Feb 24 22:51:55 2015] debbugs is custom software for debian, and it's built around mail (mostly). [Tue Feb 24 23:44:14 2015] okay, I think I did that properly XD #779170 [Tue Feb 24 23:45:50 2015] NP-Hardass: it mostly worked -- there is no "openafs-server" package; the systemd bits are part of openafs-fileserver. But it should be enough to get the job done. [Tue Feb 24 23:46:18 2015] kaduk_: ah, sorry XD [Tue Feb 24 23:46:27 2015] Thank you for figuring it out and getting it sent [Tue Feb 24 23:46:48 2015] No problem. [Wed Feb 25 10:54:51 2015] I was trying to use the QuickStart instructions to setup a Fedora 21 machine, but it appears that there is only a Fedora 20 folder for 1.6.10. How would I get this up and running? [Wed Feb 25 10:55:46 2015] The QuickStart from the web is pretty out of date; building your own from master might be a nicer option. But, mumble mumble not providing binary RPMs any more. [Wed Feb 25 10:56:09 2015] I don't want binary RPMS, I'd like to use DPMS [Wed Feb 25 10:56:09 2015] someone in here recently mentioned that 1.6.11~pre2 seemed to work with Fedora 21 [Wed Feb 25 10:56:22 2015] DPMS? [Wed Feb 25 10:56:27 2015] DKMS, I assume. [Wed Feb 25 10:56:29 2015] and I think kaduk_ meant building your own RPMs [Wed Feb 25 10:56:34 2015] yeah. [Wed Feb 25 10:56:36 2015] that was prolly me... I built RPMs for fedora 21 [Wed Feb 25 10:56:48 2015] foley - you'll still need binary rpms, including the binary dkms-openafs rpm [Wed Feb 25 10:56:50 2015] Sorry, brain overloading on too many acronyms. [Wed Feb 25 10:57:14 2015] but, if you want to build your own, I can make available the srpm for 1.6.11-pre2 for fedora 21 [Wed Feb 25 10:57:49 2015] I'll do whatever you would recommend as a method that takes minimal tuning. I haven't been on a RedHat system for a long time. [Wed Feb 25 10:58:02 2015] are you running fedora 21 32 or 64 bit? [Wed Feb 25 10:58:29 2015] 21 32 bit. I though it was 64, but I guess not. [Wed Feb 25 10:58:47 2015] ok, the binary RPMs I built are 64-bit, so, you'll hafta rebuild from the srpm [Wed Feb 25 10:59:58 2015] Are there plans for support of Fedora21 in the near future? I'm hesitant about being ahead of what is supported. [Wed Feb 25 11:00:01 2015] this'll take a moment... copying the srpm to our public ftp [Wed Feb 25 11:00:09 2015] 1.6.11 should support F21 [Wed Feb 25 11:00:11 2015] depends on what you mean by "support" [Wed Feb 25 11:00:30 2015] I define it as "Follow the documentation to get it installed" [Wed Feb 25 11:01:00 2015] I think that'll work (unless the documentation specifically mentions using RPMs versus doing a make; make dest) [Wed Feb 25 11:01:22 2015] What's wrong with make install? [Wed Feb 25 11:01:31 2015] Gotcha. [Wed Feb 25 11:01:41 2015] There are some forces at work which are trying to remove make dest support from openafs 2.0 [Wed Feb 25 11:01:43 2015] kaduk_ - obviously I left out a few steps there :) [Wed Feb 25 11:02:40 2015] I will need some hand-holding on this because I am always nervous around kernel modules. [Wed Feb 25 11:02:59 2015] fortunately, with dkms the kernel module usually just builds [Wed Feb 25 11:03:16 2015] "except when they change the KPI in a kernel security update" [Wed Feb 25 11:03:24 2015] tho, foley, you'll want to pause before upgrading fedora to the latest and greatest kernel module since the linux kernel usually breaks stuff requiring openafs to patch [Wed Feb 25 11:03:47 2015] er, s/greatest kernel module/greatest kernel/ [Wed Feb 25 11:05:18 2015] Perhaps for my own sanity, I should use the supported Fedora 20 until 1.6.11? [Wed Feb 25 11:07:04 2015] fedora 21 *is* supported... there just aren't built binary pkgs [Wed Feb 25 11:07:46 2015] foley - if you anonymous ftp to image.cnf.cornell.edu and cd to "common" you'll see openafs-1.6.11-0.pre2.src.rpm [Wed Feb 25 11:07:58 2015] grab that and do an rpmbuild --rebuild openafs-1.6.11-0.pre2.src.rpm [Wed Feb 25 11:08:12 2015] which will put rpms in ~/rpmbuild/RPMS/... [Wed Feb 25 11:08:19 2015] which you can then install [Wed Feb 25 11:08:38 2015] after that, proceed to editing the ThisCell file and other configs to your hearts' content [Wed Feb 25 11:08:45 2015] make sure you have a cache partition [Wed Feb 25 11:08:56 2015] then start the openafs-client service [Wed Feb 25 11:11:18 2015] I'll give it a shot. Thanks [Wed Feb 25 11:13:52 2015] Are you sure I can get it from there? I tried to connect with username "anonymous" and it still asked for a password [Wed Feb 25 11:14:04 2015] yeah, you can just put in whatever for the password [Wed Feb 25 11:14:10 2015] I usually just type in "null" [Wed Feb 25 11:14:21 2015] Oh, of course. [Wed Feb 25 11:15:52 2015] the convention for anonftp i to use your primary email address as password. of course, these days you have to worry if someone will find a way to harvest it for spam.... [Wed Feb 25 11:16:16 2015] So ... use your tertiary email address instead? [Wed Feb 25 11:16:18 2015] yeah... as it turns out, for whatever reason, the ftp server will accept randomly typed stuff (but not empty) for the pw [Wed Feb 25 11:17:08 2015] That's what got me. I typed in a blank a few times. [Wed Feb 25 11:22:20 2015] Hmm. error: Failed build dependencies: kernel-devel-i686 = 3.17.4-301.fc21 is needed by openafs-1.6.11-0.pre2.fc21.i686 [Wed Feb 25 11:22:38 2015] are you trying to install the rpms one at a time? [Wed Feb 25 11:22:46 2015] that won't work too well :) [Wed Feb 25 11:22:50 2015] It's also complaining like crazy about someone named dbotsch [Wed Feb 25 11:23:06 2015] prolly cuz that was my account name on my computer [Wed Feb 25 11:23:13 2015] but that should only be in the build process [Wed Feb 25 11:23:21 2015] I was trying to follow your instructions. You said I should rpmbuild the file you put in FTP [Wed Feb 25 11:23:26 2015] yes [Wed Feb 25 11:23:45 2015] rpmbuild should build (with possible complaints, not fatal, about dbotsch) [Wed Feb 25 11:23:58 2015] then cd to the directory with the built RPMs [Wed Feb 25 11:24:10 2015] I presume you already have the main dkms RPM installed? [Wed Feb 25 11:24:31 2015] I'll go check on DKMS. It didn't build [Wed Feb 25 11:24:47 2015] you *will* need kernel-devel installed for your dkms module to build [Wed Feb 25 11:24:51 2015] yum install kernel-devel [Wed Feb 25 11:25:36 2015] Weird, dkms was installed, but not kernel-devel [Wed Feb 25 11:25:40 2015] in the directory with the binary RPMs, just delete the kmod-openafs-... RPM since you want to use dkms [Wed Feb 25 11:25:59 2015] then yum install *openafs*pre2*.rpm [Wed Feb 25 11:26:17 2015] It complained about a few other broken dependencies, but I still can't fix these. [Wed Feb 25 11:26:30 2015] kernel-devel-i686 = 3.17.4-301.fc21 is needed by openafs-1.6.11-0.pre2.fc21.i686 perl(ExtUtils::Embed) is needed by openafs-1.6.11-0.pre2.fc21.i686 [Wed Feb 25 11:27:54 2015] yum install perl-ExtUtils-Embed [Wed Feb 25 11:28:03 2015] Trying to install kernel-debug-devel gets this response Package kernel-debug-devel-3.18.7-200.fc21.i686 already installed and latest version [Wed Feb 25 11:28:27 2015] not kernel-debug-devel, but kernel-devel [Wed Feb 25 11:28:52 2015] Package kernel-devel-3.18.7-200.fc21.i686 already installed and latest version [Wed Feb 25 11:29:15 2015] 3.18.7-200 != 3.17.4-301 [Wed Feb 25 11:30:31 2015] I noticed. [Wed Feb 25 11:30:43 2015] ack, ok, it must be complaining about the 3.17.8 since that's the kernel I was running when I built that srpm [Wed Feb 25 11:30:51 2015] that's an interesting bug in the packaging script [Wed Feb 25 11:31:17 2015] anyway, you should be able to yum install kernel-devel-3.17.8-300.fc21.x86_64 [Wed Feb 25 11:31:32 2015] RPM-based DKMS usually builds modules at boot, IIRC; .deb-based DKMS usually builds modules at installation time. [Wed Feb 25 11:31:59 2015] kaduk_ - yes, however, it seems the makesrpm.pl embeds as a requirement for the SRPM the running kernel's devel pkg [Wed Feb 25 11:32:07 2015] rpm -qp openafs-1.6.11-0.pre2.src.rpm --requires [Wed Feb 25 11:32:16 2015] outputs: kernel-devel [Wed Feb 25 11:32:17 2015] AND [Wed Feb 25 11:32:23 2015] kernel-devel-x86_64 = 3.17.8-300.fc21 [Wed Feb 25 11:32:48 2015] Intersesting because I am not on x84_64 [Wed Feb 25 11:32:51 2015] * had not yet rebooted into 3.18 when he ran makesrpm [Wed Feb 25 11:33:32 2015] try rpmbuild --nodeps [Wed Feb 25 11:34:18 2015] (assuming you have the rest of the dependencies satisfied) [Wed Feb 25 11:35:11 2015] I've got basic instructions for building an openafs srpm here: https://confluence.cornell.edu/display/CNF/Building+AFS+Source+RPM [Wed Feb 25 11:35:12 2015] Similar problem checking your OS... configure: error: No usable linux headers found at /usr/src/kernels/3.17.4-301.fc21.i686 [Wed Feb 25 11:37:20 2015] Thanks for the help, but I'm going to switch back to my debian partition until this is more streamlined. [Wed Feb 25 11:39:48 2015] I prefer the binary packages for simplicity, but I guess I'll have to look into the SRPMs if they stop getting updated. [Wed Feb 25 17:51:57 2015] openafs_1.6.9-2+deb8u1_amd64.changes ACCEPTED into testing-proposed-updates [Thu Feb 26 22:41:15 2015] I am evaluating openafs and I was wondering after checking the security vulnerability, why there is not much vulnerabilities found and why there sometime huge gap in between vulnerabilities ? for instance 8 vulnerabitlites between 2009 and 2014 and nothing between 2007 and 2009 and nothing between 2003 and 2007, I don't want to start any was, just wondering why [Thu Feb 26 22:41:27 2015] http://www.openafs.org/security [Thu Feb 26 23:08:40 2015] probably the obvious reasons [Thu Feb 26 23:10:59 2015] like "it's not sendmail" :-) [Fri Feb 27 06:04:09 2015] slashd: basically it is not that popular, so not that rewarding to attack it. [Fri Feb 27 06:05:14 2015] slashd: it seems like that SecureEndPoints, the publishers of the "commercial" version of OpenAFS, Auristor, have done a code review and fixed "thousands" of bugs, so if you have security requirements perhaps that is a good option. [Fri Feb 27 06:05:56 2015] slashd: I meant YFS of course. [Fri Feb 27 07:59:51 2015] Walex2, thanks [Fri Feb 27 08:08:19 2015] slashd: more troubling to me from a security standpoint are design implementations (rxkad/fcrypt), but that is being worked on (rxgk). [Fri Feb 27 08:22:19 2015] the new protocol and the standard and... oh yeah,... [Sat Feb 28 11:57:58 2015] openafs 1.6.9-2+deb8u1 MIGRATED to testing [Sun Mar 1 18:51:37 2015] Hiya channel; is there any way to up vos's rx timeouts? Occasionally our replica recipient decides to go away for a while and vos gives up on it even though the replica source doesn't. [Sun Mar 1 18:52:01 2015] The result is that the release finishes but the VLDB doesn't come up to date, and thereafter the next release after unlock causes the volume to be destroyed. [Sun Mar 1 18:52:42 2015] "Well, which timeout do you want?" [Sun Mar 1 18:55:13 2015] I'd like them to agree on timeout, but realistically I'd go for making the "vos release" command never time out. [Sun Mar 1 18:55:15 2015] For some reason I thought there was a volserver option to adjust a timeout, but I don't see it in the man page. [Sun Mar 1 18:56:06 2015] Since, observationally, the only problem is that the vos that is supposed to update the VLDB goes away. [Sun Mar 1 18:56:29 2015] I'd *like* truly transactional releases, but I can't get that. =P [Sun Mar 1 18:57:35 2015] There is the rx_SetConnIdleDeadTime() library routine which seems to not be quite what you want. [Sun Mar 1 19:00:52 2015] (I am getting distracted by something else and not looking any more right now, sorry) [Sun Mar 1 19:12:08 2015] Don't worry; I found a ... creative workaround. [Sun Mar 1 19:35:30 2015] Do tell! [Mon Mar 2 09:23:21 2015] I installed what was needed pretty sure for openafs-client on debian-testing/jessie. [Mon Mar 2 09:23:54 2015] No complaints while or after installing. However, getting a token fails with this: [Mon Mar 2 09:24:41 2015] johnfg: I screwed up, so the 1.6.9-2+deb8u1 in jessie is still broken. [Mon Mar 2 09:26:09 2015] http://dpaste.com/1YQEV20 [Mon Mar 2 09:26:21 2015] 1.6.10-4 from sid or 1.6.11~pre2 from experimental should be fine on jessie's kernel, though. [Mon Mar 2 09:26:47 2015] "a pioctl failed" ~always means "there's no kernel module", which is the known brokenness. [Mon Mar 2 09:27:00 2015] kaduk_: Ok, so just wait, or install from sid or experimental, gotcha. [Mon Mar 2 09:27:19 2015] kaduk_: thanks for working on this! [Mon Mar 2 09:28:01 2015] You're welcome. Sorry things are still broken :( [Mon Mar 2 09:28:41 2015] kaduk_: no worries. would you recommend one of those over the other? [Mon Mar 2 09:29:09 2015] Not especially; 1.6.11~pre2 is ~identical to 1.6.11 final, which should be getting announced any hour now. [Mon Mar 2 09:30:16 2015] kaduk_: And with debian, I'll need to do a total uninstall of 1.6.9-2+deb8u1 before installing 1.6.11~pre2, right? Unlike yum. [Mon Mar 2 09:30:32 2015] No need to uninstall; the upgrade path should work just fine. [Mon Mar 2 09:30:44 2015] kaduk_: Ok, I'll give it a shot. [Mon Mar 2 09:30:47 2015] (If it doesn't, please file a bug :) ) [Mon Mar 2 09:33:21 2015] I guess over the 1.6.9-->1.6.10 upgrade you'll get native systemd unit files, so you may need to manually stop/start the openafs-client service (or reboot) to get the client started. [Mon Mar 2 09:35:00 2015] Well, hmm, you'd have to do that anyway, even with the sysv init scripts, since the package is configured to not restart on upgrade. [Mon Mar 2 09:43:08 2015] so far, openafs update on debian went smooth the last 5 years [Mon Mar 2 09:44:36 2015] the last 5 years? :) [Mon Mar 2 09:44:59 2015] thats what I can call for sure^^ [Mon Mar 2 09:45:13 2015] It maybe 10 years, but thats too far ago [Mon Mar 2 09:45:30 2015] * must be missing some context [Mon Mar 2 09:47:34 2015] For a client on debian jessie, what pkgs would you install? What I installed previously was: openafs-client, openafs-krb5 and openafs-module. [Mon Mar 2 09:49:01 2015] Amiga4000: My server's debian wheezy, and it's been running and updating fine for at least that many years, but not wheezy all that time, of course. [Mon Mar 2 09:49:40 2015] Are those the pkgs you would install on the client? [Mon Mar 2 09:49:42 2015] for jessie I did install the latest jessie kernel and the 1.6.11 pre package [Mon Mar 2 09:50:07 2015] That's my plan too; have a line in sources.list added for expermimental. [Mon Mar 2 09:56:39 2015] So, do I need more than client and krb5? [Mon Mar 2 09:56:57 2015] kernel-source (dkms) [Mon Mar 2 09:57:00 2015] for OpenAFS [Mon Mar 2 09:57:09 2015] should get in with client [Mon Mar 2 09:57:59 2015] Amiga4000: Thanks, will do [Mon Mar 2 10:09:19 2015] Is this latest, installed now, able to be enabled/stopped/started/restarted with systemd? [Mon Mar 2 10:13:21 2015] It should be. [Mon Mar 2 10:13:36 2015] The sysv shim pretty much "just works" these days, as far as I can tell. [Mon Mar 2 10:14:05 2015] Bummer, all installed and built fine, no errors, but I get the same error when I try to get a token. [Mon Mar 2 10:14:16 2015] #760063 may have been partly the shim's fault, but I haven't heard anything about it recently. [Mon Mar 2 10:14:24 2015] lsmod|grep afs [Mon Mar 2 10:14:27 2015] I'll wait here for a sec, but I'm gonna try and reboot. [Mon Mar 2 10:14:38 2015] Reboot is probably worth trying, yeah. [Mon Mar 2 10:14:59 2015] kaduk_: Nothing listed. [Mon Mar 2 10:15:27 2015] I'll reboot then and be right back. [Mon Mar 2 10:22:58 2015] ok [Mon Mar 2 10:23:05 2015] all's good! [Mon Mar 2 10:23:09 2015] Yay! [Mon Mar 2 10:23:28 2015] systemd started and loaded what it should, and I can access my cell. [Mon Mar 2 10:24:07 2015] Still not sure what's the reason I couldn't get it running without a reboot, but who cares? [Mon Mar 2 10:24:19 2015] Thanks kaduk_ and Amiga4000! [Mon Mar 2 10:24:54 2015] I suspect that some combination of "systemctl enable", "systemctl stop" and "systemctl start" could have gotten it going, but it's probably not worth the effort to figure out exactly what combination, if a reboot is easy. [Mon Mar 2 10:26:19 2015] I did all but the stop, but yes, the reboot was easy and fast with jessie, for sure. [Mon Mar 2 10:27:11 2015] amusingly, afs is one of the few hangups on a boot, now, for fedora 21, with all the parallel boot stuff going on... the cache scan :) [Mon Mar 2 10:28:58 2015] I think there was some sketch of a solution to avoid that in gerrit (but I marked it -2 as not-ready-for-1.8 so won't see it for a while) [Mon Mar 2 10:28:59 2015] RedFyre: I'm running fedora 21 and no hangups at all. [Mon Mar 2 10:29:26 2015] But I'm not doing anything real heavy duty. [Mon Mar 2 10:29:30 2015] dkms is the other thing that can slow up my boot [Mon Mar 2 10:29:51 2015] If you build modules at boot, like RPM systems do, yes :) [Mon Mar 2 10:30:05 2015] well, it only builds 2 modules, I think... openafs and nvidia [Mon Mar 2 13:57:14 2015] Subject: [OpenAFS-announce] OpenAFS 1.6.11 release available [Mon Mar 2 16:20:26 2015] great... seems a windows update broke something with the kerberos identity manager and integrated login [Mon Mar 2 16:40:20 2015] network identity manager and integrated logon (afslogon.dll) are independent code bases. they do not interact. [Mon Mar 2 16:46:52 2015] all I can tell you, wihtout investigating more, is on login (and we do integrated login) I'm suddenly getting errors from NiM that "could not renew credentials for principal" [Mon Mar 2 16:47:20 2015] now, afscreds shows tokens, so it gets that far, but as far as NiM grabbing the credentials as well, something seems to have changed there [Mon Mar 2 16:55:03 2015] Going to be very clear here. OpenAFS integrated logon does not push Kerberos credentials into the logon session. There is code that is compiled out that used to do that which was removed in March 2007. Any version of OpenAFS that shipped after that does not do so. Pushing the credentials via the mechanism that OpenAFS used is not available on Vista or above and results in stranded credential cache files owned by SYSTEM [Mon Mar 2 16:55:36 2015] so, despite afslogon.dll, the 2 are eseentially operating independently in parallel [Mon Mar 2 16:56:10 2015] Network Identity Manager cannot renew AFS tokens. It can display them but AFS tokens are not renewable. To obtain a newer token requires a Kerberos TGT. [Mon Mar 2 16:57:10 2015] so for renewal, is NiM doing the renewal and then via afslogon.dll afscreds gets a new token, or is afscreds via afslogon.dll calling into NiM to use the kerb libraries to get a new TGT? [Mon Mar 2 16:57:55 2015] NIM and afslogon do not speak. afslogon is only used at initial logon [Mon Mar 2 16:58:53 2015] so, how does afscreds then "renew" its token, which it successfully does? Is it caching the user password? [Mon Mar 2 16:59:00 2015] NIM uses afscred.dll which is registered with the krb5cred.dll via the registry [Mon Mar 2 16:59:37 2015] so, if NiM loses its tickets (due to this problem), one won't get new tokens [Mon Mar 2 17:00:27 2015] NIM and afscreds.dll obtain tokens exactly the same way that aklog works. They ask for a TGT from a credential cache. The difference from aklog is that they prompt the user to obtain a TGT when there isn't one or it is expired. [Mon Mar 2 17:01:49 2015] so, is afscreds.dll essentially running as an independent binary (rundll), or is it a "hook" that is called whenever NiM asks for a new/renew TGT? [Mon Mar 2 17:04:04 2015] Here is the developers guide http://www.secure-endpoints.com/netidmgr/developer/index.html [Mon Mar 2 17:07:04 2015] There are many talks about NIM at the Workshops [Mon Mar 2 17:07:48 2015] both by Asanka Herath and from developers at other organizations that created credential providers for it [Mon Mar 2 17:10:56 2015] right... what I need to figure out is if I care about NiM not getting/renewing credentials or not (ie can I just tell NiM not to run at login time) as all I really care about are tokens, ie afscreds [Tue Mar 3 06:48:22 2015] Hi all, I have a question regarding backups. At my work I use Veeam to backup all our virtual machines. Our OpenAFS servers are VM [Tue Mar 3 06:48:28 2015] VM's too [Tue Mar 3 06:48:55 2015] and I'd like to get them in our Veeam backup infrastructure. Currently I'm searching what the best way is to do so. [Tue Mar 3 06:49:51 2015] I was thinking of creating a separate VM that overnight syncs everything from our AFS-cell to a local filesystem which veeam can read. Something like an rsync cron job if you like. [Tue Mar 3 06:50:21 2015] Is there some command in the OpenAFS suite that can do that? Or are there other thoughts on how to approach this? [Tue Mar 3 07:10:08 2015] buco_: I think the most common thing is to use 'vos dump' to dump the volumes and store the volume dumps as a backup [Tue Mar 3 07:11:30 2015] btw how safe it is to snapshot and backup the filesystem under the openafs fileservers? [Tue Mar 3 07:13:28 2015] its not [Tue Mar 3 07:21:22 2015] My idea would be to copy the files out of the AFS-cell to another ext4 partition. How is that unsafe? [Tue Mar 3 07:21:59 2015] The problem is, we work with Veeam for incremental backups. That won't work with vos dumps (I think?) [Tue Mar 3 07:22:17 2015] and veeam is incompatible with OpenAFS. [Tue Mar 3 08:08:01 2015] buco_: copy out the data looses all ACls [Tue Mar 3 08:10:09 2015] [facepalm]Yes indeed, didn't think about that[/facepalm] [Tue Mar 3 08:43:05 2015] have you considered actually using the built-in AFS backup system? [Tue Mar 3 08:49:34 2015] afs's daily snapshots will on a volume level handle incrementals [Tue Mar 3 08:50:03 2015] if you thendo a daily dump of those incrmentals over the course of say, a week, you'll have one day of large dump files and the rest of the days will be your small incremental dump files [Tue Mar 3 08:50:07 2015] repeated weekly [Tue Mar 3 08:50:28 2015] I should prolly post my modded version of afsdump.pl [Tue Mar 3 09:02:36 2015] Empirically, here at MIT, many volumes do not change very much at all. [Tue Mar 3 09:33:48 2015] same here...even user volumes [Tue Mar 3 12:39:43 2015] * realizes he hadn't restart buildslave... [Tue Mar 3 13:35:35 2015] I'm trying to evaluate how good or bad is the OpenAFS community in 2015 ? It seems there are long-term outstanding bugs and a low level of vulnerabilities found between 2009-2014, etc ... I remember Walex2 or Walex, mentionned that OpenAFS isn't that popular, so not rewarding to attack it. [Tue Mar 3 13:36:24 2015] I would like to get your opinion/facts about OpenAFS in 2015, good things/bad things/..... [Tue Mar 3 13:47:32 2015] slashd - things have slowed down in some areas and sped up in others... the community has both shrunken and grown... and hopefully for future growth, in the process of obtaining non-profit status [Tue Mar 3 13:48:13 2015] s/slowed/slown/ [Tue Mar 3 13:57:41 2015] note that a number of people in here work for companies that sell support for OpenAFS, so we're somewhat biased :) [Tue Mar 3 13:58:15 2015] * is not one of those... tho he is one the OpenAFS Foundation Board (which presents bias in terms of hope for the future) [Tue Mar 3 14:35:07 2015] slashd: what would be your target environment for OpenAFS? corporate versus home/non-profit use, number of users and various other concerns (multiple locations) may play a factor [Tue Mar 3 14:55:53 2015] this is probably out of date now (let me know if there are major problems to correct) but might be a good place to start: https://wiki.cites.illinois.edu/wiki/display/~cclausen/OpenAFS [Tue Mar 3 15:06:17 2015] cclausen: I had forgotten about Shishi [Tue Mar 3 19:56:55 2015] heh [Tue Mar 3 22:45:06 2015] cclausen - if there's anything there you can merge into the regular openafs wiki (perhaps as a simple overview...)... that'd be great! [Wed Mar 4 10:13:20 2015] 1.6.9-2+deb8u2 is in jessie now. Sorry it took so long to get it right. [Wed Mar 4 10:15:01 2015] I keep thinking, why is everyone picking on poor Jessie? [Wed Mar 4 22:32:50 2015] i tried to install the latest release for MacOS and failed with the error The OpenAFS release requires Mountain Lion (10.9) [Wed Mar 4 22:33:23 2015] I believe that YFSI has generously provided some installers. [Wed Mar 4 22:33:41 2015] http://your-file-system.com/index.php?id=47 [Wed Mar 4 22:33:58 2015] kaduk_: thanks [Wed Mar 4 22:34:21 2015] that was probably in a readme...that i did not read [Wed Mar 4 22:35:26 2015] I'm not sure about that (readme). Hope it works. [Thu Mar 5 09:42:37 2015] are those installers linked from the main page? [Thu Mar 5 14:02:43 2015] Is it possible to mount a volume by its numeric identifier, or exclusively by VLDB name? [Thu Mar 5 14:03:03 2015] What does "exclusively by VLDB name" mean? [Thu Mar 5 14:03:14 2015] Er, sorry, I mean "or is it only possible to mount by VLDB name" [Thu Mar 5 14:04:03 2015] Concretely, I have a volume whose RW/RO site is damaged, but an unaltered RO replica exists on another machine. I'd like to make a roclone of the undamaged replica and mount it in AFS. [Thu Mar 5 14:04:17 2015] I could vos dump | vos restore at a different name, but I was wondering if there was a faster way. [Thu Mar 5 14:06:28 2015] (We had some disk corruption underlying our big AFS server; it's so sad.) [Thu Mar 5 14:07:01 2015] Is this the one using ceph for the underlying storage? ;) [Thu Mar 5 14:08:17 2015] While it happens to be, the same question would arise if there were corruption on any other storage mechanism. The point remains: I have backups. [Thu Mar 5 14:10:01 2015] I suppose I could just remove the RO replica on the main server from the VLDB, so that a read-only mount would visit the other, un-damaged server. [Thu Mar 5 14:12:11 2015] It looks like everything is using the string identifier for volume mounts. [Thu Mar 5 14:12:32 2015] OK! Removing the site it is then. [Thu Mar 5 14:12:34 2015] Thanks. [Thu Mar 5 14:15:44 2015] you can mount by volume id. [Thu Mar 5 14:16:13 2015] "How many places in the documentation need updating about that?" [Thu Mar 5 14:16:14 2015] however, the volume ID is the same for all replicas so that won't make a difference to the client's strategy [Thu Mar 5 14:16:26 2015] Well, thus "make a roclone" with a different ID. [Thu Mar 5 14:16:26 2015] I don't recommend that you do it [Thu Mar 5 14:16:40 2015] I shan't; removing the damaged RO site is fine. [Thu Mar 5 14:16:44 2015] then just make a clone and give it a new name [Thu Mar 5 14:17:27 2015] Oh, huh; for some reason I thought the lightweight clones had to all be under the same VLDB entry. Sorry! [Thu Mar 5 14:17:54 2015] However, for my own curiosity's sake, how *do* you mount by volume ID? [Thu Mar 5 14:18:07 2015] use the number [Thu Mar 5 14:18:56 2015] (For clarity: I am on a phone call right now, so I was only looking at documentation, not code.) [Thu Mar 5 14:19:00 2015] vos exa and vos exa do exactly the same thing. they take a string provided and send it to the location service [Thu Mar 5 14:19:08 2015] Oh, hah. OK. [Thu Mar 5 14:19:31 2015] having read and modified that code I would assume that you know that [Thu Mar 5 14:21:21 2015] I try not to keep too much of volser/ loaded into my head, and the vos-each code just followed along the other users of the query API. [Thu Mar 5 15:43:12 2015] keeping too much of openafs source in your head hurts [Thu Mar 5 16:08:04 2015] just curious - would the vos convertROtoRW command not be appropriate for the discussed issue? [Fri Mar 6 12:07:52 2015] hi folks [Fri Mar 6 12:08:11 2015] I'm on the openafs mailing list and have noticed some discussion on filedrawers. [Fri Mar 6 12:09:31 2015] Am I right that filedrawers is only for mosaic afs files system? [Fri Mar 6 12:09:54 2015] filedrawers is not afs specific [Fri Mar 6 12:10:51 2015] it was developed at umich and provides a web UI for browsing and manipulating a file system from an Apache server [Fri Mar 6 12:11:50 2015] secureendpoints: Is it open source and available? It looks interesting. [Fri Mar 6 12:12:39 2015] it is open source but has not be publicly centrally maintained. there are several forks and lots of private patches that institutions have made to it to serve their own purposes [Fri Mar 6 12:13:00 2015] I just found the source page. thanks! [Fri Mar 6 12:13:09 2015] you found "a source page" [Fri Mar 6 12:15:14 2015] www.filedrawers.org [Fri Mar 6 12:48:31 2015] johnfg - dpending on your OS, you'll prolly need some patches... you can search for " filedrawers patch" and most likely find what you need [Fri Mar 6 12:49:33 2015] tho, the patches may be more for modwaklog than any other part [Fri Mar 6 12:49:48 2015] modwaklog? [Fri Mar 6 12:50:06 2015] modwaklog is what gets the web server an afs token on behalf of the user [Fri Mar 6 12:50:29 2015] tho, just as with non-web afs, modwaklog is used in concert with something that first gets a kerberos ticket [Fri Mar 6 12:51:21 2015] k. should have seen the modw as the prefix to aklog. I thought you were using some unknown acronym. I can't keep up. [Fri Mar 6 12:51:29 2015] :) [Fri Mar 6 13:25:25 2015] secureendpoints: Pretty old code, and RedFyre I'm not finding any fedora patches :-( [Fri Mar 6 15:47:21 2015] I wonder if we even still use filedrawers [Fri Mar 6 15:47:33 2015] probably, mfile.umich.edu seems to still work [Fri Mar 6 20:55:08 2015] I'd love to get a filedrawers instance up and running [Tue Mar 10 06:11:58 2015] Hello. Is there anyone who could help me with solving my AFS problem? Several times a day, AFS connection hangs for about minute for unknown reason. [Tue Mar 10 07:39:23 2015] kolsi: our psychic advisors are all engaged on other astral planes, and our non psychic ones can only help if you give more details [Tue Mar 10 07:41:45 2015] I don't know what exact details to give... it is just random behaviour - it works correctly most of the time, then accessing any AFS file/folder completely freezes the application for about 40-60 seconds. [Tue Mar 10 07:42:22 2015] even any "fs ..." command in cmdline does not work at the time (it hangs too)... it happens on Win7 Pro - OpenAFS 1.7.3100 [Tue Mar 10 07:43:00 2015] I was already discussing it with our system admin and could not find any problem [Tue Mar 10 07:55:43 2015] kolsi: you need network traces then, and/or system logs [Tue Mar 10 07:56:30 2015] kolsi: otherwising guessing at random is a bit difficult. Pauses that long could be due to DB server elections, to DNS issues, to congestion on the file servers, to disk or memory congestion on the local client. [Tue Mar 10 07:57:16 2015] kolsi: or you could wait for the psychic advisors to return to this astral plane :-) [Tue Mar 10 07:59:43 2015] Our AFS admin was checking server logs and he could not find anything wrong in it. [Tue Mar 10 08:00:27 2015] I tried enable tracing on my PC and watch DebugView but it shows nothing during that freeze.... just as usual, only long delay [Tue Mar 10 08:19:17 2015] kolsi: long delay *sending* requests or long delays getting *replies* to requests? [Tue Mar 10 08:24:52 2015] if I'm checking the network log in Wireshark, the network communication goes as usual... then it freezes [Tue Mar 10 08:25:33 2015] the most weird things is that no OpenAFS operation works during this time... even calling "fs trace -on" hangs [Tue Mar 10 08:29:49 2015] kolsi - try an older/newer version of the windows client and see if the issue goes away [Tue Mar 10 08:30:53 2015] debugview and process monitor will be helpful, too [Tue Mar 10 08:35:33 2015] I have the most recent version of OpenAFS ... but the problem appeared also on older versions [Tue Mar 10 08:45:28 2015] do all clients freeze at the same time, or diff clients diff times? Etc etc [Tue Mar 10 08:46:54 2015] I don't understand what you many with "the same time".. because it is completely random... sometimes it works for a whole day... sometimes it freezes several times an hour [Tue Mar 10 08:48:40 2015] well, if you have two windows machines sitting next to each other, when machine 1 freezes, you can check machne 2 [Tue Mar 10 08:49:08 2015] anyway, process monitor would help you to see what's going on the client during the freeze [Tue Mar 10 08:49:27 2015] the server should turn on fileserver auditing so that they can check those logs for when the freeze happens [Tue Mar 10 08:49:43 2015] we weren't able to reproduce the problem on another machine [Tue Mar 10 08:49:50 2015] so, it's just a single machine [Tue Mar 10 08:50:20 2015] I have to catch a bus to work... bbl [Tue Mar 10 08:50:48 2015] same here.. I will check it more later.. anyway thx for help [Tue Mar 10 09:24:03 2015] Hmm, if everything everything is hanging, that might indicate that something is sleeping with a lock held, or something like that. [Tue Mar 10 09:24:25 2015] I don't remember if cmdebug works against the windows client, but that sort of output might be helpful. [Tue Mar 10 13:16:55 2015] Anyone have a gentoo ebuild for >= 1.6.10? [Tue Mar 10 13:17:19 2015] I'm working with NP-Hardass on this too, but have kinda come to a halt. [Tue Mar 10 13:32:52 2015] johnfg: since the patch files aren't available in upstream gentoo at the moment, just copy the old patch tarball to fit the new version [Tue Mar 10 13:57:02 2015] NP-Hardass: Where would I get that from? [Tue Mar 10 13:59:19 2015] johnfg: cp /usr/portage/distfiles/openafs-SOMEVERSION-patches-1.tar.bz2 /usr/portage/distfiles/openafs-SOMEOTHERVERSION-patches-1.tar.bz2. The same process that you did for 1.6.9, because the patch tarballs A) Aren't available from gentoo (yet) and B) haven't changed between versions (yet) [Tue Mar 10 14:00:04 2015] johnfg: at this point, I'd say use the 1.6.11 ebuilds not the pre [Tue Mar 10 14:01:39 2015] NP-Hardass: k [Tue Mar 10 14:06:09 2015] NP-Hardass: I copied openafs-1.6.5-patches-1.tar.bz2 to openafs-1.6.11-patches-1.tar.bz2, but am still getting the errors when I manifest. [Tue Mar 10 14:06:51 2015] manifesting for 1.6.11? or 1.6.11_pre2? [Tue Mar 10 14:08:42 2015] 1.6.11. Here's the errors: https://bpaste.net/show/1f1d1d15f71d [Tue Mar 10 14:09:09 2015] And I went right to github to get it. [Tue Mar 10 14:11:06 2015] do you have a 1.6.10 ebuild in the directory? [Tue Mar 10 14:11:48 2015] when you manifest, it generates the manifest for ALL ebuilds in the directory, so it wants the patches for 1.6.10, so just cp that again, or remove the ebuild for 1.6.10 [Tue Mar 10 14:12:15 2015] NP-Hardass: will do [Tue Mar 10 14:14:08 2015] NP-Hardass: Should I remove the Manifest too? [Tue Mar 10 14:14:14 2015] before manifesting again? [Tue Mar 10 14:14:29 2015] not necessary [Tue Mar 10 14:15:40 2015] k [Tue Mar 10 14:15:41 2015] johnfg: to the best of my knowledge, the only reason why you'd do that is if upstream changed a file that you've downloaded. (bad practice anyway) [Tue Mar 10 14:17:39 2015] ya think I should try to emerge openafs first or do openafs-kernel at the same time? [Tue Mar 10 14:23:36 2015] right now, openafs pulls openafs-kernel, so shouldn't matter [Tue Mar 10 14:25:21 2015] k [Tue Mar 10 14:55:29 2015] NP-Hardass: Here's the errors that caused emerge to fail, i.e. the build.log: https://gist.github.com/e4831dd15f9a399813fd [Tue Mar 10 14:56:20 2015] johnfg: okay, gimme a sec to look into that [Tue Mar 10 14:57:37 2015] NP-Hardass: before you look, do you think I should change the kernel config to exclude AFS (which I had marked as an M)? [Tue Mar 10 14:57:54 2015] it should be unrelated [Tue Mar 10 14:58:49 2015] k [Tue Mar 10 14:59:28 2015] lemme know if you want to see anything else. I've got about 15 mins before I've gotta leave the office. [Tue Mar 10 14:59:39 2015] johnfg: I'll ping you when I'm done. gentoo council meeting is about to start, so might be a little while [Tue Mar 10 15:06:31 2015] johnfg: ah! I accidentally have "if use linux_kernel" instead of "if use kernel_linux" [Tue Mar 10 15:06:37 2015] updating now. apologies [Tue Mar 10 15:07:10 2015] no worries. which ebuild was that in? [Tue Mar 10 15:07:48 2015] L69 of openafs-kernel-1.6.11 [Tue Mar 10 15:09:24 2015] Do I need to do anything else other than trying the emerge? [Tue Mar 10 15:10:09 2015] That should be the only change necessary [Tue Mar 10 15:10:46 2015] Ok, tryin agin. [Tue Mar 10 15:10:59 2015] how was the meeting? [Tue Mar 10 15:13:26 2015] https://github.com/NP-Hardass/np-hardass-overlay/commit/ed368532ca484f47968e43ecbd58d7f37a809a9d [Tue Mar 10 15:13:42 2015] gentoo council meeting is ongoing. Quick fix, so I figured I'd do it now for you [Tue Mar 10 15:19:29 2015] I just edited my copy. Should that be ok? [Tue Mar 10 15:22:07 2015] mmhmm [Tue Mar 10 15:22:22 2015] was just showing you the commit+cred for spotting the problem [Tue Mar 10 15:23:04 2015] gotcha. I'm gonna disconnect from tmux, head home for lunch/change clothes, but I'll see if the emerge completes and let you know. [Tue Mar 10 15:23:10 2015] Thanks for the help! [Tue Mar 10 15:31:15 2015] No problem [Tue Mar 10 16:00:00 2015] Got quite a bit further before the emerge failed: https://gist.github.com/564d77fa090715a79598 [Tue Mar 10 17:40:07 2015] johnfg: that's from you missing files from the files directory in my overlay [Tue Mar 10 21:06:06 2015] NP-Hardass: Just got home. What files are missing? [Tue Mar 10 21:07:29 2015] johnfg: you'll want your files dir to match the one in the overlay. the error log specifically says "files/tmpfiles.d/openafs-client.conf" [Tue Mar 10 22:02:41 2015] NP-Hardass: Ok, just got them and will move them. files dir under openafs or openafs-kernel? [Tue Mar 10 22:07:49 2015] openafs-kernel [Tue Mar 10 22:08:01 2015] IIRC [Tue Mar 10 22:08:08 2015] idk, let me look [Tue Mar 10 22:08:39 2015] openafs/files/ not kernel [Tue Mar 10 22:35:50 2015] k [Tue Mar 10 22:36:31 2015] Should doing the manifest on the ebuild itself actually pull in those files? Is that the way ebuilds are supposed to work? [Tue Mar 10 22:36:40 2015] I haven't done enough of them to know. [Tue Mar 10 22:39:25 2015] "Pull in those files" from where? they are supposed to be in the files dir, not an external location for the system to grab. you just need to make sure to grab the files directory whenever you grab from an external overlay [Tue Mar 10 22:42:41 2015] NP-Hardass: k, good to know [Tue Mar 10 22:43:30 2015] btw...is there any git cmd that works in conjunction with an overlay? [Tue Mar 10 22:45:11 2015] I realize I have to manifest again after adding that file. [Tue Mar 10 22:46:18 2015] I don't understand your question [Tue Mar 10 22:48:27 2015] When I've built a kernel from the latest source, and am doing a pull, it gets pretty much everything. Probably just that I'm not as familiar with ebuilds and overlays. [Tue Mar 10 22:49:20 2015] doing the build now. [Tue Mar 10 23:47:29 2015] ok, I think it built successfully, finally (not you of late, me). [Tue Mar 10 23:48:21 2015] At the moment, I'm running 3.14.16, but have got 3.19.1 eselected, and so that's what I built against. [Tue Mar 10 23:48:36 2015] yep [Tue Mar 10 23:48:54 2015] Do both 1.6.9 and 1.6.11 exist at present? [Tue Mar 10 23:49:18 2015] what do you mean? [Tue Mar 10 23:49:20 2015] And are they only available, as I built them, for those respective kernels? [Tue Mar 10 23:50:48 2015] i.e., rxdebug localhost 7001 -version shows: AFS version: OpenAFS 1.6.9 built 2014-10-21 [Tue Mar 10 23:51:30 2015] Can I test openafs-1.6.11 while running 3.14.16? [Tue Mar 10 23:51:59 2015] I'm using 1.6.11 on 3.16 rightnow, it works with old kernels, you just need to build it for the old kernel [Tue Mar 10 23:52:50 2015] So, if I eselect 3.14.16, and emerge openafs-1.6.11 it will replace 1.6.9? [Tue Mar 10 23:53:01 2015] yeah [Tue Mar 10 23:53:26 2015] So I can't test this out as it is, until I boot 3.19.1? [Tue Mar 10 23:53:47 2015] as is? sure, you built the module for 3.19 [Tue Mar 10 23:54:05 2015] you could compile the module for your current kernel, and just load it [Tue Mar 10 23:55:24 2015] I think I'll try first with 3.19.1. Gonna reboot, so ttyl! thanks for all the help! [Tue Mar 10 23:55:35 2015] np [Tue Mar 10 23:56:11 2015] I'm doing this via ssh, so if I don't boot, I'll have to fix things tomorrow at the office. [Tue Mar 10 23:56:32 2015] if *it* doesn't boot, 3.19.1, I'll fix it tomorrow. [Tue Mar 10 23:56:36 2015] well, your kernel works, right? [Tue Mar 10 23:56:44 2015] Supposedly. [Tue Mar 10 23:56:50 2015] lol [Tue Mar 10 23:57:07 2015] you ever get your problems worked out? [Tue Mar 10 23:57:14 2015] with the latest kernel? [Tue Mar 10 23:57:33 2015] No, I'm waiting for nvidia to update their drivers [Tue Mar 10 23:58:01 2015] k, have a good night! [Tue Mar 10 23:58:26 2015] ooo! [Tue Mar 10 23:58:50 2015] speaking of, just synced, and looks like they updated nvidia drivers, and I should be good to go with them for 3.19 [Wed Mar 11 05:01:11 2015] yesterday, I was trying to solve my problem with OpenAFS freezing randomly... It starts to be really annonying... I have a DebugView log of this situation but I don't know how to use it. [Wed Mar 11 05:01:43 2015] the point of "freeze-start" and "freeze-stop" are visible in it [Wed Mar 11 05:02:46 2015] I could copy here that about 20 lines of this log, but I don't want to spam this chat... [Wed Mar 11 05:03:06 2015] there is no issue with firewall/NAT breaking the UDP packages/ MTU issue? [Wed Mar 11 05:23:51 2015] hard to say... this is local university network, and I have only default Win7 firewall, and network communication in Wireshark looks ok. [Wed Mar 11 05:26:01 2015] there are these items in this log: [Wed Mar 11 05:26:03 2015] ProcessRequest Processing AFS_REQUEST_TYPE_RELEASE_FID Index 000948D9 [Wed Mar 11 05:26:09 2015] ProcessRequest Not responding to async Index 000948D9 [Wed Mar 11 05:26:38 2015] These items are not normally there, only when problem appears [Wed Mar 11 05:28:24 2015] I am not into the code, but that could tell something, you did bos salvage on the fileservers already? [Wed Mar 11 05:28:33 2015] (just to be safe) [Wed Mar 11 05:31:08 2015] I don't have direct access to the servers [Wed Mar 11 05:31:57 2015] you can ask the server admins if they have bos salvaged the servers (to make sure, data is correct on servers storage) [Wed Mar 11 05:36:05 2015] I was communicating with admin and he checked the logs and said that he could not find anything wrong... also it seems that I am the only one with this problem (or better, nobody else reported it) [Wed Mar 11 06:52:22 2015] Apparently my laptop does not like Fedora 20, so I'm back to 21. Are there directions for getting AFS running on this system? [Wed Mar 11 06:54:14 2015] I don't see packages, so I assume I have to do a source build? [Wed Mar 11 06:57:39 2015] The packages mentioned in the Quick start do not exist for 1.6.11 (redhat) [Wed Mar 11 07:00:23 2015] And as before, it complains that I have the wrong kernel_devel version: : kernel-devel-x86_64 = 3.17.4-301.fc21 is needed by openafs-1.6.11-1.fc21.x86_64 [Wed Mar 11 07:00:31 2015] Package kernel-devel-3.18.8-201.fc21.x86_64 already installed and latest version [Wed Mar 11 07:06:42 2015] I tried the jsbillings copr at https://copr.fedoraproject.org/coprs/jsbillings/openafs/ and that is failing on the Error: Package: openafs-client-1.6.11-1.fc21.x86_64 (jsbillings-openafs) Requires: openafs-kmod >= 1.6.11 [Wed Mar 11 07:13:30 2015] I'm a little stuck on what to try next. I guess I'll try the non RPKM src. [Wed Mar 11 07:27:22 2015] Huh. Looks like the wiki is more up to date than the quickstart [Wed Mar 11 07:37:16 2015] except now I've built it and done a make install but can't figure out how to start it using systemd [Wed Mar 11 07:45:35 2015] I didn't do a clone from git and the build targets for dist are complaining. Using non-transarc paths and I don't see an /etc/openafs after make install. What am I missing/ [Wed Mar 11 09:18:00 2015] foley: what about a /usr/local/etc/openafs ? [Wed Mar 11 09:18:31 2015] The default prefix is still /usr/local IIRC and the etcdir is derived from that unless overridden at configure time. [Wed Mar 11 09:30:47 2015] foley: did you also add the jsbillings-openafs-kmod repo? [Wed Mar 11 09:31:25 2015] it's a separate repo because the kmod-openafs and dkms-openafs packages are built from a different spec file [Wed Mar 11 09:31:51 2015] (ideally because I want to rebuild against every new kernel being released) [Wed Mar 11 18:57:33 2015] To use the jsbillings source builds, what do I need to do? [Wed Mar 11 18:58:07 2015] Is there documentation (or a wiki page) somewhere? [Wed Mar 11 18:58:25 2015] I'd rather use dpms over kmod, but I will use whatever people recommend. [Wed Mar 11 18:58:58 2015] (dkms, not dpms) [Wed Mar 11 18:59:10 2015] I was just remembering. [Wed Mar 11 18:59:40 2015] cloning the git version in the meantime to see if I have better results building and packaging. [Wed Mar 11 19:00:35 2015] I think dkms is supposed to work from his packages. [Wed Mar 11 19:00:47 2015] From my logs: https://copr.fedoraproject.org/coprs/jsbillings/openafs/ for the client and server packages and https://copr.fedoraproject.org/coprs/jsbillings/openafs-kmod/ for the kmod-openafs and dkms-openafs packages. [Wed Mar 11 19:02:03 2015] Ah, so it has both. I'll give it a shot now. [Wed Mar 11 19:31:38 2015] This is much more promising. Looks like I just have AVC SElinux things to tweak. Thanks for the info! [Wed Mar 11 20:00:48 2015] Odd. I allowed all the AVC things I can understand. I'm seeing erorrs about "openafs-client" isn't signed with proper key. What? [Wed Mar 11 20:17:36 2015] that's weird... [Wed Mar 11 20:17:50 2015] which package? [Wed Mar 11 20:18:16 2015] oh, it's the silly kernel problem again. [Wed Mar 11 20:18:17 2015] * can't upload or sign pacakges on copr, just tell copr to rebuild. [Wed Mar 11 20:18:31 2015] ah [Wed Mar 11 20:18:37 2015] I only have a openafs.ko for kernal 3.17 again [Wed Mar 11 20:18:47 2015] yeah, I should tell it to rebuild. [Wed Mar 11 20:19:17 2015] one of the biggest limitations of copr (and koji, the buildsystem underneath) is that you can't just rebuild packages and have them all in the same repo [Wed Mar 11 20:19:28 2015] Ah, so what should I do? [Wed Mar 11 20:19:46 2015] you can use the dkms package until I get signed in and get it to rebuild packages [Wed Mar 11 20:19:50 2015] is this for f21? [Wed Mar 11 20:19:55 2015] Yup. [Wed Mar 11 20:20:01 2015] And I was trying to use the dkms package. [Wed Mar 11 20:20:15 2015] oh? that' should be sufficient [Wed Mar 11 20:20:37 2015] Though perhaps I did the wrong thing. I added your openafs and openafs-kmod to yum.repos.d [Wed Mar 11 20:20:48 2015] yup [Wed Mar 11 20:21:24 2015] Then "yum -y install openafs-client openafs-krb5 dkms-openfs [Wed Mar 11 20:21:53 2015] Everything looked good except for selinux errors, which I think I mostly fixed. After a reboot, it's not finding the kernel module. [Wed Mar 11 20:22:04 2015] hmmm [Wed Mar 11 20:22:17 2015] dkms might not have succeeded. you might want to check its logs [Wed Mar 11 20:22:38 2015] Which log? [Wed Mar 11 20:23:06 2015] I think its /var/log/dkms.log or something like that. let me look [Wed Mar 11 20:23:17 2015] or maybe /var/lib/dkms/openafs/version/ [Wed Mar 11 20:23:19 2015] I have no such log [Wed Mar 11 20:23:47 2015] ah yes, I do not have a folder maching the current kernel version [Wed Mar 11 20:25:57 2015] so, quite often in fedora, openafs's kmod stops building because of changes in the kernel [Wed Mar 11 20:26:06 2015] so it could be that the lastest kernel breaks something again [Wed Mar 11 20:26:09 2015] how do I force dkms to make the module or at least figure out what happend? [Wed Mar 11 20:26:23 2015] I have a 1.6.11-1.fc21 but not a folder matching my kernel. [Wed Mar 11 20:26:42 2015] dkms --verbose install -m openafs -v $VERSION [Wed Mar 11 20:26:59 2015] $VERSION is whatever is at /usr/src/openafs-$VERSION [Wed Mar 11 20:30:05 2015] Well, that fixed it. [Wed Mar 11 20:30:20 2015] So, something on my system prevented dkms from running [Wed Mar 11 20:30:41 2015] there are typically logs kept [Wed Mar 11 20:32:07 2015] Which ones would you suggest? [Wed Mar 11 20:32:46 2015] in /var/lib/dkms somewhere. I don't have a box in front of me that has it so I can't say exactly [Wed Mar 11 20:32:54 2015] I'd like this to not fail horribly when the kernel gets upgraded. [Wed Mar 11 20:33:01 2015] I did kick off a new build of the kmod [Wed Mar 11 20:33:04 2015] I'll go look around there. Thanks for helping me get this up! [Wed Mar 11 20:33:12 2015] I have fedora 21 on my workstation at work, using dkms [Wed Mar 11 20:33:21 2015] and I've not had any problems [Wed Mar 11 20:33:25 2015] $HOME in afs even [Wed Mar 11 20:33:41 2015] I just needed the use_nfs_homedirs selinux boolean toggled [Wed Mar 11 20:34:08 2015] er, use_nfs_home_dirs [Wed Mar 11 20:34:24 2015] Huh. I got a lot of avc errors about trying to do things in the cache partition which is unlabeled. [Wed Mar 11 20:34:33 2015] which cache partition? [Wed Mar 11 20:34:39 2015] /var/openafs/cache [Wed Mar 11 20:34:55 2015] oh? Hmmm. I chose that specifically because it was supposed to be right [Wed Mar 11 20:35:18 2015] oh, crap, no, it's supposed to be /var/cache/openafs [Wed Mar 11 20:35:26 2015] did I include the wrong path in a config file? [Wed Mar 11 20:36:07 2015] I found logs, but they are for each of the builds. I'm not finding a general log and No! typo on my part. It is 0:36 here [Wed Mar 11 20:36:19 2015] it is /var/cache/openafs you are right [Wed Mar 11 20:36:51 2015] 'matchpathcon /var/cache/openafs' shows that the default policy has it as afs_cache_t [Wed Mar 11 20:37:05 2015] But for some weird reason I got avc errors. I had to set rules for all file and directory operations for afs in the cache [Wed Mar 11 20:37:16 2015] hmmm. [Wed Mar 11 20:37:29 2015] did you try running a restorecon on /var/cache/openafs first? [Wed Mar 11 20:37:38 2015] it certainly shouldn't require custom rules [Wed Mar 11 20:37:43 2015] No, would that be useful with all of the custom rules? [Wed Mar 11 20:37:57 2015] I'm happy to test whatever you think would be helpful [Wed Mar 11 20:38:09 2015] if the rules changed the label definition, no [Wed Mar 11 20:38:32 2015] (I have never used selinux before today) [Wed Mar 11 20:38:36 2015] but if you run it, run a restorecon -r -v /var/cache/openafs. If it spits out a bunch of relabel lines, you'll know it is fixed [Wed Mar 11 20:38:54 2015] not sure how it got to be unlabeled or mislabeled [Wed Mar 11 20:39:04 2015] Yeah, a bunch of lines appeared [Wed Mar 11 20:39:16 2015] good, you shouldn't need your custom policy rules then [Wed Mar 11 20:39:16 2015] How do I get rid of all the custom rules to see if it is fixed? [Wed Mar 11 20:40:12 2015] semodule -e whateveryounamedyourpolicyfile [Wed Mar 11 20:40:48 2015] so if you created a localpolicy.pp file, run semodule -e localpolicy [Wed Mar 11 20:42:18 2015] Looking good. [Wed Mar 11 20:42:34 2015] FWIW, rebuilt with latest kernel [Wed Mar 11 20:42:58 2015] kmod-openafs-1.6.11-1.3.18.8_201.fc21.x86_64.rpm [Wed Mar 11 20:43:12 2015] I don't know if you have been writing this up somewhere, but I'm trying to take notes on our wiki at https://samvinna.ru.is/projects/projects/devnet-documentation/wiki/AFS_Client_Installation [Wed Mar 11 20:43:50 2015] Since this is a little tricker than just saying "yum install openafs-client etc." [Wed Mar 11 20:44:04 2015] oh [Wed Mar 11 20:44:26 2015] eventually, we hope to have openafs packages built for CentOS automatically in their Storage SIG [Wed Mar 11 20:44:43 2015] for centos 7 at least [Wed Mar 11 20:44:54 2015] centos6 packages will continue to be built upstream [Wed Mar 11 20:44:59 2015] (last I heard) [Wed Mar 11 20:45:56 2015] http://cbs.centos.org/koji/packageinfo?packageID=453 [Wed Mar 11 20:47:23 2015] I need to either get someone to figure out how to make it rebuild for new kernels or convince some bot someplace to do it for me [Wed Mar 11 20:48:03 2015] I'll need to repeat this procedure on some RHEL7 hosts, I haven't investigated that yet. [Wed Mar 11 20:48:47 2015] I build epel7 packages in copr too, but really, I think centos should be the proper place to have them built, even for RHEL7 hosts. [Wed Mar 11 20:49:52 2015] It does not matter to me as long as I can use them :) [Wed Mar 11 20:53:10 2015] when we set out to use centos, we hoped that the buildsystem would maintain versions of kmod-openafs for every kernel released (unlike copr, which only wants to have one, the latest one built) [Wed Mar 11 20:54:18 2015] This kernel dependency thing seems like a real pain, I have to admit. [Wed Mar 11 20:54:30 2015] you've got to believe it [Wed Mar 11 20:54:42 2015] and the hoops I jumped through to get it build right in the buildsystem are crazy [Wed Mar 11 20:54:59 2015] https://github.com/jsbillings/openafs is where the specfiles live [Wed Mar 11 20:55:17 2015] you should check out the find-installed-kversion.sh and see how it's run in the openafs-kmod.spec file [Wed Mar 11 20:55:36 2015] my first job was to make my boss's software in MIT's Network group compile on the many flavors of unix we had at the time, so I can at least imagine. [Wed Mar 11 20:55:46 2015] yeah [Wed Mar 11 20:55:55 2015] I made many ugly autoconf scripts that year. [Wed Mar 11 20:56:00 2015] I started off as a sysadmin at CMU building stuff across a bunch of unixes [Wed Mar 11 20:56:13 2015] It's amazing how fast you learn in such an environment :) [Wed Mar 11 20:56:31 2015] yup. and so many crappy open source projects assumed that awk = gawk and so on [Wed Mar 11 20:56:44 2015] Anywa, I need to clean the dishes and go to bed. Poke me if you want me to be your guinea pig. It's the least I can do for your help. [Wed Mar 11 20:56:57 2015] no problem. just say something here, I idle [Wed Mar 11 20:57:02 2015] Don't forget the different flavors of make! [Wed Mar 11 20:58:09 2015] One last thing -- did you writeup anywhere about how best to use your copr packages? Thats something that might be useful since the Quickstart and wiki are not very helpful in this regard for some Redhat varients. [Wed Mar 11 20:58:44 2015] I probably should write something, but no. [Wed Mar 11 20:59:07 2015] I only intended to use copr to jump-start my koji skills [Wed Mar 11 20:59:09 2015] I'm happy to give you access to our wiki :). But it probably makes sense to go on the official openafs wiki, I woudl assume. [Wed Mar 11 20:59:15 2015] but it's too darn useful [Wed Mar 11 20:59:18 2015] Heh [Thu Mar 12 03:12:07 2015] How do I install the openafs-client on Fedora 21? [Thu Mar 12 03:13:58 2015] I had someone helping me, but he had me use some sort of unofficial package (possibly to get it working with systemd) which broke my Fedora install... [Thu Mar 12 05:37:04 2015] grantwu: I do not know if your site (cmu.edu) has specific documentation for installing OpenAFS on Fedora. I would guess they do because AFS was developed at CMU. The only help I can offer is for a different Linux called "Mageia". The general process will be similar and for Fedora, you will use "yum" instead of Mageia's "urpmi" to install packages. HTH => https://wiki.mageia.org/en/Installing_OpenAFS_Client [Thu Mar 12 09:01:37 2015] grantwu - you can certainly build RPMs yourself from the source... [Thu Mar 12 09:01:43 2015] there's also: From my logs: https://copr.fedoraproject.org/coprs/jsbillings/openafs/ for the client and server packages and https://copr.fedoraproject.org/coprs/jsbillings/openafs-kmod/ for the kmod-openafs and dkms-openafs packages. [Thu Mar 12 09:02:35 2015] it also looks like the openafs.org website has a src.rpm which you should be able to rebuild into binary RPMs [Thu Mar 12 09:02:40 2015] http://www.openafs.org/release/latest.html [Thu Mar 12 09:02:51 2015] So many choices... [Thu Mar 12 09:02:58 2015] choices are good [Thu Mar 12 09:03:05 2015] IIRC the openafs.org one uses the old transarc paths, and the copr one uses FHS paths. [Thu Mar 12 09:03:28 2015] I believe so [Thu Mar 12 09:05:51 2015] * tries a rebuild of the openafs.org .src.rpm [Thu Mar 12 09:14:45 2015] and... success! [Thu Mar 12 09:33:01 2015] hey, someone from CPU [Thu Mar 12 09:33:05 2015] er, CMU [Thu Mar 12 09:33:24 2015] I thought they had switched everyone over to ubuntu there. [Thu Mar 12 09:47:00 2015] ew. The python module for google API client uses the python 'rsa' module, which has known vulnerabilities [Thu Mar 12 09:50:11 2015] this is what I get for agreeing to build a package for it for our web team [Thu Mar 12 14:24:03 2015] billings: I haven't seen anyone using ubuntu here, actually [Thu Mar 12 14:24:18 2015] most machines are RHEL [Thu Mar 12 14:24:52 2015] I'm just trying to get it working for my personal install [Thu Mar 12 14:45:37 2015] * uses openafs on rhel at work and fedora (now 21) at home [Thu Mar 12 15:19:00 2015] huh. Maybe it was SCS that was using ubuntu [Thu Mar 12 19:28:26 2015] billings: I'm a CS freshman, and I haven't seen any Ubuntu installs inside the CS building [Thu Mar 12 19:29:05 2015] I guess I would have expected students to prefer ubuntu for their personal machines, at least. [Thu Mar 12 19:33:46 2015] Hrm, most students don't actually have Linux installed, I think. [Thu Mar 12 19:34:56 2015] Everyone just SSHs into the university servers [Thu Mar 12 19:38:12 2015] Huh, interesting. [Thu Mar 12 19:38:18 2015] "From windows or from OS X?" [Thu Mar 12 19:43:27 2015] Uh, probably 70/30 windows vs OSX [Thu Mar 12 19:46:35 2015] Hrm, maybe 60/40 [Thu Mar 12 19:49:48 2015] wikigazer: Unfortunately, CMU computing services doesn't provide any instructions on installing OpenAFS. All it provides is instructions on how to install the something that will boot a virtualized image of linux with afs preinstalled... [Thu Mar 12 19:55:01 2015] grantwu: really? I thought they did, back when I was there [Thu Mar 12 19:55:19 2015] NP-Hardass: Things change :/ [Thu Mar 12 19:57:27 2015] Okay, I'm going to go make another Fedora stick so I can reinstall that [Thu Mar 12 20:00:48 2015] When were you at CMU? [Thu Mar 12 20:07:39 2015] grantwu: 2007-11 [Thu Mar 12 20:11:32 2015] Hrm, well, I haven't found any instructions provided by Computing Services, but I mean, a lot can change in 3/4 years. [Thu Mar 12 20:11:39 2015] They redid the undergrad curriculum [Thu Mar 12 20:11:55 2015] Well, the introductory curriculum, rather [Thu Mar 12 20:15:56 2015] grantwu: ping the computer club. They are usually helpful with these things. Great group of guys [Thu Mar 12 20:16:02 2015] loool [Thu Mar 12 20:16:14 2015] I'm in cclub xD [Thu Mar 12 20:16:22 2015] haha [Thu Mar 12 20:16:27 2015] They got me into this mess! [Thu Mar 12 20:17:09 2015] so the issue is what, exactly, you are having trouble configuring your client? [Thu Mar 12 20:17:44 2015] I installed it from the copr repo wikigazer listed [Thu Mar 12 20:17:58 2015] Then Fedora stopped booting reliably on 3.18 kernel [Thu Mar 12 20:18:26 2015] ah [Thu Mar 12 20:19:13 2015] They didn't try to convert you to debian? ^_~ [Thu Mar 12 20:19:27 2015] cclub has always been filed with debian lovers [Thu Mar 12 20:20:02 2015] Well the president-elect recommended Fedora to me, after I ran into issues he said loljk use Debian... [Thu Mar 12 20:22:29 2015] Ugh I need to get to a Linux machine to format this USB drive... okay I give up for now. [Thu Mar 12 20:22:35 2015] kernel problems are a pain [Thu Mar 12 20:23:02 2015] grantwu: oh, are you just booting from a flash drive? [Thu Mar 12 20:23:28 2015] NP-Hardass: No, but I need to install Fedora again, and my laptop doesn't have a CD drive. [Thu Mar 12 20:23:55 2015] The reason why I need to reinstall it was because I tried installing Debian, and the installer wasn't able to pull software from the repos from some reason... [Thu Mar 12 20:24:15 2015] That's a pain [Thu Mar 12 22:05:06 2015] who uses cd drives these days? :) [Fri Mar 13 05:34:12 2015] grantwu: glad to be some help. Did you manage to get OpenAFS installed and working on Fedora? [Fri Mar 13 07:22:03 2015] grantwu: don't recommend using the Mageia rpms in Fedora but you should be able to find the equivalent package names to install with "yum"? [Fri Mar 13 08:18:01 2015] * wonders if the CMU computer club still runs their own cell [Fri Mar 13 08:18:11 2015] I suppose its probably in the grand.central.org cellservdb [Fri Mar 13 08:19:01 2015] well, it is still browsable! [Fri Mar 13 08:20:39 2015] >club.cc.cmu.edu #Carnegie Mellon University Computer Club [Fri Mar 13 08:20:39 2015] 128.2.204.149 #barium.club.cc.cmu.edu [Fri Mar 13 08:20:39 2015] 128.237.157.11 #sodium.club.cc.cmu.edu [Fri Mar 13 08:20:39 2015] 128.237.157.13 #potassium.club.cc.cmu.edu [Fri Mar 13 08:20:52 2015] ref: http://grand.central.org/dl/cellservdb/CellServDB [Fri Mar 13 08:21:03 2015] um [Fri Mar 13 08:21:38 2015] * nods [Fri Mar 13 08:22:10 2015] i suspect enough knowledge has stayed around to keep it running [Fri Mar 13 14:11:56 2015] to package maintainer who provides separate client and server packages, would you mind sharing your build scripts? We have one unified package on Gentoo, and I'd like to change that. Would like to see how other distros are doing it. [Sat Mar 14 19:01:54 2015] this will be for *billings* whenever he comes in: the latest kernel made it where kmod didn't build successfully, and I can't get a token. [Sat Mar 14 19:10:18 2015] have you considered using memoserv to leave a message? [Sat Mar 14 19:10:59 2015] geekosaur: Never used it. [Sat Mar 14 19:11:20 2015] Should I just do like, /memoserv billings? [Sat Mar 14 19:11:28 2015] dumping it here when he's not in channel isn't very effective [Sat Mar 14 19:11:32 2015] /msg memoserv help [Sat Mar 14 19:11:45 2015] k, thanks. [Sat Mar 14 19:13:36 2015] geekosaur: So, to be sure, I'd do: /msg memoserv send billings? [Sat Mar 14 19:15:02 2015] looks right, yes [Sun Mar 15 19:01:12 2015] wikigazer: Sorry, no, won't be trying until next weekend. [Mon Mar 16 00:47:25 2015] Holy cow, the last package for openbsd was openafs 1.4.7 [Mon Mar 16 09:18:52 2015] NP-Hardass: Wheezy is only running 1.6.1. [Mon Mar 16 09:19:55 2015] I've got jessie in a vm, but can't access it right now, to tell what it runs a client. [Mon Mar 16 09:38:39 2015] johnfg: I rebuilt a f21 kmod, BTW [Mon Mar 16 09:45:29 2015] Cool! I'll install it later this morning. [Mon Mar 16 10:20:42 2015] jessie has 1.6.9 [Mon Mar 16 10:21:26 2015] sid has 1.6.10, and experimental has 1.6.11preN [Mon Mar 16 10:21:48 2015] johnfg: Booted and tested the kmod in the latest f21 kernel [Mon Mar 16 12:38:08 2015] kaduk_: thanks for the info. [Mon Mar 16 13:22:00 2015] billings: upgraded and working fine! Thanks! [Wed Mar 18 04:45:50 2015] Good day [Wed Mar 18 04:46:12 2015] I'm having an issue since upgrading my fileservers from 1.4.x to 1.6.10 [Wed Mar 18 04:46:20 2015] clients are 1.6.5 [Wed Mar 18 04:46:34 2015] the error message popping up quite often on the client is ENOBUFS [Wed Mar 18 04:47:00 2015] I've read this thread https://lists.openafs.org/pipermail/openafs-info/2014-July/040797.html [Wed Mar 18 04:47:10 2015] which states ENOBUFS == VNOSERVICE [Wed Mar 18 04:48:16 2015] according to the thread, "Since 1.6.5 a change was made to the client to translate the VNOSERVICE error to ETIMEDOUT" [Wed Mar 18 04:48:38 2015] but since my clients are 1.6.5 I'm thinking maybe I'm chasing a red herring here [Wed Mar 18 04:48:41 2015] any thoughts? [Wed Mar 18 04:58:55 2015] any chance to upgrade the client? [Wed Mar 18 04:59:14 2015] 1.6.11 is more or less recent [Wed Mar 18 06:12:23 2015] Amiga4000: not really, we'd like to use the os shipped version [Wed Mar 18 06:12:40 2015] OS being EL6 and a few EL7 [Wed Mar 18 06:13:01 2015] Amiga4000: we used to package ourselves, but don't really have the manpower for that anymore [Wed Mar 18 07:34:39 2015] the 1.6 file server will send VNOSERVICE if in the processing of an RPC that it is unable to send data to the client for a period of 60 seconds. This might indicate there is a problem with the storage used for the vice partitions. [Wed Mar 18 07:35:58 2015] The 1.4 file servers would not send this VNOSERVICE error in these situations [Wed Mar 18 07:55:24 2015] secureendpoints: thanks for clarifying this [Wed Mar 18 07:55:39 2015] secureendpoints: this seems to indicate my suspicion was righ [Wed Mar 18 07:56:13 2015] secureendpoints: any idea on how I could track down the root cause of that request taking so long? [Wed Mar 18 08:01:09 2015] I don't see anything in the logs of The fileserver at the time ENOBUFS was seen on the client [Wed Mar 18 08:02:56 2015] however I see substantial load on the Fileserver [Wed Mar 18 08:08:00 2015] secureendpoints: If I understand correctly, under the same circumstances in 1.4 server, the user wouldn't see the ERRNO, but the operation would still fail? [Wed Mar 18 08:10:24 2015] with 1.4 file servers the operation would block until the I/O completes. [Wed Mar 18 08:15:51 2015] and with 1.6 it fails [Wed Mar 18 08:16:12 2015] is this parameterizeable? [Wed Mar 18 08:19:14 2015] no [Wed Mar 18 08:19:24 2015] you can of course modify the source code [Wed Mar 18 08:19:33 2015] I guess this has been added to avoid a DOS? [Wed Mar 18 08:21:10 2015] it was added to permit clients to fail over to other file servers instead of getting stuck on a file server that is unable to respond [Wed Mar 18 08:21:59 2015] but it is an indication that there is a problem with the back end storage for the file server [Wed Mar 18 08:23:11 2015] if the storage is unresponsive then the file server will exhaust its thread pool and end up in a state where it is unable to process calls from clients [Wed Mar 18 08:27:25 2015] secureendpoints: looking at the performance data of the fileserver from an event, the only metric that seemed exhausted was network [Wed Mar 18 08:27:45 2015] disk I/O was pretty low [Wed Mar 18 08:27:51 2015] but net was saturated [Wed Mar 18 08:29:15 2015] moreover, this is a RW volume so no fallback possible [Wed Mar 18 08:29:54 2015] the logic is at the rx layer it has no knowledge of RW vs RO [Wed Mar 18 08:30:19 2015] the trigger is a failure to send data on the connection [Wed Mar 18 08:33:59 2015] secureendpoints: thanks for your insight [Wed Mar 18 16:16:31 2015] woohoo, officially taking over maintainership for Gentoo [Wed Mar 18 16:17:19 2015] NP-Hardass: thanks for your efforts! [Wed Mar 18 16:19:56 2015] jackhill: first update will be a version bump and warning users when they try to use unsupported kernels (because that happens way too often), second update, I am the process of trying to get our systemd scripts working (since that isn't our default init system) and then after that, I'm undertaking the task of attempting to split into client and server packages, because we only have a client/server [Wed Mar 18 16:23:12 2015] btw, a mod may want to kick cuquitu for being a spambot [Wed Mar 18 16:23:40 2015] hi [Wed Mar 18 16:24:03 2015] heh [Wed Mar 18 16:24:39 2015] looks like a freenoder beat me to it... [Thu Mar 19 08:23:17 2015] NP-Hardass: there's a systemd init script in the RedHat packaging area [Thu Mar 19 09:59:09 2015] There's also (different) unit files in the debian packaging [Thu Mar 19 09:59:32 2015] But ISTR that NP-Hardass was the one who filed the bug telling me to put a license statement on those ;) [Thu Mar 19 10:55:58 2015] kaduk_: twas I XD [Thu Mar 19 10:56:44 2015] billings: I'll take a look at those too. Thanks ^_^ [Thu Mar 19 11:13:52 2015] kaduk_: I finally got around to setting up a VM with systemd on it, and I'm attempting to figure it all out. Looking at yours, at other gentoo ones, at whatever ones I find. [Thu Mar 19 11:14:07 2015] *nods* [Thu Mar 19 11:14:19 2015] It's a little weird, but will hopefully make sense fairly quickly [Thu Mar 19 11:15:33 2015] kaduk_: yours, or systemd in general? [Thu Mar 19 11:16:27 2015] in general [Thu Mar 19 11:16:47 2015] Though, I guess there is some particularly strange stuff going on in mine, above and beyond the normal strangeness. [Thu Mar 19 11:18:52 2015] Well, it can only end up for the better, I think. Going through multiple systemd versions and the OpenRC version simultaneously. Hopefully that means that in the end, they all end up better off. [Thu Mar 19 11:19:05 2015] Yup [Thu Mar 19 11:22:20 2015] kaduk_: while I have you... Quick question. Your debian packages have the fileserver dependent on the client. Is there a hard dependency? Or is there another reason why they are dependent? [Thu Mar 19 11:23:01 2015] IIRC, some helper script (either a maintainer script or an RC script) is using the client utilities, but it's been a while since I looked explicitly. [Thu Mar 19 11:24:01 2015] kaduk_: Okay, I'll keep that in mind when I get to that point. Thanks. Appreciate it. [Fri Mar 20 09:35:01 2015] Hi All, I have moved my servers from 192.168.0/24 to a 10.100.2/23 subnet. Now I can no longer get a token. I have tried to get a token from clients as well as on the server itself but no luck so far. I have disabled iptables and rebooted my afs servers but still no luck. When I do aklog -d, I see about to resolve name blabla to id in cell blabla and the next line says error -2. the next line says setting [Fri Mar 20 09:35:02 2015] username to my.username. A tokens command reveals I have no tokens. In /var/log/messages I see "Lost contact with volume location servers 10.100.2.2 in cell mycell (code -2)" and another line mentioning IP 10.100.2.4, which is my second VLDB server. [Fri Mar 20 09:35:44 2015] I'm running CentOS6.6 by the way. [Fri Mar 20 09:35:57 2015] Anyone an idea what else I can do to debug? [Fri Mar 20 09:38:32 2015] Do you have NetInfo and/or NetRestrict files on the servers? [Fri Mar 20 09:40:21 2015] Did you change the IP addresses in the server CellServDB file? [Fri Mar 20 09:41:01 2015] no token really means no access to kerberos not the db servers so unless kaserver is the kdc [Fri Mar 20 09:41:38 2015] make sure you can get kerberos service tickets from your kdcs [Fri Mar 20 09:41:46 2015] then verify connectivity to the db servers [Fri Mar 20 09:41:54 2015] Is not "about to resolve name blabla to id in cell blabla" failing to talk to the dbserver? [Fri Mar 20 09:42:57 2015] yes. I guess it matters which version of aklog but using the -noprdb switch will rule that out [Fri Mar 20 09:43:03 2015] True. [Fri Mar 20 09:43:15 2015] And EL6 should hopefully not have a super-broken aklog. [Fri Mar 20 09:43:23 2015] then verify with udebug whether there is quorum among the db servers [Fri Mar 20 09:43:58 2015] rxdebug to verify connectivity and udebug to verify quorum [Fri Mar 20 09:44:52 2015] my guess is either the server cellservdb info was not updated or the iptables rules were not updated to permit the dbservers to speak to each other [Fri Mar 20 09:45:17 2015] Yeah, server cellservdb would be my guess [Fri Mar 20 09:45:31 2015] and if this is a cell that is exposed to the public Internet then verify the public IP to private IP mappings were updated appropriately [Fri Mar 20 09:45:45 2015] Yes I changed the IP addresses on all the servers and rebooted them all afterwards, just to be sure. I can get do a kinit/kdestroy and again kinit and klist, I do get a kerberos ticket. The strange thing is that after a kdestroy/kinit, aklog takes a lot longer than usual. It hangs at "about to resolve ..." Then fails. [Fri Mar 20 09:46:03 2015] I disabled iptables entirely so I think this can't be a firewall issue. [Fri Mar 20 09:46:11 2015] And no the servers are not connected to a public IP. [Fri Mar 20 09:46:30 2015] What does "changed the IP Addresses on all the servers" mean? [Fri Mar 20 09:47:14 2015] I run afs, afs01 and afs02. I changed all the servers from 192.168.0/24 to 10.100.2/23 [Fri Mar 20 09:47:26 2015] So I changed the local settings, CellServDB and DNS [Fri Mar 20 09:47:38 2015] One CellServDB file per machine, or two? [Fri Mar 20 09:48:23 2015] I changed one CellSerDB CellServDB.local and CellServDB.dist but they are all the same, just listing the IP addresses of my AFS servers [Fri Mar 20 09:48:47 2015] on all servers that is :-) [Fri Mar 20 09:48:50 2015] /usr/afs/etc or /usr/vice/etc? [Fri Mar 20 09:48:56 2015] so you did not change the "server" CellServDB files only the client CellServDB files [Fri Mar 20 09:48:58 2015] /usr/vice/etc [Fri Mar 20 09:49:11 2015] Change /usr/afs/etc/CellServDB as well :) [Fri Mar 20 09:49:20 2015] No on client and server. But I do get the problem if I do an aklog from the server itself [Fri Mar 20 09:49:22 2015] on all of the db servers and the file servers [Fri Mar 20 09:49:28 2015] yes on all the servers :-) [Fri Mar 20 09:49:49 2015] aha I did not change /usr/afs/etc/CellServDB ... let me have a look [Fri Mar 20 09:54:03 2015] Strange I didn't change /usr/afs/etc/CellServDB (maybe colleague of mine) but I see a server in there which only runs a fileserver instance (not vldb) that's not OK I guess? [Fri Mar 20 09:56:57 2015] And rxdebug gives me this if I run it on afs.mycell.com: [Fri Mar 20 09:57:01 2015] [root@afs ~]# rxdebug -allconnections -servers afs02 [Fri Mar 20 09:57:02 2015] Trying 10.100.2.4 (port 7000): [Fri Mar 20 09:57:09 2015] but that's all. [Fri Mar 20 09:58:43 2015] it will hurt performance to have non-responding servers in the server CellServDB [Fri Mar 20 09:58:54 2015] and I just nmapped port range 7000-7010 from afs to afs02 and I see filtered, not open ... [Fri Mar 20 09:59:00 2015] I guess that's the issue [Fri Mar 20 10:22:18 2015] Fixed ... There were still CellServDB~ (vi backup) files in /usr/vice/etc/ which still had the old IP addresses in them. I removed them and then it was OK. [Fri Mar 20 16:44:57 2015] Well, I dumped Fedora for Debian and now have working OpenAFS :) [Fri Mar 20 16:45:03 2015] Thanks all who helped [Fri Mar 20 16:45:52 2015] wheezy? jessie? [Fri Mar 20 16:46:02 2015] (I'm just curious; no real need to say.) [Mon Mar 23 13:45:53 2015] Hi, our site has user volumes released nightly, and backup volumes created each week. I'd like to start using BackupAFS to take vos dumps of the readonly volumes (though it appears to be designed to backup dumps of the .backup volume only). Is my plan reasonable? [Mon Mar 23 13:47:05 2015] (I'm aware that I might need to make a few little source changes to BackupAFS - I'm more interested in whether there is a good reason not to take the vos dumps from the readonly) [Mon Mar 23 13:52:40 2015] ashmc: I cannot speak to BackupAFS, as I don't know what it is, but our cell works exclusively with .readonly volumes and dumps them to archival storage. Take that with a grain of salt since I've been told I'm crazy, but there's at least one other person doing it. :) [Mon Mar 23 14:10:10 2015] ashmc - it's a bit of a hack, but should work... the biggest thing is to make sure to time everything such that you don't end up releasing volumes while the dumps are taking place (especially if you make use of the "incremental" feature of vos dump) [Mon Mar 23 14:11:04 2015] if BackupAFS is a script, then you should be able to edit it to dump the readonlys instead of the backups [Mon Mar 23 14:11:27 2015] of course, for readonlys replicated across servers, you prolly don't want to do dump the one on each server [Mon Mar 23 14:14:13 2015] If it helps, BackupAFS is a hack on top of BackupPC that swaps out the file based transports (rsync, smb) for vos dump and vos backup while keeping the backup scheduling and rotation logic. [Mon Mar 23 14:41:26 2015] " swaps out the file based transports (rsync, smb)" ... I'm not sure what that means [Mon Mar 23 14:55:52 2015] RedFyre: BackupPC is software to do backups on local file systems on remote computers. To do that it uses tools such as rsync or mounting cifs shares. To make it work with AFS, BackupAFS uses vos commands instead. [Mon Mar 23 14:56:08 2015] s/backups on/backups of/ [Mon Mar 23 15:32:34 2015] Okay, thanks. I think I need to do some research into how incremental backups work. I will probably backup readonly volumes for users, and groups, and only backup the .backup volumes for "special" volumes, which aren't auto released. [Mon Mar 23 15:36:10 2015] ashmc: FWIW, we always do full dumps and push them into bup, which uses a sliding deduplication algorithm, so we really only pay incremental storage costs. /afs/acm.jhu.edu/group/admins.pub/scripts/chicago/ dump-to-bup.sh, dump-all-to-bup.sh, and bup-tools.sh may be of interest. [Mon Mar 23 15:37:07 2015] As might be https://www.acm.jhu.edu/~admins.pub/systems/afs-bup.html [Mon Mar 23 16:19:04 2015] nwf: Cool, thanks! Do you find that doing full dumps all the time takes a while? I'm not sure if all of our fileservers could complete a full dump or all volumes each night. [Mon Mar 23 16:21:33 2015] It does take a while, yes. When things are humming along, a big volume without a lot of changes gets pushed into the archive at ~25MB/sec, which is pretty pathetic for the hardware that it's on, but it's not the worst thing imaginable (it's a lot, lot better than what we were using before). [Mon Mar 23 16:22:06 2015] We only dump weekly, tho', and we have a dedicated dumper server which hosts the RO replicas itself, so it's not been a huge problem for us. [Mon Mar 23 16:22:55 2015] Every now and again the two processes involved -- the dump and the nightly 'vos release' -- slam into each other and some volume misses the weekly archive. I mean to fix it, but it hasn't been a huge deal to date. [Mon Mar 23 16:23:01 2015] ah, I see. Seems pretty neat :) [Mon Mar 23 16:23:03 2015] (It's just a matter of automation.) [Mon Mar 23 16:27:17 2015] (It'd be nice if vos had a "wait for VLDB lock" or "wait for volume lock" option... really I'd just go for "all locking operations spin [slowly!] until the user hits ^C") [Tue Mar 24 12:17:52 2015] nwf: I'd love a vos that just blocked until it was ready. Maybe I should look into wrapping it. [Tue Mar 24 12:26:03 2015] Not sure wrapping is good enough; volumes could be busy mid-release, for example. [Wed Mar 25 11:59:11 2015] Hey channel; is OpenAFS finicky about which version of Visual Studio is used to build it? [Wed Mar 25 12:00:31 2015] yes [Wed Mar 25 12:01:13 2015] Will it work with one of 2010, 2012, or 2013? [Wed Mar 25 12:01:18 2015] no [Wed Mar 25 12:01:33 2015] those versions do not support building for XP [Wed Mar 25 12:03:13 2015] Hm. Lemme maybe restate my question. I want to build my vos-foreach branch and try to fix whatever's got it down on Windows; I'm not terribly interested in *using* the resulting binaries except to test that they got built correctly, and they're not going to be distributed within our cell. Does that relax constraints on VS version? [Wed Mar 25 12:03:50 2015] no [Wed Mar 25 12:04:22 2015] the tree does not build with VS2010 or later because it has dependencies on an SDK for XP which is incompatible with those versions of VS [Wed Mar 25 12:04:27 2015] is that clear? [Wed Mar 25 12:08:35 2015] If you need to build, submit draft changes to gerrit and then manually schedule a build of the draft branch on one of the windows builders [Wed Mar 25 12:11:00 2015] the hyatt training videos should really be in Spanish. I'm sure most of the staff that is watching it speaks Spanish as their primary language [Wed Mar 25 12:11:50 2015] its a video explaining how to clean rooms exposed to infectious materials [Wed Mar 25 12:12:28 2015] How do I manually trigger a build? [Wed Mar 25 12:13:29 2015] secureendpoints: how were your last two messages related to the current topic? [Wed Mar 25 12:14:21 2015] they weren't. meant for a different room. they are relevant to my current context though because I'm sitting in a hyatt bar waiting for a room to become available [Wed Mar 25 12:15:33 2015] nwf: I updated the README.WINDOWS somewhat recently with the results from my experience building it; you probably want to get VS2005 and the windows SDK/DDK version listed there. [Wed Mar 25 12:17:15 2015] the openafs buildbot page to permit manual builds appears to be broken [Wed Mar 25 12:17:43 2015] i don't believe we ever claimed everything said here would be relevant to the current topic [Wed Mar 25 12:18:03 2015] what should happen when you go to the builder page for example, http://buildbot.openafs.org:8010/buildslaves/win2008-amd64, is that a history of prior builds be listed with a form to submit a manual build. [Wed Mar 25 12:18:36 2015] dbrashear: to me, it seemed extremely out of place here [Wed Mar 25 12:18:47 2015] it was [Wed Mar 25 12:19:23 2015] yeah, it was context-free for here. i figured out the context, but .... :) [Wed Mar 25 12:20:17 2015] just as out of context. someone just walked up to me and asked if I have ever seen the Westminster Dog Show. [Wed Mar 25 12:21:25 2015] have you? [Wed Mar 25 12:23:11 2015] considering the dog at my feet has competed in the Westminster Dog Show I would hope so :-) [Wed Mar 25 14:29:33 2015] secureendpoints: kaduk_: Thanks for your help; I have a copy of VS2005 now. :) [Wed Mar 25 14:36:47 2015] You will also require the 6.0a SDK and the 7600 DDK [Wed Mar 25 15:33:59 2015] (15b5353759984bf0942dff9c3df2959f0a59b405 is the commit I was referring to) [Thu Mar 26 17:33:43 2015] Oh my @&#^&@#@%#&@ I hate Windows so much. I suspect this is just lack of familiarity but man these build tools are offensively opaque. Having bashed my head through several previous barriers, I'm now stuck with 'undefined symbol __imp_printf' errors. [Thu Mar 26 17:33:59 2015] I think I give up. [Thu Mar 26 17:34:28 2015] The openafs windows build system is not exactly known for its modernity (or sanity). [Thu Mar 26 17:34:42 2015] http://buildbot.openafs.org:8010/buildslaves/win2008-amd64 appears to still be broken (at least, I assume it shouldn't be saying what it's saying...), too. [Thu Mar 26 17:34:53 2015] What version are you trying to build? [Thu Mar 26 17:35:06 2015] HEAD with a cherry-picked vos-foreach. [Thu Mar 26 17:35:26 2015] I would advise trying to build something known to be buildable before trying more exciting things. [Thu Mar 26 17:35:28 2015] Windows being the last builder where vos-foreach failed to compile. [Thu Mar 26 17:35:34 2015] Also, the build dependencies for master and 1.7 are different. [Thu Mar 26 17:35:35 2015] It's failing *well before* my code comes into question. [Thu Mar 26 17:36:10 2015] The buildslave info traceback may be a configuration error of the buildbot instance on the buildslave and not necessarily indicative of other issues. [Thu Mar 26 17:36:45 2015] OK, sure, fine, but secureendpoints said to use that URL to schedule a manual build. [Thu Mar 26 17:37:23 2015] I could just push a no-change commit to gerrit, but that seems rude to all the non-windows builders. [Thu Mar 26 17:37:47 2015] I would not worry about it very much unless there's a queue of changes to build. [Thu Mar 26 17:38:05 2015] But http://buildbot.openafs.org:8010/grid shows none, so... [Thu Mar 26 17:38:36 2015] On the other hand, I'm not sure why the windows builders aren't "waiting"; maybe the buildbot master config has been changed since I Was last paying close attention. [Thu Mar 26 17:44:14 2015] Oh well, time for a rebase and push. [Thu Mar 26 17:45:14 2015] I have certainly done my share of debugging-windows-build-issues-via-buildbot. Though I couldn't do that for the rxkad-k5 patch so I had to pull up my own build environment. [Thu Mar 26 18:09:00 2015] for what its worth, there is nothing wrong with the windows buildbot builders for openafs. either for master or 1.7 [Thu Mar 26 18:09:33 2015] and I said yesterday that the url that should provide the UI to permit scheduling of manual builds is broken [Thu Mar 26 18:09:54 2015] I no longer have access to the buildbot master so there is nothing I can do about it [Thu Mar 26 18:11:19 2015] vos-each.c build failure http://buildbot.openafs.org:8010/builders/win2008-amd64-checked-builder/builds/264/steps/compile/logs/stdio [Thu Mar 26 18:15:58 2015] my guess looking at the headers included within vos-each.c is that the problem is a failure to include afs/stds.h after roken.h [Thu Mar 26 18:34:26 2015] nwf your answer is at http://gerrit.openafs.org/#patch,sidebyside,10966,20,src/volser/vos-each.c [Thu Mar 26 18:55:41 2015] secureendpoints: so it should be "#include \n #ifdef AFS_NT40_ENV \n #include \n #endif" ? [Thu Mar 26 18:55:45 2015] Thanks very much for looking. [Thu Mar 26 18:58:44 2015] Er, I mean, the commented-upon block should move up, not that roken.h should move down, right? [Thu Mar 26 19:01:17 2015] when building a windows application/library the very first include must be windows.h. That rule becomes for afs that the first three include files must be afsconfig.h, afs/param.h and roken.h. [Thu Mar 26 19:01:46 2015] shlapi.h must be included immediately following windows.h and before any CRT headers [Thu Mar 26 19:47:31 2015] Arg, it helps to git add first. _-_ [Thu Mar 26 20:15:25 2015] secureendpoints: Thanks! vos-each is verified! :D [Thu Mar 26 21:23:05 2015] kaduk_: Are you adamant that vos-each not make it into 1.8? It'd be nice to have in some of our automation and I don't really fancy keeping a HEAD build around as a prerequisite; seems mean to my successor(s). [Thu Mar 26 23:00:38 2015] I am not adamant, no. See if you can convince Chas (or someone else) to review it :) [Fri Mar 27 17:36:17 2015] Hm. I see that 'vos backupsys' falls back to some non-posix regex library when !HAVE_POSIX_REGEX. Should I make 'vos each' do the same? I think that's all that's missing in actually being able to replace 'vos backupsys' hand-rolled iteration with 'vos each'; right now, absent this fallback, I don't have a way to vsuEach with a prefix-match requirement. Well, no pleasant way, anyway. [Fri Mar 27 19:13:24 2015] well, you can certainly do a vos list on your own, grab the list of vols, and do whatever regex match the run vos backup on each one [Fri Mar 27 19:16:06 2015] CybrFyre, I think you are missing context. nwf is writing a new tool to replace vos backupsys. See http://gerrit.openafs.org/#change,10966 [Fri Mar 27 19:17:02 2015] Well, I'm not opposed to leaving backupsys as is if that's the best way forward, but it is at present highlighting a deficiency of vsuEach. [Fri Mar 27 19:18:56 2015] nwf: you are referring to src/util/regex.c ? [Fri Mar 27 19:20:36 2015] I'm not sure; vos.c:/^BackSys calls re_comp if !HAVE_POSIX_REGEX, which might be provided by src/util/regex.c if the platform doesn't natively have it? [Fri Mar 27 19:25:02 2015] yes, re_comp() is from src/util/regex.c [Fri Mar 27 19:25:13 2015] That is what is used on Windows. [Fri Mar 27 19:25:34 2015] I have never found a non-GPL version of POSIX regex [Fri Mar 27 19:27:46 2015] oh, ok, something built into vos, then, as compared to a "wrapper" [Fri Mar 27 19:27:52 2015] should be possible to make one from the Spencer v8 regex code though [Fri Mar 27 19:27:54 2015] nwf - awesomeness [Fri Mar 27 19:30:15 2015] secureendpoints: What's wrong with https://github.com/freebsd/freebsd/tree/HEAD/lib/libc/regex ? (I am not a lawyer, so please take that as a very naive question.) [Fri Mar 27 19:31:31 2015] https://github.com/freebsd/freebsd/blob/e2e2bd33d3725e331eaa5b037da7e071618b8774/lib/libc/regex/COPYRIGHT [Fri Mar 27 19:32:10 2015] hah, which would be based on what I already suggested [Fri Mar 27 19:32:12 2015] I don't know who decided to mix those two licenses but "um, its special" [Fri Mar 27 19:32:47 2015] interesting combination, yes [Fri Mar 27 19:32:49 2015] Oh boy, that *is* exciting. [Fri Mar 27 19:33:06 2015] I'm surprised the BSD guys are OK with that. [Fri Mar 27 19:33:07 2015] Henry's license cannot be altered. So if the BSD license came first then Henry's license is invalid and if the BSD license came second, then that is invalid. [Fri Mar 27 19:33:53 2015] Henry's came first. But Henry's is for the original code which was a reimplementation of v8 regex; the BSD license presumably applies to the changes to make it POSIX regex instead [Fri Mar 27 19:33:57 2015] Based on the copyright dates it looks like the BSD license came second [Fri Mar 27 19:34:12 2015] But you don't really get to do that [Fri Mar 27 19:34:27 2015] You can modify the code under the original license or not at all [Fri Mar 27 19:35:03 2015] That is why OpenAFS wants newly licensed code in a separate source file with a clear licnese [Fri Mar 27 19:35:06 2015] http://git.musl-libc.org/cgit/musl/tree/src/regex maybe? (Looks MIT license?) [Fri Mar 27 19:35:25 2015] using this is going to require asking license questions of the freebsd folks. better to dig out the original Spencer v8 code and adapt it [Fri Mar 27 19:35:41 2015] nwf: 2-part BSD. That is better [Fri Mar 27 19:36:47 2015] or not. It keeps the advertising clause [Fri Mar 27 19:38:38 2015] It does? [Fri Mar 27 19:42:21 2015] In any case, this is somewhat straying from the main question of "Should I make vsuEach call re_comp and friends when !HAVE_POSIX_REGEX?" [Fri Mar 27 19:48:25 2015] yes [Fri Mar 27 19:50:50 2015] advertising is anything that requires the copyright notice and license to be included in all documentation and other materials distributed with binaries [Fri Mar 27 19:52:49 2015] Is that incompatible with IPL10 or OpenAFS requirements? [Fri Mar 27 19:53:18 2015] Sorry, I was thinking you meant https://www.gnu.org/philosophy/bsd.html [Fri Mar 27 21:04:54 2015] its not incompatible, its just a nuisance. similar to the problem that the page you referenced describes the same occurs with the need to document. technically it means that any man page for command that links to the binary code must include the license text, all documentation must do so, all licenses displayed to end users, all "about" screens, etc. Its just a lot of book keeping that is never done properly. [Fri Mar 27 22:47:13 2015] We could always write to the musl author and ask. :) [Mon Mar 30 13:07:33 2015] Uh oh. I just realised that BackupAFS's downloads are dated 2010. Is it dead, or do any forks get updates? [Mon Mar 30 13:10:43 2015] ashmc: It is not dead. My former employer is the developer and still actively uses it. We just didn't have any problems with it that needed fixing. [Mon Mar 30 13:10:59 2015] If you have improvements, please submit patches! [Mon Mar 30 13:13:04 2015] We've tested it on at least Ubuntu 12.04 (I can't remember if we did 14.04). If changes are needed to make it work on 12.04 that means that we forgot to push them to SF. [Mon Mar 30 13:18:09 2015] jackhill: Ah, okay, thanks. I'm probably going to devise some way to backup readonly volumes, I may want to add an option to prevent backupafs doing the vos_backup/vos_release for a given volume set too. Also I got tripped up by the error messages, I may try to sort that in the process. [Mon Mar 30 13:18:42 2015] There is no public version control tree, is there? [Mon Mar 30 13:21:57 2015] ashmc: um, it was mostly my boss's baby. He mostly used SVN, but I'm not seing it now, so maybe not. [Mon Mar 30 13:22:05 2015] He's not here, but does read openafs-info. [Mon Mar 30 13:26:52 2015] jackhill: Okay, I think I'll just make my own repo and if I produce anything that might be publically useful I'll throw patches at him. Thanks :-) [Mon Mar 30 13:30:06 2015] ashmc: :) [Mon Mar 30 15:20:11 2015] Has anyone heard from dhowells recently? [Mon Mar 30 18:19:17 2015] not on this end [Wed Apr 1 08:37:26 2015] Hmm. Is there a known issue with lists.openafs.org? I can't read web posts at all [Wed Apr 1 08:38:39 2015] Also (not sure if this is a new thing) https://docs.openafs.org/AdminGuide/HDRWQ208.html has a certificate hostname error. [Wed Apr 1 08:47:11 2015] ashmc: seems it's hosted on www.openafs.org, and there isn't a separate cert for docs.openafs.org [Wed Apr 1 08:47:47 2015] the docs link on the homepage doesn't like to an https:// url, so I suspect that it's extensions like https everywhere that might lead to that error [Wed Apr 1 09:09:00 2015] It is known that we do not serve reasonable things over https, yes. [Wed Apr 1 09:44:45 2015] should RxRPC version packets just be ignored if they're not responded to? [Wed Apr 1 09:54:48 2015] why are you not responding to them? [Wed Apr 1 09:57:14 2015] version queries should be responded to otherwise the peer will think the packets have been dropped and they will be resent. version replies which are often sent just to keep NATs happy do not get replied to and should be dropped if unexpected [Wed Apr 1 09:58:30 2015] OpenAFS doesn't seem to respond to them [Wed Apr 1 09:59:32 2015] rx version queries are what "rxdebug -version" uses to obtain the version string [Wed Apr 1 10:00:28 2015] when I mount your-file-system.com, I see version requests coming at me [Wed Apr 1 10:00:44 2015] requests or responses? [Wed Apr 1 10:01:09 2015] you should be seeing rx version responses [Wed Apr 1 10:01:42 2015] well, I'm not sending version requests [Wed Apr 1 10:02:25 2015] which are used to maintain at least one packet every N seconds (as per IETF guidelines) to keep NAT port mappings open [Wed Apr 1 10:02:57 2015] The receiver of an rx version response does not reply. [Wed Apr 1 10:03:18 2015] what's an rx version response look like? [Wed Apr 1 10:03:22 2015] it's like udp: sure hope you got that reply you asked for [Wed Apr 1 10:09:21 2015] the type is set to RX_PACKET_TYPE_VERSION (13), the cid, callnumber, seq, serial may all be 0; the epoch in ours is 999; the payload is a single byte. since it's unsolicited in this case, the payload is not expected to be parsed but the single null byte is not a string anyway [Wed Apr 1 10:53:02 2015] dbrashear: is that the version request or the version response? [Wed Apr 1 11:10:12 2015] that's the reply. [Wed Apr 1 11:12:03 2015] dbrashear: is there a matching request? [Wed Apr 1 11:14:03 2015] the nat spoofing sends none, but there is a request [Wed Apr 1 11:14:11 2015] There can be, but in the case in question the reply is being used as a nat ping [Wed Apr 1 11:17:59 2015] specifically because it will be discarded without action [Wed Apr 1 11:21:08 2015] so the Version packet with a payload of a single 0 byte is the request and a string padded to 64 bytes with NULs is the response? [Wed Apr 1 11:47:18 2015] uh, the request packet was never described [Wed Apr 1 11:47:41 2015] (here) [Wed Apr 1 11:48:25 2015] i don't know what it is off top of head, but the reply should be able to do (and is in our code) the header plus a single null byte [Wed Apr 1 11:53:45 2015] dbrashear: "rxdebug grand.mit.edu -version" sends a packet with a version packet with single zero byte and gets back a version packet with a string [Wed Apr 1 12:04:53 2015] sure, but that you get a string in reply does not mean that a string of that length is the only valid reply [Wed Apr 1 12:05:20 2015] it is valid. it is not the sole valid reply. i guess is what i am trying to impart [Wed Apr 1 12:05:28 2015] I presume this isn't described in docs anywhere? [Wed Apr 1 12:07:39 2015] not in rx spec from kolya and i know of nowhere else which describes it [Wed Apr 1 12:08:07 2015] can't say I'm surprised :-) [Wed Apr 1 12:08:12 2015] nor i [Wed Apr 1 12:08:17 2015] sadly, yes [Wed Apr 1 12:08:30 2015] "why do you need documentation when you have the code?" [Wed Apr 1 12:08:57 2015] licensing, of course [Wed Apr 1 12:14:46 2015] well, the free as in free rx certainly is compatible enough with this part [Wed Apr 1 12:16:17 2015] such as the one in Arla? [Wed Apr 1 12:25:51 2015] i believe the one in arla is derived from it. the one i got from transarc in 1997ish with a letter saysing it was free [Wed Apr 1 14:51:00 2015] dbrashear: [root@carina ~]# rxdebug andromeda -version -port 7001 [Wed Apr 1 14:51:00 2015] Trying 90.155.74.21 (port 7001): [Wed Apr 1 14:51:00 2015] AFS version: linux-4.0.0-rc6-fsdevel+ AF_RXRPC [Wed Apr 1 14:52:41 2015] Well, that's a thing. [Wed Apr 1 14:59:02 2015] unicode-thumbs-up [Wed Apr 1 16:11:55 2015] oh joy. systemd does *not* work well with AFS homedirs. [Wed Apr 1 16:12:06 2015] if you want to use systemd --user [Wed Apr 1 16:12:45 2015] "the debian unit file, or the rpm one?" [Wed Apr 1 16:13:10 2015] kaduk: are you asking me? [Wed Apr 1 16:13:32 2015] * is talking about sytemd --user and AFS $HOME [Wed Apr 1 16:13:33 2015] billings: yes [Wed Apr 1 16:14:17 2015] so [Wed Apr 1 16:14:25 2015] the afs client works fine [Wed Apr 1 16:14:32 2015] and I can use afs as my homedir just fine [Wed Apr 1 16:14:49 2015] but what happens when you log in with a system that uses systemd-logind, a systemd --user process is started [Wed Apr 1 16:15:02 2015] and apparently it either runs in its own PAG or without a PAG [Wed Apr 1 16:15:42 2015] so I run 'systemct --user enable emacs.service' [Wed Apr 1 16:16:16 2015] and it talks to the systemd --user process running under my userid, which then tries to symlink /usr/lib/systemd/user/emacs.service into ~/.config/systemd/user/ [Wed Apr 1 16:16:34 2015] however, the systemd process get a permission denied [Wed Apr 1 16:17:09 2015] manually creating it and logging out/back in and it doesn't start [Wed Apr 1 16:17:17 2015] so I suspect it's being started outside of the login pag [Wed Apr 1 16:17:56 2015] this is (obviously) not OpenAFS's fault [Wed Apr 1 16:18:11 2015] but just in case anyone else asks, it appears to be a bug in systemd. [Wed Apr 1 16:19:15 2015] I'm pretty certain that no one seriously uses the --user functionality. yet. [Wed Apr 1 16:22:54 2015] * wonders if re-ordering the pam modules would help [Wed Apr 1 16:26:05 2015] * gives it a try [Wed Apr 1 16:31:56 2015] dhowells: Nice! :) [Wed Apr 1 16:32:33 2015] nope. systemd --user still doesn't have the ability to enable user units. [Wed Apr 1 16:33:26 2015] How well is systemd --user going to do if the tokens expire? [Wed Apr 1 16:33:30 2015] I'm still a little confused about this, because one of the things that seemed to be hanging shutdown was a systemd user session process, even thoug the user in question thought he was not using systemd --user [Wed Apr 1 16:34:22 2015] nwf: well, if it were started in the same PAG as the logged-in user, tokens would be renewed when the user renews it for other functions in $HOME [Wed Apr 1 16:34:49 2015] Right, but we have users who don't notice that their tokens have expired until they try to save a file. [Wed Apr 1 16:34:56 2015] * nods [Wed Apr 1 16:34:59 2015] so, same thing [Wed Apr 1 16:35:22 2015] Well, if systemd --user decides to bail on write errors, for example... [Wed Apr 1 16:35:33 2015] ... which it probably does... [Wed Apr 1 16:36:21 2015] I think its got to be a bug that it isn't being started with the rest of user processes [Wed Apr 1 16:36:46 2015] billings: What does 'ps -auxf' show about the process relationship? [Wed Apr 1 16:36:58 2015] It's possible that systemd is spawning systemd --user in response to an RPC? [Wed Apr 1 16:37:28 2015] not sure [Wed Apr 1 16:37:54 2015] I'll need to ask the systemd developers, trying to puzzle this out is giving me a headache [Wed Apr 1 16:38:42 2015] I fully understand. [Thu Apr 2 05:27:40 2015] Gerrit seems to be having some trouble: "com.google.gwtorm.client.OrmException: Cannot open database connection" [Thu Apr 2 05:38:46 2015] I was going to mark http://gerrit.openafs.org/#q,11823,n,z as -2, since I have done nothing but compile-tested it locally, but I think it solves the problem. [Thu Apr 2 08:46:22 2015] i'm hitting the error too. [Thu Apr 2 09:42:29 2015] nwf: gerrit should be back up [Thu Apr 2 09:44:45 2015] meffie: ^^^^ [Thu Apr 2 09:48:16 2015] kaduk: ok, thank you. and thank you for rebasing that commit. i missed it needed to be rebased. [Thu Apr 2 09:48:54 2015] I only noticed because I went through and spot-checked the list from two weeks ago [Thu Apr 2 10:26:05 2015] so, it sounds like systemd --user is started by the init service outside of the startup of a user's session. [Thu Apr 2 10:26:30 2015] supposedly because it is supposed to run once per-uid and not once per session [Thu Apr 2 10:26:47 2015] but this means that it can't touch ~/.config/systemd for users with AFS homedirs [Thu Apr 2 10:27:25 2015] it's not really a huge issue now. but as systemd, logind and friends grow in popularity, I'm sure we'll start seeing stuff that relies on the user systemd to manage a user's desktop environment somehow [Thu Apr 2 10:39:31 2015] ugh. [Thu Apr 2 10:41:12 2015] kill it. kill it with fire. [Thu Apr 2 11:54:38 2015] ? [Thu Apr 2 12:01:01 2015] * scrools [Thu Apr 2 12:01:15 2015] sigh, sounds like RH *still* doesn't get network auth [Thu Apr 2 12:01:53 2015] they were getting this wrong back in RH6 (not RHEL, go back a decade or so...), and they still are now [Thu Apr 2 12:08:20 2015] what now? [Thu Apr 2 12:10:09 2015] systemd --user pretends that root can always write to the user's homedir [Thu Apr 2 12:10:15 2015] hm, this is your problem [Thu Apr 2 12:10:24 2015] ah, yeah [Thu Apr 2 12:10:40 2015] RH still does not understand that credentials are not per uid but per session [Thu Apr 2 12:11:01 2015] If I'm not mistaken, this also will affect people using NFSv4 with sec=krb5 [Thu Apr 2 12:11:17 2015] Most likely. [Thu Apr 2 12:11:44 2015] NFSv4 already suffers from this, see the current discussion on kerberos@mit.edu :) [Thu Apr 2 12:13:14 2015] (tl;dr: someone using a shared account just discovered that nfs4 uses the first ticket cache for the uid that it finds, so if one session's tickets expire the kerberized mount goes dead for all sessions with that uid) [Thu Apr 2 12:20:00 2015] yup [Thu Apr 2 12:20:59 2015] billings: is there a RedHat bug? [Thu Apr 2 12:28:06 2015] So far, no. [Thu Apr 2 12:28:22 2015] as far as I know [Thu Apr 2 12:28:46 2015] I'd like to set up a test environment using only RHEL packges and nfsv4/krb5 $HOME [Thu Apr 2 12:38:43 2015] indeed [Thu Apr 2 12:59:52 2015] jsherrill: it was successful so it seems all is well again for the moment. [Thu Apr 2 13:00:02 2015] oops wrong room :D [Thu Apr 2 13:41:37 2015] Are there other applications that use Rx? [Thu Apr 2 13:43:23 2015] there are a number of in-house applications that use rx [Thu Apr 2 13:50:09 2015] is see. I guess the reason I was curiosity of what other sorts of things it has been useful for. [Thu Apr 2 17:10:59 2015] Hm; is there an easy way to make gerrit not push things off to the builders if only the commit message has changed? I feel bad burning so much carbon for something that won't change the compilation results... >_> [Thu Apr 2 17:11:10 2015] I don't know of one. [Thu Apr 2 17:11:32 2015] If there is a queue, you can (sometimes?) cancel pending builds, but that's a per-builder operation. [Thu Apr 2 17:11:41 2015] I see. [Thu Apr 2 17:16:35 2015] nwf: your revised commit messages do not explain the problem sufficiently for someone like me to understand why your change is correct [Thu Apr 2 17:17:17 2015] secureendpoints: I'm sorry; should I include a trace of the code on a particular query and VLDB entry? [Thu Apr 2 17:18:13 2015] 11822 for example is a change in behavior. Why is your change better than what was there previously? What is failing? Why is this change in behavior not going to break something relying on the old behavior? [Thu Apr 2 17:18:47 2015] 11822 is just something that seemed wrong, as I just wrote in the review box. [Thu Apr 2 17:18:48 2015] Changes in RPC semantics require a new RPC [Thu Apr 2 17:19:32 2015] Is there actually a spec for what ListAttributesN2 is supposed to do? Because I'm pretty sure it's current behavior isn't sane, but if it's the spec, it's the spec. [Thu Apr 2 17:19:53 2015] I don't think 11822 changes existing semantics. [Thu Apr 2 17:20:10 2015] 11823 certainly does, but I think in a way that is more obviously what was intended? [Thu Apr 2 17:20:34 2015] Any RPCs that were added after 1993 were not documented by IBM. The spec is the code. [Thu Apr 2 17:20:49 2015] But if the RPCs are fossilized, I'll just add RT 131837 to the BUGS section of vos each and die a little bit inside. [Thu Apr 2 17:22:03 2015] Just because something "seems wrong" is not a good reason to accept a change when there is no test suite to validate the change. [Thu Apr 2 17:22:03 2015] Anyway, give me a moment and I'll try to write up what's going on in 11823. [Thu Apr 2 17:22:36 2015] OK, fine, I'll abandon 11822; can I at least add a comment that the missing VLSF_ROVOL check is deliberate and likely to be problematic later? [Thu Apr 2 17:22:43 2015] I don't know if 11823 can be broken up in to "use new structures" and "change behavior" but if that is possible then please do so [Thu Apr 2 17:23:10 2015] I will take the change but I want a better justification than "seems wrong" [Thu Apr 2 17:23:13 2015] Not easily, because the structures capture the rollback that needs to happen. [Thu Apr 2 17:23:32 2015] Well, as I said in gerrit review, I don't think 11822 is observably wrong. [Thu Apr 2 17:23:54 2015] 11823 is certainly an observable change, by design. Give me a minute and I'll write up a trace. [Thu Apr 2 17:24:11 2015] Then justify the change with what error would be prevented in the future [Thu Apr 2 17:24:32 2015] OK. [Thu Apr 2 17:26:07 2015] For 11823 the commit message should have enough detail to explain how to write a test for an automated test suite. If OpenAFS had such a suite I would ask that the test be written for it along with the change. [Thu Apr 2 17:27:49 2015] Sure. I am somewhat hopeful that 'vos each' eventually factors into such a test suite. [Thu Apr 2 17:27:56 2015] (or vsu_Each) [Thu Apr 2 17:28:22 2015] I want to be able to verify vos each. So vos each cannot be the test suite [Thu Apr 2 17:29:37 2015] one of the concerns is that vos each needs to provide consistent output even when the db servers have not been consistently upgraded [Thu Apr 2 17:54:16 2015] Hm. Well, like I said on gerrit, I hadn't done more than compile tested 11823. I think I'm going to abandon that one until I can better characterize things; sorry for the noise. [Thu Apr 2 20:30:39 2015] Hi! [Thu Apr 2 20:31:09 2015] It looks like the FreeBSD port got updated recently, which I'm hoping fixed my bug on the client side performance. [Thu Apr 2 20:33:33 2015] My servers are still in the same state they were, so now I'm back (re)learning openafs. vos partinfo server looks to contain two mounts, /vicepa and /vicepb. How do I inspect which filesystems are on those partitions? [Thu Apr 2 20:40:12 2015] You mean like vos listvol, or something else? [Thu Apr 2 20:45:02 2015] Ah, yeah that. [Thu Apr 2 20:45:34 2015] Ah right, partitions contain volumes [Thu Apr 2 20:55:16 2015] Is it, or not, recommended to run a client on a server? [Thu Apr 2 20:56:24 2015] It is in general useful to have client functionality available on a server (more for the client utilities than filesystem access, per se), but in the particular case of FreeBSD, the client may not be entirely stable. [Thu Apr 2 20:57:44 2015] Yeah, I'm hoping to test that stability now. I never could get my OSX client to connect to the FreeBSD server. [Thu Apr 2 20:57:58 2015] Though, I'll probably try again. [Thu Apr 2 20:58:25 2015] aklog: unknown RPC error (-1765328228) while getting AFS tickets [Thu Apr 2 20:58:29 2015] Still same issues. [Thu Apr 2 20:58:48 2015] From osx to freebsd, that is [Thu Apr 2 20:59:37 2015] KRB5_KDC_UNREACH (-1765328228) [Thu Apr 2 21:04:29 2015] Are you using the native krb5 on the mac? [Thu Apr 2 22:00:11 2015] Yes, I think so. [Thu Apr 2 22:00:28 2015] I have a ticket as well. That was just the output from aklog. [Thu Apr 2 22:33:30 2015] A TGT, or a servic ticket as well? [Thu Apr 2 22:33:44 2015] Are you willing to say the cell and realm names here? [Thu Apr 2 22:38:51 2015] The natural next thing to do is test whether manually getting the service ticket which aklog should be trying to get works. That would be kgetcred or kvno as appropriate for whether heimdal or MIT krb5 is in use. [Thu Apr 2 23:02:12 2015] I do not receive the service ticket on my osx client. From the server itself, I do. I also have a firewall between my client and my server, but all kerberos ports are open. I believe I'm allowing all afs ports as well, but I could use some validation. It was a while ago I added those rules [Thu Apr 2 23:04:37 2015] FWIW, I also have freebsd client in the same network zone as my osx laptop, and it is able to receive a token, and has a service ticket as well, so that should in theory be enough to test my firewall is passing the right traffic. [Thu Apr 2 23:04:45 2015] I'm only restricting on subnet, not on host. [Thu Apr 2 23:05:53 2015] Sounds like it's time to break out tcpdump, then. [Thu Apr 2 23:05:59 2015] Oh, how do I get admin privilges from the server? There is some dance that I need to do iirc [Thu Apr 2 23:06:21 2015] yeah, I'll work on the osx client thing later. I just want to validate that I am able to put files in a volume from my freebsd client. [Thu Apr 2 23:06:45 2015] And then my brain will *probably* be able to turn off this evening before I go to bed. [Thu Apr 2 23:10:15 2015] I keep gettign bos: running unauthenticated, how do I auth? [Thu Apr 2 23:10:29 2015] I assume I just need a ticket and a token? [Thu Apr 2 23:12:05 2015] ls /afs/.realm/ works! on my freebsd client, so something is happening. [Thu Apr 2 23:12:16 2015] At least it shows a couple of volumes in there. [Thu Apr 2 23:12:24 2015] token should be all that's needed, yes (unless you are using bosserver -noauth, but don't do that). [Thu Apr 2 23:13:03 2015] From the server itself, I can't get a token. Is that because I don't have afsd running? I only have the server components running on the server. [Thu Apr 2 23:13:28 2015] aklog: a pioctl failed while obtaining tokens for cell cellname [Thu Apr 2 23:13:36 2015] is the message [Thu Apr 2 23:13:56 2015] libafs.ko is not loaded [Thu Apr 2 23:14:36 2015] (95% likelihood) [Thu Apr 2 23:14:42 2015] Which is probably loaded as part of starting /usr/local/etc/rc.d/afsd [Thu Apr 2 23:14:54 2015] It ought to be, yes. [Thu Apr 2 23:15:55 2015] Indeed, starting afsd on the server loads the libafs.ko and I'm able to get a token. [Thu Apr 2 23:16:04 2015] So it seems that things are working. [Thu Apr 2 23:16:28 2015] ish. I still can't list the contents of the volumes. [Thu Apr 2 23:16:43 2015] Oh, on the server you could also do bos whatever -localauth [Thu Apr 2 23:16:57 2015] Do you get permission denied, or something more exciting? [Thu Apr 2 23:20:30 2015] Nah, it just hangs [Thu Apr 2 23:20:38 2015] Operation timed out [Thu Apr 2 23:20:44 2015] Oh. Hmm. [Thu Apr 2 23:21:13 2015] Is the cell name the lowercase of the kerberos realm name? [Thu Apr 2 23:21:21 2015] Yep [Thu Apr 2 23:21:52 2015] Ooo, with -localauth, I'm able to move the volume around [Thu Apr 2 23:21:55 2015] Thats cool. [Thu Apr 2 23:22:06 2015] So everything other than the bit where I access files is working :) [Thu Apr 2 23:22:52 2015] Oh, I can ls /afs/.realm/common/etc [Thu Apr 2 23:22:58 2015] I don't remember why I created that [Thu Apr 2 23:23:11 2015] Huh, that's curious. [Thu Apr 2 23:23:25 2015] I guess procstat -kk would give a trace of what is hanging, which may or may not be helpful. [Thu Apr 2 23:23:32 2015] Oh, maybe its just the one volume thats broken. [Thu Apr 2 23:23:40 2015] fstrace might be more useful, but I don't have much experience with it. [Thu Apr 2 23:23:44 2015] There is nothing in there, maybe I should just delete it and start over? [Thu Apr 2 23:23:50 2015] I forget if a broken mountpoint will do that. [Thu Apr 2 23:24:00 2015] No reason not to 'fs rmmount' and try again, though. [Thu Apr 2 23:24:42 2015] is -dir relative to / or /afs/.realm or what? [Thu Apr 2 23:25:00 2015] relative to . unless you specify an absolute path [Thu Apr 2 23:25:24 2015] Oh, even better [Thu Apr 2 23:26:38 2015] So now for the test... [Thu Apr 2 23:27:06 2015] Eh, the bug still seems to be there. [Thu Apr 2 23:27:08 2015] Bummer [Thu Apr 2 23:28:10 2015] using rsync to move data into the volume from a client results in an unstable client. [Thu Apr 2 23:28:31 2015] It writes a few hundred M of data and then just freezes [Thu Apr 2 23:28:47 2015] On FreeBSD? Yeah, that is something that happens. I wish I had enough time to dig into it properly :( [Thu Apr 2 23:28:59 2015] "not production-ready" [Thu Apr 2 23:29:09 2015] Oh, I was really hoping that the recent update was fixing this issue [Thu Apr 2 23:29:43 2015] I'll mail you a box of cookies for working on it :) [Thu Apr 2 23:29:53 2015] Or whatever your flavor of goodie is [Thu Apr 2 23:30:14 2015] No, I don't think a fix for that is imminent. Some previous investigation I had done had made it seem like an rx packet was getting lost or something like that, and I thought about switching to an upcall mechanism like OS X does, but got distracted midway through. [Thu Apr 2 23:30:35 2015] And it's hard to justify working on freebsd bits when I could be working on rxgk, which benefits everyone and not just freebsd. [Thu Apr 2 23:30:52 2015] What is rxgk? [Thu Apr 2 23:31:36 2015] It is cryptography that is newer than 1980 [Thu Apr 2 23:32:00 2015] That does sound valuable. [Thu Apr 2 23:33:03 2015] Um, so should I give up? I just don't really run any linux. If the OSX client worked, that would be one thing I suppose, but I primarily run FreeBSD, which I know is not "mainstream" enough to warrant the attention. [Thu Apr 2 23:33:19 2015] I wish I could send you an intern or something [Thu Apr 2 23:34:20 2015] This is also entirely personal infrastructure, since professionally I'm bound to linux, but also have no need for afs. [Thu Apr 2 23:35:59 2015] I think we can get the OS X client working. (Maybe not from me, tonight, as I'm getting tired, but there are other people here and other days.) [Thu Apr 2 23:36:16 2015] I'm getting tired too. [Thu Apr 2 23:36:23 2015] Thank for the troubleshooting though. [Thu Apr 2 23:36:56 2015] I just really want to use afs, and I was super excited, since its basically about the only clustered filesystem that remotely runs on FreeBSD. [Thu Apr 2 23:37:00 2015] Sure thing. I feel bad that the freebsd client is still broken, and appreciate the understanding [Thu Apr 2 23:37:25 2015] Yeah, I know as much as the next how prioritizing time is important. [Thu Apr 2 23:38:24 2015] kaduk: Do you think that freebsd client will ever make the list, or should I be thinking about osx and linux only clients? [Thu Apr 2 23:38:38 2015] Perhaps and easier question, is there a ticket tracker or something I can watch? [Thu Apr 2 23:40:12 2015] a freebsd client might make the list. It is mostly in reasonable shape overall, I think; there's just a few (particularly) rough edges. [Thu Apr 2 23:40:27 2015] Fair [Thu Apr 2 23:40:59 2015] aklog on osx doesn't even attempt to reach the kdc according to tcpdump :/ [Thu Apr 2 23:41:10 2015] I think thats a problem for another day [Thu Apr 2 23:41:14 2015] There is a ticket tracker at rt.central.org/rt, but I am not sure that the relevant ticket therein will get traffic in a timely fashion when I do get around to doing something. [Thu Apr 2 23:41:26 2015] There is aklog -d for a little bit of debug mode [Thu Apr 2 23:45:24 2015] Doesn't seem to add much info [Thu Apr 2 23:46:26 2015] *nods* [Thu Apr 2 23:46:37 2015] Yeah, no attempt to reach the server. [Thu Apr 2 23:46:52 2015] kdestroy; kinit; aklog [Thu Apr 2 23:47:05 2015] can see all traffic to the kdc for getting the ticket, nothing futher. [Thu Apr 2 23:50:31 2015] No attempt to reach anything at all, even a different IP address? [Thu Apr 2 23:50:54 2015] I'm tcpdump for all known kdc addresses, v4 and v6 [Thu Apr 2 23:51:14 2015] I'm seeing reports of other users seeing this as well. [Thu Apr 2 23:52:21 2015] The key questions are going to be "what is the krb5.conf" for the OSX system and is there SRV records for the realm [Thu Apr 2 23:53:13 2015] there we go! [Thu Apr 2 23:53:18 2015] Cannot contact any KDC can simply means that there are no KDCs known for a given realm [Thu Apr 2 23:53:20 2015] That was dumb. [Thu Apr 2 23:53:27 2015] /var/db/openafs/etc/krb5-weak.conf seems to cause the problem. [Thu Apr 2 23:53:45 2015] Moving the file out of the way means that /etc/krb5.conf is used. [Thu Apr 2 23:54:09 2015] where did you get the OSX client? [Thu Apr 2 23:54:15 2015] From the website [Thu Apr 2 23:54:15 2015] and what version is it? [Thu Apr 2 23:54:20 2015] How do I tell? [Thu Apr 2 23:54:51 2015] I think the "no attempt to reach the server" had me confused into thinking that "server" meant "AFS server" and not "KDC". [Thu Apr 2 23:54:53 2015] fs version [Thu Apr 2 23:55:16 2015] openafs 1.6.6 [Thu Apr 2 23:55:23 2015] which OSX version [Thu Apr 2 23:55:46 2015] kaduk: sorry, I thought I mentioned kdc. [Thu Apr 2 23:55:54 2015] secureendpoints: yoselion [Thu Apr 2 23:56:05 2015] sorry, 10.9.5 [Thu Apr 2 23:56:25 2015] so Mavericks? [Thu Apr 2 23:57:03 2015] Whatever the latest wolf is. It just says 10.9.5 [Thu Apr 2 23:57:17 2015] 10.9 is Mavericks. [Thu Apr 2 23:57:22 2015] Or maybe one version older than newest [Thu Apr 2 23:57:38 2015] 10.10 (Yosemite) is current [Thu Apr 2 23:57:53 2015] Okay [Thu Apr 2 23:58:25 2015] If you are willing to use a non-openafs.org package you can obtain a newer version of the OpenAFS client from https://www.your-file-system.com/openafs/client-download [Thu Apr 2 23:59:25 2015] I'd rather run official. [Thu Apr 2 23:59:42 2015] There is unlikely to be an official release any time soon. [Thu Apr 2 23:59:53 2015] Why's that? [Fri Apr 3 00:01:20 2015] because Mavericks and Yosemite require digital signatures and signed installers must be flat packaging. There is no open source flat packaging nor is there a legal entity that has approval from Apple to sign kernel extensions able to sign on behalf of OpenAFS.org [Fri Apr 3 00:03:05 2015] Well, I think you can disable the requirement in the OS for signed package installation. [Fri Apr 3 00:03:13 2015] You did say KDC, yes; I just confused myself is all. [Fri Apr 3 00:04:23 2015] While an individual can choose to make that choice most institutions will not. I'm not sure that you can disable the need for signed kernel extensions on Yosemite [Fri Apr 3 00:04:45 2015] So now that I have a token though, I should be able to browse the node in the finder, right? [Fri Apr 3 00:04:58 2015] In any case, the Your File System packages are signed and have a signed kernel extension. You can choose to use them or not. [Fri Apr 3 00:05:41 2015] Once you have a token you can authenticate to the file server. Whether you can browse the contents of the root.cell volume is dependent on the permissions within the volume [Fri Apr 3 00:06:19 2015] On a different client, I was able to list the contents of the volume with the same credentials [Fri Apr 3 00:06:35 2015] then you should be able to from the osx client with that token [Fri Apr 3 00:07:20 2015] I'll verify what I just said once more [Fri Apr 3 00:08:01 2015] I get Operation timed out [Fri Apr 3 00:08:08 2015] But only on osx [Fri Apr 3 00:10:48 2015] Maybe thats a hold over from a previous failure? Is there a way to say "try again"? [Fri Apr 3 00:14:10 2015] Eh, I'm headed to bed. Goodnight, and thanks for the help. [Fri Apr 3 21:50:02 2015] Back at it this evening. I've got another mac that I've installed the yfs client on. Looks to be fuse, I've got tickets and a token. /afs shows up, I can navigate to my realm, but nothing shows up. [Fri Apr 3 22:43:52 2015] zleslie I suggest you use wireshark to trace "rx" traffic and turn on the audit logging for the file server and pts server so you can see what requests are being received and under which identity [Fri Apr 3 22:44:35 2015] OpenAFS doesn't have "WhoAmI" RPCs so clients can find out what identity the server thinks they are using. [Sat Apr 4 07:38:38 2015] Hi - I've just got round to upgrading my home AFS to krb5 - followed the instructions linked from the security adivsory - all sort of works *except* I'm havin issues with pt database and getting the following in the FIleLog: libprot: could not find entry (getting key from local KeyFile) libprot: Could not get afs tokens, running unauthenticated [Sat Apr 4 11:26:15 2015] njd: KeyFile is the non-kerberos key location. [Sat Apr 4 11:26:51 2015] njd: if it is looking for that, it is not finding the 'rxkad' keytab [Sat Apr 4 11:32:15 2015] I've placed the rkkad.keytab in the server directory (where the KeyFile was) [Sat Apr 4 11:32:28 2015] I've placed the rxkad.keytab in the server directory (where the KeyFile was) [Sat Apr 4 11:33:30 2015] (as per the instructions) - all worked fine until (as the last step) I renamed the KeyFile [Sat Apr 4 11:37:36 2015] njd: evidently something is missing. [Sat Apr 4 11:37:49 2015] njd: or not in the right place. [Sat Apr 4 11:38:18 2015] njd: a little mentioned detail is that the keytab must nost have the DES key(s). [Sat Apr 4 11:43:27 2015] Walex: I saw that - made sure that there was no DES keys in the rxkad.keytab [Sat Apr 4 11:44:01 2015] njd: have you restarted the daemons? [Sat Apr 4 11:48:53 2015] Walex: oh yes! several times, even rebooted machine [Sat Apr 4 11:49:32 2015] njd: then 'strace' the PT daemon and find whether it does access the keytab [Sat Apr 4 11:49:49 2015] njd: the other check is to make sure that the keytab keys have the same kvno on all machines. [Sat Apr 4 11:50:09 2015] njd: hopefully you haven't extracted a new key on every machine... [Sat Apr 4 11:50:41 2015] Walex: at the moment this is all on a single machine (it is the household AFS, make the change there before the work one) [Sat Apr 4 11:51:06 2015] njd: then check that the KVNO is the same in the keytab and the KDB. [Sat Apr 4 11:52:14 2015] Walex: it is (just checked) - sounds like it is time to become one with the code... [Sat Apr 4 11:52:51 2015] please confirm that all of the openafs process versions are new enough to support the new functionality [Sat Apr 4 11:53:32 2015] secureendpoints: it is all openafs 1.6.11-0ppa1~ubuntu14.04.1-debian [Sat Apr 4 11:59:51 2015] how many entries are in the keytab and which enc types are they? [Sat Apr 4 12:00:31 2015] I too would confirm with tracing that the keytab is in fact being accessed [Sat Apr 4 12:01:04 2015] secureendpoints: there are three: aes256-cts-hmac-sha1-96, des3-cbc-sha1, arcfour-hmac-md5 [Sat Apr 4 12:01:23 2015] might be easiest to test by tracing a pts command with -localauth [Sat Apr 4 12:01:44 2015] only include the aes256 key [Sat Apr 4 12:02:04 2015] do the same on the kdc [Sat Apr 4 12:02:33 2015] or at least make sure that the kdc only issues service tickets with the aes256 key [Sat Apr 4 12:03:02 2015] you might need the other keys to be listed in the kdb in order for the kdc to support des3 and rc4 session keys [Sat Apr 4 12:04:58 2015] secureendpoints: strace reveals there may be an issue in the format of the CellServDB - let me look at that first [Sat Apr 4 12:10:14 2015] secureendpoints: looking more deeply the DNS interactions look hinky - more to look at after my tea [Sat Apr 4 12:15:27 2015] To what degree OpenAFS support hardened features? I attempted to compile OpenAFS 1.6.11 with hardened linux kernels 3.2,3.14,3.18. They all die with "Error: Undefined symbols in modules" Full build log: https://bpaste.net/show/1f4655d3acba Any suggestions on how to approach this (if hardened kernels are even supported)? [Sat Apr 4 12:16:56 2015] It doesn't [Sat Apr 4 12:18:16 2015] hardened linux requires that many data structures be initialized by C99 field names instead of explicit structure layouts. Not all OpenAFS platforms support C99. [Sat Apr 4 12:19:22 2015] secureendpoints: Ah ha. Thank you. [Sat Apr 4 12:19:48 2015] there might be other issues as well [Sat Apr 4 12:20:40 2015] Just curious, is C99 support expected in the (near) future? [Sat Apr 4 12:20:49 2015] no [Sat Apr 4 12:20:58 2015] it would require dropping platform support [Sat Apr 4 12:21:17 2015] or adding lots of #ifdef for specific platforms [Sat Apr 4 12:21:40 2015] I can certainly understand why that would not be desired, in both cases [Sat Apr 4 12:25:02 2015] there are also upstream dependencies from Heimdal that would need to be fixed for hardened linux [Sat Apr 4 12:34:19 2015] secureendpoints: removing all the other encryption types except aes256 seems to have done the trick! [Sat Apr 4 12:40:31 2015] secureendpoints: Thanks again for all of your help. [Sat Apr 4 19:54:04 2015] The OSX client to a FreeBSD server seems to be working, though afsd client side needed be started manually with ./launchafs.sh from yfs. [Sat Apr 4 20:15:54 2015] are you on 10.10 by any chance? [Sat Apr 4 20:17:06 2015] (spamming in pm, in a pattern seen in other channels) [Sun Apr 5 18:58:11 2015] I think I have some trouble. When I first setup this cluster, I had created a few nodes in the cluster and moved some volumes around just for testing. Now I've rebuilt a couple of those nodes, and so there is only one system in the cluster. However, this last system still has knowledge of the old servers and I think its causing some timeouts. How do I clean out all old knowledge of other servers? [Sun Apr 5 18:58:20 2015] vos listaddrs has way too many addresses, for example. [Sun Apr 5 20:51:03 2015] zleslie: 'vos listaddrs' showing too many addresses may just mean that there are some stale server entries in the VLDB. If you add -showuuids (that may not 'xactly be the right option) you should be able to see if the addresses are actually registered with your current servers (which shouldn't happen, I think) or are just stale entries. [Mon Apr 6 10:21:25 2015] zleslie: Since you have only a single server cell, I would simply shutdown the cell, delete the VLDB, start the cell, and then use vos sync* commands to rebuild the database. [Mon Apr 6 13:43:05 2015] I set the quota for a volume larger than the partition. Now I've filled the partiion, and vos listvol no longer shows the volume exists, and I get timeouts on the client. [Mon Apr 6 13:43:39 2015] I did increase the partition size by 1G just to give breathing room, but it didn't help. [Mon Apr 6 13:52:30 2015] Oh it came back... I didn't touch it since last night, but it just appeared. I suppose I'll just move it to a larger partition now [Mon Apr 6 13:54:53 2015] I'll come back to my too many addresses issue tonight [Wed Apr 8 15:50:37 2015] by the way, a systemd guy asked why we don't use the user keyring instead of the session keyring to store the tokens [Wed Apr 8 15:51:11 2015] tokens are per-pag, not per-user. [Wed Apr 8 15:51:34 2015] Well, the default pag is per-user, but pam_afs_session is probably (?) putting the user in a new pag [Wed Apr 8 15:56:36 2015] The session keyring was pretty much designed for the AFS use case. [Wed Apr 8 15:57:34 2015] maybe systemd can get tokens for us *eye roll* [Wed Apr 8 15:58:10 2015] If it gets kafs production ready, would you still complain? ;) [Wed Apr 8 15:58:26 2015] yes and no [Wed Apr 8 16:19:02 2015] sxw: they seem to think that it doesn't make sense to use the session keyring, we should use the user keyring. that way cron works too [Wed Apr 8 16:19:48 2015] I don't really agree, but it'd help if I could come up with arguments as to why using the user keyring is bad [Wed Apr 8 16:19:54 2015] Tell them that one user can have multiple sessions each with different credentials and privilege levels, and this has been a feature for a very long time. [Wed Apr 8 16:20:20 2015] yeah. what ben said. [Wed Apr 8 16:20:35 2015] "so how do i constraint the set of powers i have for just one tty?" [Wed Apr 8 16:21:04 2015] and if the answer is "with a differnet uid", then ask how a user can register ephemeral uids [Wed Apr 8 16:21:18 2015] "with user namespaces, of course" [Wed Apr 8 16:22:41 2015] thanks [Wed Apr 8 16:26:24 2015] IIRC there are commands that create new pags and allow users to switch pag explicitly [Wed Apr 8 16:37:02 2015] red hat still does not understand the purpose, use, or implementation of sessions. surprise [Wed Apr 8 16:37:21 2015] I remember fighting with them back in the RH6 days, over a decade ago... [Wed Apr 8 16:38:12 2015] (Obligatory note that Red Hat is not a monolithic entity and contains individuals, some of whom probably do understand the purpose of sessions, even if not all of them do.) [Wed Apr 8 18:41:33 2015] kaduk: Is there a list of things that keep kafs from being production ready? I've fixed issues as I find them, but a bug list might be nice. [Wed Apr 8 18:43:02 2015] I am not maintaining a list; there were a couple of MIT folk who tried it out again recently and ran into some big issues, but I don't remember them offhand. [Wed Apr 8 18:44:37 2015] The one I hit was that any error that missed the tiny subset of UAE that kafs understood marked the volume offline until the cache timed out... a patch has been landed that grows the UAE decoder, which should substantially improve matters. [Wed Apr 8 18:45:15 2015] It's still far too hard to get a token into the keyring, too; I have a somewhat modified version of dhowells' upioctl wrapper that works with an unmodified aklog, but that's not really ideal. [Wed Apr 8 18:45:59 2015] I thought cg2v had a proper aklog floating around in andrew AFS space [Wed Apr 8 18:58:06 2015] That'd be awesome if so. [Wed Apr 8 19:08:50 2015] kaduk: Is cg2v here? If not, would you mind PMing me the right email address? [Wed Apr 8 22:01:06 2015] git log says Chaskiel Grundman , which I believe is correct [Wed Apr 8 22:31:18 2015] shall we gather all the kafs stuff togeather somewhere? [Thu Apr 9 15:20:28 2015] tsk. if Mark Shuttleworth didn't invent it, it is at most advisory. >.> [Thu Apr 9 15:22:08 2015] this is all Google's fault [Thu Apr 9 15:22:19 2015] you don't just kill your openid provider [Thu Apr 9 15:32:34 2015] Why not? It's not like people pay Google to use their services. =P [Thu Apr 9 15:33:39 2015] It sounds like they are at least going to provide OpenID Connect, right? (seems better than what they did to XMPP) [Thu Apr 9 16:11:18 2015] yeah, they're switching to whatever the "new" openid thing is [Thu Apr 9 16:40:28 2015] I thought cg2v had one that did rxkad-kdf as well, but would have to ask him where it may live. [Thu Apr 9 19:34:28 2015] kaduk: Oh, indeed, it's right next to it (didn't think to check... derp). I've created http://wiki.openafs.org/LinuxKAFSNotes with what I know. [Fri Apr 10 01:44:39 2015] How can I tell if I'm running dafs or fs? [Fri Apr 10 01:45:34 2015] Ah, the /usr/local/libexec daemons are actually executed. [Fri Apr 10 02:17:12 2015] I'm getting Instance ptserver, temporarily disabled, stopped for too many errors, currently starting up. for ptserver and vlserver on a new fileserver. I've created the /vicepa directory, but since then the daemons don't start correctly. [Fri Apr 10 02:17:15 2015] Where are the logs? [Fri Apr 10 02:21:14 2015] /var/openafs/logs/ [Fri Apr 10 02:34:13 2015] zleslie: ptserver and vlserver don't use /vicepa. [Fri Apr 10 02:34:45 2015] One of them didn't want to start due to the AlwaysAttach file not being present. [Fri Apr 10 02:34:53 2015] And /etc/hosts was wrong. [Fri Apr 10 02:34:57 2015] on the new server. [Fri Apr 10 02:35:17 2015] I could have seen the AlwaysAttach message in a different log though. [Fri Apr 10 02:35:44 2015] I think it all works now. I just created a volume on the new host and all processes seem to be running. [Fri Apr 10 02:47:23 2015] AlwaysAttach is a fileserver thing.