[This article was written by Tom Lord and sent to the arch-users list in February 2003 with the subject "Diagnosing svn". It is stored here for reference because the arch-users archive from that period is no longer available. See "undiagnosing" in this directory for my response. --ghudson] People have recently said things here along the lines of "svn fails to significantly improve upon CVS and, to the degree it does, meta-CVS and dcvs do the same job in a better way" (I pretty much agree) and "it looks like an ego driven project" (perhaps, but then I'd like to think that arch is a pride driven project and ultimately, isn't that just a slight difference in spin?). I've thought a lot about "what went wrong" with svn (and take it as axiomatic, on this list, that _something_ went wrong) for two reasons: (1) like Bob, I really tried to like svn; (2) as I started to think about "what went wrong" -- it seemed like what went wrong was a bunch of mistakes of exactly the sort that I am inclined towards myself and therefore have to actively resist: there, but for the grace of something, stand I. Here's what I think went wrong. This is just my unscientific impression based on following news of the project over the years. A) It started with a brilliant idea for a hack: a transactional, write-once, versioned, hierarchical filesystem database. Around the time svn started, that idea was "going around" -- I even had my own version for a little while. As an abstract data structure, that kind of database is a neat thing with many potential applications. If you ever spend time trying to write robust data-intensive apps on top of a unix filesystem without using a database, you really long for that kind of functionality. Moreover, it's _conceptually_ simple to implement: it's essentially just trees written in a functional (side-effectless) style. To "modify" a tree, you build a new tree, sharing unmodified nodes with the previous tree. Seems relatively cheap and transactions just fall out of that nearly for free. So here's the first mistake: the idea of a transactional FS is like a shiny new hammer. It's pretty natural to let it possess you and start running around looking for nails. B) It took off from there with an underdeveloped notion of revision control. Suppose you have the same intuition that Walter expressed a while back, which I'll paraphrase as: "The first and most fundamental task of a revision control system is to take snapshots of working directories." If you don't believe that that's a seductive (even though wrong) intuition, go back and look at how I replied. It took many, quite abstract paragraphs. What revision control is really about (archival, access, and manipulation of changesets) is subtle and _non_-intuitive. (Anecodtally, a few years before arch, I made an earlier attempt at revision control based on, guess what: snapshotting.) What's worse is that a set of working tree snapshots combined with a little meta-data is a kind of dual space to the kinds of records rev ctl is really about (they're logically interconvertable representations). Anything you say to a snapshotting fan about what you want to do with a changeset-librarian orientation they can reply to with "Yeah, but we could do that, too." So it's not even that the snapshot intuition is completely wrong: it's just putting an emphasis on the wrong details. Now the transactional filesystem DB takes snapshots handily. It's ideal for that. So if you have the snapshot-intuition, and the transactional fs hammer -- you're apt to leap to a wrong conclusion: you've solved the problem! And if, as some of the original svn contributors were, you're coming from hacking CVS and it's screwy (historically constrained) repository format, an apparent escape route from that mess is just going to strengthen your convictions. Second mistake: The assumption that "a filesystem DB pretty much solves revision control -- all the rest is just a small matter of hacking". C) It underwent fuzzy design conceptualization. I infer from some of the design documents and other materials that, early on, there must have been some bull sessions to plan out how svn would work. As an example, "history-sensitive merging" has been part of the plan (such as it is) for as long as I've been aware of the project. Whatever planning there was: it didn't nail the details. Instead, it reduced a lot of problems, in a sort of hand-wavy manner, to meta-data mechanisms. I'm guessing (and inferring from docs), for example, that somebody straw-manned an intelligent merge operator, never really worried about its limitations, but worried more about what kind of meta-data it needed. Since functionality like file properties seemed more than adequate to record that meta-data, the problem was considered reduced to "a small matter of hacking". Well, that's the problem with some design patterns like attaching property lists to everything under the sun: they don't really solve design problems but they give you an operational language in which to _restate_ design problems. It's sometimes very hard to recognize the difference between a restatement of a design problem in operational terms and its actual solution. Application of patterns like property lists in a design bull session all too easily gives rise to the feeling that "all the problems we're thinking about have natural solutions in this design" even though all you're really saying is "the problems we need to solve can be expressed in terms of associative lookup". Third mistake: insufficient skepticism about their own design sketches, early on. D) Narrow design focus combined with grand technology ambitions The original contributors included people who worked on CVS, people who used CVS, and people working on products that incorporate CVS. In some sense, the itch they must have had in common was "Get this CVS monkey off my back; I'm sick of it." At the same time, they (justifiably) had their eyes on a real treasure -- that transactional filesystem database. In that context, it'd be hard to get behind the idea of just incrementally fixing CVS. It'd be hard to invent meta-CVS, for example. As the project has progressed, over the years, those conflicting goals have tended to be resolved in the "get 1.0 out the door" direction -- a scaling back of functionality ambitions in the direction of CVS necessitated by the degree of difficulty of the grand technology ambition. Fourth mistake: conflicting goals at opposite ends of the ambition spectrum -- the low end ultimately defining official success, the high end providing the personal motivations. E) Leaping to unstable proto-standards. SVN came into being at a time when it looked to many like HTTP and Apache were the spec and best implementation of the new distributed OS for the world that would solve everything beautifully. There was a kind of dog-pile onto everything W3C based on irrational exhuberence. Well, they weren't that OS and they don't solve everything beautifully. Fifth mistake: jumping on the W3C bandwagon. F) The API Fallacy When you lack confidence about your intended way to implement something, a common pattern is to decide to hide the implementation under an API. That way you can always change the implementation later, right? The problems are: (1) unless you have at least one fully worked design for how to implement your API, you shouldn't have any confidence that good implementations can exist; (2) unless you have at least two fully worked designs for how to implement your API, and they make usefully contrary trade-offs, you should really start to wonder whether doing extra work to make an abstraction here is the right way to proceed. Sixth mistake: assuming that defining APIs all over the place would improve chances for success. G) Collision with reality. The transactional filesystem idea is, indeed, conceptually simple and elegant. The reality of implementing it, however, is a swamp of external considerations and inconvenient realities. Supposing you want to achieve high transaction rates and size-scalability. You have a _lot_ to consider: locking (contention over the root of the tree is especially fun), deadlocks, logging, crash recovery, physical layout of data, I/O bottlenecks, network protocols, etc. etc. In short, implementing a really good, high-performance, transactional fs is an undertaking comperable in scope and complexity to implementing a really good, high-performance, relational database storage manager -- only while there's tons of literature and experience about RDB implementation, transactional filesystems are fresh territory. (As an aside, if you were to seriously undertake to make a transactional FS, I think you would not want to burden yourself with the extra work of concurrently building a revision control system on top of it -- give that task to a separate team after you have something working.) Wanting to make progress simply and quickly, they spotted the Berekeley DB library. After all: it provides transactions with ACID properties for our favorite handwavy design tool -- the associative lookup table. As we all know, design problems can be "solved" simply by restating them in terms of associative tables. And anyway, even if Berkeley DB isn't the _best_ choice to implement this, it'll be the fastest way to get something working, and anyway it'll be hidden behind an API. Well, I think Berkeley DB is a lousy choice for this application. It creates administrative headaches, and it's optimized for simple associations, not hierarchical filessytems. It doesn't natively provide any sort of delta-compression -- you'll have to layer that. Ultimately _all_ that it buys you is transactions and locks -- every other aspect is a force-fit. And what resulted? Sure enough: years of fighting against excessive space consumption, disk-filling log files, and poor performance, characterized by substantial rewrites of core functionality and API changes. A similar mistake happened with network protocols: W3C solves everything from authentication to proxying to browsers-for-free; WebDAV just sweetens the deal, right? Except, no, as with Berkeley DB for physical storage, a lot is left out and, again, the result has been years of rewrites and API changes trying to get somewhere in the neighborhood of good performance, plus lots of dependencies on unstable externally developed libraries. Seventh mistake: underestimating the degree of difficulty of a transactional FS. Eigth mistake: overconfidence in dubiously selected building blocks. H) Failure to Fail If a team went away for six months and came back with SVN as it works today, I think it'd be pretty easy to say: "That's a great and even useful prototype. It definately proves, at least schematically, the idea of using a transactional FS to back revision control. There are clearly some bad choices in the implementation, and clearly some important neglected revision control issues that competing projects are starting to leapfrog you over. And there's a heck of a lot of code here for what it does. Let's suspend development on it for a little while, and invest in a design effort to see how to get this thing _right_". That isn't the situation, though. A team spent _years_ on this and they justified the project institutionally and publicly not by saying "let's build a proof of concept" but by saying "let's replace CVS". And, sure enough, if you suggest to them a stop-and-think phase at this late date you get back, basically: "Um...no...not gonna happen." I won't label that one of their mistakes because I don't think the root cause is something they could have easily avoided. I'll label it: First bad circumstance: crappy socio-economic circumstances for smart, ambitious programmers in the free software industry and community -- way too much weight given to what supposedly successful projects "smell like" and too much resulting pressure on hackers to project an image resembling that false idol. So, in summary, I don't think they're a bunch of egomaniacs or anything. I think they rushed into something that looked like it would be easier than it is, got boxed in by the mythologies of open source and W3C, and now have way too much momentum and contraints to do much about it. What's really disappointed me most, though is that, while I do perceive them as smart and ambitious, they don't seem terribly open-minded about stepping back to review their project for deep structural mistakes that need fixing. My sense is that most of them are pretty young and several have been associated with some successful projects (like Apache and CVS) -- good, young programmers, since they tend to be capable of so much more than their average peers, often fall into a pit of overconfidence which is hard to recognize from the inside until you've experienced a few disasters. The situation is made worse since there's so little effective mentoring in the industry from old-salts who are good at making a religion of the K.I.S.S. principle and making fun of the wealth of bloated, crappy, yet slow-to-fail stall-ware projects that dominate so much of the landscape. If you ask me, explosive growth during the dot-com bubble really blunted the technology edges of the free software movment and our industry generally. It left us collectively struggling to do things the hard way, svn being just one small example. - -t