| Decentralized DeduplicationEliminating duplicate data from a file system can vastly reduce
storage requirements, especially in large, shared storage
environments. Unlike existing deduplication systems, which all
require either a centralized component or modification of the
storage system itself, our system supports symmetrically
decentralized, cluster-scale deduplication in a shared-disk file
system. | Decentralized Deduplication in SAN Cluster File Systems | | Austin T. Clements, Irfan Ahmad, Murali Vilayannur, Jinyuan Li | June 17, 2009 | | USENIX '09 | Abstract PDF Slides (PDF) Slides (transcript) Slides (printable) Slides (video) | File systems hosting virtual machines typically contain many
duplicated blocks of data resulting in wasted storage space
and increased storage array cache footprint. Deduplication
addresses these problems by storing a single instance of each
unique data block and sharing it between all original sources
of that data. While deduplication is well understood for file
systems with a centralized component, we investigate it in a
decentralized cluster file system, specifically in the context
of VM storage.
We propose DeDe, a block-level deduplication system for live
cluster file systems that does not require any central
coordination, tolerates host failures, and takes advantage of
the block layout policies of an existing cluster file system.
In DeDe, hosts keep summaries of their own writes to the
cluster file system in shared on-disk logs. Each host
periodically and independently processes the summaries of its
locked files, merges them with a shared index of blocks, and
reclaims any duplicate blocks. DeDe manipulates metadata
using general file system interfaces without knowledge of the
file system implementation. We present the design,
implementation, and evaluation of our techniques in the
context of VMware ESX Server. Our results show an 80%
reduction in space with minor performance overhead for
realistic workloads. |
| Decentralized Data Deduplication in VMware's SAN File System | | Austin T. Clements, Irfan Ahmad, Murali Vilayannur, Jinyuan Li | September 15, 2008 | | VMworld '08 Poster | Abstract PDF Extended abstract (PDF) | Recent inroads towards online duplicate data elimination have
demonstrated the feasibility of deduplication beyond archival
and backup applications. However, existing approaches do not
complement the scalability and fault tolerance of
decentralized shared storage systems such as those that form
the cornerstone of large virtual infrastructures. Our system
builds atop VMware's shared storage file system to provide
completely decentralized, scalable, online deduplication with
minimal performance overhead.
Our working prototype has demonstrated a two-to-one reduction
in the storage requirements of actively used virtual machines.
Deduplication is highly effective as part of a virtual
infrastructure because one datacenter may store hundreds of
identical copies of system files, application files, and files
shared between users of the infrastructure. Our system
detects and eliminates such duplication on a cluster-wide
scale, leading to lower storage and management costs, shorter
backup and recovery times, and better cache utilization. |
XocXoc is an extension-oriented compiler for C based on a custom
meta-language, concrete generic syntax manipulation, and
pass-free lazy compilation. To the best of our knowledge, it is
the first compiler to support an extension-oriented
paradigm, meaning that the user can select which
independently-written language extensions to compose at compiler
run-time, as opposed to the traditional monolithic extensible
paradigm, in which each extension represents a whole new
compiler. Extensions can extend both the syntax and the
semantics of C, and are intended to be both terse and easy to
write without knowledge of the internals of Xoc.
See also the Xoc
homepage. | Xoc, an Extension-Oriented Compiler for Systems Programming | | Russ Cox, Tom Bergan, Austin T. Clements, Frans Kaashoek, Eddie Kohler | March 1, 2008 | | ASPLOS '08 | Abstract PDF | Today's system programmers go to great lengths to extend the
languages in which they program. For instance,
system-specific compilers find errors in Linux and other
systems, and add support for specialized control flow to Qt
and event-based programs. These compilers are difficult to
build and cannot always understand each other's language
changes. However, they can greatly improve code
understandability and correctness, advantages that should be
accessible to all programmers.
We describe an extension-oriented compiler for C called
xoc. An extension-oriented compiler, unlike a conventional
extensible compiler, implements new features via many
small extensions that are loaded together as needed. Xoc
gives extension writers full control over program syntax and
semantics while hiding many compiler internals. Xoc
programmers concisely define powerful compiler extensions
that, by construction, can be combined; even some parts of the
base compiler, such as GNU C compatibility, are structured as
extensions.
Xoc is based on two key interfaces. Syntax patterns allow
extension writers to manipulate language fragments using
concrete syntax. Lazy computation of attributes allows
extension writers to use the results of analyses by other
extensions or the core without needing to worry about pass
scheduling.
Extensions built using xoc include xsparse, a
345-line extension that mimics Sparse, Linux's C front end,
and xlambda, a 170-line extension that adds
function expressions to C. An evaluation of xoc using these
and 13 other extensions shows that xoc extensions are
typically more concise than equivalent extensions written for
conventional extensible compilers and that it is possible to
compose extensions. |
| A Comparison of Designs for Extensible and Extension-Oriented Compilers | | Austin T. Clements | February 4, 2008 | | Master's thesis | Abstract PDF PDF (compact) | Today's system programmers go to great lengths to extend the
languages in which they program. For instance,
system-specific compilers find errors in Linux and other
systems, and add support for specialized control flow to Qt
and event-based programs. These compilers are difficult to
build and cannot always understand each other's language
changes. However, they can greatly improve code
understandability and correctness, advantages that should be
accessible to all programmers.
This thesis considers four extensible and extension-oriented
compilers: CIL, Polyglot, xtc, and Xoc. These four
compilers represent four distinctly different approaches to
the problem of bridging the gap between language design and
system implementation. Taking an extension author's point of
view, this thesis compares the design of each compiler's
extension interface in terms of extension structure, syntactic
analysis, semantic analysis, and rewriting.
To perform the comparison, this thesis uses three extensions
implemented variously in the four compilers: a bitwise
rotation operator, function expressions, and lock checking.
These extensions are designed to span a representative space
of analysis and rewriting needs.
Based on this comparison, this thesis identifies the following
implications of the design decisions of each extension
interface: the expressiveness, understandability, and
correctness of extension implementations can benefit from
domain specific languages and language features tailored to
the extension interface; compiler-managed scheduling trades
loss of control for automatic extension composability;
unifying internal and external program representation improves
ease of use and extension composability, but gives up
potentially useful control over the internal representation;
concrete syntax patterns provide a natural interface to
internal program representation, but must be more powerful
than simple tree matching to be practical; grammars, types,
and syntax interfaces have a natural correspondence; and
accounting for semantic information in the final output
enables hygienic rewriting, which can simplify extensions. |
Optimizing Distributed Read-Only TransactionsThis algorithm and implementation explores the viability of an
alternate approach to distributed transactions that optimizes
for read-only transactions - the common case in file system
workloads and many modern database applications. While typical
distributed transactional systems weaken consistency guarantees
in order to provide acceptable performance, our approach instead
weakens causality guarantees while still ensuring true
serializability. | Optimizing Distributed Read-Only Transactions Using
Multiversion Concurrency | | Dan R. K. Ports, Austin T. Clements, Irene Y. Zhang | December 15, 2007 | | Final paper for 6.830 (Database Systems) | Abstract PDF Slides (PDF) | | Distributed transactional systems typically achieve efficiency
by abandoning true serializability for weaker forms of
consistency that are difficult to reason about because they
expose the concurrency in the underlying system. We explore
an alternate route: weakening causality instead of
consistency. Our proposed algorithm achieves global
serializability by sacrificing global causality, which we
argue is reasonable in many situations. This allows our
algorithm to achieve efficiency by permitting read-only
transactions to operate on stale but locally available cache
data. We present the details of a transactional block storage
protocol that implements this form of concurrency control, as
well as a performance evaluation of an experimental
implementation of this protocol and comparison against
conventional optimistic concurrency control. |
PlaidPlaid is a pattern matching system for Scheme in which
the pattern language is a declarative, reversible subset of
Scheme itself. This allows pattern matching for abstract
datatypes where a single definition serves as both the
constructor and deconstructor for the data type. The pattern
matching control-flow construct provided by Plaid is
itself reversible, and is therefore available within the
pattern language. This makes it possible to express
canonicalization, alternate views of data, as well as
non-determinism in a highly declarative style. | Plaid: Pattern Language for Abstract Datatypes | | Dan R. K. Ports, Austin T. Clements, Irene Y. Zhang | May 14, 2007 | | Final paper for 6.891 (Adventures in Advanced Symbolic Programming) | Abstract PDF Slides (PDF) | The expressiveness of traditional syntactic pattern matching
is severely limited by its lack of abstraction. Because
syntax patterns are mired in the built-in types understood by
the pattern matching system, they lack the ability to express
patterns over abstract data types (ADT's). More advanced
pattern matching techniques, such as semantic matching, can
overcome this, but at the per-ADT cost of the complex code
required to add new pattern combinators to the system.
Plaid defines a new pattern language that captures a
strict subset of Scheme capable of both regular computation,
as well as reverse computation. This allows it to overcome
both the limitations of syntactic patterns and the cost of
semantic patterns by providing a means by which programmers
can write a single specification of the mapping between the
abstract and concrete representations of an ADT that
simultaneously serves as constructor, predicate, accessor, and
pattern combinator for that ADT. This specification is
written virtually identically to how a regular constructor
would be written.
Furthermore, the Plaid pattern language is capable of
capturing non-determinism and decisions within pattern
matching, thus admitting a very broad interpretation of what
can be considered an ADT constructor. This leads to variety
of interesting capabilities, such as the ability to view
concrete data in multiple abstract ways, the ability to
canonicalize multiple concrete representations in one abstract
way, and the ability to imagine more convenient
representations of existing data. |
Guarded Atomic Actions for HaskellGAAH is a Haskell library that implements a parallelism paradigm
known as guarded atomic actions, a form of transactional memory.
A module consists of a set of actions, each of which has a
guard. At any given time, all actions whose guards are
satisfied can fire in parallel. Actions running in parallel are
always guaranteed an atomic view of the module's state and their
modifications of the state are transactional. | Guarded Atomic Actions for Haskell | | Austin Clements and Yang Zhang | December 13, 2006 | | Final paper for 6.827 (Multithreaded Parallelism) | Abstract PDF | | The guarded atomic actions model is a programming model
introduced in the Bluespec high-level hardware description
language. The model expresses parallel behavior with a high
degree of modularity and composability. In this project, we
present an implementation of guarded atomic actions as a
library for the Haskell software programming language, thus
introducing the guarded atomic action model of parallelism
into the realm of software. |
CanopyCanopy is an environment for experimentation and debugging of
network systems. It provides total control over the system by
running every node and the network in virtual machines. Time is
carefully controlled and synchonized between all virtual
machines, allowing complete control over network latencies, as
well as behavior such as packet dropping. Furthermore, the
entire system can be rolled back to any point in the past to
determistically experiment with the effects of differing network
behavior. Canopy is implemented as a distributed system to help
alleviate the performance impacts of full emulation. | Canopy: A Controlled Emulation Environment for Network
System Experimentation | | Dan Ports, Austin Clements, Jeff Arnold | December 15, 2005 | | Final paper for 6.829 (Computer Networks) | Abstract PDF Slides (PDF) | | Network systems are hard to debug because they are inherently
parallel and non-deterministic. Canopy assists with network
debugging by putting the entire network system into a
controlled emulation environment constructed from
virtual machines and a simulated network. This puts all
variables under the user's control and provides a coherent,
omniscient viewpoint of the entire system. To aid the user in
observing and manipulating the system, Canopy provides tools
such as traffic visualization, packet manipulation, rollback
and replay. |
PersiFSPersiFS is a versioned file system that retains a
history of its contents. The entire contents of the file system
as it appeared at any point in the past can be accessed from a
special automount-like directory. The current version of the
file system can be accessed and modified almost as efficiently
as in regular non-persistent file systems. Unlike other
versioned file systems, PersiFS provides the same
efficiency when accessing past versions. Furthermore,
PersiFS optimizes storage space by efficiently recognizing and
coalescing common substrings across different versions of files
and across different files. | PersiFS: A Versioned File System with an Efficient
Representation | | Dan R. K. Ports, Austin T. Clements, Erik D. Demaine | October 24, 2005 | | SOSP '05 Poster | Abstract PDF Extended abstract (PDF) | Most file systems are ephemeral, meaning that once a
change has been made, there is no way to recall the previous
contents of the file system. PersiFS is a continuously
versioned file system. It stores every modification to
the file system, allowing access to any past version
of any file. PersiFS achieves this without sacrificing access
time to either current versions or past versions, using
inordinate amounts of disk space, or requiring modification to
existing applications.
In order to make continuous versioning feasible, PersiFS
tackles the problems of time and space efficiency with a
number of efficient data structures for indexing and retaining
files. |
| Structures for Efficient File System-Scale Partial Persistence | | Dan R. K. Ports, Austin T. Clements | May 12, 2005 | | Final paper for 6.897 (Advanced Data Structures) | Abstract PDF Slides (PDF) | | A persistent file system stores every previous state
of each file, allowing convenient access to the full state of
the file system as it appeared at any point in the
past. Achieving this convenient feature presents a challenging
data structural problem because the amount of data involved is
so large: it must use as little space as possible, and provide
efficient operations for modifying the current state and
accessing both current and past states. We formalize
persistent file systems as a problem in data structures, and
analyze it in the context of the external memory model. We
begin by considering the design of our initial solution to
this problem from the PersiFS1 file system, which
is based on a log of metadata changes and an indirection layer
for storing file data. These "systems" data structures support
the desired operations, but are not asymptotically
efficient. Applying more advanced data structures, we refine
the design into the next version, PersiFS2. We use
B+-trees for file content indexing in order to
improve the space efficiency of the system, and we present a
novel partially-persistent B+-tree design, which
can be used to track changes to files with logarithmic
modification and query cost. PersiFS2 has been
implemented as a working file system with these data
structures, and our measurements indicate that the new file
system data structure provides dramatically improved access
time for previous revisions with a small increase in cost for
modifications. |
| PersiFS: A Continuously Versioned Network File System | | Austin T. Clements, Dan R. K. Ports, Ben A. Schmeckpeper,
Hector Yuen | May 12, 2005 | | Final paper for 6.824 (Distributed Systems Engineering) | Abstract PDF Slides (PDF) | | Most file systems are ephemeral, meaning that once a
change has been made, there is no way to recall the previous
contents of the file system. Backups, version control
systems, and user interface improvements such as "trash cans"
attempt to alleviate this problem; however, these are all
rough approximations of persistent file system
structures, giving users restricted access to a restricted set
of past states of the file system. PersiFS is a fully
persistent file system, providing access to any
past state of the entire file system. PersiFS achieves full
persistence without sacrificing access time to either current
versions or past versions, using inordinate amounts of disk
space, or requiring modification to existing applications. |
√X√X is a graphical window system developed in a few days as
an answer to the "do something cool" challenge problem at the
end of MIT's operating systems engineering course. Dan Ports
and I figured it was the only obvious thing to do, given that
for the last lab we had added a basic UNIX-like shell to the
operating system we had written in the class. In
exokernel-style, √X consisted largely of isolated
user-space programs and libraries. Over the few days we had to
implement our "something cool" for the class, we wrote a full
VESA graphics driver (including a Virtual-8086 mode driver), a
mouse driver, various IPC mechanisms, a graphics library
(including translucency and anti-aliasing support), the window
system itself, and a number of graphical applications (including
a terminal emulator). And on the due date we rested. | √X: A Window System for the JOS Operating System | | Austin T. Clements, Dan R. K. Ports | December 8, 2004 | | Project technical notes for 6.828 (Operating Systems
Engineering) | Notes (incomplete) Screenshot |
ArpeggioArpeggio started as a summer research project to design and
develop a completely decentralized peer-to-peer file sharing
system based on the Chord distributed hash table. The goal was
to provide a provable level of completeness in search results,
using carefully designed distributed algorithms to maintain both
time and space efficiency. The design has been completed and
published, and my co-conspirator, Dan Ports, is working on the
implementation for his Master's thesis. | Arpeggio: Metadata Searching and Content Sharing with Chord | | Austin T. Clements, Dan R. K. Ports, David R. Karger | January 21, 2005 | | IPTPS '05 | Abstract PDF Slides (PDF) | | Arpeggio is a peer-to-peer file-sharing network based
on the Chord lookup primitive. Queries for data whose metadata
matches a certain criterion are performed efficiently by using
a
distributed keyword-set index, augmented with
index-side filtering. We introduce
index gateways, a
technique for minimizing index maintenance overhead. Because
file data is large,
Arpeggio employs subrings to
track live source peers without the cost of inserting the data
itself into the network. Finally, we introduce
postfetching, a technique that uses information in
the index to improve the availability of rare files. The
result is a system that provides efficient query operations
with the scalability and reliability advantages of full
decentralization, and a content distribution system tuned to
the requirements and capabilities of a peer-to-peer network. |
| Apeggio: Efficient Metadata-based Searching and File Transfer
with DHTs | | Austin T. Clements, Dan R. K. Ports, David R. Karger | November 8, 2004 | | ISW '04 Poster | Abstract PDF | | Arpeggio is a peer-to-peer file-sharing network based
on the Chord distributed hash table. Queries for files whose
metadata matches a certain criterion are performed efficiently
by using a
distributed keyword-set index, augmented
with
index-side filtering. We introduce
metadata
gateways, a technique for minimizing index maintenance
overhead.
Arpeggio also uses the DHT for
indirect
storage of file contents, maintaining pointers from
content to the live peers that provide it. Finally, we
introduce
postfetching, a technique that uses
information in the index to improve the availability of rare
files. The result is a system that provides efficient query
operations with the scalability and reliability advantages of
full decentralization, and a content distribution system tuned
to the requirements of a peer-to-peer file-sharing network. |
GizmoballMy 6.170 (Software Engineering) team implemented a pinball-like
game called Gizmoball for our final project. Our implementation
includes numerous innovations under-the-hood as well as in the
user interface. The engine made extensive use of Java
reflection to produce a highly decoupled system that essentially
figured out what it was capable of at runtime, and our
generalized "interaction engine" provided an abstract platform
on which to implement the physics system. Our user interface
made use of "frobbers" (ie, pie menus) to make the game board
editor intuitive and efficient to use, and won the 6.170
Usability Award. EmulabI spent two summers working with the Flux group at the
University of Utah on a controlled network simulation and
emulation testbed called Emulab. One of these summers
I spent integrating our local-area system with wide-area
distributed experimentation platform called PlanetLab, allowing
experiments to integrate use of Emulab's nodes and highly
controlled network environment with use of PlanetLab's
world-wide network of nodes. | Implementing the Emulab-PlanetLab Portal: Experiences and
Lessons Learned | | Kirk Webb, Mike Hibler, Robert Ricci, Austin Clements, Jay Lepreau | December 5, 2004 | | WORLDS '04 | Abstract PDF | Emulab's PlanetLab portal, hereafter known as the "portal,"
provides access to the large-scale, geographically distributed
resources of the PlanetLab testbed using the integrated Emulab
interface. The portal provides sophisticated resource
allocation, configuration, and management services, while
hiding from the user the underlying low-level detail and
complexity of distributed resource provisioning and failure
management. Moreover, it does so while minimizing the impact
on the underlying PlanetLab system.
In the process of creating this portal and tracking
PlanetLab's evolving third-party service API, we identified
several key issues in the design of such platforms and the
management systems built on top of them. This paper uses our
portal as a basis for discussing these issues, and presents
the lessons we have learned during its design and
implementation. |
Active DoomI spent the summer after my junior year of high school
developing test and demonstration applications for the
Java-based active network operating system JANOS, being
developed at the University of Utah. The fruit of my work was
an experiment in redesigning a real-world non-active protocol as
an active protocol to see what power and efficiency could be
gained. And what better protocol to start with than the
multi-player protocol of Id Software's DOOM? | Active Doom: Applying Active Networks to Traditional Protocols | | Austin T. Clements, Patrick Tullmann, Jay Lepreau | October 22, 2001 | | SOSP '01 WIP Presentation | Slides (PDF) Extended abstract (PDF) |
|