This post is a work in progress.
Inspired by a recent purchase of the Red Book, which provides
a curated list of important papers around database systems, I’ve decided
to begin assembling a list of important papers in distributed systems.
Similar to the Red Book, I’ve broken each group of papers out into a
series of categories, each highlighting a progression of related ideas
over time focused in a specific area of research within the field.
Keeping the tradition of the Red Book, I’ve included both papers which
resulted in very successful systems and/or techniques, as well as papers
which introduced a concept which was either immediately dismissed or
proven incorrect. This emphasizes the progression of ideas which lead
to the development of these systems.
Consensus
The problems of establishing consensus in a distributed system.
Consistency
Types of consistency, and practical solutions to solving ensuring atomic
operations across a set of replicas.
- Highly Available Transactions: Virtues and Limitations
Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica
2013
- Consistency Tradeoffs in Modern Distributed Database System Design
Daniel J. Abadi
2012
- CAP Twelve Years Later: How the “Rules” Have Changed
Eric Brewer
2012
- Calvin: Fast Distributed Transactions for Partitioned Database Systems
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi
2012
- Optimistic Replication
Yasushi Saito and Marc Shapiro
2005
- Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
Seth Gilbert, Nancy Lynch
2002
- Harvest, Yield, and Scalable Tolerant Systems
Armando Fox, Eric A. Brewer
1999
- Linearizability: A Correctness Condition for Concurrent Objects
Maurice P. Herlihy, Jeannette M. Wing
1990
- Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport
1978
Conflict-free data structures
Studies on data structures which do not require coordination to ensure
convergence to the correct value.
Distributed programming
Languages aimed towards disorderly distributed programming as well as
case studies on problems in distributed programming.
- Logic and Lattices for Distributed Programming
Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, David Maier
2012
- Dedalus: Datalog in Time and Space
Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, Russell Sears
2011
- MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean, Sanjay Ghemawat
2004
- A Note On Distributed Computing
Samuel C. Kendall, Jim Waldo, Ann Wollrath, Geoff Wyant
1994
Systems
Implemented and theoretical distributed systems.
- Spanner: Google’s Globally-Distributed Database
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman,Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura,David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,Christopher Taylor, Ruth Wang, Dale Woodford
2012
- ZooKeeper: Wait-free coordination for Internet-scale systems
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, Benjamin Reed
2010
- A History Of The Virtual Synchrony Replication Model
Ken Birman
2010
- Cassandra — A Decentralized Structured Storage System
Avinash Lakshman, Prashant Malik
2009
- Dynamo: Amazon’s Highly Available Key-Value Store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels
2007
- Stasis: Flexible Transactional Storage
Russell Sears, Eric Brewer
2006
- Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber
2006
- The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
2003
- Lessons from Giant-Scale Services
Eric A. Brewer
2001
- Towards Robust Distributed Systems
Eric A. Brewer
2000
- Cluster-Based Scalable Network Services
Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier
1997
- The Process Group Approach to Reliable Distributed Computing
Ken Birman
1993
Books
Overviews and details covering many of the above papers and concepts compiled into single resources.
I’m hoping to make this into a living document, so please submit pull
requests or leave comments!