Readings in conflict-free replicated data types
This is a work in progress post outlining research topics related to conflict-free replicated data types, or CRDTs.
Yesterday, Basho announced the release of Riak 2.0.0 RC1, which contains a comprehensive set of “data types” that can be used for building more robust distributed applications. For an overview of how to use these data types in Riak to avoid custom, and error prone, merge functions, see the Basho documentation site.
You’re probably more familiar with another name for these data types: conflict-free replicated data types (CRDTs). Simply put, CRDTs are data structures which capture some aspect of causality, along with providing interfaces for safely operating over the value and correctly merging state with diverged and concurrently edited structures.
This provides a very useful property when combined with an eventual consistency, or AP-focused, data store: Strong Eventual Consistency (SEC). Strong Eventual Consistency is an even stronger convergence property than eventual consistency: given that all updates are delivered to all replicas, there is no need for conflict resolution, given the conflict-free merge properties of the data structure. Simply put, correct replicas which have received all updates have the same state.
Here’s a great overview by one of the inventors of CRDTs, Marc Shapiro, where he discusses conflict-free replicated data types and their relation to strong eventual consistency.
In this Hacker News thread, there was an interesting discussion about why one might want to implement these on the server, why implementing them is non-trivial, and what the most recent research related to them consists of.
This post serves as a reading guide on the the various areas of conflict-free replicated data types. Papers are broken down into various areas and sorted in reverse chronologically.
Basics
Overviews and history of the various conflict-free replicated data types, implementation details, problem statements, etc.
- A comprehensive study of Convergent and Commutative Replicated Data Types 2011
- Conflict-free replicated data types 2011
- CRDTs: Consistency without concurrency control 2009
- A Commutative Replicated Data Type for Cooperative Editing 2009
- Designing a commutative replicated data type 2007
Causality
Advanced causality tracking mechanisms, each focusing on a particular specific problem. Important in the implementation of some CRDTs and for historical purposes.
- Scalable and Accurate Causality Tracking for Eventually Consistent Stores 2014
- Dotted Version Vectors: Logical Clocks for Optimistic Replication 2010
- Interval Tree Clocks 2008
- Bounded Version Vectors 2004
Uses
Various insights into developing on, and using, CRDTs in systems.
- Collaborative offline web applications using conflict-free replicated data types 2015
- On the composability of the Riak DT map: expanding from embedded to multi-key structures 2014
- Riak DT map: a composable, convergent replicated dictionary 2014
- Riak PG: distributed process groups on dynamo-style distributed storage 2013
- Incremental stream processing using computational conflict-free replicated data types 2013
- SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine 2013
- Evaluating Dotted Version Vectors in Riak 2011
Optimizations
Optimizations, critical to making some CRDTs practical.
- Merging OT and CRDT algorithms 2014
- Making operation-based CRDTs operation-based 2014
- Efficient state-based CRDTs by decomposition 2014
- Scalable Eventually Consistent Counters over Unreliable Networks 2013
- An optimized conflict-free replicated set 2012
Verification
Verification of CRDTs and their use in eventually consistent applications.
- Towards verifying eventually consistent applications 2014
- Vector Clocks in Coq: An Experience Report 2014
Computations
Computing with CRDTs.
- Selective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation 2015
- Lasp: A Language for Distributed, Coordination-Free Programming 2015
- Lasp: A Language for Distributed, Eventually Consistent Computations with CRDTs 2015
- A Study of CRDTs that do Computations 2015
- Joining Forces: Toward a Unified Account of LVars and Convergent Replicated Data Types 2014
- Freeze After Writing: Quasi-Deterministic Parallel Programming with LVars 2014
- LVars: Lattice-based Data Structures for Deterministic Parallelism 2013
- Logic and Lattices for Distributed Programming 2012
Talks
Finally, for those of you who learn more by watching, here’s some talks.
- Distributed, Eventually Consistent Computations 2015
- CRDT: Datatype for the Apocalypse 2015
- Eventually Consistent Computation with CRDTs 2014
- SyncFree: Large-Scale Computation Without Synchronization 2014
- Designing for Partition Tolerance with CRDTs 2014
- Berlin Buzzwords 2014 - Consistency without Consensus 2014
- EmberConf 2014 - Convergent/Divergent 2014
- Wicked Good Ruby 2013 - Bloom: A Language for Disorderly Distributed Programming 2013
- Strange Loop 2012 - Eventually-Consistent Data Structures 2012
- Strong Eventual Consistency and Conflict-free Replicated Data Types 2011
I’m hoping to make this into a living document, so please submit pull requests or leave comments!