Readings in conflict-free replicated data types

22 Jul 2014

This is a work in progress post outlining research topics related to conflict-free replicated data types, or CRDTs.

Yesterday, Basho announced the release of Riak 2.0.0 RC1, which contains a comprehensive set of “data types” that can be used for building more robust distributed applications. For an overview of how to use these data types in Riak to avoid custom, and error prone, merge functions, see the Basho documentation site.

You’re probably more familiar with another name for these data types: conflict-free replicated data types (CRDTs). Simply put, CRDTs are data structures which capture some aspect of causality, along with providing interfaces for safely operating over the value and correctly merging state with diverged and concurrently edited structures.

This provides a very useful property when combined with an eventual consistency, or AP-focused, data store: Strong Eventual Consistency (SEC). Strong Eventual Consistency is an even stronger convergence property than eventual consistency: given that all updates are delivered to all replicas, there is no need for conflict resolution, given the conflict-free merge properties of the data structure. Simply put, correct replicas which have received all updates have the same state.

Here’s a great overview by one of the inventors of CRDTs, Marc Shapiro, where he discusses conflict-free replicated data types and their relation to strong eventual consistency.

In this Hacker News thread, there was an interesting discussion about why one might want to implement these on the server, why implementing them is non-trivial, and what the most recent research related to them consists of.

This post serves as a reading guide on the the various areas of conflict-free replicated data types. Papers are broken down into various areas and sorted in reverse chronologically.

Basics

Overviews and history of the various conflict-free replicated data types, implementation details, problem statements, etc.

A comprehensive study of Convergent and Commutative Replicated Data Types Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski 2011
Conflict-free replicated data types Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski 2011
CRDTs: Consistency without concurrency control Mihai Letia, Nuno Preguiça, Marc Shapiro 2009
A Commutative Replicated Data Type for Cooperative Editing Nuno Preguica, Joan Manuel Marques, Marc Shapiro, Mihai Letia 2009
Designing a commutative replicated data type Marc Shapiro, Nuno Preguiça 2007

Causality

Advanced causality tracking mechanisms, each focusing on a particular specific problem. Important in the implementation of some CRDTs and for historical purposes.

Scalable and Accurate Causality Tracking for Eventually Consistent Stores Paulo Sérgio Almeida, Carlos Baquero, Ricardo Gonçalves, Nuno Preguiça, and Victor Fonte 2014
Dotted Version Vectors: Logical Clocks for Optimistic Replication Nuno Preguiça, Carlos Baquero, Paulo Sérgio Almeida, Victor Fonte, Ricardo Gonçalves 2010
Interval Tree Clocks Paulo Sérgio Almeida, Carlos Baquero, Victor Fonte 2008
Bounded Version Vectors José Bacelar Almeida, Paulo Sérgio Almeida, Carlos Baquero. 2004

Uses

Various insights into developing on, and using, CRDTs in systems.

Collaborative offline web applications using conflict-free replicated data types Santiago J. Castiñeira and Annette Bieniusa 2015
On the composability of the Riak DT map: expanding from embedded to multi-key structures Christopher Meiklejohn 2014
Riak DT map: a composable, convergent replicated dictionary Russell Brown, Sean Cribbs, Sam Elliott, Christopher Meiklejohn 2014
Riak PG: distributed process groups on dynamo-style distributed storage Christopher Meiklejohn 2013
Incremental stream processing using computational conflict-free replicated data types David Navalho, Sérgio Duarte, Nuno Preguiça, Marc Shapiro 2013
SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine Marek Zawirski, Annette Bieniusa, Valter Balegas, Sérgio Duarte, Carlos Baquero, Marc Shapiro, Nuno Preguiça 2013
Evaluating Dotted Version Vectors in Riak Ricardo Goncalves, Paulo Sergio Almeida, Carlos Baquero, Victor Fonte, and Nuno Preguica 2011

Optimizations

Optimizations, critical to making some CRDTs practical.

Merging OT and CRDT algorithms Ahmed-Nacer Mehdi, Pascal Urso, Valter Balegas, Nuno Perguiça 2014
Making operation-based CRDTs operation-based Carlos Baquero, Paulo Sérgio Almeida, Ali Shoker 2014
Efficient state-based CRDTs by decomposition Paulo Sérgio Almeida, Ali Shoker, Carlos Baquero 2014
Scalable Eventually Consistent Counters over Unreliable Networks Paulo Sérgio Almeida, Carlos Baquero 2013
An optimized conflict-free replicated set Annette Bieniusa, Marek Zawirski, Nuno Preguiça, Marc Shapiro, Carlos Baquero, Valter Balegas, Sérgio Duarte 2012

Verification

Verification of CRDTs and their use in eventually consistent applications.

Towards verifying eventually consistent applications Burcu Kulahcioglu Ozkan, Erdal Mutlu, Serdar Tasiran 2014
Vector Clocks in Coq: An Experience Report Christopher Meiklejohn 2014

Computations

Computing with CRDTs.

Selective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation Christopher S. Meiklejohn and Peter Van Roy 2015
Lasp: A Language for Distributed, Coordination-Free Programming Christopher S. Meiklejohn and Peter Van Roy 2015
Lasp: A Language for Distributed, Eventually Consistent Computations with CRDTs Christopher S. Meiklejohn and Peter Van Roy 2015
A Study of CRDTs that do Computations David Navalho, Sérgio Duarte, Nuno Preguiça 2015
Joining Forces: Toward a Unified Account of LVars and Convergent Replicated Data Types Lindsey Kuper, Ryan R. Newton 2014
Freeze After Writing: Quasi-Deterministic Parallel Programming with LVars Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, Ryan R. Newton 2014
LVars: Lattice-based Data Structures for Deterministic Parallelism Lindsey Kuper Ryan R. Newton 2013
Logic and Lattices for Distributed Programming Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, David Maier 2012

Talks

Finally, for those of you who learn more by watching, here’s some talks.

Distributed, Eventually Consistent Computations Christopher Meiklejohn 2015
CRDT: Datatype for the Apocalypse Alexander Songe 2015
Eventually Consistent Computation with CRDTs Christopher S. Meiklejohn 2014
SyncFree: Large-Scale Computation Without Synchronization Annette Bieniusa and Christopher S. Meiklejohn 2014
Designing for Partition Tolerance with CRDTs Carlos Baquero 2014
Berlin Buzzwords 2014 - Consistency without Consensus Peter Bourgon 2014
EmberConf 2014 - Convergent/Divergent Christopher Meiklejohn 2014
Wicked Good Ruby 2013 - Bloom: A Language for Disorderly Distributed Programming Christopher Meiklejohn 2013
Strange Loop 2012 - Eventually-Consistent Data Structures Sean Cribbs 2012
Strong Eventual Consistency and Conflict-free Replicated Data Types Marc Shapiro 2011

I’m hoping to make this into a living document, so please submit pull requests or leave comments!

Christopher Meiklejohn