Readings in conflict-free replicated data types

22 Jul 2014

This is a work in progress post outlining research topics related to conflict-free replicated data types, or CRDTs.

Yesterday, Basho announced the release of Riak 2.0.0 RC1, which contains a comprehensive set of “data types” that can be used for building more robust distributed applications. For an overview of how to use these data types in Riak to avoid custom, and error prone, merge functions, see the Basho documentation site.

You’re probably more familiar with another name for these data types: conflict-free replicated data types (CRDTs). Simply put, CRDTs are data structures which capture some aspect of causality, along with providing interfaces for safely operating over the value and correctly merging state with diverged and concurrently edited structures.

This provides a very useful property when combined with an eventual consistency, or AP-focused, data store: Strong Eventual Consistency (SEC). Strong Eventual Consistency is an even stronger convergence property than eventual consistency: given that all updates are delivered to all replicas, there is no need for conflict resolution, given the conflict-free merge properties of the data structure. Simply put, correct replicas which have received all updates have the same state.

Here’s a great overview by one of the inventors of CRDTs, Marc Shapiro, where he discusses conflict-free replicated data types and their relation to strong eventual consistency.

In this Hacker News thread, there was an interesting discussion about why one might want to implement these on the server, why implementing them is non-trivial, and what the most recent research related to them consists of.

This post serves as a reading guide on the the various areas of conflict-free replicated data types. Papers are broken down into various areas and sorted in reverse chronologically.

Basics

Overviews and history of the various conflict-free replicated data types, implementation details, problem statements, etc.

Causality

Advanced causality tracking mechanisms, each focusing on a particular specific problem. Important in the implementation of some CRDTs and for historical purposes.

Uses

Various insights into developing on, and using, CRDTs in systems.

Optimizations

Optimizations, critical to making some CRDTs practical.

Verification

Verification of CRDTs and their use in eventually consistent applications.

Computations

Computing with CRDTs.

Talks

Finally, for those of you who learn more by watching, here’s some talks.

I’m hoping to make this into a living document, so please submit pull requests or leave comments!