In my previous post, we discussed the failure semantics of
pg2
during partitions and when dealing with dynamic cluster
management. In this post, we’re going to look at a common alternative
to pg2
, the extended process registry, gproc
.
gproc
?gproc
, as described by Ulf Wiger in his 2007 paper, outlines an
extended process registry for Erlang which provides a number of benefits
over the default process registry. It also provides a global mode,
implemented by leveraging gen_leader
, a leader election algorithm
based roughly on a 2005 paper and original implementation by
Thomas Arts and Hans Svensson.
Briefly, some of the notable improvements include:
We’re specifically going to look at emulating a pg2
-like interface
with gproc
, by tagging groups of processes with a name, and querying
the gproc
to get the list of processes.
Let’s explore the API using just the local distribution model of
gproc
.
First, register the current process and ask for the registered processes under that name.
To explain the registration key: group
is a term alias for the
process, p
represents a non-unique property, so we can tag multiple
processes, and l
means register locally.
gproc
.We start off by adding a dependency on gproc
to the test application
we used last time to test pg. We also need to add gen_leader
as a
dependency to support the global distribution of the registry.
We’ll use the current recommended version of gen_leader
by gproc
,
which is known to have multiple problems during
netsplits.
We’ll also modify the vm.args
file to enable global distribution.
Done.
It’s important to note that nodes must be connected prior to
starting the gproc
application (which is ultimately responsible for
starting gen_leader
with a candidates list.)
Let’s see this in action. Starting with our same riak_core
cluster of
three nodes, we’ll start the first two nodes and inspect the gproc
state.
Again, to explain the registration key: group
is a term alias for the
process, p
represents a non-unique property, so we can tag multiple
processes, and g
means register globally.
As the gproc
application was started prior to the nodes being
connected, they each are their own leader with their own state, which is
not replicated.
Now, let’s restart the gproc
application once we know for sure the
nodes are connected.
First, let’s make the first node, which has the registered process, unreachable.
The process is available for a short period, but gets removed shortly after the net tick time period times out the connection.
When the partition heals, we see the process return.
And, when we heal that partition.
First, identify the leader.
Now, take the leader down.
Then, join the third node.
Finally, restore the original leader.
Now, let’s check in on the third node.
We see that it’s fully connected, however the get_leader
call
continues to fail with timeouts until we restart gproc
on the first
two nodes. In fact, if we restart gproc
on all three nodes too
quickly, simulating small partitions during the restart period, we run
into the same situation again, with nodes being unable to determine who
the leader is due to deadlock. Some of these issues are documented in
Ulf’s 2007 paper in the future work section as well.
Let’s register a unique name on the primary node instead of the non-unique property types.
Now, let’s partition the network.
And, as soon as we heal the partition:
What? What happened to the registered name? Did the process die?
Nope.
But, it eventually returns. This is the behaviour when the leader becomes unavailable and there is a period of time where there is no leader elected.
Permanent data loss is also considered a possibility with some of the data structures, which is also mentioned in the paper. It’s important to note that I couldn’t trigger this type of behaviour with the non-unique property types.
I didn’t review counters, single, shared, or aggregated for data loss, as I’m saving that topic for a future blog post.
While gproc
has been tested very thoroughly using utilities like
Erlang QuickCheck, its reliance on gen_leader
is problematic. Given
numerous forks and implementations of gen_leader
trying to address the
data loss and netsplit issues, none have been proven to operate
correctly and guarantee termination. Above, using a few small simulated
tests across a maximum cluster size of three nodes, we’ve observed data
loss, failure to handle dynamic membership, and timeout situtations due
to deadlock and failed leader election.
In conclusion, gproc
seems to operate extremely well when used in a
local environment, but any use of it in a distributed system, where you
have dynamic cluster management and network partitions, appears to be
non-deterministic due to its reliance on gen_leader
.
Feedback encouraged!
Thanks to OJ Reeves and Heinz N. Gies for providing valuable feedback on this blog post.
View this story on Hacker News.