TWS-4: Gossip protocol: Epidemics and rumors to the rescue

Having successfully completed a grueling yet enjoyable ‘Cloud Computing Concepts’ course at Coursera, from the University of Illinois at Urbana-Champaign, by Prof Indranil Gupta, I continue on my “Thinking Web Scale (TWS)” series of posts. In this post, I would like to dwell on Gossip Protocol.

Gossip protocol finds its way into distributed system from Epidemiology, a branch of science, which studies and models how diseases, rumors spread through society. The gossip protocol disseminates information – the way diseases, rumors spread in society or the way a computer virus is able to infect large networks very rapidly

Gossip protocol is particularly relevant in large distributed systems with hundreds and hundreds of servers spread across multiple data centers for e.g. Social networks like Facebook, Google or Twitter etc.. The servers that power Google’s search, or the Facebook or Twitter engine is made of hundreds of commercial off the shelf (COTS) computers. This is another way of saying that the designers of these systems should fold extremely high failure rates of the servers into their design. In other words “failures will be the norm and not the exception”

As mentioned in my earlier post, in these large distributed systems servers will be fail and new servers will be continuously joining the system. The distributed system must be able to accommodate servers joining or leaving the system. There is no global clock and each server has its own clock. To handle server failures data is replicated over many servers which obviously leads to issues of maintaining data consistency between the replicas.

A well-designed distributed system must include in its design key properties of

Availability – Data should be available when you want it
Consistency – Data should consistent across multiple copes
Should be fault tolerant
Should be scalable
Handle servers joining or leaving the systems transparently

One interesting aspect of Distributed Systems much like Operating System (OS) is the fact that a lot of the design choices are based on engineering judgments. The design choices are usually a trade-off of slightly different performance characteristics. Some of them are obvious and some not so obvious.

Why Gossip protocol? What makes it attractive?

Here are some approaches

Centralized Server:

Let us assume that in a network of servers we have a server (Server A) has some piece of information which it needs to spread to other servers. One way is to have this server send the message to all the servers. While this would work there are 2 obvious deficiencies with this approach

The Server A will hog the bandwidth in transmitting the information to all other servers
Server A will be a hot spot besides also being a Single Point of Failure

Cons: In other words if we have a central server always disseminating information then we run into the issue of ‘Single point of Failure’ of this central server.

Directed Graph

Assuming that we construct a directed overlay graph over the network of servers, we could transmit the message from server A to all other servers. While this approach, has the advantage of lesser traffic as each server node will typically have around a 1 -3 children. This will result in lesser bandwidth utilization. However the disadvantage to this approach, will be that , when an intermediate non-leaf node fails then information will not reach all children of the failed nodes.

Cons: Does not handle failures of non-leaf nodes well

Ring Architecture

In this architecture we could have Server A, pass the message round the ring till it gets to the desired server. Clearly each node has one predecessor and one successor. Like the previous example this has the drawback that if one or more servers of the ring fail then the message does not get to its destination.

Cons: Does not handle failures of nodes in the ring well

Note: We should note that these engineering choices only make sense in certain circumstances. So for e.g. the directed graph or the ring structure discussed below have deficiencies for the distributed system case, however these are accepted design patterns in computer networking for e.g. the Token Ring IEEE 802.5 and graph of nodes in a network. Hierarchical trees are the norm in telecom networks where international calls reach the main trunk exchange, then the central office and finally to the local office in a route that is a root-non-leaf-leaf route.

Gossip protocol

Enter the Gossip protocol (here is a good summary on gossip protocol). In the gossip protocol each server sends the message to ‘b’ random peers. The value ‘b’ typically a small number is called the fan-out. The server A which has the data is assumed to be ‘infected’. In the beginning only server A is infected while all other servers are ‘susceptible’. Each server receiving the message is now considered to be infected. Each infected server transmits to ‘b’ other servers. It is likely that the receiving sever is already infected in which case it will drop the message.

In many ways this is similar to the spread of a disease is through a virus. The disease spreads when an infected person comes in contact with another person.

The nice part about the gossip protocol is that is light weight and it can infect the entire set of servers in the order of O (log N)

This is fairly obvious as each round the ‘b’ infected servers will infect ‘b*n’ other servers where ‘n’ is the fan-out.
The computation is as follows

Let x0 = n (Initial state, all un-infected) and y0 =1 (1 infected server) at time t = 0
With x0 + y= n + 1 at all times

Let β be the contact rate between the ‘susceptible’ and ‘infected’ (x*y), then the rate of infection can be represents as
dx/dt= -βxy

The negative sign indicates that the number of ‘non-infected’ servers will decrease over time
(It is amazing how we can capture the entire essence of the spread of disease through a simple, compact equation)

The solution for the above equation (which I have taken in good faith, as my knowledge in differential equations is a faint memory. Hope to refresh my memory when I get the chance, though!)
x=n(n+1)/(n+e^β(n+1)t ) – 1
y=(n+1)/(1+ne^(-β(n+1)t)) – 2

The solution (1) clearly shows that the number ‘x’ of un-infected servers at time‘t’ rapidly to 0 as the denominator becomes too large. The number of infected units ‘y’ as t increases tends to n+1, or in other words all servers get infected

This method where infected server sends a message to ‘b’ servers is known as the ‘push’ approach.

Pros: The Gossip protocol clearly is more resilient to servers failing as the gossip message is sent a ‘b’ random targets and can handle failures better.
Cons: There is a possibility that the ‘b’ random targets selected for infection are already infected, in which case the infection can die rapidly if these infected servers fail.

The solution for the above is to have a ‘pull’ approach where after a time ‘t’ the un-infected servers pull the data from random servers. This way the un-infected servers will also get infected if they pull the data from already infected servers

A third approach is to have a combination of a push-pull approach.
Gossip has been used extensively in Facebook’s and Apache’s Cassandra NoSQL database. Amazon’s Dynamo DB and Riak NoSQL DB also use forms of Gossip Protocol

Failure detection: Gossip protocol has been used extensively in detecting failures. The failed servers are removed from the membership list and this is list is gossiped so that all servers have a uniform view of the set of live servers. However, as with any approach this is prone to high rate false-positives, where servers are assumed to have failed even though this may have been marked as ‘failed’ because of a temporary network failure. Moreover the network load on epidemic style membership lists are also high.

Some methods to handle false positives is to initially place failed servers under a ‘suspicion’. When the number of messages attributing failure to this server increases above a threshold ‘t’, then the server is assumed to have failed and removed from the membership list.

Cassandra uses a failure ‘accrual’ mechanism to detect failures in the distributed NoSQL datanase

Epidemic protocols, like the gossip protocol are particularly useful in large scale distributed systems where servers leave and join the system.

One interesting application of the epidemic protocol is to simply to collect the overall state of the system. If we consider an information exchange where all nodes have set an internal value xi = 0 except node 1 which has x1=1 (infected) (from the book Distributed Systems: Principles & paradigms by Andrew Tannenbaum and Maarten Van Steen)

where xi = 1 if i =1, or 0 if i > 1
If the nodes gossip this value and compute the average (xi + xj) /2, then after a period of time this value will tend towards 1/N where N is the total number of nodes in the system. Hence all the servers in the system will become aware of the total size of the system.

Conclusion: Gossip protocol has widespread application in distributed systems of today, from spreading information, membership, failure detection, monitoring and alarming. It is really interesting to note that the theory of epidemics or disease spread from a branch of sociology become so important in a field of computer science.

Also see
1. A crime map of India in R: Crimes against women
2. What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
3. Bend it like Bluemix, MongoDB with autoscaling – Part 2
4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
5. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data