Why CA System Is Not Possible in Distributed World

We know about CAP theorem for distributed system.

Consistency: Any node which is not down must return the latest write when data is read from it.

Availability: Any node which is not down must return non-error response for a read. Data might not be the most recent.

Partition Tolerance: In case of network breakdown among nodes, system should function & be able to serve requests.

Now can we have a distributed system where we won't see any network partition? In all practical purposes, the answer is no. Any distributed system at scale with hundreds or thousands of nodes (may be in same data center or in different data centers across globe) will see network failures. And we would handle it. It just depends whether we would make our system as CP or AP or most likely somewhere in between these two.

Suppose there is a network partition due to network breakdown or network delay. In Group-A, we have 5 nodes. And in Group-B, we have 3 nodes. The nodes can communicate within the same group. But due to network partition, inter-communication between the groups are broken. So an update in Group-A node won't be visible to Group-B nodes & vice versa.

If we want to make our system CP, then we need to cordon a group, maybe Group-B & send all requests only to Group-A. We are compromising with Availability, but our system is CP. And it's not like our system is unavailable, we have partial availability of nodes. If Group-A can handle the additional traffic, then availability wise our system is doing just fine. Once network partition is recovered & nodes in Group-B are in sync, they can be added back to the distributed system.

Now if we want to make system AP, both Group-A & Group-B will serve requests. But data won't be in sync when subsequent requests go to different groups. So user might see stale or inconsistent results in some requests. It may or may not be fine with your use case. Once network partition is recovered, there should be some mechanism to come to a consistent state (there may be some data loss in the process).

But then why CA system comes up? CA can be a valid scenario if you have a single node system, not in distributed system. In practical distributed world, we will always try to find a trade-off somewhere between CP & AP system. Network partition is real after all.

Comments