Distributed Systems
CAP, PACELC, and Consensus (feat. Apache Kafka).
Distributed Systems Theory
A distributed system is a group of computers working together that appears to the user as a single computer.
The defining characteristic is Partial Failure. In a monolith, if the CPU dies, the system stops. In a distributed system, one node dies, or the network drops a packet, but the system must keep running.
Note on Examples
To make these abstract concepts concrete, I use Apache Kafka as the primary case study throughout this page. It is the quintessential distributed system that forces you to make these trade-offs explicitly via configuration.
The 8 Fallacies
L. Peter Deutsch's famous list. If you assume these are true, your system will fail.
- The network is reliable. (It isn't. Packets drop.)
- Latency is zero. (Speed of light is a hard limit.)
- Bandwidth is infinite.
- The network is secure.
- Topology doesn't change. (Pods scale up/down constantly.)
- There is one administrator.
- Transport cost is zero. (Serialization is expensive.)
- The network is homogeneous.
CAP Theorem
You can only have two of the three: Consistency, Availability, or Partition Tolerance.
Since Network Partitions (P) are a fact of physics (cables get cut), you must choose between CP and AP.
CAP in Practice: System Diagrams
CP System (Consistency over Availability)
Characteristics:
- Primary node handles all writes
- Data synchronously replicated to replicas
- If partition occurs, system becomes unavailable rather than inconsistent
- Use Case: Financial transactions, inventory management
- Example: Traditional banking systems with strong consistency requirements
AP System (Availability over Consistency)
Characteristics:
- Any node can handle writes
- Data eventually synchronized in background
- System remains available during partitions
- Use Case: Social media feeds, logging systems
- Example: Instagram feed updates, distributed caching
Case Study: Kafka and CAP
Kafka is unique because it lets you choose CP or AP via configuration.
- CP Mode (Strong Consistency):
- Config:
acks=all,min.insync.replicas=2. - Behavior: The producer waits for the Leader AND a Follower to acknowledge the data. If the Follower is down (Partition), the write fails (Unavailable). Data is safe, but uptime suffers.
- Config:
- AP Mode (High Availability):
- Config:
acks=1(or0). - Behavior: The Leader acknowledges immediately. If the Leader dies before replicating to a Follower, data is lost. But the system accepts writes faster and stays up even if Followers are down.
- Config:
PACELC Theorem
CAP only applies when there is a failure. What about when the system is running normally?
If there is a Partition (P), trade off Availability (A) and Consistency (C). Else (E) when running normally, trade off Latency (L) and Consistency (C).
The Kafka Trade-off:
- High Consistency = High Latency: Using
acks=allforces the system to wait for network replication to multiple nodes. This adds milliseconds to every request. - Low Consistency = Low Latency: Using
acks=0(Fire and forget) is near-instant, but risky.
Consistency Models
It is not just "Strong" vs "Weak." It is a spectrum.
- Linearizability (Strongest): Acts like a single machine.
- Sequential Consistency: Global ordering is preserved.
- Eventual Consistency: Everyone will agree... eventually.
Kafka's Ordering Guarantee
Kafka provides Partition-level Ordering.
If you send Message A then Message B to Partition 1, Kafka guarantees B will always come after A.
However, if you send A to Partition 1 and B to Partition 2, there is no global guarantee which one a consumer sees first. This is a trade-off: sacrificing Global Ordering for Massive Parallelism.
Consensus Algorithms
How do nodes agree on who is the Leader?
Raft (The Modern Standard)
Raft is designed to be understandable. It relies on a Leader receiving writes and replicating them to Followers. A write is "Committed" only when a Quorum (Majority) confirms it.
Raft Consensus Flow
Key Raft Concepts:
- Leader Election: Nodes vote for leader based on term and log completeness
- Log Replication: Leader appends to local log, then replicates to followers
- Commit Rules: Entry is committed once majority of followers acknowledge
- Safety: Only leader can accept client requests
- Liveness: System continues operating as long as majority of nodes are available
Kafka's Consensus (KRaft vs ZooKeeper)
- Historically (ZooKeeper): Kafka used an external system (ZooKeeper/ZAB) to elect leaders. This was complex to manage.
- Modern (KRaft): Kafka now uses an internal Raft-based controller. The metadata log is just a topic inside Kafka itself.
ZooKeeper vs KRaft Architecture Comparison
Benefits of KRaft:
- Simplified Operations: No external coordination service
- Lower Latency: Internal consensus eliminates network hop
- Reduced Complexity: Fewer moving parts to manage
- Better Performance: Optimized for Kafka's specific use case
Real-World Distributed System Examples
Example 1: E-Commerce Platform
Distributed Challenges:
- Order Processing: Must ensure ACID properties across services
- Inventory Management: Real-time stock updates across multiple warehouses
- Payment Processing: Exactly-once processing to prevent double charges
- User Session Management: Consistent session state across services
Example 2: Microservices with Event Sourcing
Event Sourcing Benefits:
- Complete Audit Trail: Every state change is recorded
- Temporal Queries: Reconstruct system state at any point in time
- Debugging: Replay events to reproduce issues
- Scalability: Read and write sides scale independently
Logical Clocks
In distributed systems, you cannot rely on wall-clock time (Date.now()) because server clocks drift.
Clock Synchronization Issues
How Kafka Solves This: Kafka uses Offsets. An Offset is simply an integer (0, 1, 2, 3...). It represents the chronological order of events. It doesn't matter what time it is; if Message A has Offset 50 and Message B has Offset 51, we know A happened before B.
This is a practical implementation of a Lamport Timestamp (or logical counter).
Logical Clock Example
Benefits of Logical Clocks:
- No Synchronization Overhead: Don't need NTP or time sync protocols
- Causal Ordering: Can determine event causality
- Distributed Consistency: Consistent ordering across all nodes
- Fault Tolerant: Works even if some nodes are down