Engineering Playbook

Distributed Systems

CAP, PACELC, and Consensus (feat. Apache Kafka).

Distributed Systems Theory

A distributed system is a group of computers working together that appears to the user as a single computer.

The defining characteristic is Partial Failure. In a monolith, if the CPU dies, the system stops. In a distributed system, one node dies, or the network drops a packet, but the system must keep running.

Note on Examples

To make these abstract concepts concrete, I use Apache Kafka as the primary case study throughout this page. It is the quintessential distributed system that forces you to make these trade-offs explicitly via configuration.


The 8 Fallacies

L. Peter Deutsch's famous list. If you assume these are true, your system will fail.

  1. The network is reliable. (It isn't. Packets drop.)
  2. Latency is zero. (Speed of light is a hard limit.)
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn't change. (Pods scale up/down constantly.)
  6. There is one administrator.
  7. Transport cost is zero. (Serialization is expensive.)
  8. The network is homogeneous.

CAP Theorem

You can only have two of the three: Consistency, Availability, or Partition Tolerance.

Since Network Partitions (P) are a fact of physics (cables get cut), you must choose between CP and AP.

CAP in Practice: System Diagrams

CP System (Consistency over Availability)

Characteristics:

  • Primary node handles all writes
  • Data synchronously replicated to replicas
  • If partition occurs, system becomes unavailable rather than inconsistent
  • Use Case: Financial transactions, inventory management
  • Example: Traditional banking systems with strong consistency requirements

AP System (Availability over Consistency)

Characteristics:

  • Any node can handle writes
  • Data eventually synchronized in background
  • System remains available during partitions
  • Use Case: Social media feeds, logging systems
  • Example: Instagram feed updates, distributed caching

Case Study: Kafka and CAP

Kafka is unique because it lets you choose CP or AP via configuration.

  • CP Mode (Strong Consistency):
    • Config: acks=all, min.insync.replicas=2.
    • Behavior: The producer waits for the Leader AND a Follower to acknowledge the data. If the Follower is down (Partition), the write fails (Unavailable). Data is safe, but uptime suffers.
  • AP Mode (High Availability):
    • Config: acks=1 (or 0).
    • Behavior: The Leader acknowledges immediately. If the Leader dies before replicating to a Follower, data is lost. But the system accepts writes faster and stays up even if Followers are down.

PACELC Theorem

CAP only applies when there is a failure. What about when the system is running normally?

If there is a Partition (P), trade off Availability (A) and Consistency (C). Else (E) when running normally, trade off Latency (L) and Consistency (C).

The Kafka Trade-off:

  • High Consistency = High Latency: Using acks=all forces the system to wait for network replication to multiple nodes. This adds milliseconds to every request.
  • Low Consistency = Low Latency: Using acks=0 (Fire and forget) is near-instant, but risky.

Consistency Models

It is not just "Strong" vs "Weak." It is a spectrum.

  1. Linearizability (Strongest): Acts like a single machine.
  2. Sequential Consistency: Global ordering is preserved.
  3. Eventual Consistency: Everyone will agree... eventually.

Kafka's Ordering Guarantee

Kafka provides Partition-level Ordering.

If you send Message A then Message B to Partition 1, Kafka guarantees B will always come after A.

However, if you send A to Partition 1 and B to Partition 2, there is no global guarantee which one a consumer sees first. This is a trade-off: sacrificing Global Ordering for Massive Parallelism.


Consensus Algorithms

How do nodes agree on who is the Leader?

Raft (The Modern Standard)

Raft is designed to be understandable. It relies on a Leader receiving writes and replicating them to Followers. A write is "Committed" only when a Quorum (Majority) confirms it.

Raft Consensus Flow

Key Raft Concepts:

  • Leader Election: Nodes vote for leader based on term and log completeness
  • Log Replication: Leader appends to local log, then replicates to followers
  • Commit Rules: Entry is committed once majority of followers acknowledge
  • Safety: Only leader can accept client requests
  • Liveness: System continues operating as long as majority of nodes are available

Kafka's Consensus (KRaft vs ZooKeeper)

  • Historically (ZooKeeper): Kafka used an external system (ZooKeeper/ZAB) to elect leaders. This was complex to manage.
  • Modern (KRaft): Kafka now uses an internal Raft-based controller. The metadata log is just a topic inside Kafka itself.

ZooKeeper vs KRaft Architecture Comparison

Benefits of KRaft:

  • Simplified Operations: No external coordination service
  • Lower Latency: Internal consensus eliminates network hop
  • Reduced Complexity: Fewer moving parts to manage
  • Better Performance: Optimized for Kafka's specific use case

Real-World Distributed System Examples

Example 1: E-Commerce Platform

Distributed Challenges:

  • Order Processing: Must ensure ACID properties across services
  • Inventory Management: Real-time stock updates across multiple warehouses
  • Payment Processing: Exactly-once processing to prevent double charges
  • User Session Management: Consistent session state across services

Example 2: Microservices with Event Sourcing

Event Sourcing Benefits:

  • Complete Audit Trail: Every state change is recorded
  • Temporal Queries: Reconstruct system state at any point in time
  • Debugging: Replay events to reproduce issues
  • Scalability: Read and write sides scale independently

Logical Clocks

In distributed systems, you cannot rely on wall-clock time (Date.now()) because server clocks drift.

Clock Synchronization Issues

How Kafka Solves This: Kafka uses Offsets. An Offset is simply an integer (0, 1, 2, 3...). It represents the chronological order of events. It doesn't matter what time it is; if Message A has Offset 50 and Message B has Offset 51, we know A happened before B.

This is a practical implementation of a Lamport Timestamp (or logical counter).

Logical Clock Example

Benefits of Logical Clocks:

  • No Synchronization Overhead: Don't need NTP or time sync protocols
  • Causal Ordering: Can determine event causality
  • Distributed Consistency: Consistent ordering across all nodes
  • Fault Tolerant: Works even if some nodes are down