Consistency Models in Distributed Systems
After understanding causal ordering, the logical next step is to dive deep into consistency models, as they directly influence how distributed systems like Zanzibar handle correctness, availability, and performance.
Why Consistency Models?
Consistency models define the rules for how operations appear to users in a distributed system. Zanzibar’s strength lies in balancing external consistency (strict correctness) with bounded staleness (optimized performance). To design or work on similar systems, it’s crucial to understand these models and their trade-offs.
Key Concepts to Explore
Types of Consistency Models
1. Strong Consistency:
- Guarantees that all clients see the same data simultaneously after an update.
- Example: Spanner provides strong consistency with global synchronization using TrueTime.
- Trade-Off: Higher latency due to global coordination.
2. Eventual Consistency:
- Ensures that all replicas converge to the same state eventually, but immediate consistency is not guaranteed.
- Example: Amazon DynamoDB.
- Trade-Off: Lower latency but might return stale data temporarily.
3. Causal Consistency:
- Preserves causality between dependent events (as explained earlier).
- Example: Zanzibar ensures causal ordering to prevent “new enemy” problems.
4. Snapshot Consistency (Bounded Staleness):
- Allows reads at slightly older, consistent snapshots to improve performance while guaranteeing a coherent state.
- Example: Zanzibar uses snapshot reads to reduce latency while adhering to causality
TrueTime and Spanner
Zanzibar relies on Google Spanner’s TrueTime API for external consistency.
- TrueTime: A globally synchronized clock that assigns timestamps with bounded uncertainty.
- Learning Focus: How TrueTime ensures linearizability (strong consistency) by waiting for clock uncertainty to resolve.
- Impact of clock drift and synchronization delays on system performance.
Consistency vs. Availability (CAP Theorem)
Understand the CAP theorem
- Consistency: All nodes see the same data at the same time.
- Availability: Every request receives a response, even during failures.
- Partition Tolerance: The system continues to function despite network partitions.
Trade-Offs in Zanzibar
- Prioritizes consistency (external consistency) while optimizing latency and availability with techniques like caching and snapshot reads.
Quorum-Based Systems
Zanzibar achieves consistency through Spanner’s quorum-based replication.
Key Ideas:
- Read/Write Quorums: A majority of nodes must agree for a write or a read to be valid.
- Trade-Offs: Impact on latency and system performance during write-heavy workloads.
Techniques for Balancing Consistency and Performance
Snapshot Reads:
- Serve data from a consistent snapshot to reduce coordination overhead.
Caching:
- Local caches improve read performance but may introduce staleness.
- Zanzibar uses snapshot-aware caching to maintain consistency.
Staleness Tuning:
- Zanzibar dynamically adjusts snapshot staleness for different workloads to balance performance and correctness.
Practical Applications
- Design Distributed Systems: Apply the right consistency model based on the use case (e.g., strong consistency for financial transactions, eventual consistency for social media timelines).
- Understand Trade-Offs: Explore how systems like Spanner and DynamoDB choose their models to handle real-world constraints.
- Learnings from Zanzibar:
- How bounded staleness allows low-latency decisions while maintaining correctness for global users.
- Techniques like zookies (consistency tokens) to ensure causal relationships.
Suggested Resources
Papers:
- “Spanner: Google’s Globally Distributed Database.”
- “The CAP Theorem Revisited” by Gilbert and Lynch.
Hands-On:
- Experiment with distributed databases (e.g., CockroachDB, Amazon DynamoDB) to observe how consistency models impact performance.
By mastering consistency models, you’ll gain the foundational knowledge to design scalable, reliable, and high-performing distributed systems like Zanzibar, while making informed trade-offs tailored to specific use cases.
— — -
This article is part of the Paper “Zanzibar: Google’s Consistent, Global Authorization System” I am currently reading. More content from this paper is going to come.
If you find this useful follow me for more such content.