Lots of things can go wrong in distributed systems.

How can we build fault-toolerant distributed systems, assuming all the problems we’ve discusssed can occur?

We can use a similar approach as with transactions- allow our application to use general purpose abstractions with useful guarantees.

Consistency Guarantees

Most databases provide a guarantee of eventual consistency, which is honestly a pretty weak guarantee.

Here, let’s explore some stronger consistency models and their tradeoffs.

A quick roadmap

  • What is linearilizability (strongest commonly used consitency model)
  • Ordering GUarantees
  • How to atomically commit a distributed transaction

Linearizability

Linearizability

  • Synonyms: atomic consistency, strong consistency, immediate consistency, external consistency
  • Core idea: give client the illusion there’s only one replica (ie never have separate ones transmit different data)
  • Basically, its a recency guarantee

What Makes a System Linearizable?

Every read must return the value set by the most recent write!

Note: Serializability vs Linearizability Serializability: isolation prperty of transactions. Guarantees transactions behave as if they had been executed in some serial order (order can be different from the order in which transactions were actually run) Linearizability: recency guarantees on operations of a register (individual object). Doesn’t group opreations into transactions so it doesn’t prevent problems like write skew

Relying on Lineraizability

When is linearizabilitty useful / essential?

Locking and leader election

Use case: Single leader replciation- need one leader- how to make sure you elect just one leader?

  • Distributed locking with linearizable operations

Constains and uniquess guarantees

Use case: username and email address must uniquely identify one user and file storage service canno have same path and filename

  • Similar to lock- need linearizability

Cross-channel timing dependencies

Use case: multiple comunication channels

Implementing Linearizable Systems

How can we actually implement a system with linearizable semantics?

Basic idea of linearizability: “behave as if only one copy of the data and all operations are atomic.”

Naive approach: use one copy of the data

But when using replication, how amenable are different methods to linearizability?

  • Single Leader Replication: potentially
    • Leader has primary copy of data and folllowers maintain backup copies
    • If you make reads from leadr or from synchronously updaed followers, can be linearizable
  • Consensus Algorithms: linearizable
  • Multi-leader Replication: not
    • They concurrently process writes on multiple nodes and asynchronously replicate them- can create conflicts
  • Leaderless Replication: probably not
    • Some claim that you can get strong consistency by requirimg quorum reads and writes- generally not quite true
    • Last write win conflict resolution based on time-of-day clocks re almost certainly nonlinearizable and same for slppy quorums

Linearizability and quorums

Seems like strict quorum reads / writes should be linearizable, but this is only really possible at the cost of reduced performacnce- reader must rperofrmm read repeir synchronously

The cost of Linearizability

Useful to explore pros / cons of linearizability especially as only some replication methods can provide it

Consider impact of network interruption between two datacenters

  • I multi-leader datbase- each datacenter can operate normally
  • Single leader- leader must e in one of teh dataceenters and thus if
    • Linearizable: clients connected to follower datasets cannot make reads
    • Non-linearizable: will stilll make (possibly stale) reads

CAP Theorem

Any linearizable database has this problem- the tradeoff:

  • If your application requires linearizability and some replicas are disconnected, the some replicas can’t process requests while they’re disconnected
  • If your application does not require linearizability, it can be written in a way that each replcia can process requests independently. Can be available in the face of a network but without being linearizable

Insight: Applications which don’t require linearizability can be more tolerant of network problems

Note: this is known as the CAP theorem

CAP

  • Originally: broad rule of thumb
  • Created a movement/shift
    • Before: focus on distibuted systems with liearizable semantics
    • Then: wider design space
  • Now superseded by more precise results

Linearizability and network delays

Very few systems are actually linearizable in practice

Not even RAM onn modern ulti-core CPU is linearizable

  • Each CPU core has its own memory cache and store buffer
  • Memory access first to the cache by default- cahngs are asynchronously written out to main memory

Why make this tradeoff? Performance! Linearizability is slow

Ordering Guarantees

Linearizability implies that operations are executed in a well-defined order

Remember:

  • Leader in single leader replication determines order of writes in the replication log
  • Serializability is all about ensuring tranactions behave as if executed in some sequential order
  • Clocks

Ordering and Casuaility

One reason why ordering keeps coming up

  • It helps preserve causality TODO Why do we care about causality?
  • Imposes an ordering on events

Distributed Transactions and Consensus