Basics
On this page
“Everything is a trade-off”
- Distributed System → a network of computers that work together to perform task(s)
- Characteristics of a Distributed System
- Availability → % uptime (ex: 99.99%)
- Reliability → remain operational despite component(s) failure, implies Availability
- Both achieved by Replication of data and Redundancy of services
- Scalability / Scaling → cope with increased demand without drop in performance
- Vertical Scaling - get a bigger machine, increase CPU, RAM, Storage, etc.
- Horizontal Scaling - add more machines/nodes, require Load Balancing
- Efficiency
- Latency / Response Time
- Throughput / Bandwidth
- Manageability / Maintainability / Serviceability → simplicity and speed to repair system, impacts Availability
- Theorems
- Definitions
- Consistency → system maintains a consistent state across all nodes
- Availability → system should always respond to requests regardless of the current state of the system
- Partition Tolerance → system continues to function even if there is a network partition
- Latency → time taken for system to respond to requests
- CAP → in the presence of a network partition, system must choose between Availability and Consistency
- PACELC → extension of CAP theorem, in the absence of network partitions, system must choose between Latency (Lower) and Consistency (Strong)
- Definitions
- Consistency Patterns
- Weak Consistency
- Eventual Consistency → data replicated asynchronously, takes milliseconds
- Strong Consistency → data replicated synchronously
- Availability Patterns
- Complementary Patterns: Fail-Over and Replication
- Fail-Over
- Active-Passive → heartbeats sent between active and passive servers, if the heartbeat is interrupted, the passive server assumes the active server’s IP address and resumes service
- Active-Active → both servers are managing traffic, spreading the load between them
- Public Facing → DNS need to know IPs of servers
- Internal Facing → Application logic need to know IPs of servers
- Replication → improve reliability and fault tolerance
- Single Leader, Multi Leader, Leaderless
- Single Leader Replication → writes/reads go to leader, reads go to followers (read-replicas), suitable for read-heavy system
- Multi Leader Replication → both leaders serves reads and writes and coordinate with each other on writes
- Leaderless Replication
- Synchronous, Asynchronous, Semi-Synchronous
- Synchronous → Strong Consistency
- Asynchronous → Eventual Consistency
- Single Leader, Multi Leader, Leaderless
- Datastores
- Relational / SQL → structured data, stored in tables/rows/columns, has primary key(s) and foreign key(s), able to perform joins between tables, support transactions, ACID-compliant, require atomic reads/writes, normalization
- Non-relational / NoSQL → semi-structured and unstructured data, BASE-compliant, denormalization
- Key-value → like a HashMap
- In-Memory Key-value
- Document → similar to JSON
- Wide Column → use tables, rows and columns but names and format of the columns can vary from row to row in the same table
- Graph → model relationships, use nodes (entities) and edges (relationships), think of Adjacency List, Neo4j
- Key-value → like a HashMap
- Data Consistency Models
- Trade-off between Consistency and Availability
- ACID (Atomicity, Consistency, Isolation, Durability) → associated with relational DB, support transactions and data integrity features, need strict consistency and isolation
- Offer stronger consistency guarantees but may be less available
- BASE (Basically Available, Soft state, Eventually consistent) → associated with non-relational DB, easily horizontally scalable, need high availability and scalability
- Offer weaker consistency guarantees but is more available
- Data Replication → process of making multiple copies of data and storing them on different servers, improves Availability and Durability of data
- Sharding or Data Partitioning → process of distributing data across a set of servers, improves the scalability and performance of the system
- Horizontal Partitioning / Sharding→ store rows in different nodes, use some key to partition data, key is an attribute of the data
- Range-based → ranges of a key
- Hash-based → pass key to hash function
- Directory-based → Lookup table
- Custom-based
- Vertical Partitioning → store columns in different nodes
- Horizontal Partitioning / Sharding→ store rows in different nodes, use some key to partition data, key is an attribute of the data
- Hashing → hashing function that maps key to value
- Consistent Hashing → Distributed Hash Table (DHT) algorithm, only
n / m
keys need to be remapped when a node gets added/removed (n = no. of keys, m = no. of slots) - Database Indexes
- Proxies
- Reverse Proxy → intermediary between Web Servers and the Internet
- Forward Proxy → intermediary between clients and the Internet
- Load Balancing/Balancer → distribute traffic across cluster of servers (traffic police)
- Redundant LB (active/passive) → eliminate single Point of Failure (PoF)
- Traffic Distribution
- Perform Health Checks
- Naive: Round Robin, Weighted Round Robin
- Better: Least Response Time
- Traffic Distribution
- Location: Between
- User and Web Server
- Web Server and Internal Platform Layer (Application Servers or Cache Servers)
- Internal Platform Layer and Database
- Redundant LB (active/passive) → eliminate single Point of Failure (PoF)
- Cache → store data temporarily in-memory
- Time to Live (TTL)
- Cache Invalidation
- Write-through Cache
- Write-around Cache
- Write-back Cache
- Cache Eviction Policies
- Least Recently Used (LRU)
- Content Delivery Network (CDN) → geographically distributed cache servers, push content closer to the users, improve performance by reducing latency (shortened physical distance)
- Push CDN → push content to CDN whenever changes occur on your server
- Pull CDN → grab new content from your server
- Network Protocols → help transmit data over the internet
- Transmission (TCP) → transport-layer protocol, ensure that data is delivered to its destination without errors and in the correct order
- User Datagram Protocol (UDP) → transport-layer protocol, does not guarantee the delivery of data or the order in which it is delivered, used for real-time applications (online gaming or voice over IP), low latency is more important than reliability
- Hypertext Transfer Protocol (HTTP) → application-layer protocol, used to transfer data over the web
- HTTP1 → stateless protocol, request-response model, text-based encoding, built on top of TCP
- HTTP2 → similar to previous version but improved: supports multiplexing (able to send multiple requests over a single connection), binary encoding, support server push, header compression
- HTTP3 → stateful protocol, based on UDP, new binary encoding (QUIC), server push
- Communication Protocols
- Polling → AJAX/Fetch requests at regular intervals
- Long Polling → like Polling, but server holds on to the connection longer until new data is available (timeout if too long without response)
- WebSockets → full duplex (bidirectional) connection over TCP, establish “Websocket handshake” between client and server
- Server Sent Events → client establishes a persistent and long-term connection with server, server sends real-time data to client (unidirectional)
- Architecture Patterns
- Monolith → deploy as a single, large, stand-alone application (single codebase)
- Microservices → deploy application as a collection of small, independent services that communicate with each other over the network
- Event-Driven Architecture (EDA) → events to trigger and communicate between decoupled services, consist of Event Producers, Event Routers (Message Brokers), and Event Consumers (Workers)
- Serverless → Functions that respond to requests, event-driven
- Components of Web-based Distributed System
- Clients → Mobile, Desktop, etc.
- Domain Name System (DNS) → map domain names to IP addresses
- Content Delivery Network (CDN) → geographically distributed, store static content
- API Gateway → authentication/authorization, route to proper services
- Load Balancer → traffic police
- Web Server(s) → handle HTTP requests and responses only
- Message Queue → store and transmit requests for further processing by Application Server(s)
- Application Server(s) → execute business logic to application programs through any number of protocols
- File System → store and manage files
- Cache Server(s) → temporary, store hot DB rows in memory
- Database Server(s) → read and write data to disk
- Popular Services
- Relational Database → Amazon RDS/Aurora, Azure SQL Database, Google Spanner
- Non-relational Database → Amazon DynamoDB, Azure CosmosDB, Google Datastore
- Object Storage → AWS S3, Azure Blob Storage, Google Cloud Storage
- Cache (in-memory datastore) → Amazon ElastiCache, Azure Cache, Google Memorystore
- Redis, Memcached
- CDN → Amazon CloudFront, Azure CDN, Google CDN
- Serverless → AWS Lambda, Azure Functions, Google Cloud Functions