Basics

“Everything is a trade-off”

  • Distributed System → a network of computers that work together to perform task(s)
  • Characteristics of a Distributed System
    • Availability → % uptime (ex: 99.99%)
    • Reliability → remain operational despite component(s) failure, implies Availability
      • Both achieved by Replication of data and Redundancy of services
    • Scalability / Scaling → cope with increased demand without drop in performance
      • Vertical Scaling - get a bigger machine, increase CPU, RAM, Storage, etc.
      • Horizontal Scaling - add more machines/nodes, require Load Balancing
    • Efficiency
      • Latency / Response Time
      • Throughput / Bandwidth
    • Manageability / Maintainability / Serviceability → simplicity and speed to repair system, impacts Availability
  • Theorems
    • Definitions
      • Consistency → system maintains a consistent state across all nodes
      • Availability → system should always respond to requests regardless of the current state of the system
      • Partition Tolerance → system continues to function even if there is a network partition
      • Latency → time taken for system to respond to requests
    • CAP → in the presence of a network partition, system must choose between Availability and Consistency
    • PACELC → extension of CAP theorem, in the absence of network partitions, system must choose between Latency (Lower) and Consistency (Strong)
  • Consistency Patterns
    • Weak Consistency
    • Eventual Consistency → data replicated asynchronously, takes milliseconds
    • Strong Consistency → data replicated synchronously
  • Availability Patterns
    • Complementary Patterns: Fail-Over and Replication
    • Fail-Over
      • Active-Passive → heartbeats sent between active and passive servers, if the heartbeat is interrupted, the passive server assumes the active server’s IP address and resumes service
      • Active-Active → both servers are managing traffic, spreading the load between them
        • Public Facing → DNS need to know IPs of servers
        • Internal Facing → Application logic need to know IPs of servers
    • Replication → improve reliability and fault tolerance
      • Single Leader, Multi Leader, Leaderless
        • Single Leader Replication → writes/reads go to leader, reads go to followers (read-replicas), suitable for read-heavy system
        • Multi Leader Replication → both leaders serves reads and writes and coordinate with each other on writes
        • Leaderless Replication
      • Synchronous, Asynchronous, Semi-Synchronous
        • Synchronous → Strong Consistency
        • Asynchronous → Eventual Consistency
  • Datastores
    • Relational / SQL → structured data, stored in tables/rows/columns, has primary key(s) and foreign key(s), able to perform joins between tables, support transactions, ACID-compliant, require atomic reads/writes, normalization
    • Non-relational / NoSQL → semi-structured and unstructured data, BASE-compliant, denormalization
      • Key-value → like a HashMap
        • In-Memory Key-value
      • Document → similar to JSON
      • Wide Column → use tables, rows and columns but names and format of the columns can vary from row to row in the same table
      • Graph → model relationships, use nodes (entities) and edges (relationships), think of Adjacency List, Neo4j
  • Data Consistency Models
    • Trade-off between Consistency and Availability
    • ACID (Atomicity, Consistency, Isolation, Durability) → associated with relational DB, support transactions and data integrity features, need strict consistency and isolation
      • Offer stronger consistency guarantees but may be less available
    • BASE (Basically Available, Soft state, Eventually consistent) → associated with non-relational DB, easily horizontally scalable, need high availability and scalability
      • Offer weaker consistency guarantees but is more available
  • Data Replication → process of making multiple copies of data and storing them on different servers, improves Availability and Durability of data
  • Sharding or Data Partitioning → process of distributing data across a set of servers, improves the scalability and performance of the system
    • Horizontal Partitioning / Sharding→ store rows in different nodes, use some key to partition data, key is an attribute of the data
      • Range-based → ranges of a key
      • Hash-based → pass key to hash function
      • Directory-based → Lookup table
      • Custom-based
    • Vertical Partitioning → store columns in different nodes
  • Hashing → hashing function that maps key to value
  • Consistent Hashing → Distributed Hash Table (DHT) algorithm, only n / m keys need to be remapped when a node gets added/removed (n = no. of keys, m = no. of slots)
  • Database Indexes
  • Proxies
    • Reverse Proxy → intermediary between Web Servers and the Internet
    • Forward Proxy → intermediary between clients and the Internet
  • Load Balancing/Balancer → distribute traffic across cluster of servers (traffic police)
    • Redundant LB (active/passive) → eliminate single Point of Failure (PoF)
      • Traffic Distribution
        • Perform Health Checks
        • Naive: Round Robin, Weighted Round Robin
        • Better: Least Response Time
    • Location: Between
      • User and Web Server
      • Web Server and Internal Platform Layer (Application Servers or Cache Servers)
      • Internal Platform Layer and Database
  • Cache → store data temporarily in-memory
    • Time to Live (TTL)
    • Cache Invalidation
      • Write-through Cache
      • Write-around Cache
      • Write-back Cache
    • Cache Eviction Policies
      • Least Recently Used (LRU)
    • Content Delivery Network (CDN) → geographically distributed cache servers, push content closer to the users, improve performance by reducing latency (shortened physical distance)
      • Push CDN → push content to CDN whenever changes occur on your server
      • Pull CDN → grab new content from your server
  • Network Protocols → help transmit data over the internet
    • Transmission (TCP) → transport-layer protocol, ensure that data is delivered to its destination without errors and in the correct order
    • User Datagram Protocol (UDP) → transport-layer protocol, does not guarantee the delivery of data or the order in which it is delivered, used for real-time applications (online gaming or voice over IP), low latency is more important than reliability
    • Hypertext Transfer Protocol (HTTP) → application-layer protocol, used to transfer data over the web
      • HTTP1 → stateless protocol, request-response model, text-based encoding, built on top of TCP
      • HTTP2 → similar to previous version but improved: supports multiplexing (able to send multiple requests over a single connection), binary encoding, support server push, header compression
      • HTTP3 → stateful protocol, based on UDP, new binary encoding (QUIC), server push
  • Communication Protocols
    • Polling → AJAX/Fetch requests at regular intervals
    • Long Polling → like Polling, but server holds on to the connection longer until new data is available (timeout if too long without response)
    • WebSockets → full duplex (bidirectional) connection over TCP, establish “Websocket handshake” between client and server
    • Server Sent Events → client establishes a persistent and long-term connection with server, server sends real-time data to client (unidirectional)
  • Architecture Patterns
    • Monolith → deploy as a single, large, stand-alone application (single codebase)
    • Microservices → deploy application as a collection of small, independent services that communicate with each other over the network
    • Event-Driven Architecture (EDA) → events to trigger and communicate between decoupled services, consist of Event Producers, Event Routers (Message Brokers), and Event Consumers (Workers)
      • Serverless → Functions that respond to requests, event-driven
  • Components of Web-based Distributed System
    • Clients → Mobile, Desktop, etc.
    • Domain Name System (DNS) → map domain names to IP addresses
    • Content Delivery Network (CDN) → geographically distributed, store static content
    • API Gateway → authentication/authorization, route to proper services
      • Load Balancer → traffic police
    • Web Server(s) → handle HTTP requests and responses only
    • Message Queue → store and transmit requests for further processing by Application Server(s)
    • Application Server(s) → execute business logic to application programs through any number of protocols
    • File System → store and manage files
    • Cache Server(s) → temporary, store hot DB rows in memory
    • Database Server(s) → read and write data to disk
  • Popular Services
    • Relational Database → Amazon RDS/Aurora, Azure SQL Database, Google Spanner
    • Non-relational Database → Amazon DynamoDB, Azure CosmosDB, Google Datastore
    • Object Storage → AWS S3, Azure Blob Storage, Google Cloud Storage
    • Cache (in-memory datastore) → Amazon ElastiCache, Azure Cache, Google Memorystore
      • Redis, Memcached
    • CDN → Amazon CloudFront, Azure CDN, Google CDN
    • Serverless → AWS Lambda, Azure Functions, Google Cloud Functions