Key Steps


  1. Requirements / Clarifications / Minimum Viable Product (MVP) / Goals
    • Functional Requirements
      • Basic functionalities of the app
    • Non-functional Requirements
      • Availability, Consistency, Latency, Scalability…
      • High Availability, High Reliability, Low Latency, Highly Scalable
      • Can consistency take a hit in favor of availability/lower latency?
      • Low latency in …
      • Extended Requirements (recommended if you have more time)
  2. Estimation and Constraints / Back of the Envelope Calculation / Rough Estimates
    • Read heavy? Write Heavy? Read to Write Ratio
      • System will focus on the more common
    • Traffic
      • Total Users
      • Daily Active Users (DAU)
      • Queries Per Second (QPS) / Requests Per Second (RPS)
    • Bandwidth (manage traffic and balance load between servers)
    • Storage
    • Memory / Cache
  3. Data Model Design / Database Design
    • Relations
    • Type of Database
      • Is the data relational? Require joins?
    • Schema / Table
  4. API Design
    • Function Signatures
  5. High Level Component Design
    • Identify components that are needed
      • API Gateway, Load Balancers, Multiple Application Servers
      • Separate read and write servers
      • Datastores
        • Database
        • Distributed File Storage System (photos and videos) / CDN
  6. Detailed Component Design
    • No right answer, consider trade-offs
      • How will we partition our data to distribute it to multiple databases?
      • How do we handle hot active users?
      • Storing most recent data, should we store our data in such a way that is optimized for scanning latest data
      • How much and at which layer should we introduce caching?
      • What components need better load balancing?
  7. Identify and Resolve Bottlenecks
    • Is there a Single Point of Failure
      • Enough Data Replication?
      • Enough Copies of Services?
    • Performance Monitoring

Universal Tricks

  • Introduce Cache
    • For read-heavy system
    • Reduce load on database
    • Multiple instances and replicas of our globally distributed cache
    • Redis / Memcached
    • Cache Eviction: LRU
    • Pareto Principle: 80-20 rule
  • CDN for static assets
    • Geographically distributed
    • Address latency
    • Cache Eviction: LRU
    • Pareto Principle: 80-20 rule
  • Redundancy and Replication
  • Introduce Load Balancers
    • For horizontal scaling
    • Consistent Hashing - useful strategy for distributed caching system and distributed hash tables
    • Initially, RR then something Dynamic
  • Consistent Hashing
    • Uniformly distribute requests among different nodes such that we should be able to add or remove nodes with minimal effort
  • High Reliability
    • Multiple copies
  • Multiple Datastores
    • Relational Database
      • Sharding / Horizontal Partitioning
      • Multiple Read Replicas (part of handling heavy reads)
    • Non-relational Database
      • Easily scalable (horizontal scaling)
  • Relational Database
    • Read-heavy system
    • Indexing for faster search
    • makes columns faster to query by creating pointers to where data is stored within a database

Popular Services

  • SQL
    • Azure SQL Database (MySQL, PostgreSQL)
  • NoSQL
    • Apache Cassandra
    • Amazon RDS
    • Google Cloud Datastore
  • Key-Value Store
    • Amazon DynamoDB
    • Key
  • Object Store
    • Amazon S3 (Simple Storage Service)
    • Azure Blob Storage
    • Google Cloud Storage
  • Graph Database
    • Neo4j
  • CDN
    • Amazon CloudFront
    • Azure CDN
    • Google CDN
  • Cache
    • Redis
    • Memcached
  • Search
    • ElasticSearch

Not commonly talked about

  • Security

Other Outlines:


1 byte = 8 bits

1 KB = 10^3 byte

1 MB = 10^6 byte

1 GB = 10^9 byte

1 TB = 10^12 byte

1 PB = 10^15 byte

1 EB = 10^18 byte

In UTF-8, 1 char can range from 1-4 bytes