Short Media Sharing Platform

Services: Instagram, TikTok

System Requirements / Goals

Functional Requirements
- Upload, download, view photos and short videos
- Video or photo metadata search
- Able to follow other users
Non-functional Requirements
- High availability
  - Consistency can take a hit (in the interest of availability), if a user doesn’t see a photo/video for a while, still fine
- High reliability (media is never lost)
- Acceptable latency (loading media)

Read-heavy system
Users
- Total Users: 500 M
- Daily Active Users (DAU): 1 M
Storage
- For 1 Day: 2 M Uploaded Media * 200 KB average file size = 400 GB / day
- For 10 years: 400 GB / day * 365 days * 10 years = ~1425 TB

UserFollow
UserID1	int PK
UserID2	int PK

Relational Database
- Require joins
- Challenge to scale the application
NoSQL Database
- Distributed key-value store
- Schema
  - Media
  - User
  - UserFollow
- Apache Cassandra
  - Maintain replicas for high reliability
Media Data Store
- Distributed File Storage (HDFS)
- Distributed Object Storage (Amazon S3)

High Availability and Reliability - Store multiple copies of media
Multiple copies of the services, system will run even if one instance of a service dies
Eliminate single PoF

Metadata Sharding (Options)
1. Based on UserID (Not recommended)
2. Based on MediaID
  1. Generate unique MediaIDs first and then find a shard number through MediaID % 10 servers
Generate MediaIDs
- Dedicate a separate database instance to generate auto-incrementing IDs
- Define a table containing only an ID field
- As new media added into the system, insert a new row in this table and take that ID to be our MediaID of the new media
- Single PoF?
  - Two DB instance, one generating even numbers, one generating odd numbers
  - Design can be extended for Users, Media-Comments

Need large-scale media delivery system (Global)
Pareto Principle: 80-20 rule
- 20% of daily read volume for media is generating 80% of traffic
- Cache 20% of media files and metadata
CDN for static assets
- Geographically distributed cache servers
- Push content closer to the user
Cache for Metadata
- Cache hot database rows
- Memcached
- Application servers check cache first before hitting DB
- Cache Eviction Policy: LRU