Short Media Sharing Platform
Notes from grokking system design
Services: Instagram, TikTok
System Requirements / Goals #
- Functional Requirements
- Upload, download, view photos and short videos
- Video or photo metadata search
- Able to follow other users
- Non-functional Requirements
- High availability
- Consistency can take a hit (in the interest of availability), if a user doesn’t see a photo/video for a while, still fine
- High reliability (media is never lost)
- Acceptable latency (loading media)
Capacity Estimation / Constraints #
- Read-heavy system
- Users
- Total Users: 500 M
Daily Active Users (DAU): 1 M
- Storage
- For 1 Day:
2 M Uploaded Media * 200 KB average file size = 400 GB / day
- For 10 years:
400 GB / day * 365 days * 10 years = ~1425 TB
High Level System Design #
- (Write) Upload media service
- (Read) View/Search media service
- Object storage servers - store media
- Database servers - store media metadata and users metadata
Database Schema #
Media | |
---|
MediaID | int PK |
MediaPath | varchar |
CreationDate | datetime |
User | |
---|
UserID | int PK |
Name | varchar |
Email | varchar |
DateOfBirth | datetime |
CreationDate | datetime |
LastLogin | datetime |
UserFollow | |
---|
UserID1 | int PK |
UserID2 | int PK |
- Relational Database
- Require joins
- Challenge to scale the application
- NoSQL Database
- Distributed key-value store
- Schema
- Apache Cassandra
- Maintain replicas for high reliability
- Media Data Store
- Distributed File Storage (HDFS)
- Distributed Object Storage (Amazon S3)
Component Design #
- Split reads and writes to separate servers, avoid overload, scale independently (Microservices)
- Web servers have a connection limit (Assume 500, therefore can’t have more concurrent reads/writes more than this)
- (Write) Upload media service
- (Read) View media service
- Faster, especially served from cache
Reliability and Redundancy #
- High Availability and Reliability - Store multiple copies of media
- Multiple copies of the services, system will run even if one instance of a service dies
- Eliminate single PoF
Data Sharding #
- Metadata Sharding (Options)
- Based on UserID (Not recommended)
- Based on MediaID
- Generate unique MediaIDs first and then find a shard number through Media
ID % 10 servers
- Generate MediaIDs
- Dedicate a separate database instance to generate auto-incrementing IDs
- Define a table containing only an ID field
- As new media added into the system, insert a new row in this table and take that ID to be our MediaID of the new media
- Single PoF?
- Two DB instance, one generating even numbers, one generating odd numbers
- Design can be extended for Users, Media-Comments
Cache and Load Balancing #
- Need large-scale media delivery system (Global)
- Pareto Principle: 80-20 rule
- 20% of daily read volume for media is generating 80% of traffic
- Cache 20% of media files and metadata
- CDN for static assets
- Geographically distributed cache servers
- Push content closer to the user
- Cache for Metadata
- Cache hot database rows
- Memcached
- Application servers check cache first before hitting DB
- Cache Eviction Policy: LRU