Video Streaming Platform
Notes from grokking system design
Services: Youtube, Netflix
Requirements
- Functional
- Upload, view and share videos
- Search videos from titles/description
- Record stats (likes, number of views)
- Comment on videos
- Non-functional
- High Availability
- Consistency can take a hit in the interest of availability/lower latency, fine if an uploaded video takes a while to be available
- High Reliability (uploaded videos are never lost)
- Minimal Latency (while watching videos)
- High Availability
Capacity Estimation / Constraints
- Read-heavy system
Read (View) to Write (Upload) Ratio 200:1
- Traffic
- Total Users:
1.5 B
- Daily Active Users (DAU):
800 M
- Assume:
A user watches 5 videos / day
- (Read)
800 M DAU * 5 videos / day * 1 day / 86400 s = 46 K videos / s
- (Write)
46 K videos / s * 1/200 = 230 videos / s
- (Read)
- Total Users:
- Storage
- Assume
500 hr worth of videos are uploaded / min
- On average,
1 min video needs 50 MB storage
(videos need to be stored in multiple formats)500 hr of videos uploaded / min * 60 min / hr * 50 MB = 1500 GB / min = 25 GB / s
- Ignored video compression and replication
- Assume
- Bandwidth -
10 MB / min
- (Write / Ingoing)
500 hr of videos uploaded / min * 60 min / hr * 10 MB / min = 300 GB / min = 5 GB / s
- (Read / Outgoing)
200 * 5 GB / s = 1 TB / s
- (Write / Ingoing)
System APIs
- upload_video(api_dev_key, video_contents, video_title, video_description, tags[], category_id, default_language, recording_details)
- Parameters:
- api_dev_key (string): limit users
- video_contents (stream): video to be uploaded
- video_title (string)
- video_description (string)
- tags (string[])
- category_id (string)
- default_language (string)
- recording_details (string): Location info
- Returns:
- result (JSON):
- status: successful upload returns HTTP 202 (request accepted), if successful, can send email notification as well with link
- link: access to the video
- result (JSON):
- Parameters:
- search_video(api_dev_key, search_query, user_location, maximum_videos_to_return, page_token)
- Parameters:
- api_dev_key (string)
- search_query (string): search term
- user_location (string)
- maximum_videos_to_return (number)
- page_token (string): specify a page in the result set that should be returned
- Returns:
- result (JSON): contains information about the list of video resources matching the search query
- videos (array): each has video title, a thumbnail, a video creation date, and a view count
- result (JSON): contains information about the list of video resources matching the search query
- Parameters:
- stream_video(api_dev_key, video_id, offset, codec, resolution)
- Parameters:
- api_dev_key (string)
- video_id (string)
- offset (number): time offset in seconds from the beginning of video
- codec (string) & resolution (string): to support play/pause from multiple devices
- Returns: stream
- Media stream (video chunk) from given offset
- Parameters:
Database Schema
- Video Metadata - SQL
- Video
- VideoID (PK), Title, Description, Size, Thumbnail, Uploader/UserID, TotalNumberOfLikes, TotalNumberOfViews
- Comment
- CommentID (PK), VideoID (FK), UserID (FK), Comment, CreationDate
- Video
- User Metadata - SQL
- UserID (PK), Name, Email, Address, Age, RegistrationDetails
High Level Design
- Processing Queue - videos to be dequeued for encoding, thumbnail generation and storage
- Encoder - encode uploaded videos into multiple formats
- Thumbnails Generator
- Video and Thumbnail Storage - distributed file storage / object storage
- User Datastore
- Video Metadata Datastore
Detailed Component Design
- Read-heavy system - more views than uploads
- Video stored using a Distributed File Storage System (HDFS) or Distributed Object Storage (S3)
- Maintain multiple copies of videos
- Microservices Architecture - separate the services, scale them independently
- Read Service
- LB - distribute traffic to different servers since we have multiple copies of videos
- Write Service
- Single Leader Replication
- Can cause some staleness in data, but still acceptable (takes some milliseconds to update)
- Video Uploads - Support resuming from the same point if connection fails, store uploaded videos on server
- Video Encoding - New task added to processing queue to encode video into multiple formats
- Single Leader Replication
- Read Service
- Thumbnails
- Read-heavy - Higher thumbnail views than video views
- Watching one video, there can be multiple other thumbnails of other videos
- Each thumbnail is small in size (5-10 KB)
- Store in Bigtable
- Combines multiple files into one block to store on the disk
- Efficient in reading a small amount of data
- Cache hot thumbnails to improve latencies
- Read-heavy - Higher thumbnail views than video views
Metadata Sharding
- Based on UserID - store all data for a specific user on one server
- Pass UserID to hash function which will map the user to a database server
- Query videos for a particular user will be straightforward, pass UserID to the hash function
- To search videos by titles, query all servers and each server will return a set of videos, a centralized server will then aggregate and rank these results before returning them to the user
- Issues
- Performance bottleneck if a user becomes popular (a lot of queries on the server holding the user)
- Nonuniform distribution of data as some users may be storing a lot more that others
- Solution - use Consistent Hashing to balance the load between servers or repartition/redistributed the data
- Based on VideoID - hash function map each VideoID to a random server where we will store that video’s metadata
- To find videos of a user, query all servers and each server will return a set of videos, a centralized server will then aggregate and rank these results before returning them to the user
- This approach solves our problem of popular users but shifts it to popular videos
- Improve performance by caching hot videos
- To find videos of a user, query all servers and each server will return a set of videos, a centralized server will then aggregate and rank these results before returning them to the user
Video Deduplication
- Perform depuplication early in the process
- Run matching algorithms (Block Matching, Phase Correlation, etc.) to find video duplications
- Replace existing video if
- If new video higher quality
- If new video is longer, or existing video is subpart of new video
- Intelligently divide the video into smaller chunks, only upload the new parts
Load Balancing
- Use Consistent Hashing among cache servers
Cache & Content Delivery Network
- Cache hot database rows for metadata servers
- Eviction Policy: LRU, discard least recently viewed row first
- Pareto Principle: 80-20 rule
- 20% is generating 80% of traffic
- Cache the 20%
- CDN - Geographically distributed cache servers
- Move popular videos to CDNs
- CDNs replicate content in multiple places - videos will stream from a friendlier network
- CDN machines make heavy use of caching and can mostly serve videos out of memory
- Move popular videos to CDNs
Fault Tolerance
- Use Consistent Hashing for distribution among DB servers
- Help replace dead servers
- Help distribute load among servers
Other Details
- Youtube was originally built using a relational database (MySQL)
- Discovered new ways to scale it, resulted in Vitess