December 1, 2025

System Design Interview - Instagram

ChatGPT, Deep research, GPT 5.1 Thinking

----

prompt:

I am preparing for software development system design interviews in the United States, targeting top big tech companies, mid-sized companies, and popular startups. I want to prepare for designing an Instagram. I now need materials to prepare for these interviews. This involves principal, senior principal, and even more senior-level positions. I know that in the actual interview there is only one hour or less, but I want my preparation to be comprehensive. Please prepare a very large scenario for me (including all related questions and answers, which might take many hours, dozens of hours, or even more, for me to study). For example, starting from the opening statement, the overall system framework, all aspects before and after the system design, and then for each knowledge point the possible questions that might be asked, including potentially very deep questions. If your context length has no limit, then output as much information as possible. If there is a context length limit, then you can omit a few questions here (such as in-depth questions about the details of various types of databases, in-depth questions about the details of load balancers, but anything that is essential for Instagram still needs to be included; if something is omitted, then add an uppercase TODO. If there are other large modules that need to be omitted, then include the major topics of the omitted questions and also mark them with uppercase TODO). If there is no length limit, then include all these questions. (If there are even deeper questions that can be asked, add a lowercase todo.)

----

ChatGPT:

To help you comprehensively prepare for an Instagram system design interview, I need to confirm a few key details:

Do you want me to write this preparation material in English so it aligns with the U.S. interview environment?

Besides high-level design (such as architecture diagrams, module breakdowns, etc.), do you also want to go deeper into code structure or pseudocode for certain components?

Do you need me to provide different styles of questions based on interviewer styles at different companies (such as Google, Meta, Stripe, etc.)?

Do you prefer the content to be output as structured notes, or as a step-by-step explanatory format?

Please confirm the above questions so that I can start building a complete and in-depth system design preparation outline for you.

----

Prompt: 

You can ask me more questions and then output, or if you only want to ask questions once, you can just start according to your own flow.

English.

I need high-level design and module breakdown, and you can describe architecture diagrams in text, for example showing multiple lines like API Gateway-> Service A, Service A<---> Service B, Service B-> DB A, and so on.

Some interviewers may ask about detailed flows; wherever they are likely to dig into details, I need pseudocode.

I need questions in different styles according to the styles of interviewers at different companies (but label this on the questions. For example, when asking about a certain knowledge point, there may be multiple ways to ask it; after each wording you can mark the company name. If basically all companies will ask the same question in the same way, then there is no need to label it).

As for whether you prefer the content as structured notes or step-by-step explanation → I still prefer it to be output in an interview scenario style. For example, first state the requirements, then show various possible opening styles from interviewers, and then they might dig deep into image upload and all the design knowledge points. Due to time constraints or interviewer style, they might only ask one or two questions, but you need to output all the possible branches. And for each deep dive, if similar questions come up at multiple stages (for example, storing user information indexes and storing post information indexes both require using a database), raise the related questions and deep-dive questions. But for common questions such as what kind of tree structure a DB uses to implement indexes, only bring it up the first time; when it appears again later, just mention that it was already discussed above.

----

ChatGPT: 

Great — I’ll start building a comprehensive mock system design interview scenario for Instagram, structured like a real interview. It will cover requirements gathering, evolving architecture, service/module breakdown, tradeoffs, and all question types across varying depths, tailored to roles like Principal, Senior Principal, and higher. I’ll also include company-specific question variations where appropriate.

This will be a large, structured output organized by interview phases, and will serve as a deeply detailed study and practice resource.
You can continue chatting with me while I prepare it.

----

ChatGPT

Comprehensive Instagram System Design Interview Preparation

ChatGPT

OK, let me see. The prompt involves high-level system design, especially for a social media application. It needs to include architecture, pseudocode, and responses to various interview styles.

I’m thinking through the need to standardize background colors across pages for consistency. The primary background should be white, with an alternative in a light medium color.

I'm mapping out the interview format, using bold terms for roles and headings for each phase. This ensures clear progress, especially in areas like requirements and high-level architecture.

Searched for B+ tree in relational database index

ChatGPT

I’m looking through ByteByteGo for B+ trees snippets. This is interesting—it helps in understanding different phrasing options.

Read blog.bytebytego.com

ChatGPT

OK, let me see. The ByteByteGo article explains B+ tree structures for relational databases' indexing. Should I cite this? Considering common knowledge for S3 or CDN usage.

Searched for CDN static image system design

ChatGPT

OK, let me see. CDNs cut latency by placing cached static content closer to users, optimizing access speeds.

Read geeksforgeeks.org

ChatGPT

I’m mapping out various Instagram design elements, including CDNs for image delivery, database indexing, and scalability to handle millions of users and photos.

Searched for Instagram photos uploaded per day 2025

ChatGPT

I'm noting that tens of millions of photos are uploaded daily, emphasizing the need for design to handle this high volume effectively.

Read metricool.com

ChatGPT

I'm thinking through the assembly of system components, database design choices, media storage solutions, scaling strategies, caching mechanisms, and user feed generation algorithms.

I'm focusing on leveraging the lengths of the adapter and the goal, adjusting each dynamic to meet the primary objective while ensuring the second adapts accordingly.

Read dev.to

ChatGPT

I’m gathering insights on object storage for storing high-resolution images and using CDNs for swift global delivery, referencing a specific line from the article. The rest concerns refining citations and verifying details.

Read metricool.com

ChatGPT

I'm mapping out the best way to cite Instagram's image sharing stats. Focusing on including the header and surrounding details seems effective.

Searched for Instagram feed fanout design

ChatGPT

Mapping Instagram system design. LinkedIn might just be a blog.

Read linkedin.com

ChatGPT

Hm, I’m thinking about the article's thoroughness. It covers Instagram's design intricacies, including feed mechanics. The extent of its detail is impressive, but its extensive length suggests it's a valuable source for a deep dive.

Read more from linkedin.com

ChatGPT

I'm thinking through the complexity of including a detailed upload flow with direct upload options. This might complicate the pseudocode and not align with the central requirement of simplifying the flow.

Read more from linkedin.com

ChatGPT

I'm thinking through how to seamlessly incorporate relevant citations into the response. This approach ensures the content flows naturally and maintains trustworthiness.

Research completed in 8m · 8 sources

----

System Design Interview: Photo Sharing Social Network (e.g., Instagram)

Interviewer (Opening): "Design a photo-sharing social network where users can upload images and follow other users."
Interviewer (Google): "How would you design an Instagram-like system for hundreds of millions of users?"
Interviewer (Amazon): "Let's design a service for sharing photos among users. What are your initial thoughts and requirements?"

Candidate: First, I'd like to clarify the requirements and constraints for this photo-sharing social network:

Requirements Clarification

  • Functional Requirements:

    • Users can create an account (sign up), log in, and have a profile.

    • Users can upload photos (with captions) and view their own and others’ photos.

    • Users can follow or unfollow other users.

    • A user’s home feed should show recent posts (photos) from users they follow (ideally in some order, e.g. chronological or ranked).

    • Users can like and comment on posts. (This implies tracking likes/comments, but we can focus on the core photo posting and feed system first.)

  • Non-Functional Requirements:

    • Scalability: The system should handle a large number of users and posts. (Instagram, for example, has on the order of billions of images shared daily metricool.com, so our design should scale horizontally to handle high read and write throughput.)

    • High Availability: The service should be reliable – minimal downtime, with redundancies for failure.

    • Low Latency: Users should be able to load their feed and view images quickly (a target could be <200ms for feed data and fast image loading globally).

    • Consistency: It’s acceptable to have eventual consistency for some parts (e.g. slight delays in seeing a newly uploaded photo in all follower feeds), but core actions like posting a photo or following someone should be durable and not get lost.

    • Durability: User data and photos must be stored safely (no data loss, even if servers crash). Photos especially should be stored with high durability (multiple copies, backups).

  • Capacity Estimates (Assumptions): (If needed for context)

    • Let’s assume we might have on the order of 100 million daily active users, and say 10 million photo uploads per day at steady state (this is just to gauge scale). Peak could be higher.

    • If each photo is ~1 MB, 10 million photos/day is about 10 TB of new data per day, which necessitates a very scalable storage solution.

    • Each active user’s feed might involve fetching dozens to hundreds of photos from people they follow.

(Interviewer likely acknowledges these requirements. Once clarified, I would move on to proposing a high-level design.)

High-Level Architecture Design

Candidate: At a high level, I propose a microservices architecture with separate services for handling different responsibilities (this helps with scalability and clarity). The core components and their interactions could be:

Client (Mobile/Web App) -> API Gateway (Load Balancer) -> Auth Service (authentication, sessions) -> User Service (manages user profiles, follows) -> Post Service (manages photo posts & feed) -> Database for Users (relational DB) -> Database for Posts (relational or NoSQL) -> Media Storage Service (for image files) -> Blob Storage (for photos) -> CDN (for delivery)

For example:

  • API Gateway / Load Balancer → routes requests from clients to the appropriate service. This could also handle things like request throttling or act as a reverse proxy.

  • Auth Service → handles sign-up, login, and issuing auth tokens (e.g., JWT). It might use a lightweight database or in-memory store for sessions if needed.

  • User Service → handles user profiles and relationships: getting user info, follow/unfollow actions. It will interact with a User Database (likely a SQL database storing user accounts, profile info, follower lists, etc.).

  • Post Service → handles operations related to posts (photos): creating a new post, fetching posts for a user’s feed, and recording likes/comments (though likes/comments could even be separate microservices if needed at scale). The Post Service will use a Posts Database for metadata about posts (who posted, captions, timestamp, etc.) and likely will interact with caching and feed generation components (discussed later).

  • Media Storage Service → responsible for handling the image files. When a user uploads a photo, this service (or module) will store the photo in a Blob/Object Storage system and return a URL. It ensures images are stored efficiently outside the relational DB. We will use a distributed object storage (like AWS S3 or similar) to store the image files, because object storage is highly scalable (can handle billions of objects), durable (often 11+ nines durability via replication), and optimized for large files dev.to. The Media service will also interface with a Content Delivery Network (CDN) for delivering images quickly to users around the world. CDNs cache images on servers globally, so when a user loads an image, it can be served from a nearby edge server, greatly reducing latency geeksforgeeks.org.

Each of these services can be scaled horizontally as needed (multiple instances behind the gateway). Clients (mobile app or web) communicate via RESTful APIs (or GraphQL) exposed by the gateway. Here's a textual architecture diagram of the key components and data stores:

Client App -> API Gateway/Load Balancer -> Auth Service -> User DB \-> User Service -> User DB \-> Post Service -> Post DB \-> Cache (Redis) for feeds/posts (optional) \-> Media Service -> Blob Storage (Images) \-> CDN (Global Edge Cache)

(The arrows show request flow. For example, a client’s request to upload a photo goes through the Gateway to the Post Service (for post metadata) and Media Service (for the file itself).)

Interviewer: "Why separate the services? Could it be one big service?"

Candidate: We could start with a monolithic design for simplicity, but separating services by domain (users, posts, media, etc.) offers several advantages as the system grows: independent scalability (e.g., media handling might need more bandwidth), clarity of responsibilities, and easier future maintenance. Each service can have its own database optimized for its access patterns. This is a common approach in high-scale systems.

Now, I'll go into detail on key components and how we handle critical issues like data storage, indexing, image handling, feed generation, and scaling.

Database Design for Users and Posts

Interviewer: "What kind of database would you use to store user information and posts? SQL or NoSQL, and how would you design the schema?"
Interviewer (Amazon): "How will you store the user profiles and posts? Consider scalability and query patterns."
Interviewer (Google): "Given the scale, how would you ensure quick lookups for user profiles and their posts? Would sharding or a NoSQL solution be needed?"

Candidate: For core user data and post metadata, I would start with a relational database (SQL) – for example, a MySQL or PostgreSQL instance (with replication and sharding as needed). A relational DB offers transactions and strong consistency which is important for things like account creation, ensuring a follow action is properly recorded, etc. It also allows us to use joins (for example, to fetch posts for a set of followed users, though we might handle feeds differently as the system scales). Most importantly, relational databases support indexes on important fields to optimize queries.

  • User Table: would store users with fields like user_id (primary key), username, hashed_password, profile info, etc. We would index the username (for login or profile lookup) and of course the primary key (user_id). We also need a way to get a user's followers or followings. This could be a separate Follow table (mapping follower_id to followee_id). That follow table would be large (many rows for popular users), and we’d index it by followee_id to quickly fetch all followers of a user (for fan-out if needed), and by follower_id to fetch whom a user is following.

  • Post Table: stores each photo post metadata: post_id (primary key), user_id (who posted), caption text, timestamp, maybe a pointer/URL to the image in storage, etc. We would index by user_id (to get all posts by a user quickly, e.g. for that user's profile page), and maybe index by timestamp if we often query recent posts (e.g. to retrieve the most recent N posts for a user's feed, we might query posts of multiple users sorted by time). In a relational DB, an index on timestamp or [user_id, timestamp] composite can help retrieve recent posts for a given user or set of users.

Interviewer: "Why not use a NoSQL database?"

Candidate: We can consider NoSQL (like a document store or wide-column store) especially as we scale. For example, storing the feed might be better in a NoSQL wide-column DB (like Cassandra) where each user's feed is a row with a sorted list of recent post IDs — we'll discuss feed storage soon. But initially, user data and post metadata have relational aspects (users, their posts, and relationships). A SQL database ensures strong consistency for things like ensuring a post is tied to a valid user, or a like is tied to a post and user. Also, SQL can handle moderate scale with replication and sharding.

That said, as the number of posts grows to the billions, we likely will shard the Post database by user_id or post_id range. For instance, users could be partitioned across multiple databases (e.g., user IDs 1-1M on DB shard 1, etc.), and their posts go to the corresponding shard. Sharding will distribute load. NoSQL could also be introduced if we find the relational model too slow or if we need schemaless flexibility for certain features.

Interviewer: "How do we ensure quick retrievals? What about database indexing in practice – do you know how indexes are implemented under the hood?"

Candidate: Yes. In a relational database, when we add an index (say on user_id in the Posts table), the database builds a data structure to allow fast lookups by that key. Most relational databases use a B+ Tree for indexes, which keeps the indexed values sorted and allows logarithmic time lookups blog.bytebytego.com. For example, an index on user_id in the Posts table would let the DB quickly jump to all entries for a given user_id without scanning the whole table. B+ Trees are essentially an optimized variation of B-Trees that are great for range scans and disk-based storage (all leaves at the same level, linked leaves for in-order traversal, etc.). This is why using the right indexes (on columns like user_id, or user_id+timestamp) is critical for performance. (If the interviewer wants more detail, I can discuss how B+ tree nodes and leaves work, but generally showing awareness is enough.)

(Having addressed the structured data storage, now I’ll discuss how we handle the actual image files.)

Image Storage & Delivery (Media Service)

Interviewer: "How and where will you store the photos themselves? In the database, or elsewhere?"
Interviewer (Facebook): "Images are large binary data – what's your approach to store and serve potentially millions of images efficiently?"
Interviewer (Amazon): "We have AWS S3, would you use something like that for storing user-uploaded images?"

Candidate: We will not store the raw image files in the relational database. Storing large BLOBs in a SQL DB is not efficient – it would bloat the database and make backups and replication harder reddit.com. Instead, we use a Blob/Object Storage service for the images. In a cloud environment, this could be Amazon S3, Google Cloud Storage, or an internal distributed file system. Object storage is ideal for this use-case: it’s highly scalable (handles millions or billions of objects easily) and provides high durability (through replication) for the images dev.to.

How it works: When a user uploads a photo, our Media Service will handle it in coordination with the Post Service:

  1. Receive Upload Request: The client app will call an API (e.g., POST /upload) with the image file (or some handle to it). This goes to the Post Service/Media Service via the Gateway.

  2. Store Image in Blob Storage: The Media Service takes the image file and stores it in the blob storage system. For example, if using AWS S3, it would put the file in an S3 bucket. The storage returns a URL or key for the stored image.

  3. Generate Image URL: The Media Service or Post Service then generates a publicly accessible URL for the image (possibly a signed URL if we want access control). This URL is what we store in the Posts database as the image link in the post record.

  4. Return/Confirm Upload: The Post Service creates a new post entry in the Posts DB with the image URL, user_id, timestamp, etc., and returns success to the client. Now the client knows the post is created.

To serve images quickly to users worldwide, we integrate a Content Delivery Network (CDN) in front of the blob storage. The image URL that clients use might actually be a CDN URL (like https://cdn.ourapp.com/{photo_id}). The CDN will cache the images in edge servers globally. So when a user in Europe requests an image that was uploaded in the US, the CDN will deliver it from the nearest cache node if available, or fetch from the origin storage only if needed linkedin.com. This drastically reduces latency and offloads traffic from our servers and storage geeksforgeeks.org.

Interviewer: "What if an image upload is large? Does the file go through your server? Isn't that a lot of bandwidth on the app server?"

Candidate: Great question. For efficiency, we can use a technique to avoid piping the entire image through our application server. One common approach is using pre-signed URLs for direct upload. For example, the client first asks our service for an upload URL. The Media Service (or an API in Post Service) would get a pre-signed URL from the blob storage (this is essentially a time-limited credential to upload directly). The client then uploads the file directly to the blob storage using that URL linkedin.com, bypassing our app server for the heavy file transfer. Once the upload finishes, the client notifies our server (perhaps calling a POST /posts API with the metadata and file path). This way, our servers handle metadata and coordination but don’t become a bottleneck for streaming the file data. This approach is used by Instagram and others to handle large media efficiently.

Image Upload Flow (Step-by-Step)

Let's summarize the photo upload flow in steps (with the presigned URL approach in mind, which is more efficient):

  1. Client requests upload URL: The client app requests an upload URL for a new photo (maybe hitting POST /posts with no file, just intent).

  2. Server generates URL: The Post/Media Service generates a pre-signed upload URL from the storage service (e.g., S3) and returns it to the client.

  3. Client uploads to storage: The client directly uploads the image file to the blob storage using that URL. This offloads bandwidth.

  4. Client notifies completion: After successfully uploading, the client calls the Post Service (e.g., POST /posts/complete) with info like the image’s storage key, caption, etc.

  5. Post Service creates record: The Post Service creates a new entry in the Posts DB with the image URL (which likely points to CDN domain or storage key), user_id, timestamp, etc. It could also publish an event (e.g., to a queue) for other processes (like updating followers’ feeds or triggering image processing tasks if needed).

  6. Feed update (if push model): The service can now propagate this new post to followers’ feeds (we’ll discuss feed handling next).

  7. Response: The server responds to the client that the post is created. The client can now show the photo in the user’s profile or feed.

(If we didn't use direct upload, steps 1-3 would be replaced by the client directly sending the image to the server, and the server forwarding it to storage. But at high scale, direct-to-storage is preferred.)

  1. CDN Caching: When other users view this photo URL, the CDN will serve it. The first request might fetch from origin (blob storage), but then it’s cached on the CDN node. Subsequent viewers (especially in similar regions) get it quickly from cache.

By using object storage with CDN, we ensure images are stored durably and delivered fast. Object storage typically replicates data across multiple servers/locations (e.g., S3 offers 11 nines durability by storing copies in multiple AZs dev.to). And CDNs ensure even a user across the globe gets the photo with minimal delay (no transoceanic slow fetch each time).

(Having covered how we store and serve images, I'll move on to how to handle the feed and scalability.)

Image Processing, Multiple Image Sizes, and Publish Latency

Another important aspect is that the system should not serve the original uploaded image directly to all consumers. For example, a user may upload a 5 MB high-resolution photo, but a user scrolling through the home feed on a mobile device might only need a compressed 500 KB version, while a thumbnail in a profile grid might require an even smaller file. To optimize bandwidth, latency, and rendering performance, the system should generate multiple derived versions of each uploaded image. In practice, after the original image is stored in blob storage, an asynchronous Image Processing Service is triggered to create several variants such as thumbnail, small, medium, large, profile-crop, and high-resolution display versions. If needed, we can maintain around seven standard sizes so that different surfaces in the product (feed, profile, detail page, search results, etc.) can request the most appropriate asset instead of downloading the original every time.

This resizing and compression step should happen asynchronously rather than inline with the upload request. The upload path should remain fast: once the original file is durably stored and the post metadata is written, the system publishes an event to a message queue. Media worker machines then consume that event, download the original image, validate it, strip unsafe metadata if necessary, generate the resized variants, compress them to target quality levels, and write the outputs back to object storage. The Post Service can track a post state such as uploaded -> processing -> published. This is important because the post may exist logically before all image variants are ready to serve.

Regarding latency for other users: yes, image processing can affect how quickly a post becomes visible, but we typically design the system so that it does not block for all seven variants to finish. A practical approach is to define a minimum publish set — for example, a thumbnail and a medium-sized feed image. As soon as those essential variants are generated successfully, the post can be marked as published and inserted into followers’ feeds. The remaining variants can continue processing in the background. This means other users may see the post only a short time after upload, rather than waiting for the full media pipeline to complete. In other words, there is usually a small delay between “user finished upload” and “followers can see the post,” but that delay is controlled and minimized by asynchronous processing and partial readiness rules.

The end-to-end sequence would look like this: (1) the client uploads the original image, often directly to blob storage using a pre-signed URL; (2) the client notifies the Post Service, which creates the post record with status = processing; (3) the Post Service emits a media-processing event; (4) background workers generate the seven image variants and store them back in object storage; (5) once the minimum required variants are ready, the post status changes to published; (6) the feed system fans out the post to followers or makes it eligible for feed retrieval; and (7) when a follower opens the app, the feed API returns metadata pointing to the best image size for that device, which is then served through the CDN. This approach ensures that upload latency for the creator stays low, while viewers receive device-appropriate images that load much faster than the original 5 MB file.

There are also operational benefits to this design. Because image processing is asynchronous and queue-based, the system can absorb spikes in uploads by scaling media workers independently from the API servers. If processing is temporarily slow, uploads can still succeed and the queue smooths the workload. We can also retry failed transformations, monitor processing lag, and avoid making the entire posting experience unavailable just because one image worker cluster is overloaded. This separation of concerns improves both scalability and reliability.

Feed Generation and Retrieval

Interviewer: "How will the home feed be generated? When a user logs in, how do they see posts from those they follow?"
Interviewer (Facebook): "Describe how you would design the News Feed for this system. Would you compute it on the fly or in advance?"
Interviewer (Google): "If each user follows many others, fetching a feed could require aggregating a lot of posts. How do we do this efficiently at scale?"

Candidate: The news feed (home timeline) is a crucial part of a social platform. There are two classic approaches to generate a feed:

  • Fan-out on write (Push model): Every time a user posts a new photo, we push that post out to all their followers’ feed storage. In practice, this means when User A (with 1000 followers) posts, we instantly insert a reference to that post into 1000 feeds (could be rows in a feed table or cache for each follower). This makes reading the feed very fast – since each follower’s feed is pre-assembled, a simple query can fetch the feed posts. But the write cost is high (for a celebrity with 1M followers, that one post triggers 1M insert operations). Instagram uses a hybrid model where normal users’ posts are pushed, but celebrity posts might not be fully pushed to avoid huge fan-out linkedin.com.

  • Fan-out on read (Pull model): We do not pre-store anything. When a user opens their feed, we assemble it on the fly by pulling the latest posts from all users they follow (e.g., query recent posts of each followee and merge-sort them by time). This avoids heavy writes at post time (a post is just one record in the Posts DB), but reading the feed becomes expensive (if you follow 1000 people, the system must fetch from 1000 lists of posts). It can be slow and database-intensive to do this for every feed view, especially with many users and many follows.

Solution - Hybrid: Many real systems, including what Instagram evolved to, use a hybrid approach linkedin.com:

  • For regular users with manageable followers count, do fan-out on write: when they post, push to followers’ feeds. This gives quick reads for the majority of users.

  • For very popular users (celebrities) who have millions of followers, doing millions of writes per post is impractical. For those, the system can use fan-out on read: the followers’ feeds might not have the celebrity’s new post pre-inserted; instead, when a follower’s feed is requested, the system will pull in those celebrity posts on the fly (or handle them separately). This avoids a write explosion for huge followings at the cost of slightly slower feed for those few cases.

  • We will merge the feeds of very popular users and regular users.

Data storage for feeds: We could have a dedicated Feed Service or let the Post Service handle it. A typical implementation: use a fast storage (maybe a wide-column NoSQL DB or an in-memory cache) to store each user’s feed as a sorted list of recent post IDs. For example, a Redis sorted set or a Cassandra table keyed by user_id with columns for post_ids (sorted by time). When a user posts, we add that post_id to each follower's feed list (except maybe very large ones). When a user requests their feed, we retrieve the list of post IDs from this feed store (which is O(1) or O(log n) operations because it's pre-ordered data), then fetch the actual post details (and images) for those IDs.

We also likely need caching here: even with pre-computation, reading from disk for every feed view is heavy. So we can cache each user's feed in memory. For instance, store the top N posts of a user’s feed in Redis. The first time it’s computed or updated, put it in cache with a TTL. Subsequent feed refreshes hit the cache (fast). If the cache expires or on cache miss, regenerate from the feed DB.

Additionally, to keep feeds fresh, whenever a new post arrives (fan-out on write), we might directly update the in-memory cache of followers if they have one. Or we set a short TTL so that within a few seconds it refreshes.

Interviewer: "How do you rank or order the feed? Just time, or something else?"

Candidate: Initially, we can sort by reverse chronological (most recent first) for simplicity. That’s easy to implement by timestamps. Later, if we want more complex ranking (like Instagram now uses an algorithmic feed with ML ranking), that could be an extra layer: a ranking service that reorders fetched feed items based on user preferences, engagement likelihood, etc. But that goes beyond core system design, so I'd stick to chronological for now.

Interviewer: "What about the case where a user has a huge number of follows or followers? How do we handle that scale in feed generation?"

Candidate: We touched on the hybrid push/pull model for celebrity accounts. If a user follows 10k accounts, generating the feed fully on read can be expensive; but if one account has 10 million followers, pushing every new post to all followers is also too expensive. So I would use a hybrid strategy.

For most users, I would use fan-out on write: when they post, we asynchronously push the post ID into their followers’ feed stores. To keep post creation fast, this should not happen inline on the write path. Instead, the post service publishes an event to a queue such as Kafka, and background feed workers consume that event and update follower feeds asynchronously. This keeps write latency low while still making feed reads fast for the majority of users.

For very popular users, I would not fully fan out their posts to every follower, because that creates a massive write amplification problem. Instead, I would treat those accounts differently: their posts would be fetched on demand using a fan-out on read approach.

That means when a user opens their home feed, the system first loads the precomputed feed entries from normal accounts they follow, and then separately fetches recent posts from any very popular accounts they follow. After that, the feed service performs a merge step: it merges the precomputed feed items with the pulled posts from those very popular accounts, typically by timestamp or ranking score, and returns the top N items.

So the read path becomes:

  1. Load the user’s precomputed feed from feed storage

  2. Fetch recent posts from very popular followed accounts

  3. Merge the two streams

  4. Apply filtering/ranking/pagination

  5. Return the final feed

This hybrid design gives us the best of both worlds:

  • fast reads for normal users through precomputed feeds

  • controlled write amplification by avoiding massive fan-out for celebrity accounts

  • and a scalable way to handle users who either follow many accounts or are followed by very large audiences

A further optimization is to make the “very popular user” threshold dynamic rather than static, based on follower count, posting frequency, and system load.

Scalability and Performance

Interviewer: "How will your design scale to millions of users and daily uploads? Where are the bottlenecks and how would you address them?"
Interviewer (Google): "If this system needs to handle 10 million requests per second, how would you scale it out?"
Interviewer (Amazon): "Talk about scaling this design. How do we handle high read and write throughput, and ensure high availability?"

Candidate: Scalability is addressed at multiple layers:

  • Stateless Services Horizontal Scaling: The application servers (API Gateway, Auth, User Service, Post Service, etc.) will be stateless (they don't store user session data on local disk; any state is in databases or caches). This means we can run many instances behind load balancers. If traffic increases, we add more server instances (or auto-scale). A Load Balancer will distribute incoming requests to these instances. This handles scaling of compute easily.

  • Database Scaling: For relational databases (User DB, Post DB), we will employ replication and sharding. For example, each database can have one primary and multiple read replicas. The primary handles writes and replicates to secondaries for load-balanced reads. Many more reads (like profile fetches, feed fetches if hitting the DB) can be served by replicas. For write scaling beyond a single machine's capacity, we shard (partition) the data. A simple sharding scheme might be by user_id range or a hash of user_id, so that different groups of users’ data live on different DB servers. That way, queries and writes get distributed. The application logic (or a sharding middleware) would route requests to the correct shard.

  • NoSQL for specific scale points: We might introduce a NoSQL store for certain high-throughput data. For instance, the Feed storage could be handled by a NoSQL DB like Cassandra or DynamoDB which can scale horizontally easily and handle a high volume of writes (especially sequential writes for feed fan-out). We could also use a search engine like Elasticsearch if we later add searching of posts or tags.

  • Caching (Memory): Caching is critical for performance. We have a few places to use caches:

    • CDN for images – already discussed (caches images globally, offloading traffic by ~90%+ linkedin.com).

    • Application layer cache: Use an in-memory cache store like Redis or Memcached to cache frequent queries. For example, user profile data (username, follower count) can be cached, so repeated profile lookups don't always hit the database. Similarly, feeds as mentioned can be cached in Redis. Redis can also store sessions or token data if needed.

    • Database query caching: Many SQL databases have internal cache for frequent reads, but explicitly using an external cache gives more control in a distributed environment.

    We must also consider cache invalidation: e.g., if a user profile updates, we should invalidate that cache entry. Or when a user posts a new photo, we need to update or invalidate cached feeds for followers so they eventually see the new post.

  • Asynchronous Processing & Queueing: For expensive operations, use message queues and async workers. As mentioned, feed fan-out can be done via a queue (like Kafka, RabbitMQ). Another example: image processing – if we need to generate thumbnails or transcode images (or videos), we shouldn't do that inline on the upload request. Instead, we save the original and send a message to a processing service to create resized versions. This keeps the user-facing operations fast, and heavy lifting is done in the background.

  • Load Balancing and Failover: Each service group would be behind a load balancer. If an instance goes down, the LB stops sending traffic to it. We also design stateless services so that any instance can handle any request (session state can be passed via tokens, etc.). For DB, we have replicas – if a primary fails, we promote a replica. For object storage, it’s inherently redundant behind the scenes. For the CDN, they have multiple edge servers; if one is down, user requests automatically route to another.

  • Partitioning Content: If we grow immensely, we might partition by region as well. For example, have separate clusters for Americas, Europe, Asia, etc. with some federation. This reduces cross-world traffic and improves latency by keeping data closer to users. However, that adds complexity (especially if users from different regions interact), so it may be a later optimization.

  • Monitoring & Scaling: We would include monitoring (logs, metrics, alerts) to know when a component is overloaded. Then we scale out that component. For instance, if the Post DB writes become the bottleneck, we shard more aggressively or move some data (like comments maybe to a separate service/DB).

Interviewer: "What about consistency and reliability across these distributed components?"

Candidate: We would ensure atomicity for critical operations. For example, when a post is created, two things happen: storing the image and saving the post record. We should handle failures in between gracefully (e.g., if image upload succeeded but DB insert failed, we have an orphan image – we might periodically scan storage for unreferenced blobs or use a two-phase approach where we only finalize upload after DB success). In practice, wrapping the sequence in a distributed transaction is heavy, so simpler approaches are used: often the application will do one then the other and have retry or cleanup logic on failure.

For reliability: use acknowledgements and retries for queue processing, so if a feed update fails it can retry from queue. Use redundant infrastructure (multiple servers, multiple replicas, multi-data-center deployments if possible) to avoid single points of failure. Also, employing rate limiting and backpressure in the system to prevent overload cascades (e.g., if suddenly a million users join and follow a celeb, our system should degrade gracefully, perhaps by temporarily using pull model for that scenario or by queueing the fan-out).

Interviewer (Amazon): "How would you optimize or reduce costs for this system?"

Candidate: Cost-wise, major costs come from storage and data transfer (especially images). Using CDNs not only improves speed but also saves bandwidth from origin (CDN edge caching can cut origin traffic massively, e.g. 90-95% reduction linkedin.com). Storing images in tiers (hot vs cold storage) can save cost – e.g., older images that are rarely accessed could be moved to cheaper storage (like Amazon Glacier or a cold blob store) after some time, as long as retrieval is slower (which is okay for old content). Also, we should match resources to demand – auto-scaling down during low traffic, etc. For databases, using efficient indexes and data models ensures we don't need oversized clusters unnecessarily.

Finally, security and privacy should be mentioned: all communications should use HTTPS/TLS (especially uploads/downloads to protect data in transit). Stored images can be encrypted at rest by the storage service. Access control: make sure only authorized users can, say, delete or view private content (if any privacy settings exist). We might implement token-based access for images if needed (e.g., signed URLs that expire, though for a social network typically images are public to followers).

(Wrap up with a quick recap of design choices.)

Conclusion

Candidate (Summary): In summary, we designed a scalable photo-sharing service with a microservices approach:

  • We store structured data (users, posts, follows) in relational databases (sharded, replicated) with proper indexing (B+ tree indexes for fast lookups blog.bytebytego.com).

  • We store the large photo files in distributed object storage and serve them via a CDN for global low-latency access geeksforgeeks.org.

  • We implemented a feed system using a combination of push and pull strategies to handle different scales of fan-out, caching feeds in memory for quick reads.

  • The system scales horizontally at each layer: web/app servers, databases, storage, using load balancing, caching, and partitioning to handle growth. We also considered using asynchronous processing (queues) for heavy tasks like distributing feed updates and processing media.

  • Throughout, we aimed for high availability and durability (replicating data, removing single points of failure) and acceptable consistency trade-offs (allowing eventual consistency in feed updates to achieve scalability).

This design should be able to serve a platform like "Instagram-lite" efficiently, and we can evolve each component (database optimizations, more microservices, advanced ranking, etc.) as needed when the user base and feature set grows.

----


metricool.com
Instagram Statistics in 2025 Every Marketer Should Know
13. The average US adult spends 33.1 minutes per day on Instagram.13 14. Users share 1.3 billion images on Instagram every day. 76 15. Only 16% of US adults use Instagram for their news consumption.15 16. 99.9% of Instagram users are active on at least one other social media
dev.to
Why Use Object Storage for Large Files in System Design? - DEV Community
7️⃣ Optimized for Large Files & Streaming
geeksforgeeks.org
Designing Content Delivery Network (CDN) | System Design - GeeksforGeeks
A Content Delivery Network (CDN) is a globally distributed network of servers designed to enhance the performance and availability of web content. CDNs reduce latency and accelerate the delivery of static assets, such as images, videos, and scripts, by storing copies on servers strategically positioned around the world. When a user requests content, the CDN automatically routes the request to the nearest server, minimizing the physical distance and, consequently, reducing load times.
blog.bytebytego.com
Database Indexing Strategies - ByteByteGo Newsletter
Indexes are typically stored on disk. They are associated with a table to speed up data retrieval. Keys made from one or more columns in the table make up the index, which, for most relational databases, are stored in a B+ tree structure. This structure allows the database to locate associated rows efficiently.
blog.bytebytego.com
Database Indexing Strategies - ByteByteGo Newsletter
For relational databases, indexes are often implemented using a B+ Tree, which is a variant of B-Tree.
reddit.com
Is it better to store images in database or filesystem? - Reddit
Reddit www.reddit.com If you store images in a database, it will considerably increase the size of your database, and you might run into issues around database backup ...
linkedin.com
Instagram System Design: The Blueprint to Crack FAANG Interviews
Why use CDN?
linkedin.com
Instagram System Design: The Blueprint to Crack FAANG Interviews
2️⃣ Direct upload to blob store
dev.to
Why Use Object Storage for Large Files in System Design? - DEV Community
2️⃣ Durability & Availability
linkedin.com
Instagram System Design: The Blueprint to Crack FAANG Interviews
Hybrid approach (what Instagram uses)
linkedin.com
Instagram System Design: The Blueprint to Crack FAANG Interviews
* Upload services throttle large video uploads to protect processing pipeline. * Blobs are encrypted at rest + in transit (TLS). * Using CDN reduces

----

ChatGPT 5.4 Thinking

prompt

Could you take a look and see whether there are any missing details here that an interviewer might ask about — assuming this is for a very senior-level position?

ChatGPT

----

My overall take first:
Your answer is already solid and complete enough for a mid-to-senior system design interview. But for a higher-level role, interviewers usually will not stop at “list the components + mention common patterns.” They will keep drilling into:

What exactly are the trade-offs? What is the source of truth? What can be eventually consistent? How do you handle hot spots? How do you recover from failures? What product semantics do you guarantee? How does the system evolve?

The biggest characteristic of your current version is: broad coverage, but not yet sharp enough.
At higher levels, interviewers care much more about your ability to identify the true core problems, define boundaries, explain trade-offs, and survive deep follow-up questions.


First, what your current version already does well

These parts are good:

  • Requirements clarification is complete

  • Separating image binaries from metadata is the right direction

  • You mentioned CDN / object storage / presigned URLs

  • You covered fanout-on-write vs fanout-on-read, and a hybrid model

  • You know you need cache, queues, and async processing

  • You know the database needs replication and sharding

  • You understand that not everything must be strongly consistent

These are all signs that you have good system design fundamentals.


But for a very senior role, here is what is still missing

1. It needs a much stronger design spine

Right now the answer reads a bit like “I listed the standard components.”
At a higher level, interviewers usually want to feel that you have identified a very clear core narrative, something like:

The core challenges in this system are not “how to upload an image,” but
high-fanout feed distribution, globally low-latency image delivery, asynchronous consistency control, and hot-user handling.

Then build the design around those 4 things.

In other words, you want the interviewer to feel that you know what is truly hard here, not that you are distributing equal attention across every subsystem.


2. The capacity estimation is too shallow, and it is not actually driving the design

You have some estimates, but you do not really use them to justify architectural choices. That matters a lot for senior interviews.

Interviewers may expect you to continue like this:

  • 100M DAU

  • Assume each user opens the feed 20 times/day

  • Each feed request returns 20 items

  • That gives roughly 2B feed reads/day

  • If each feed item’s metadata is 1KB, then what is the daily metadata read volume?

  • 10M uploads/day is important, but may not be the main throughput bottleneck

  • The real bottlenecks are likely:

    • feed fanout writes

    • feed read QPS

    • celebrity hot keys

    • social graph lookups

So the point is:
use numbers to prove where the bottleneck is, then explain why hybrid feed generation is the correct choice.


3. “Start with microservices” is not always the best senior-level framing

You opened with a microservices architecture. In interviews, that is not always a plus.

A more mature framing is usually:

Logically, I would separate the system into domains such as User, Social Graph, Media, Feed, and Engagement.
Deployment-wise, I would not necessarily start with independently deployed microservices. I could begin with a modular monolith or a small number of services, and split only when the scaling pattern or team boundary requires it.

That sounds much more senior because it shows you understand:

  • logical boundaries are not the same as deployment boundaries

  • microservices come with costs: cross-service calls, distributed transactions, deployment complexity, observability overhead

  • not every system should start with full microservice decomposition


4. The feed section mentions the standard patterns, but does not fully explain product semantics

This is one of the most common deep-dive areas.

An interviewer may ask:

If a user just followed someone, should we backfill old posts into the feed?

  • Backfill the last 20?

  • Or only show posts created after the follow?

  • Is backfill synchronous or asynchronous?

  • What if backfill fails?

After unfollow / block / private account changes, what happens?

  • If a post has already been fanned out into feed storage, how do you remove it?

  • Do you filter at read time, or asynchronously clean the materialized feed?

  • If a user deletes a post, how do you remove it from all follower feeds?

How do you paginate the feed?

  • Offset or cursor?

  • How do you prevent duplicates and gaps?

  • If ordering is time-based or rank-based, how does the cursor work?

Is the feed chronological or ranked?

  • If ranked, does the ranking service produce a candidate set or the final order?

  • Where do ads get inserted?

  • How do recall and ranking get separated?

These are exactly the kinds of feed questions that come up in senior+ interviews.


5. You are missing a clean “source of truth vs materialized view” framing

This is a very strong senior signal.

You should explicitly say:

  • Posts DB + Social Graph DB are the source of truth

  • The user’s home feed storage is only a materialized view / derived dataset

  • The feed is allowed to be stale

  • The feed can be lost and rebuilt

  • Rebuild options:

    • recompute from follow graph + recent posts

    • or replay from Kafka / an event stream

That framing is extremely valuable because once you say it, many later answers become cleaner:

  • asynchronous feed updates are acceptable

  • cache loss is not a disaster

  • feed storage can use NoSQL / Redis / Cassandra

  • repair and recovery paths are much clearer


6. The social graph modeling and partitioning are not deep enough

A senior interviewer may definitely go there.

You mentioned a Follow table, but that is only the start. They may ask:

  • You need efficient lookup for both followers and followings. How exactly do you model that?

  • Does a celebrity’s follower list become a hot shard?

  • If you shard by user_id hash, does querying all followers of one user still land on a single hot partition?

  • Do you need bucketing for follower edges by (followee_id, bucket_id)?

  • How do you support fast isFollowing(A, B) checks for privacy enforcement?

A stronger answer often includes:

  • social graph is a dedicated service / storage layer

  • separate read patterns matter:

    • followers(of user X)

    • following(of user X)

    • isFollowing(A, B)

  • you may maintain two adjacency lists

  • very large accounts need bucketing to avoid partition hot spots


7. The media pipeline is still incomplete

You correctly mentioned object storage + CDN + direct upload. That is good. But at a higher level, interviewers may continue with:

  • Do you generate thumbnails?

  • Multiple resolutions?

  • WebP / AVIF transcoding?

  • Strip EXIF metadata?

  • Fix image orientation?

  • Run malware scanning?

  • Run content moderation?

  • What happens if upload succeeds but post creation fails?

  • How do you guarantee idempotency?

  • How do you handle multipart upload for large files?

  • How do you leave room for future video support?

Your current design is still a bit “store and serve the file.” In a real system there is usually a media pipeline:

upload -> validation -> virus scan -> moderation -> thumbnail/transcode -> metadata extraction -> publish event

That is a big thing to add for a senior-level answer.


8. Idempotency is missing

This is a classic senior-level question.

For example:

Create post

If the client times out and retries twice, how do you prevent duplicate posts?

  • client-generated id

  • idempotency key

  • server-side deduplication

Fanout

If the same post event is consumed twice, what happens?

  • feed entry upsert

  • unique key (user_id, post_id)

Like

If the user taps like multiple times very quickly, what is the exact semantic?

  • toggle?

  • dedupe?

  • last-write-wins?

A strong senior candidate often proactively says:

For write APIs like create post, follow, and like, I would introduce idempotency keys because clients and gateways will retry on timeout.

That single sentence adds a lot of credibility.


9. Multi-region / disaster recovery / locality are not developed enough

For a very senior role, interviewers often push on this boundary.

They may ask:

  • A user posts in the US and followers in Europe open the feed — where is the data?

  • Is this single-region primary or multi-active?

  • Are images globally replicated or only served globally through CDN?

  • Is metadata replicated cross-region?

  • What happens during a regional outage?

  • What are your RPO and RTO targets?

You do not need to propose a globally multi-master system. In fact, a better senior answer is often:

I would start with a single write-primary region plus cross-region disaster recovery replication.
Images are globally served through CDN.
Read paths can be localized over time, but I would keep writes single-region initially to avoid conflict complexity.
Multi-active writes only become worth it if the product truly needs them.

That is a very mature position.


10. Consistency semantics are not explicit enough

A senior interviewer may ask for very precise answers:

  • After a user posts, do they see it immediately?

  • How long until followers see it?

  • If the user deletes the post, how long until it disappears from feeds?

  • Must like count and comment count be strongly consistent?

  • Must follower count be exact in real time?

A stronger answer would say:

  • after post creation succeeds, the author should immediately see it on their own profile

  • follower home feeds can be eventually consistent within seconds

  • like/comment counters may be near-real-time aggregates

  • delete and block use high-priority asynchronous invalidation, but read-time authorization checks remain the final guardrail

That is much more senior because it ties system behavior to product semantics.


11. Hotspot handling is mentioned, but not fully grounded

You mentioned celebrities, but you did not fully explain how you would operationally handle them.

Possible follow-up questions:

  • A user with 100M followers posts a photo. How do you avoid fanout explosion?

  • A single viral post becomes a read hot key. What do you do?

  • A profile page suddenly becomes extremely hot. How do you handle that?

Things you can add:

  • celebrity accounts use read-time merge instead of full push

  • feed fanout uses queue-based workers with throttling

  • hot cache keys may be replicated

  • request coalescing / single-flight to prevent thundering herds

  • more aggressive CDN and metadata caching for trending content


12. Delete, privacy, and compliance are missing

This is another favorite senior-level direction.

Examples:

  • user deletes a post

  • user deletes their account

  • GDPR / right to be forgotten

  • account changes from public to private

  • block user

  • report / moderation takedown

  • DMCA / legal removals

If these never come up in your answer, it can make the design feel like it only supports the happy path.


13. Observability and operations are not fully developed

A higher-level role is not just about drawing a system. It is also about operating it.

You can proactively mention key metrics:

  • upload success rate

  • post-to-visible latency

  • feed p95 / p99 latency

  • cache hit ratio

  • fanout queue lag

  • moderation pipeline latency

  • object storage error rate

  • CDN hit ratio

  • shard imbalance / hot partition rate

And for debugging:

  • distributed tracing

  • structured logging

  • per-post lifecycle tracing

  • dead-letter queues for failed feed fanout events

That makes you sound much more like someone who has actually run large systems.


Questions the interviewer is very likely to ask next

I would strongly suggest having at least a 2–3 sentence answer ready for each of these.

Feed-related

  1. Why hybrid, instead of pure push or pure pull?

  2. How do you define the celebrity threshold? Static or dynamic?

  3. Do you backfill posts when someone follows a new account?

  4. How do unfollow / block / delete remove content from feeds?

  5. How do you paginate feeds without duplicates or gaps?

  6. How do you rebuild a feed after cache or feed-store loss?

  7. What is the source of truth? What if the feed store is corrupted?

Write path

  1. What if upload succeeds but DB write fails?

  2. How do you make post creation idempotent?

  3. How do you handle duplicate consumption of fanout events?

  4. If Kafka or the queue is backlogged, how long before users see new posts?

Data model

  1. How do you model the follow graph? Why not use a graph database?

  2. How do you shard posts — by post_id or user_id?

  3. How do you store likes/comments counts — realtime or async aggregated?

  4. How do secondary indexes behave at this scale?

Globalization / availability

  1. How would you design multi-region support?

  2. What is the regional failover plan?

  3. What if CDN origin fetch fails?

  4. Do images need multi-region replication?

Product semantics / safety

  1. How do private accounts work?

  2. After block, what happens to old feed entries?

  3. Where is moderation done?

  4. How do you mitigate spam likes / spam comments / abuse?

Cost / evolution

  1. What is the biggest cost driver, and how do you reduce it?

  2. How do you evolve from 10M DAU to 100M DAU?

  3. If you add short videos/Reels, what parts of the design change first?


There are also a few places in your answer I would weaken or rewrite

1. Do not spend too much time on B+ tree internals

Unless the interviewer explicitly asks “how are indexes implemented under the hood?”, this is usually not the most valuable use of time in the main design interview.

For a senior role, what matters more is:

  • what indexes you create

  • why those indexes fit the access patterns

  • what the write amplification and storage trade-offs are

  • whether they support point lookups or range scans efficiently

That matters more than going deep on B+ tree mechanics.


2. Do not emphasize “microservices” too early

A stronger phrasing is:

I would first separate the system logically into Auth, User, Social Graph, Post Metadata, Media, Feed, and Engagement.
Deployment-wise, I can start with a smaller number of services or a modular monolith, and split further only when scaling patterns or organizational boundaries justify it.

That sounds much more mature than opening with “I propose a microservices architecture.”


3. External stats should not be overused or relied on too heavily

In interviews, saying something like “Instagram has 1.3 billion images per day” can be risky if the number is challenged.

A safer approach is:

I will assume 100M DAU, 10M uploads/day, and peak QPS roughly 5–10x the average. Then I will use those assumptions consistently to drive the design.

In system design interviews, self-consistent estimation is usually more valuable than web-sourced numbers.


4. Be much more explicit about “must-have now” vs “later evolution”

Senior interviewers like staged thinking.

You can say:

MVP must-have:

  • upload

  • profile

  • follow graph

  • chronological feed

  • object storage + CDN

  • hybrid fanout

Phase 2 / later:

  • ranking

  • more aggressive caching

  • advanced moderation

  • multi-region active-active

  • ads insertion

  • ML recommendation

That makes it sound like you are not just stacking concepts — you are making roadmap decisions.


If I were the interviewer, these are the 8 things I would most want you to add

If you add these, your answer becomes much more senior-looking.

1. Explicit source of truth

Posts + Follow Graph are the truth; Feed is a materialized view and can be rebuilt.

2. Explicit product semantics

  • author sees own post immediately

  • followers see it within seconds

  • delete/block have read-time enforcement plus async cleanup

3. Follow / unfollow / backfill behavior

This is one of the classic feed-design follow-up topics.

4. Idempotency

Create post / follow / like / fanout all need idempotent behavior.

5. Hot user and hot content strategy

Celebrity fanout, hot cache keys, degradation strategy.

6. Media pipeline

Thumbnails, multiple sizes, moderation, EXIF stripping, format conversion.

7. Multi-region position

Start with single write-primary plus disaster recovery, not global multi-master.

8. Observability

Post-to-visible latency, queue lag, feed p99, CDN hit ratio.


A more senior-looking answer structure

Next time, I would recommend answering in this order. It will sound much stronger.

1. Define the scope

“For this interview, I will focus on photo upload, follow graph, and home feed. I will start with chronological ranking and not go deep into recommendation ML.”

2. Give scale and identify bottlenecks

“Uploads matter, but the hardest problem here is feed distribution and read amplification, especially with celebrity accounts.”

3. Define the core data model

  • User

  • FollowEdge

  • PostMetadata

  • FeedEntry as a materialized view

  • Like / Comment can be deferred

4. Walk through the two critical paths

  • write path: upload + publish post

  • read path: fetch home feed

5. Explain feed trade-offs

push vs pull vs hybrid, and why hybrid wins

6. Explain consistency and source of truth

especially what is eventually consistent and what must be immediate

7. Explain failures, hotspots, and idempotency

this is where senior candidates separate themselves

8. End with evolution

chronological to ranked feed
single region to geo-expansion
photo to video


A stronger summary paragraph you can use

You can upgrade your conclusion to something like this:

The core problem in this system is not simply storing images in object storage, but coordinating posting, social graph lookups, feed distribution, hot-user traffic, and global low-latency reads.
I would treat Posts and Follow Graph as the source of truth, and Home Feed as a rebuildable materialized view.
For ordinary users, I would use fanout-on-write; for celebrities, I would use read-time merge.
Images would go through object storage plus CDN, while metadata and graph data would be partitioned separately.
Then I would rely on queues, caches, idempotency, permission checks, and observability to keep the system scalable, recoverable, and operable at high throughput.

That sounds much more like a Staff/Principal-level answer.


My blunt final assessment

The issue is not that you forgot a few components.
The issue is that you have not yet pushed deeply enough into what is actually hard about this system.

A very senior interviewer usually will not penalize you for forgetting to say “Redis.”
But they absolutely may penalize you if you do not clearly explain:

  • source of truth

  • product semantics

  • failure handling

  • hot-user strategy

  • backfill / delete / block behavior

  • rebuild strategy

  • how the design evolves

That is where the real differentiation happens.


Additional things I would add beyond my previous reply

Since you asked me to also check whether anything else should be added, here are several more senior-level dimensions that are worth including on top of everything above.

14. Ranking pipeline separation: retrieval vs ranking vs filtering

If the interviewer pushes beyond chronological feed, a strong senior answer should break feed serving into stages:

  • candidate retrieval: fetch recent posts from precomputed feed, followed users, or hot sources

  • filtering: privacy, block, delete, muted users, moderation status

  • ranking: chronological or ML score

  • blending: ads, recommended posts, stories, etc.

Even if you choose chronological for the interview, it is strong to say:

If we later move to an algorithmic feed, I would separate candidate generation from ranking, because the scaling, latency, and experimentation requirements are different.

That is a very good senior signal.


15. Read path budget and latency decomposition

Senior interviewers often like candidates who think in latency budgets.

For example, for a feed request you could say:

  • API Gateway: 10–20 ms

  • Feed cache/store lookup: 20–40 ms

  • Metadata hydration: 30–50 ms

  • ranking/filtering: 20–40 ms

  • total server-side p95 target: under ~150 ms

  • image bytes are served separately via CDN

That helps show that you understand feed metadata latency and image delivery latency are different problems.


16. Hydration strategy for feed items

This is another subtle but good addition.

A feed request often returns:

  • post IDs

  • author basic info

  • caption

  • media URLs

  • like/comment counters

  • whether viewer liked the post

You do not want one feed request to become 20 downstream calls per item.

So you should mention:

  • batch hydration APIs

  • denormalized feed card metadata where appropriate

  • partial precomputation for frequently displayed fields

  • batched cache lookups

A senior answer might say:

I would avoid N+1 fanout on the read path by batch-fetching post metadata and denormalizing some frequently displayed fields into feed-serving storage.


17. Counter architecture

Likes, comments, follower counts, and view counts are deceptively tricky.

Possible follow-up:

  • Do counts need to be exact in real time?

  • How do you avoid write contention on hot counters?

  • Do you store raw events and aggregate asynchronously?

A strong answer:

  • raw like/comment events go to durable storage

  • user-visible counters are eventually consistent

  • counters may be updated via asynchronous aggregation

  • for hot posts, use sharded counters or event aggregation rather than row-level contention

This is a common senior-level extension.


18. Timeline repair and reprocessing

Once you say feed is a materialized view, the natural next step is to explain repair paths.

For example:

  • if a fanout worker fails and misses some followers

  • if a bug corrupts feed generation logic

  • if privacy rules change retroactively

Then what?

A strong answer:

  • keep append-only post events

  • version the feed generation logic

  • support offline rebuild jobs per user range / shard

  • run repair backfills asynchronously

This is very senior-sounding because it shows you think beyond happy path correctness.


19. Privacy and authorization as a first-class read concern

This deserves even more emphasis.

For example:

  • public vs private accounts

  • follower-only visibility

  • blocked users

  • muted users

  • age restrictions / regional legal restrictions

  • moderation-removed posts

The important senior point is:

Even if feed data is precomputed, the read path must still apply permission filtering, because precomputed data may be stale or invalid after privacy changes.

That is a very strong and practical point.


20. API semantics and contract design

For a very senior role, sometimes interviewers want to see that you can think at API-contract level too.

Examples:

  • POST /media/upload-url

  • POST /posts with idempotency key

  • GET /feed?cursor=...

  • POST /follow/{user_id}

  • DELETE /posts/{post_id}

Useful details:

  • cursor-based pagination rather than offset

  • idempotency key on write APIs

  • versioned APIs

  • explicit error semantics for partial media pipeline failure

You do not need to overdo this, but showing 2–3 concrete APIs can help.


21. Abuse prevention / platform integrity

At high scale, every social network has abuse problems.

Possible areas:

  • spam accounts

  • mass follow/unfollow abuse

  • fake likes/comments

  • bot uploads

  • malicious image payloads

Even a brief mention helps:

I would include rate limiting, abuse detection, and moderation signals as cross-cutting concerns, because at large scale platform integrity becomes as important as serving performance.

That sounds much more realistic.


22. Storage lifecycle and retention

You briefly touched on cold storage, but you can sharpen it.

For example:

  • original image retained permanently?

  • derived thumbnails re-creatable?

  • deleted content hard-deleted immediately or after retention window?

  • logs retained for audit?

  • event streams retained long enough for feed rebuild?

This ties directly to:

  • cost

  • compliance

  • recovery


23. Schema evolution and backward compatibility

This is subtle but often appreciated for very senior candidates.

As the product evolves:

  • posts may gain new metadata

  • media formats change

  • privacy rules expand

  • ranking features add fields

A more senior perspective:

  • use backward-compatible event schemas

  • version feed item schemas

  • separate immutable post identity from mutable presentation metadata


24. Explicitly call out what not to optimize too early

This is also a senior signal.

You can say something like:

I would not start with global active-active writes, graph databases, or a full ML ranking stack on day one. The simplest design that solves feed scale, media delivery, and social graph correctness is enough initially, and I would only add complexity when the measured bottleneck justifies it.

That sounds very mature.


If you want your answer to sound truly high-level, here is the mindset shift

Instead of sounding like:

Here are the components I know should exist.

Try to sound like:

I know which parts are the source of truth, which parts are derived, which latencies matter, which consistency guarantees the product really needs, where the hot spots will emerge, and what failure/recovery paths I need before this can run safely at scale.

That is the difference between a competent answer and a very senior answer.


Final one-line takeaway

Your answer is already good, but to make it strong for a very senior role, you need to add much more depth around:

source of truth, feed semantics, hotspots, idempotency, repairability, privacy enforcement, observability, and staged evolution.

----

Gemini, Deep research, Gemini 3 Pro Thinking

----

prompt:

----

I am preparing for software development system design interviews in the United States, aiming at top big tech companies, mid-sized companies, and popular startups. I want to prepare for designing an Instagram-like system. I now need materials to prepare for these interviews. This is for principal, senior principal, and even higher-level roles. I know that in the actual interview there will only be about one hour or less, but I want my preparation to be comprehensive. Please prepare a very large scenario for me (including all related questions and answers, which might take many hours, dozens of hours, or even more for me to study). For example, starting from the opening, the overall system architecture, all aspects before and after the system design, and then for each knowledge point the possible questions that might be asked, including potentially very deep questions. If your context length has no limit, then output as much information as possible. If there is a context length limit, then you can skip a few questions here (such as in-depth questions about the details of different types of databases, in-depth questions about load balancer details; but anything that is essential for Instagram still needs to be included. If something is skipped, mark it with an uppercase TODO. If there are other large modules that have to be omitted, then list the major topics of the omitted parts and also mark them with an uppercase TODO). If there is no length limit, then include all of these questions. (If there are even deeper questions that could be asked, mark them with a lowercase todo.)

You can ask me more questions and then produce the output, or you can just start directly according to your own flow.

Output in English.

I need high-level design and module breakdown, and you can describe the architecture diagram in text, for example using multiple lines like
API Gateway-> Service AService A<---> Service BService B-> DB A, and so on.

Some interviewers may ask about detailed flows; wherever they are likely to dig into details, I need pseudocode.

I need questions in different styles according to the styles of interviewers at different companies (but label this on the questions. For example, for a given knowledge point there may be multiple ways to ask it; after each wording you can annotate the company name. If basically all companies would ask essentially the same question, then there is no need to label it).

As for whether you prefer the content in the form of structured notes or step-by-step explanation → I still prefer it to be presented in an interview scenario style. For example, first propose the requirements, then show various possible opening styles from interviewers, and then they might drill down into image upload and all related design knowledge points. Due to time limits or interviewer style, they might only ask one or two questions, but you need to output all the possible branches. And for each deep dive, if multiple stages involve similar questions—for example, storing user information indexes and storing post information indexes both require using a database—raise the related questions and deep-dive questions. But for common questions such as which tree structure a DB uses to implement indexes, only bring it up the first time; when it appears again later, just mention that it was already discussed above.

----

Revising Scope Definition

I'm now zeroing in on defining the scope more concretely, specifically around system design interview preparation for principal and above roles. I'm focusing on US-based tech companies across various sizes.

Clarifying Requirements Further

I'm drilling down on the nuances. The need for exhaustive coverage, even for a brief interview, is becoming clearer. I'm prioritizing the US-based tech landscape and incorporating varying interviewer styles by company size. The structure will be dialogue-driven with staged progression and architecture diagrams with pseudocode when suitable. I'm planning for constraints by using the 'TODO' & 'todo' notations for necessary omissions and deep dives. Common themes will be covered once and referenced later.

Refining Design Principles

I'm now prioritizing the design's trade-offs, particularly emphasizing scalability, consistency, and availability. The goal is to address the specific needs of principal-level roles by focusing on a complex distributed systems approach. I'm focusing on defining the architectural choices, and the associated business considerations that differentiate Principal level designs.

Outlining the Structure

The Instagram design interview structure is now taking shape, moving from broad strokes to more granular detail. I'm focusing on the dialogue-based format, mapping out the stages from initial scope definition through deep dives. The emphasis is on simulating the Principal level perspective, addressing both business considerations and deep technical intuition. The scenario will encompass write path, data modeling, read path and system design at extreme scale.

Developing Granular Details

I'm now structuring the deep dives based on the planned scope and structure. I've broken it down to make sure I cover the write path (upload handling), data modeling, and read path (feed generation) more precisely. I'm focusing on those Principal level challenges. I've started adding 'TODO' and 'todo' tags to guide potential deep dives based on interviewer styles and company types.

Defining Architectural Scope

I'm now focusing on a more granular approach to data modeling and storage for the write path, specifically for managing user data, social graphs, and feed metadata. My plan emphasizes scalability challenges by assessing the pros and cons of relational versus NoSQL databases and sharding techniques. 'TODO' tags are placed.

Analyzing Instagram's Scale

I'm now diving into the specifics of scaling Instagram, considering the challenges of massive user growth and traffic. The focus is on the impact of a high read-to-write ratio and the complexities of handling extreme concurrency. I'm prioritizing efficient data models and resilient distributed systems for optimal performance.

----

This is an exciting and challenging undertaking. Preparing for a Principal/Senior Principal Engineer role at top-tier US companies (FANG/MAANG, Uber, Airbnb, etc.), pre-IPO startups, or major established tech firms requires moving far beyond standard system design templates.

Interviewers at this level are not just looking for a working solution; they are looking for deep technical intuition, the ability to foresee multi-year architectural evolution, business strategic alignment, and a mastery of complex trade-offs at massive scale. They want to see how you handle failure modes that only appear at "planet scale."

Below is a comprehensive, "super-sized" interview simulation designed for Principal-level preparation for designing Instagram. It is structured as an ongoing dialogue, broken down by phases, with deep dives, variations in interviewer styles, and necessary technical artifacts.


System Design Interview Simulation: Designing Instagram (Principal Level Focus)

Setting the Stage:

  • Candidate Role: Principal/Senior Principal Software Engineer.

  • Interviewer Persona: Varies (VPs of Engineering, Distinguished Engineers, Senior Principal Architects).

  • Time Constraint: In reality: 45-60 mins. In this simulation: Unlimited deep-dive preparation covering many possible paths.


Phase 1: The Opening & Requirements Definition (Scoping)

Interviewer (VP of Engineering - Top Tier Tech Company Style): "Welcome. Let's get right to it. We want to build a new photo-sharing platform to compete with existing market leaders. Let's call it 'InstaClone'. We need it to scale globally and fast. Walk me through how you'd design it."

Candidate (You - Principal Mindset): (Thought Process: At this level, don't just jump into drawing boxes. A Principal Engineer first defines the business goals and the scale of the ambition. I need to scope this down to a manageable session but ensure the architecture won't collapse under success. I need to drive the conversation.)

"Sure. Before we dive into the architecture, I need to understand the scope and constraints to ensure we are solving the right problems. 'Competing with market leaders' implies massive scale. Are we designing for a Day 1 MVP launch, or are we designing for the target state 3-5 years from now where we have achieved product-market fit?"

Interviewer: "Let's design for the target state. Assume we have achieved exponential growth."

Candidate: "Understood. That fundamentally changes the design choices. We will focus on high availability, extremely low latency for media consumption, and eventual consistency for social graph updates to prioritize user experience over strict data consistency in non-critical areas.

Here are the core features I propose we focus on for this session:

  1. User Management & Graph: User profiles and the following/followers relationships.

  2. Content Creation (Write Path): Uploading photos (let's park video for now to simplify, but design the storage so it's extensible), and storing metadata (captions, location).

  3. Content Consumption (Read Path): Generating a personalized 'Home Feed' of posts from people a user follows. This is likely the hardest challenge.

  4. Discovery/Profile View: Viewing a specific user's grid of past posts.

Is this scope acceptable, or would you like to prioritize features like Stories, Direct Messaging, or Reels instead?"

Interviewer (Hot Startup CTO Style): "That scope is fine. But speed is everything for user acquisition. How fast can users see a photo after it's uploaded? And how many users are we talking about?"

Candidate: "Latency is a critical non-functional requirement.

  • Upload Latency: We should aim for sub-second perceived latency for the uploader (optimistic UI update), while the actual background processing might take a few seconds.

  • Read Latency: Feed rendering should be under 200ms globally.

Let's align on the scale constraints (Back-of-the-envelope math to justify architectural choices):

  • DAU (Daily Active Users): Let's assume 500 Million globally.

  • Read/Write Ratio: Instagram is heavily read-intensive. Let's assume a 100:1 read-to-write ratio.

  • Write QPS (New Posts): If 10% of DAUs post once a day -> 50M posts/day.

    • Average write QPS = ~600 uploads/second.

    • Peak write QPS (events, holidays) could be 5x-10x that -> ~3K to 6K QPS.

  • Read QPS (Feed Views): If users view their feed 10 times a day -> 5 Billion feed views/day.

    • Average read QPS = ~60K QPS.

    • Peak read QPS -> ~300K+ QPS.

  • Storage Constraints: 50M photos/day * avg 2MB/photo (compressed) = 100 TB/day of new media generation.

    • Key Insight: Storage capacity and bandwidth costs will be our biggest infrastructure challenge over time, not just compute.

Summary: We need a system optimized for massive read throughput, high availability, and enormous blob storage growth."


Phase 2: High-Level Architecture & Foundational Choices

Interviewer (Senior Principal Architect - Any Company): "Okay, the scale is clear. Sketch out the 30,000-foot view. How do you organize this system logically? Monolith or microservices?"

Candidate: "Given the 500M DAU scale and the distinct behaviors of uploading media versus consuming feeds, a monolithic architecture is a non-starter. It would be a single point of failure and impossible to scale teams independently.

We need a microservices-based architecture. I would implicitly adopt a CQRS (Command Query Responsibility Segregation) pattern. The write path (uploading) has very different characteristics (high latency, bandwidth-intensive, async) than the read path (low latency, highly cacheable, complex aggregation logic).

Here is the high-level block diagram:

Plaintext
                                         [Global CDN (e.g., Cloudfront/Akamai)]
                                                        ^ (Serves images/static content directly)
                                                        |
[Mobile Client] <-----> [Global Load Balancer (L7)] <-> [API Gateway]
                                                              |
        +-----------------------------------------------------+----------------------------------+
        | (Write Path Route)                                  | (Read Path Route)                |
        v                                                     v                                  v
[Media Upload Service]                               [Feed Generation Service]           [User/Graph Service]
(Handles metadata write, orchestrates flow)          (Complex logic, aggregation)        (High connectivity data)
        |                                                     |                                  |
        | (Metadata only)                                     | (Query post IDs)                 | (Query profiles/follows)
        v                                                     v                                  v
[Post Metadata DB (NoSQL Cluster)]                   [Feed Cache (Redis Cluster)]        [User DB (SQL Sharded)]
        |
        +-----> [Media Processing Message Queue (e.g., Kafka/SQS)]
                      |
                      v
              [Image Process Workers (resize, filter, safety check)]
                      |
                      v
              [Object Storage (e.g., S3/GCS)] <---- (Origin for CDN)

Interviewer (Top Tech - Infrastructure Focus): "I see you have an API Gateway. What protocol are the clients using to talk to it? REST, GraphQL, gRPC?"

Candidate: "This is a crucial decision affecting client performance and developer velocity.

For a complex, graph-like consumer product like Instagram, I strongly lean towards GraphQL.

  • Why GraphQL overcomes REST limitations here:

    • Over-fetching/Under-fetching: On a mobile device with flaky networks, bandwidth is precious. A REST endpoint like GET /posts/{id} might return huge amounts of data (comments, likes, user info). If the feed view only needs the photo URL and username, REST wastes bandwidth. GraphQL allows the client to specify exactly what fields it needs.

    • Aggregation: To render a feed, we need data from the Feed Service, User Service (for profiles), and Metadata Service (for captions). With REST, the mobile client might have to make 3 separate calls over high-latency mobile networks. A GraphQL gateway can act as an orchestration layer, stitching these disparate backend responses into a single JSON response to the client in one round-trip.

  • Alternative - gRPC: I would use gRPC for internal service-to-service communication (e.g., Feed Service talking to Metadata Service) because of its binary serialization efficiency (Protobuf) and HTTP/2 multiplexing. We might consider gRPC for the mobile client later if performance demands it, but GraphQL offers better flexibility for rapid product iteration initially."


Phase 3: Deep Dive - The Write Path (Media Upload)

Interviewer (Startup Style - Practical Implementation Focus): "Let's zoom in. Walk me through the exact flow when a user on a shaky 4G connection taps 'Post' on a 10MB high-res photo. How do we ensure it doesn't time out and lock up their app?"

Candidate: "A naive implementation where the client sends the 10MB binary file directly in the body of a POST request to our API server is a massive anti-pattern at this scale.

  1. It ties up expensive API server threads waiting on slow mobile I/O.

  2. It complicates retry logic on the client.

We must separate metadata upload from actual binary upload. We should use a Presigned URL (or Secure Upload URL) pattern to upload directly to Object Storage (S3).

The Robust Upload Flow:

  1. Initiate Request: The mobile client sends a lightweight POST request to the Media Upload Service with metadata only: {"user_id": "123", "file_type": "jpg", "expected_size": 10485760, "caption": "..."}.

  2. Generate Secure Link: The Media Upload Service authenticates the user, validates the metadata, and requests a time-limited (e.g., 5 minute expiry), unique Presigned PUT URL from S3. It also generates a temporary post_id and stores a draft entry in the Post Metadata DB with status PENDING_UPLOAD.

  3. Return URL to Client: The service returns this secure S3 URL and the post_id to the mobile client.

  4. Client Direct Upload: The mobile client now performs a PUT request directly to S3 using that URL, uploading the 10MB binary. Our web servers are completely bypassed for the heavy lifting.

  5. Completion Trigger:

    • Option A (Client confirms): Client calls POST /media/{id}/complete upon success. (Less reliable, client might die before confirmation).

    • Option B (Infrastructure event - Preferred): We configure S3 to trigger an event notification (via AWS SNS/SQS) whenever a new object lands in the 'raw-uploads' bucket.

  6. Async Processing: The S3 event pushes a message into the Media Processing Message Queue (Kafka).

  7. Worker Execution: Image Process Workers pull messages from Kafka. They download the raw image from S3, resize it into standard resolutions (thumbnail, low-res, high-res), apply compression, perhaps run safety checks (NSFW detection models), upload processed versions back to a public S3 bucket, and finally update the status in Post Metadata DB to ACTIVE.

Interviewer (Top Tech - Mobile/Client Focus): "Can you provide pseudocode for how the client handles Step 4 to ensure robustness against network failure?"

Candidate: "Certainly. The client needs smart retry logic with exponential backoff.

Python
# Client-side pseudo-code for robust upload (Python-ish format)

def upload_photo_flow(photo_data, user_id, caption):
    # Step 1: Get secure upload URL and metadata
    init_payload = {
        "user_id": user_id,
        "caption": caption,
        "size": len(photo_data),
        "type": "image/jpeg"
    }
    # This is a quick, low-latency call to our API Gateway
    response = api_client.post("/media/initiate", data=init_payload)
    presigned_url = response.s3_url
    post_id = response.post_id

    # Step 4: Direct upload to Object Storage with retry logic
    max_retries = 5
    base_delay = 1.0 # seconds

    for attempt in range(max_retries):
        try:
            print(f"Attempting upload {attempt + 1} directly to S3...")
            # Set a reasonable timeout for the upload itself
            s3_response = requests.put(presigned_url, data=photo_data, timeout=60)

            if s3_response.status_code == 200:
                print("Upload successful!")
                # Optimistically show the post in the user's local feed
                local_feed_cache.add_temporary_post(post_id, photo_data, caption)
                return True
            else:
                # Handle non-retriable S3 errors (e.g., 403 Forbidden if URL expired)
                raise NonRetriableUploadError(s3_response.text)

        except (NetworkConnectionError, TimeoutError) as e:
            print(f"Network error: {e}")
            if attempt == max_retries - 1:
                 print("Max retries reached. Marking upload failed in UI.")
                 raise UploadFailedException()

            # Exponential Backoff with Jitter to prevent thundering herd on S3
            sleep_time = (base_delay * (2 ** attempt)) + random.uniform(0, 1)
            print(f"Sleeping for {sleep_time:.2f}s before retry...")
            time.sleep(sleep_time)

Interviewer (Principal Level - Failure Modes & Consistency): "Okay, good client logic. But look at the backend. What if the user successfully uploads to S3, but the SQS message to trigger processing gets lost, or the worker crashes mid-process? The user sees 'Posting...' forever. How do you handle this distributed transaction failure?"

Candidate: "This is a classic problem where we need to ensure the metadata state matches the object storage state. We cannot rely solely on the happy path event trigger. We need reconciliation.

  1. Dead Letter Queue (DLQ): If an Image Process Worker fails to process an image multiple times (e.g., due to a corrupted file crashing the imagemagick library), the Kafka/SQS message should be moved to a DLQ so it doesn't block the queue. On-call engineers need alerts on DLQ depth.

  2. The Safety Net: Reconciliation Sweeper Job.

    • We need a background cron job that runs periodically (e.g., every 10 minutes).

    • It queries the Post Metadata DB for posts that have been in PENDING_UPLOAD state for longer than a reasonable threshold (e.g., > 15 minutes).

    • For each "stuck" post, the job checks S3: "Does the file corresponding to this post_id actually exist?"

    • Scenario A (S3 file exists): The event trigger vanished. The reaper job manually re-queues a message into Kafka to trigger processing.

    • Scenario B (S3 file missing): The client failed the upload. The reaper job marks the post status as FAILED in the DB, which triggers a notification to the user's client to retry.

This ensures eventual consistency between our metadata recording and the actual media storage."


Phase 4: Deep Dive - Data Modeling & Storage Strategy

Interviewer (Database Expert / Infrastructure): "Let's talk storage. This is where systems live or die. You have User profiles, Social Graph (follows), and Post Metadata. What specific database technologies do you choose for each, and why? Justify your choices for 500M users."

Candidate: "At 500M DAU, 'one DB to rule them all' is impossible. We must polyglot persistence based on data access patterns and consistency requirements.

1. User Profile Data (User ID, Username, Email, Bio)

  • Characteristics: Highly structured, relatively low volume per user, high read/write ratio, requires strong consistency (ACID). If I change my username, it shouldn't take 5 minutes to reflect.

  • Choice: Relational Database (e.g., PostgreSQL or MySQL with Vitess).

  • Justification: SQL databases are mature and guarantee strong consistency. Since user profiles are distinct, we can easily communicate the need for sharding early.

  • Sharding Strategy: Shard based on User_ID. A lookup service maps user IDs to specific physical DB shards.

2. Post Metadata (Post ID, User ID, Caption, Timestamp, S3 URL pointers)

  • Characteristics: Enormous volume (billions of rows), write-once-read-many, immutable (mostly).

  • Access Pattern (Crucial): We almost ALWAYS query posts in two ways:

    1. "Give me the last X posts created by User Y" (Profile View).

    2. "Give me the post details for a list of Post IDs" (Feed rendering).

  • Choice: NoSQL Wide-Column Store (e.g., Apache Cassandra or DynamoDB).

  • Justification:

    • Scale: We need seamless linear horizontal scaling for writes and storage. Relational DBs struggle here without immense operational overhead.

    • Data Modeling for Performance: We can model this to satisfy the primary access pattern in a single query.

  • Cassandra Data Model:

    • Partition Keyuser_id (All posts for one user live on the same physical nodes).

    • Clustering Keytimestamp (DESC) (Posts within a partition are pre-sorted by time).

    • Query: SELECT * FROM posts WHERE user_id = '123' LIMIT 20; This is incredibly efficient as it hits one partition and reads sequentially from disk (SSTables).

Interviewer (Follow up on NoSQL): "If you use Cassandra and partition by user_id, how do you handle a celebrity like Cristiano Ronaldo who gets millions of profile views a day? You'll create a massive 'hot partition' and overload the nodes holding his data."

Candidate: "Excellent point. Sharding by user_id works for 99.9% of users, but fails for the top 0.1% (celebrities). Hot partitions are a major risk in Cassandra/DynamoDB.

  • Mitigation Strategies:*

  1. Aggressive Multilayer Caching (Primary defense): Ronaldo's profile metadata and latest post list should almost never hit the DB. It should live in a highly replicated Redis cluster (or Memcached). We use a cache-aside pattern with a reasonable TTL (Time To Live).

  2. Read Replicas/Consistency Tuning: In Cassandra, we can increase the replication factor for the keyspace holding celebrity data and issue queries with CL=ONE (Consistency Level One) to spread reads across replicas, accepting slight eventual consistency risks.

  3. TODO [Advanced DB]: (If time permitted, I would discuss more complex strategies like splitting celebrity data across partitions artificially, e.g., user_id#YYYYMM, but caching is usually sufficient for reads).

3. Social Graph (Who follows whom)

  • Characteristics: Highly interconnected data. Needs to answer "Who does A follow?" (out-degree) and "Who follows A?" (in-degree).

  • Choice: Sharded Relational Database (MySQL) with custom graph application logic.

  • Why not a native Graph DB (e.g., Neo4j)? While conceptually perfect, at the scale of Instagram/Facebook, native graph DBs have historically struggled with the operational overhead of massive write throughput and sharding capabilities compared to mature SQL infrastructure. The industry standard approach (proven by Facebook's TAO paper) is using sharded MySQL optimized for graph operations.

  • Implementation: We use adjacency lists stored in SQL tables:

    • Table: Following (user_id, follows_user_id, timestamp) -> Index on user_id (fast retrieval of who I follow).

    • Table: Followers (user_id, follower_user_id, timestamp) -> Index on user_id (fast retrieval of my followers).

  • We shard these tables heavily by the user_id in the primary index column.


Phase 5: Deep Dive - The Read Path (Feed Generation)

Interviewer (The "Hardest Problem" Interviewer): "You've stored the data. Now solve the hardest problem in social networks. I open the app. Generate my home feed. I follow 500 people. Some are friends who post rarely, some are high-frequency news accounts, some are celebrities. I want to see a relevant, mixed feed instantly."

Candidate: "This is the core 'Fan-Out' challenge. The naive approach of querying the DB on demand will fail.

Approach 1: Fan-out on Read (Pull Model - Naive) When User A requests their feed:

  1. Fetch list of 500 users A follows (Graph Service).

  2. For each of those 500 users, query the Metadata Service (Cassandra) for their latest 5 posts.

  3. Aggregate 2500 posts in memory, sort by time, return top 20. Verdict: Unscalable latency. 500 DB queries per feed load is too slow.

Approach 2: Fan-out on Write (Push Model - Better for reads) We pre-compute feeds. Every user has a personalized 'mailbox' in a fast storage layer (Redis). When User B (whom User A follows) posts a photo:

  1. The Write Service fetches all of B's followers (User A is one of them).

  2. The service "pushes" the new post_id into the Redis list for every single follower. When User A opens the app, we just read their pre-computed Redis list. Blazing fast (O(1) read).

Interviewer: "Push model sounds great. But what about the Cristiano Ronaldo problem again? He has ~600 Million followers. When he posts, are you going to execute 600 Million writes to Redis instantly? That will crush your write infrastructure and cause massive delays."

Candidate: "Exactly. The pure push model fails for high-fanout users (celebrities).

The Principal Solution: Hybrid (Push/Pull) Approach

We classify users based on follower count thresholds (e.g., > 500k followers = 'Celebrity', otherwise 'Normal').

  1. Normal Users (Push): When my friend with 200 followers posts, we use Fan-out-on-Write. We push the post_id to those 200 followers' Redis timelines asynchronously via a queue.

  2. Celebrity Users (Pull): When Ronaldo posts, we do not fan out. We just write the post to the DB.

  3. Feed Generation Time (The Magic): When I (a user who follows both friends and Ronaldo) request my feed:

    • Step A (Pull pre-computed): Fetch my pre-computed timeline from Redis (contains posts from normal friends). E.g., Get 200 post IDs.

    • Step B (Pull celebrity): The Feed Service knows which celebrities I follow. It queries the Metadata DB directly for the latest posts only from those specific celebrities. E.g., Get latest 5 posts from Ronaldo, latest 5 from The Rock.

    • Step C (Merge & Rank): The Feed Service merges the friend IDs and the celebrity posts in memory, sorts them (chronologically or via a relevance algorithm), and returns the finalized feed to the client.

This hybrid approach balances write load for celebrities with read latency for regular users.

Interviewer (Optimization Focus): "Redis is expensive RAM. Are you storing the whole post data there for 500M users?"

Candidate: "No. We only store a list of Post_IDs (long integers) in Redis. The flow is:

  1. Client requests feed.

  2. Feed Service gets list of Post IDs from Redis (and celebrity merge).

  3. Feed Service performs a MultiGet request to the Metadata Service (Cassandra/Memcached layer) using those IDs to retrieve the actual captions, user info, and photo URLs.

  4. Construct final JSON response.

We also need LRU (Least Recently Used) eviction policies. If a user hasn't logged in for 30 days, we evict their Redis timeline to save RAM. We rebuild it from the DB on their next login (the first load will be slower, subsequent ones fast)."


Phase 6: Principal-Level Advanced Concerns & Resilience

Interviewer (Distinguished Engineer - Operational Resilience): "We're global. An AWS region (e.g., us-east-1) just went completely dark due to a hurricane. What happens to Instagram? Can users in Europe still post? Can users in California still view feeds?"

Candidate: "At this level, regional failure is an expected event, not an anomaly. Our architecture must survive it.

1. The Read Path (Viewing Feeds): High Availability is Priority

  • Data Replication: Our databases (PostgreSQL for users, Cassandra for metadata) must be configured for cross-region asynchronous replication. US-West should have a near-real-time copy of US-East data.

  • Stateless Services: Our API Gateways and Feed Services are deployed in multiple regions globally behind a DNS Load Balancer (like Route53) with health checks.

  • Failure Scenario: Route53 detects us-east-1 is down. It redirects traffic to us-west-2. The services there connect to the local read-replicas of the databases. Users see feeds, perhaps slightly stale data (seconds behind), but the service is UP.

2. The Write Path (Posting): The CAP Theorem Trade-off This is harder. We have to choose between consistency and availability during a partition.

  • Option A (Strong Consistency - Spanner/CockroachDB): Writes go to a leader. If the leader region is down, a new election happens. Writes might pause for seconds/minutes globally. High complexity.

  • Option B (Eventual Consistency / Multi-Master - DynamoDB Global Tables / Cassandra): We allow writes to any local region. If I'm in Europe, I write to eu-central-1. That region asynchronously replicates to the US.

  • Failure Scenario (Option B chosen for Instagram): If us-east-1 is down, users in NY are routed to us-west-2. They can still post. Their data is written to us-west-2. When us-east-1 comes back online, the databases reconcile the data (handling conflicts via Last-Write-Wins or vector clocks).

For a social app, Option B is usually preferred. Being able to post is more important than instant global consistency.

3. Media Storage (S3): S3 has Cross-Region Replication (CRR). Buckets in East replicate to West. If East fails, the CDN origin changes to the West bucket. This is mostly handled by the cloud provider, but we must configure it and pay for the transfer costs.

Interviewer: "Final question. What are the top 3 metrics you have on your dashboard to know if the system is healthy right now?"

Candidate: "I focus on SLO (Service Level Objective) metrics that reflect user experience, not just CPU usage.

  1. Feed Load P99 Latency: Is the 99th percentile of users seeing their feed in under 500ms? If this spikes, users are suffering.

  2. Upload Success Rate (End-to-End): The percentage of initiated uploads that result in a visible post within 30 seconds. This catches issues anywhere in the complex async processing pipeline.

  3. CDN Cache Hit Ratio: If this drops suddenly, it means our origin servers are about to get hammered by traffic, indicating a configuration mistake or a change in user behavior."


Topics Omitted for Time/Depth (TODOs)

  • TODO [Big Topic]: Search & Discovery. How do we search for hashtags or users? (Requires designing an ingestion pipeline into Elasticsearch/OpenSearch).

  • TODO [Big Topic]: Recommendations Engine. How do we populate the "Explore" tab? (Requires ML pipelines, batch processing of user activity logs, collaborative filtering).

  • TODO [Deep Dive]: Security. OAuth2 flows, rate limiting at the gateway, preventing scraping, securing internal service-to-service comms (mTLS).

  • TODO [Deep Dive]: Cassandra Internals. Deep dive into LSM trees, SSTables, compaction strategies, and quorum consistency levels (R+W > N) if the interviewer pushes hard on NoSQL expertise.

  • TODO [Deep Dive]: Load Balancer Internals. L4 vs L7 balancing, SSL termination strategies at scale.

  • todo [minor]: Video specific processing pipeline (transcoding into HLS/DASH protocols for adaptive bitrate streaming).

  • todo [minor]: Handling "soft deletes" (user deletes post, we don't remove from DB immediately, just flag hidden).

----


No comments:

Post a Comment