HomeInterview QuestionsSuppose 100 GB of files arrive in an S3 bucket eve…

Suppose 100 GB of files arrive in an S3 bucket every day and need to be processed and loaded into a data warehouse. Which AWS services would you use and why? Consider factors such as flexibility, cost, scalability, and error recovery.

🔴 Hard System Design Senior level
1Times asked
Jun 2026Last seen
Jun 2026First seen

💡 Model Answer

I would use S3 as the landing zone and trigger an AWS Glue crawler to catalog the incoming data. For processing, Glue ETL jobs are cost‑effective for moderate volumes and provide serverless scaling, but if the data requires heavy transformations or machine learning, I’d spin up an EMR cluster with Spark for parallel processing. After transformation, I’d load the data into Amazon Redshift using the COPY command, which can ingest from S3 in parallel and is highly scalable. For flexibility, Glue’s serverless nature eliminates cluster management overhead. For cost, Glue charges per 1‑minute job execution, while EMR charges per instance hour; I’d choose Glue for lighter workloads and EMR for heavier ones. Scalability is handled by Glue’s auto‑scaling and EMR’s cluster scaling. Error recovery is managed by Glue job bookmarking and Step Functions orchestration, which can retry failed steps and send alerts via CloudWatch. This architecture balances flexibility, cost, scalability, and robust error handling.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500