HomeInterview QuestionsHow would you design an ETL pipeline that uses AWS…

How would you design an ETL pipeline that uses AWS DMS CDC to capture change data from an Oracle database, stores the raw change data as JSON in a raw zone, and then processes it into a target system?

🔴 Hard Conceptual Senior level
1Times asked
May 2026Last seen
May 2026First seen

💡 Model Answer

First, set up AWS Database Migration Service (DMS) with a CDC task that connects to the Oracle source. Configure the task to use Oracle’s native redo logs to capture inserts, updates, and deletes. In the DMS task settings, enable the JSON output format and map each change event to a JSON object that includes the operation type, primary key, and column values. The task writes these JSON records to an Amazon S3 bucket that serves as the raw zone. Next, create an AWS Glue or Lambda job that reads the S3 objects, parses the JSON, and writes the data into a staging table in Amazon Redshift or a data lake. Use a schema registry or Glue catalog to enforce schema consistency. Finally, build a downstream transformation layer (e.g., Redshift SQL or Spark) that aggregates, cleans, and loads the data into the target schema. Throughout the pipeline, use CloudWatch metrics and DMS task logs to monitor lag and error rates. This design leverages DMS for efficient CDC, S3 for durable raw storage, and Glue/Lambda for flexible processing, ensuring that change data is captured, stored, and transformed reliably.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500