HomeInterview QuestionsSuppose we have some records that do not match the…

Suppose we have some records that do not match the expected schema, containing null or invalid price values. How would you separate bad records to ensure only valid data is loaded for valuation?

🟡 Medium Conceptual Mid level
1Times asked
Jun 2026Last seen
Jun 2026First seen

💡 Model Answer

I would first define a strict schema for the incoming data, specifying data types and nullability. Using a distributed processing framework like Apache Spark or AWS Glue, I would read the CSV into a DataFrame and apply schema validation. Rows that fail validation (e.g., null or non‑numeric price) are flagged as bad. I would then write the good rows to the target location (e.g., a data lake or database) and the bad rows to a separate "bad records" bucket for audit or reprocessing. This separation can be automated with a Glue job that writes to two S3 prefixes or two tables. Additionally, I would set up monitoring (CloudWatch metrics or Glue job logs) to alert on the volume of bad records, and optionally trigger a Lambda to notify the data owner. Complexity is linear in the number of rows, O(n), and the approach ensures data quality before downstream valuation calculations.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500