Did you ensure that re‑running a job does not create duplicate records?
💡 Model Answer
Yes, I implement idempotency in batch jobs by using a combination of unique identifiers and upsert logic. First, each record in the source system has a natural key (e.g., customer_id) or a surrogate key that remains stable across runs. In the ETL process, I generate a hash of the key and the record’s payload to detect changes. When loading into the target database, I use an UPSERT (INSERT … ON CONFLICT UPDATE) statement in PostgreSQL or MERGE in SQL Server. This ensures that if the record already exists, it is updated rather than duplicated. Additionally, I maintain a job metadata table that records the last successful run timestamp and a checksum of the processed batch. Before processing a new batch, the job checks this table; if the checksum matches, it skips the batch. These techniques guarantee that re‑running the job, whether due to a failure or a manual trigger, will not create duplicate records while still allowing incremental updates.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500