Did you ensure that re‑running a job does not create duplicate records?

Question

Assisting AI · Accepted Answer

Yes, I implement idempotency in batch jobs by using a combination of unique identifiers and upsert logic. First, each record in the source system has a natural key (e.g., customer_id) or a surrogate key that remains stable across runs. In the ETL process, I generate a hash of the key and the record’s payload to detect changes. When loading into the target database, I use an UPSERT (INSERT … ON CONFLICT UPDATE) statement in PostgreSQL or MERGE in SQL Server. This ensures that if the record already exists, it is updated rather than duplicated. Additionally, I maintain a job metadata table that records the last successful run timestamp and a checksum of the processed batch. Before processing a new batch, the job checks this table; if the checksum matches, it skips the batch. These techniques guarantee that re‑running the job, whether due to a failure or a manual trigger, will not create duplicate records while still allowing incremental updates.

Did you ensure that re‑running a job does not create duplicate records?

💡 Model Answer

🎤 Get questions like this answered in real-time