Is there another scenario where a data engineering team updates a 50 TB ISO table often using copy‑on‑write?
💡 Model Answer
Copy‑on‑write (CoW) is efficient for append‑only workloads, but it becomes problematic when frequent updates or deletes are required on a large table. In a scenario where a 50 TB ISO table is updated often—say, daily incremental loads that modify a small fraction of rows—CoW will rewrite entire data files for each update, causing high storage usage, long write times, and increased I/O. This is especially true if the table is partitioned on a low‑cardinality column, forcing many partitions to be rewritten.
An alternative is to use a merge‑on‑read (MoR) or delta‑table approach. MoR stores updates as separate delta files and merges them lazily during reads, reducing write amplification. It also allows efficient point‑in‑time queries and time‑travel. For workloads with frequent small updates, MoR or a transactional engine like Delta Lake or Apache Iceberg can provide better performance and lower storage overhead.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500