HomeInterview QuestionsCloud storage shows that a table is consuming over…

Cloud storage shows that a table is consuming over 200 TB of space. What do you think is causing the problem and what would be your approach to optimize it?

🟡 Medium Conceptual Mid level
1Times asked
May 2026Last seen
May 2026First seen

💡 Model Answer

A table that consumes 200 TB is likely suffering from a combination of small‑file proliferation, inefficient partitioning, and lack of compaction. Small files increase metadata overhead, reduce read throughput, and inflate storage costs. Poor partitioning can cause many partitions to be scanned for a query, while no compaction means that every write creates a new file, leading to write amplification.

To optimize, first analyze the file size distribution and partition layout. Aim for 100–400 MB Parquet/ORC files; use a tool like Spark’s repartition or Delta Lake’s OPTIMIZE to merge small files. Re‑partition the table on a high‑cardinality, frequently queried column (e.g., date or region) to enable pruning. Enable data skipping by ensuring statistics are up‑to‑date. If the workload is append‑heavy, consider a merge‑on‑read format (Delta Lake, Iceberg) to reduce write amplification. Finally, schedule regular compaction jobs and monitor storage metrics to keep the table size in check.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500