HomeInterview QuestionsWhy does coalesce not perform a full shuffle?

Why does coalesce not perform a full shuffle?

🟢 Easy Conceptual Junior level
1Times asked
May 2026Last seen
May 2026First seen

💡 Model Answer

In Apache Spark, the coalesce transformation is used to reduce the number of partitions in an RDD or DataFrame. Unlike repartition, coalesce does not trigger a full data shuffle across the cluster. Instead, it collapses existing partitions by moving data only within the same executor or a small set of executors. This is achieved by selecting a subset of the original partitions and merging their data locally, which avoids the expensive network I/O and data movement that a shuffle requires. Coalesce is therefore efficient for decreasing partition count when the data is already roughly evenly distributed and you want to avoid the overhead of a full shuffle. However, if the data is skewed or you need a more balanced partitioning, you should use repartition, which performs a full shuffle to redistribute data evenly across new partitions.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500