Home › Interview Questions › What is the difference between repartition and coa…

What is the difference between repartition and coalesce queries?

🟡 Medium Conceptual Junior level
1Times asked
May 2026Last seen
May 2026First seen

💡 Model Answer

In Spark, both repartition and coalesce change the number of partitions in a DataFrame or RDD, but they differ in how they redistribute data. repartition(n) performs a full shuffle, creating n partitions by randomly redistributing all rows across the cluster. This is useful when you need a balanced partitioning or a specific number of partitions for parallelism. coalesce(n) reduces the number of partitions without a full shuffle; it simply collapses existing partitions, which is efficient when decreasing partitions but can lead to uneven data distribution. If you need to increase partitions or require balanced data, use repartition. If you are only reducing partitions and can tolerate some skew, use coalesce for better performance.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500