HomeInterview QuestionsSpark, Window Functions, Dataframes

How can you compute a cumulative sum (running balance) ordered by date using an unbounded preceding window specification in a Spark DataFrame?

🟡 Medium Conceptual Junior level
1 Times asked
Mar 2026 Last seen
Mar 2026 First seen

💡 Model Answer

In Spark SQL or DataFrame API you can define a window that orders by the date column and spans from the first row to the current row. Example:

python
from pyspark.sql import Window
from pyspark.sql.functions import sum as _sum

w = Window.partitionBy().orderBy('date').rowsBetween(Window.unboundedPreceding, Window.currentRow)

df_with_balance = df.withColumn('running_balance', _sum('amount').over(w))

This creates a new column running_balance that contains the cumulative sum of amount up to each row, sorted by date. The rowsBetween clause with Window.unboundedPreceding ensures the window includes all preceding rows, giving a running total. The result can then be displayed or written to storage.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500