How can you compute a cumulative sum (running balance) ordered by date using an unbounded preceding window specification in a Spark DataFrame?
1
Times asked
Mar 2026
Last seen
Mar 2026
First seen
💡 Model Answer
In Spark SQL or DataFrame API you can define a window that orders by the date column and spans from the first row to the current row. Example:
python
from pyspark.sql import Window
from pyspark.sql.functions import sum as _sum
w = Window.partitionBy().orderBy('date').rowsBetween(Window.unboundedPreceding, Window.currentRow)
df_with_balance = df.withColumn('running_balance', _sum('amount').over(w))This creates a new column running_balance that contains the cumulative sum of amount up to each row, sorted by date. The rowsBetween clause with Window.unboundedPreceding ensures the window includes all preceding rows, giving a running total. The result can then be displayed or written to storage.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500