Give PySpark code to calculate the friend acceptance rate for each date when friend requests were sent. A request is sent if action = 'sent' and accepted if action = 'accepted'. If a request is not accepted, there is no record of it being accepted in the table. The output should only include dates where requests were sent and at least one of them was accepted (acceptance can occur on any date after the request is sent).
💡 Model Answer
To compute the acceptance rate per date, first read the data into a DataFrame. Group by the date of the sent action, count total sent requests, and count accepted requests that correspond to those sent requests. Since an accepted record may appear on a later date, we need to join the sent and accepted actions on the sender and receiver pair.
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, when
spark = SparkSession.builder.getOrCreate()
# Assume df has columns: user_id_sender, user_id_receiver, date, action
# Separate sent and accepted actions
sent = df.filter(col("action") == "sent")
accepted = df.filter(col("action") == "accepted")
# Count sent per date
sent_counts = sent.groupBy("date").agg(count("user_id_sender").alias("sent_count"))
# Join sent with accepted on sender/receiver to find which sent requests were accepted
accepted_sent = sent.join(accepted, ["user_id_sender", "user_id_receiver"], "inner")
# Count accepted per sent date
accepted_counts = accepted_sent.groupBy("sent.date").agg(count("user_id_sender").alias("accepted_count"))
# Combine counts and compute rate
result = sent_counts.join(accepted_counts, sent_counts.date == accepted_counts["sent.date"], "inner")
result = result.withColumn("acceptance_rate", col("accepted_count") / col("sent_count"))
# Select desired columns
final = result.select(col("date"), "sent_count", "accepted_count", "acceptance_rate")
final.show()Complexity: Each filter, join, and aggregation is linear in the number of rows (O(n)). The join is on two keys, so the cost is O(n) with a shuffle. The final result contains one row per date where at least one request was sent and accepted. This solution scales to large datasets typical in Spark environments.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500