Count occurrences of each event type in real time, write the result to a BigQuery table, and handle connection and streaming errors.
💡 Model Answer
Start by creating a Spark Structured Streaming job that reads from Kafka. Parse the JSON payload to extract the event_type field. Use groupBy("event_type").count() to get per‑event counts. Write the aggregated DataFrame to BigQuery using the BigQuery connector inside a foreachBatch function. Inside that function, wrap the write call in a try/except block; on exception, log the error and optionally retry a few times with a short delay. Set a checkpoint location so that Spark can resume from the last successful batch if the stream stops. This approach ensures real‑time counting, persistence to BigQuery, and graceful handling of transient errors.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500