Write a Python function using Apache Spark Structured Streaming that reads messages from a Kafka topic.
1Times asked
Jun 2026Last seen
Jun 2026First seen
💡 Model Answer
Define a function read_kafka_stream(spark, topic, bootstrap_servers) that returns a streaming DataFrame. Inside, use spark.readStream.format("kafka") with options kafka.bootstrap.servers, subscribe, and startingOffsets="latest". Cast the binary value column to string and optionally parse JSON. Return the DataFrame so it can be further transformed or written. Example:
def read_kafka_stream(spark, topic, bootstrap_servers):
df = (spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", bootstrap_servers)
.option("subscribe", topic)
.option("startingOffsets", "latest")
.load())
return df.selectExpr("CAST(value AS STRING) as json")This function can be used in a larger streaming pipeline.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500