HomeInterview QuestionsWrite a Python function using Apache Spark Structu…

Write a Python function using Apache Spark Structured Streaming that reads messages from a Kafka topic.

🟡 Medium Coding Junior level
1Times asked
Jun 2026Last seen
Jun 2026First seen

💡 Model Answer

Define a function read_kafka_stream(spark, topic, bootstrap_servers) that returns a streaming DataFrame. Inside, use spark.readStream.format("kafka") with options kafka.bootstrap.servers, subscribe, and startingOffsets="latest". Cast the binary value column to string and optionally parse JSON. Return the DataFrame so it can be further transformed or written. Example:

def read_kafka_stream(spark, topic, bootstrap_servers):
    df = (spark.readStream
          .format("kafka")
          .option("kafka.bootstrap.servers", bootstrap_servers)
          .option("subscribe", topic)
          .option("startingOffsets", "latest")
          .load())
    return df.selectExpr("CAST(value AS STRING) as json")

This function can be used in a larger streaming pipeline.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500