HomeInterview QuestionsIs AWS Glue used for real‑time data batch processi…

Is AWS Glue used for real‑time data batch processing, and are we using the same Glue service for handling both batch and streaming workloads?

🟡 Medium Conceptual Mid level
1Times asked
Jun 2026Last seen
Jun 2026First seen

💡 Model Answer

AWS Glue is a fully managed extract, transform, load (ETL) service that supports both batch and streaming data processing. For batch workloads, you create Glue jobs that run on a schedule or on demand, reading data from sources such as S3, RDS, or Redshift, transforming it with Spark or Python, and writing the results back to a target. For real‑time or near‑real‑time processing, Glue offers Glue Streaming, which is built on Apache Spark Structured Streaming. It consumes data from Kinesis Data Streams, Kafka, or other streaming sources, applies transformations, and writes to destinations like S3, Redshift, or DynamoDB. Both batch and streaming jobs use the same Glue service, but they differ in job type, trigger, and underlying execution engine. The key advantage is a unified catalog (Glue Data Catalog) that stores metadata for all datasets, enabling consistent schema management across batch and streaming pipelines. In practice, you might use Glue batch jobs for nightly ETL and Glue Streaming for real‑time analytics, sharing the same catalog and IAM roles, but configuring separate job definitions and triggers.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500