HomeInterview QuestionsI have some CSV files located in an S3 bucket. I w…

I have some CSV files located in an S3 bucket. I want to convert all CSV files to Parquet to save cost and improve performance. What approach should I use?

🟡 Medium Conceptual Junior level
1Times asked
Jul 2026Last seen
Jul 2026First seen

💡 Model Answer

To convert CSV files in S3 to Parquet efficiently, you can use AWS Glue or an EMR Spark job. With Glue, create a crawler to catalog the CSV tables, then write a Glue ETL script that reads the CSV, applies schema inference or a predefined schema, and writes the data back to S3 in Parquet format. Glue handles partitioning, compression (e.g., Snappy), and can run on a schedule or trigger. Alternatively, launch an EMR cluster with Spark, mount the S3 bucket, and run a simple Spark job: val df = spark.read.option("header", "true").csv("s3://bucket/input/"); df.write.mode("overwrite").parquet("s3://bucket/output/"). Spark automatically optimizes the write, and you can partition by a key to improve query performance. Both approaches reduce storage costs (Parquet is columnar and compressed) and speed up downstream analytics (e.g., Athena, Redshift Spectrum).

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500