Currently data is present on S3; how would you move it into Redshift?

Question

Assisting AI · Accepted Answer

I would use the COPY command because it is the fastest and most cost‑effective way to load data from S3 into Redshift. The steps are:
1. Stage the files in an S3 bucket and ensure they are in a supported format (CSV, JSON, Parquet, ORC). If they are compressed, use gzip or Snappy.
2. Create an IAM role that grants Redshift read access to the bucket.
3. In Redshift, create a table that matches the schema of the data.
4. Run COPY with the appropriate options: `COPY table FROM 's3://bucket/path/' IAM_ROLE 'arn:aws:iam::account:role/RedshiftCopy' FORMAT AS CSV DELIMITER ',' IGNOREHEADER 1;`
5. Monitor the load using Redshift system tables and adjust parallelism or batch size if needed.
If the data requires transformation, I would first run an AWS Glue job to clean and transform the data, write the output back to S3, and then use COPY to load the transformed files.

Currently data is present on S3; how would you move it into Redshift?

💡 Model Answer

🎤 Get questions like this answered in real-time