Explain the COPY command in Amazon Redshift.

Question

Assisting AI · Accepted Answer

The COPY command is a bulk data loading utility in Amazon Redshift that imports data from external sources such as Amazon S3, DynamoDB, or EMR into a Redshift table. It is designed for high throughput and parallelism, automatically distributing data across slices. Key features include:

1. Parallel loading: Redshift reads multiple files or partitions simultaneously.
2. Format support: CSV, JSON, Avro, Parquet, ORC, etc.
3. Compression handling: Supports gzip, bzip2, lzop.
4. Security: Uses IAM roles or temporary credentials.
5. Error handling: Options like ACCEPTINVCHARS, MAXERROR, and TRUNCATECOLUMNS.

Typical syntax:
COPY table_name
FROM 's3://bucket/path'
IAM_ROLE 'arn:aws:iam::account-id:role/RedshiftCopyRole'
FORMAT AS CSV
DELIMITER ','
IGNOREHEADER 1;

The command is preferred over INSERT for large datasets because it reduces network traffic, leverages Redshift’s parallel architecture, and provides robust error handling.

Explain the COPY command in Amazon Redshift.

💡 Model Answer

🎤 Get questions like this answered in real-time