Redshift Bulk Loader transform Icon Redshift Bulk Loader

Description

The Redshift Bulk Loader transform loads data from Apache Hop to AWS Redshift using the COPY command.

make sure your target Redshift table has a layout that is compatible with Parquet data types, e.g. use int8 instead of int4 data types.

Supported Engines

Hop Engine

Supported

Spark

Maybe Supported

Flink

Maybe Supported

Dataflow

Maybe Supported

The Redshift Bulk Loader is linked to the database type. It will fetch the JDBC driver from the hop/lib/jdbc folder.

+

General Options

Option Description

Transform name

Name of the transform.

Connection

Name of the database connection on which the target table resides.

Target schema

The name of the target schema to write data to.

Target table

The name of the target table to write data to.

AWS Authentication

choose which authentication method to use with the COPY command. Supported options are AWS Credentials and IAM Role.

Use AWS system variables

(Credentials only!) pick up the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values from your operating system’s environment variables.

AWS_ACCESS_KEY_ID

(if Credentials is selected and Use AWS system variables is unchecked) specify a value or variable for your AWS_ACCESS_KEY_ID.

AWS_SECRET_ACCESS_KEY

(if Credentials is selected and Use AWS system variables is unchecked) specify a value or variable for your AWS_SECRET_ACCESS_KEY.

IAM Role

(if IAM Role is selected) specify the IAM Role to use, in the syntax arn:aws:iam::<aws-account-id>:role/<role-name>

Truncate table

Truncate the target table before loading data.

Truncate on first row

Truncate the target table before loading data, but only when a first data row is received (will not truncate when a pipeline runs an empty stream (0 rows)).

Specify database fields

Specify the database and stream fields mapping

Main Options

Option Description

Stream to S3 CSV

write the current pipeline stream to a CSV file in an S3 bucket before performing the COPY load.

Load from existing file

do not stream the contents of the current pipeline, but perform the COPY load from a pre-existing file in S3. Suppoorted formats are CSV (comma delimited) and Parquet.

Copy into Redshift from existing file

path to the file in S3 to COPY load the data from.

Database fields

Map the current stream fields to the Redshift table’s columns.