Amazon Redshift is the world's fastest cloud data warehouse. It allows you to handle large analytical workloads with best-in-class performance, speed, and efficiency. With Redshift, you don't have to worry about the scale of your data or the cost of running queries on them.
Setting up a Redshift cluster
Before adding Redshift as a destination in RudderStack, it is recommended that you create a new Redshift cluster depending on the type of instance needed.
The following sections contain step-by-step instructions on setting up a Redshift cluster.
Choosing the Redshift instance type
Amazon Redshift provides two types of clusters: Dense Compute and Dense Storage clusters.
- Dense Compute clusters maximize CPU usage, resulting in an increased query performance. However, there is a trade-off with respect to the storage.
- Dense Storage clusters maximize storage for customers with hundreds of millions of rows of data. However, there is a trade-off in the CPU usage, resulting in a lower query performance.
Creating a new Redshift cluster
Follow the steps below to create a new Redshift cluster:
- Open the Redshift Console as shown:
- Click the Create Cluster option, as shown:
- Enter the cluster details. First, fill in the Cluster identifier and choose the instance type, as shown:
- Enter the number of nodes for your cluster. This will primarily depend on the amount of data you expect to work with.
- Enter the database name, and create the admin user with the name of your choice.
- Finish creating the cluster by allowing the default options for Additional Configurations.
With the Redshift cluster now created and ready to use, the next sections cover the necessary steps to set up the necessary user permissions and set up Redshift as a destination in RudderStack.
Setting user permissions in Redshift
This section contains the steps to create a new user to access the Redshift cluster and create temporary tables in it.
- Click the Editor option in the left pane. You can run the queries to create a new user to access the Redshift cluster in the Query editor, as shown:
- The queries to create a new user are listed below:
-- create a user named "rudder" RudderStack can use to access RedshiftCREATE USER rudder PASSWORD '<password goes here>';
-- granting schema creation permission to the "rudder" user on the database you chose earlierGRANT CREATE ON DATABASE "<database name goes here>" TO "rudder";
- Log into the Redshift cluster with the newly created user credentials.
Setting up network and security access
IPs to be allowlisted
To enable network access to RudderStack, allowlist the following RudderStack IPs depending on your region and RudderStack Cloud plan:
Plan | ||
---|---|---|
Free, Starter, and Growth |
|
|
Enterprise |
|
|
Adding a security group
Follow these steps to add a security group and assign it to your Redshift cluster:
- Go to EC2 from the services on your AWS console, as shown:
- Go to Security Groups under Network & Security, followed by Create Security Group.
- Enter the details of the security group. The Security group name will be used to select the group later.
- Add an Inbound rule with IPs listed above, and enter the Redshift port as
5439
in the Port range field, as shown:
- Next, go to the Redshift cluster and select Properties, where you can modify the network and security rules of the cluster.
- Edit the Network and security option and choose the VPC security group that you selected earlier.
- Finally, click Modify cluster to finish the Network and Security setup.
Configuring Redshift destination in RudderStack
To send event data to Redshift, you first need to add it as a destination in RudderStack and connect it to your data source. Once the destination is enabled, events will automatically start flowing to Redshift via RudderStack.
To configure Redshift as a destination in RudderStack, follow these steps:
- In your RudderStack dashboard, set up the data source. Then, select Redshift from the list of destinations.
- Assign a name to your destination and then click Next.
Connection settings
- Host: The host name of your Redshift service.
- Port: The port number associated with the Redshift database instance.
- Database: The database name in your Redshift instance where the data will be sent.
- User: The name of the user with the required read/write access to the above database.
- Password: The password for the above user.
- Namespace: Enter the schema name where RudderStack will create all the tables. If you don't specify any namespace, RudderStack will set this to the source name, by default.
SSH connection
SSH tunneling is a method of transferring data over an encrypted SSH connection. You can use it to add encryption to your legacy applications and achieve compliance with regulations like HIPAA, PCI-DSS, etc., without having to modify the existing applications.
RudderStack lets you connect to your Redshift database securely over an SSH connection by configuring these settings:
- SSH Connection: Enable this setting to use the SSH connection while connecting to your Redshift database.
- SSH Host: Enter the IP address of your bastion host.
- SSH Port: Enter the port for the above host.
- SSH User: Enter the username you use to access the bastion host.
- SSH Public Key: Copy the public key provided in this field and add it to the
authorized_keys
file on your bastion host. Rudderstack will use the private key corresponding to this public key to establish the connection successully.
To enable the SSH connection for an existing Redshift destination, navigate to the destination's Configuration tab, select Edit configuration and enable the SSH connection setting.
Sync settings
- Sync Frequency: Specify how often RudderStack should sync the data to your Redshift database.
- Sync Starting At: This optional setting lets you specify the particular time of the day (in UTC) when you want RudderStack to sync the data to the warehouse.
- Exclude Window: This optional setting lets you set a time window when RudderStack will not sync the data to your database.
- JSON Columns: Use this optional setting to specify the required JSON column paths in dot notation, separated by commas. This option applies to all the incoming
track
events for this destination.
Configuring the object storage
RudderStack lets you configure the following object storage configuration settings while setting up your Azure Synapse destination:
- Use RudderStack-managed object storage: Enable this setting to use RudderStack-managed buckets for object storage.
- Staging S3 Storage Bucket Name: If Use RudderStack-managed object storage is disabled in the dashboard, enter the relevant Amazon S3 bucket storage settings.
S3 Permissions
The alternative to providing AWS credentials for S3 access, is to set up permissions for your bucket as specified in the following section:
RudderStack-hosted data plane
You need to edit your bucket policy to allow RudderStack to write to your bucket with the following JSON:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::422074288268:user/s3-copy" }, "Action": [ "s3:GetObject", "s3:PutObject", "s3:PutObjectAcl", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::YOUR_BUCKET_NAME/*", "arn:aws:s3:::YOUR_BUCKET_NAME" ] } ]}
Self-hosted data plane
- Create an IAM policy with the following JSON:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "*", "Resource": "arn:aws:s3:::*" } ]}
- Create an IAM user with programmatic access keys and attach the above created IAM policy. Copy the ARN of this user.
- Edit your bucket policy to allow the data plane to write to your bucket with the following JSON. Make sure you edit the account id and user ARN with your AWS Account ID and the above created user ARN:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::ACCOUNT_ID:user/USER_ARN" }, "Action": [ "s3:GetObject", "s3:PutObject", "s3:PutObjectAcl", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::YOUR_BUCKET_NAME/*", "arn:aws:s3:::YOUR_BUCKET_NAME" ] } ]}
- Finally, add the programmatic access credentials to the environment of your data plane, as shown:
RUDDER_AWS_S3_COPY_USER_ACCESS_KEY_ID=<above created user access key>RUDDER_AWS_S3_COPY_USER_ACCESS_KEY=<above created user access key secret>
Column compression encoding
Compression encoding specifies the type of compression applied to a column of data values as rows are added to a table.
If not specified, Redshift automatically assigns compression encoding. RudderStack explicitly sets the runlength encoding for Boolean columns.
FAQ
How are reserved words handled by RudderStack?
There are some limitations when it comes to using reserved words in a schema, table, or column names. If such words are used in event names, traits or properties, they will be prefixed with a _
when RudderStack creates tables or columns for them in your schema.
Besides, integers are not allowed at the start of the schema or table name. Hence, such schema, column or table names will be prefixed with a _
.
For instance, '25dollarpurchase
' will be changed to '_25dollarpurchase
'.
Contact us
For more information on the topics covered on this page, email us or start a conversation in our Slack community.