This guide explains RudderStack’s data retention policy and your options for opting in or out of data storage.
RudderStack's default policy is not to store your event data, but there are times when it may be useful for you to access or replay recent event data. Refer to the Event Replay guide for more details on this feature.
RudderStack provides 3 options for retaining your event data:
- Do not store event data
- Store in your own cloud storage (Recommended)
- Store in RudderStack Cloud on a rolling 30-day basis
The following sections define different types of RudderStack data and provide steps on opting in to the right setup for your needs.
Data definitions
RudderStack does not permanently store any customer data except the following:
- Aggregate “Count” data on Event Name, Event Type, Source ID, Destination ID.
- Error codes.
- Customer user records (for example, usernames, billing-related details).
All other customer data can be classified as either transient or non-transient and it may either be stored in your location, for example, AWS, or by RudderStack for upto 30 days.
Transient customer data
Transient customer data can be defined as follows:
- All data that is in transit, that is, stored for less than 3 hours, as an essential part of delivering the RudderStack product experience.
- Data plane: Events that hit the RudderStack gateway. Refer to the data plane architecture for more details.
- Control plane: The in-transit data captured in the Live Events tab of the RudderStack dashboard.
Non-transient customer data
Non-transient customer data can be defined as follows:
- Data that can persist for more than 3 hours only if configured by the RudderStack user.
- Data plane: Processing errors, gateway dumps.
- Control plane: Data in the reporting service (sample events, sample responses).
Data retention options
RudderStack provides 3 options for your event data storage. To choose how you want to store the event data, follow these steps:
- Log into your RudderStack dashboard.
- Go to Settings > Data Management.
- Choose one of the 3 data storage options in the Data retention section:
The following sections explain the data retention options in detail.
1. Do not store event data
If you choose this option, RudderStack will not store any of your event data. This is the default setting.
2. Store event data in your own cloud storage (Recommended)
This is the recommended event storage option, and available in the Starter, Growth, and Enterprise plans. Selecting this option will bring up a modal allowing you to connect a storage bucket with your RudderStack data.
When connecting your cloud storage provider to RudderStack, you will first need to create a storage bucket and configure the credentials for RudderStack to access the datastore. Follow the steps listed below depending on your cloud provider:
- Create your object storage bucket.
- Configure the relevant permissions for your bucket.
- Connect your storage provider in the RudderStack dashboard.
- Create your object storage bucket.
- Configure the relevant permissions for your bucket.
- Connect your storage provider in the RudderStack dashboard.
- Login to the Azure portal and create a storage account.
- Click Containers under Blob service and create a new container.
- Connect your storage provider in the RudderStack dashboard.
- Login to your MinIO service and set up your bucket.
- Connect your storage provider in the RudderStack dashboard.
3. Store event data in RudderStack cloud storage
Choosing this option allows RudderStack to store and delete your event data on a rolling 30-day basis.
Sample event data
When the Sample event data setting is enabled, RudderStack stores and deletes sample events and responses on a rolling 30-day basis. This data may be helpful for debugging your events.
Plan-based options
Based on your plan, RudderStack provides different options for event storage, giving you the ability to enable or disable retention for the following kinds of data:
- Sample events and responses: As mentioned above, RudderStack will store and delete sample events and responses on a rolling 30-day basis.
- Processing errors: These correspond to the events that are rejected at various stages of the data pipeline, including errors from user transformation, destination transformation (internal to RudderStack), and events rejected by the destination after three hours of retry attempts.
- Gateway dumps: These correspond to the raw data for every successfully-ingested event.
Refer to the below table for the storage items supported by different RudderStack pricing plans:
Storage options | Free tier | Starter/Growth | Enterprise |
---|---|---|---|
Sample events/responses | ✅ | ✅ | ✅ |
Processing errors | ❌ | ✅ | ✅ |
Gateway dumps | ❌ | ❌ | ✅ |
Contact us
For more information on the topics covered on this page, email us or start a conversation in our Slack community.