Send Cohorts to Amazon S3

Set up this integration to send cohorts to an Amazon S3 bucket. This enables you to export groups of users out of Amplitude and synchronize them with other databases or stored procedures you've built off your Amazon S3 bucket. From there, you can use Amplitude cohorts in internal analytics dashboards and internal personalization engines.

Other Amplitude + Amazon S3 Integrations

This integration sends Amplitude cohorts to Amazon S3. Amplitude offers two other integrations with Amazon S3:

Setup¶

Prerequisites¶

From your Amazon S3 console, find the S3 bucket you would like Amplitude to sync with. Copy its name, path, and region.

Amplitude setup¶

In Amplitude Data, click Catalog and select the Destinations tab.
In the Cohort section, click Amazon S3 (Cohorts). Don't select Amazon S3 for this integration.
Enter the bucket name, select a region, enter a bucket path (optional), and enter a name for the destination. The name is used when syncing a cohort from Amplitude.
Select the Amplitude user property to match users between Amazon S3 and Amplitude.
Click Copy Bucket Policy.

Amazon S3 setup¶

In the Amazon S3 console, go to the S3 bucket and navigate to Permissions → Bucket Policy. Paste the Amplitude bucket policy into the Amazon S3 console.
Optionally, you can also set the following two parameters for your buckets: - Require suffix: When set, this allows users to append a string at the end of every file exported to S3. - User property: You can select a single user property to sync along with each user as an extra column in each file exported.

Send a cohort¶

After you connect the S3 bucket to Amplitude, you can sync any cohort to that bucket. To do so, follow these steps:

From the Cohorts page in Amplitude, click the cohort to send, or create a cohort.
Click Sync.
Select Amazon S3, then click Next.
Select the S3 location. This is what you named the bucket when setting up the integration.
(Optional). Set the following two optional parameters:
- User Property: Here, you can append a user property to each user exported in this cohort. The user property appears as a column in the exported CSV file.
- Routing Key: Enter a string to be appended to the end of the cohort file name in S3.
Choose a sync cadence.
When finished, click Sync.

Cohorts in S3¶

Your cohort is synced as a CSV to the bucket you specified. Within the folder, there is a list of CSV files.

Each sync generates three CSV files:

One with users who entered the cohort since the last sync.
One with users who exited the cohort since the last sync.
One containing the users that existed in the cohort at the time of the last sync. This way, you always have a complete historical log of S3 cohort membership.

The CSV files all use this naming convention:

path/integer_cohortID_YYYY-MM-DDTHH:SS_difftype_routingkey.csv

Where:

path : The optional folder prefix on the path the file should be written to.
cohortID : The unique identifier for your cohort. You can find this number in the URL of your cohort in Amplitude.
YYYY-MM-DDTHH-SS : The timestamp when the cohort was synced.
difftype : This describes which of the three user groups the CSV file contains. Acceptable values are entering , exiting , or existing.
routingkey: The optional string suffix entered before.

The timestamp in the CSV name refers to the day/time the cohort was synced. If you have an hourly/daily scheduled sync, Amplitude creates a new file for every sync with the full list of users who qualify in that cohort at that time. You can keep a historic log of audience membership.

Each CSV file contains a list of users, with data broken into the following columns:

amplitudeID: The internal amplitude identifier for the user.
userID: Your unique database identifier for the user.
userProperty: The value for a user property you added in step 3 of the exporting cohorts section; there is one column for each user property. In portfolio projects, there is a separate column for each source app.