Warehouse Connectors
Overview
Warehouse Connectors enable you to load Mixpanel with data from data warehouses (DWH) like BigQuery, Snowflake, Databricks, and Redshift. By unifying business data with product usage events, you can answer many more questions in Mixpanel:
- What percentage of our Enterprise revenue uses the features we shipped last year?
- Did our app redesign reduce support tickets?
- Which account demographics have the best retention?
- We spent $50,000 on a marketing campaign, did the users we acquire stick around a month later?
In this guide, we'll walk through how to set up Warehouse Connectors. The integration is completely codeless, but you will need someone with access to your DWH to help with the initial set up.
Getting Started
Step 1: Connect a warehouse
Navigate to Project Settings → Warehouse Sources (opens in a new tab). Select your warehouse and follow the instructions to connect it. Note: you only need to do this once.
Step 2: Load a warehouse table
Navigate to Project Settings → Warehouse Data (opens in a new tab) and click +Event Table.
Select a table (or view) representing an event from your warehouse and tell Mixpanel about the table. Once satisfied with the preview, click run and we’ll establish the sync. The initial load may take a few minutes depending on the size of the table, we show you progress as it’s happening.
🎉 Congrats, you’ve loaded your first warehouse table into Mixpanel! From this point onward, the table will be kept in sync with Mixpanel. You can now use this event throughout Mixpanel’s interface.
Warehouse-Specific Walkthroughs
The BigQuery connector works by giving a Mixpanel service account the permission to read/write from BigQuery in your GCP project. You will need your GCP Project ID, which you can find in the URL of Google Cloud Console (https://console.cloud.google.com/bigquery?project=YOUR_GCP_PROJECT (opens in a new tab)). You will also need the gcloud
CLI.
Table Types
Mixpanel’s Data Model consists of 4 types: Events, User Profiles, Group Profiles, and Lookup Tables. Each have properties, which are arbitrary JSON. Warehouse Connectors lets you turn any table or view in your warehouse into one of these 4 types of tables, provided they match the required schema.
Events
An event is something that happens at a point in time. It’s akin to a “fact” in dimensional modeling or a log in a database. Events have properties, which describe the event. Learn more about Events here.
Here’s an example table that illustrates what can be loaded as events in Mixpanel. The most important fields are the timestamp (when) and the user id (who) — everything else is optional.
Timestamp | User ID | Item | Brand | Amount | Type |
---|---|---|---|---|---|
2024-01-04 11:12:00 | alice@example.com | shoes | nike | 99.23 | in-store |
2024-01-12 11:12:00 | bob@example.com | socks | adidas | 4.56 | online |
Here are more details about the schema we expect for events:
Column | Required | Type | Description |
---|---|---|---|
Event Name | Yes | String | The name of the event. Eg: Purchase Completed or Support Ticket Filed. Note: you can specify this value statically, it doesn’t need to be a column in the table. |
Time | Yes | Timestamp | The time at which the event occurred. |
User ID | No | String or Integer | The unique identifier of the user who performed the event. Eg: 12345 or grace@example.com. |
Device ID | No | String or Integer | An identifier for anonymous users, useful for tracking pre-login data. Learn more here |
JSON Properties | No | JSON or Variant | A field that contains key-value properties in JSON format. If provided, Mixpanel will flatten this field out into properties. |
All other columns | No | Any | These can be anything. Mixpanel will auto-detect these columns and attach them to the event as properties. |
User Profiles
A User Profile is a table that describes your users. It’s akin to a “dimension” in dimensional modeling or a relational table in a database. Learn more about User Profiles here.
Here’s an example table that illustrates what can be loaded as user profiles in Mixpanel. The only important column is the User ID, which is the primary key of the table.
User ID | Name | Subscription Tier | |
---|---|---|---|
12345 | grace@example.com | Grace Hopper | Pro |
45678 | bob@example.com | Bob Noyce | Free |
Profile History is in beta. While Profiles typically only store the state of a user as of now, Profile History enables storing the state of a user over time. When creating a User Profile sync, set the Table Type to “History Table” — this will require you to supply a Start Time column in the sync configuration. Request beta access here (opens in a new tab).
Group Profiles
A Group Profile is a table that describes an entity (most often an Account, if you’re a B2B company). They are functionally identical to User Profiles, just used for other non-User entities. Group Profiles are only available if you have the Group Analytics add-on. Learn more about Group Analytics here.
Here’s an example table that illustrates what can be loaded as group profiles in Mixpanel. The only important column is the Group Key, which is the primary key of the table.
Group Key | Name | Domain | ARR | Subscription Tier |
---|---|---|---|---|
12345 | Notion | notion.so | 45000 | Enterprise |
45678 | Linear | linear.so | 2000 | Pro |
Lookup Tables
A Lookup Table is a table that describes an entity. It’s useful for enriching events with metadata about other concepts in your product, like content or skus. Learn more about Lookup Tables here (opens in a new tab).
Here’s an example table that illustrates what can be loaded as group profiles in Mixpanel. The only important column is the ID, which is the primary key of the table.
ID | Song Name | Artist | Genre |
---|---|---|---|
12345 | One Dance | Drake | Pop |
45678 | Voyager | Daft Punk | Electronic |
Sync Types
Warehouse Connectors continuously detect new data from your warehouse tables to load into Mixpanel. The Sync Type determines the method of detecting new rows:
Time-Based
Time-Based syncs require an Insert Time column in your table. Mixpanel remembers the maximum Insert Time it saw in the previous run of the sync and looks for only rows that have an Insert Time greater than that. This is useful and efficient for append-only tables (usually events) that have a column indicating when the data was appended.
Full
Full syncs periodically make a snapshot of the source table and sync it entirely to Mixpanel. If a row has new properties in your warehouse, the corresponding profile in Mixpanel will be overridden with those new properties. This mode is available for all tables except events.
Mirror
Mirror syncs leverage warehouse change-data-capture (CDC) capabilities to mirror insert, updates, and deletes from your warehouse to Mixpanel. It provides the efficiency of time-based syncs with the simplicity/reliability of full syncs and works for all table types.
Mirror is in beta for Snowflake only. Request access here (opens in a new tab).
FAQ
What tables are valuable to load into Mixpanel?
Anything that is event-based (has a user_id and timestamp) and that you want to analyze in Mixpanel. Examples, by data source are:
- CRM: Opportunity Created, Opportunity Closed
- Support: Ticket Created, Ticket Closed
- Billing: Subscription Created, Subscription Upgraded, Subscription Canceled, Payment Made
- Appplication Database: Signup, Purchased Item, Invited Teammate
We also recommend loading your user and account tables, to enrich events with demographic attributes about the users and accounts who performed them.
How fast do syncs run?
Syncs have a throughput of ~30K events/second or ~100M events/hour.
I already track data to Mixpanel via SDK or CDP, can I still use Warehouse Connectors?
Yes! You can send some events (eg: web and app data) directly via our SDKs and send other data (eg: user profiles from CRM or logs from your backend) from your warehouse and analyze them together in Mixpanel.
How much does Warehouse Connectors cost?
The events generated by Warehouse Connectors are billed identically to all other events you track to Mixpanel. Learn more about Mixpanel’s event-based billing here (opens in a new tab). Otherwise, there is no special cost for using Warehouse Connectors.
What will be the cost impact of this on my DWH?
In short, very little. There are 3 major aspects of DWH cost: storage, compute, and network egress.
- Storage: None. We don't store anything additional in your warehouse, so there are no extra storage costs.
- Compute: Low. We make highly efficient queries to extract data from your warehouse. To be extra careful, we recommend provisioning the smallest possible warehouse for use with Warehouse Connectors (eg: X-Small in Snowflake (opens in a new tab)). If syncs run slowly with this size, you can scale up to trade compute cost for speed.
- Network Egress: Low. We only extract rows that have changed and leverage gzip compression to reduce bytes over the wire. Assuming an egress rate of $0.08 per GB and 100 compressed bytes per event, this is a cost of less than $0.01 per million events.
How can I get help setting up a warehouse sync?
Reach out (opens in a new tab) to our team — we’re happy to walk you through the set up. If you bring a data engineer who has credentials to access your warehouse, it takes < 10 minutes to get up and running.
Was this page useful?