[Ultimate]{class="badge positive"}
Snowflake streaming source
-
The Snowflake streaming source is available in the API to users who have purchased Real-Time CDP Ultimate.
-
You can now use the Snowflake streaming source when running ÃÛ¶¹ÊÓÆµ Experience Platform on Amazon Web Services (AWS). Experience Platform running on AWS is currently available to a limited number of customers. To learn more about the supported Experience Platform infrastructure, see the Experience Platform multi-cloud overview.
ÃÛ¶¹ÊÓÆµ Experience Platform allows data to be ingested from external sources while providing you with the ability to structure, label, and enhance incoming data using Experience Platform services. You can ingest data from a variety of sources such as ÃÛ¶¹ÊÓÆµ applications, cloud-based storage, databases, and many others.
Experience Platform provides support for streaming data from a Snowflake database.
Understanding the Snowflake streaming source
The Snowflake streaming source works by having data loaded by periodically executing an SQL query and creating an output record for each row in the resulting set.
By using Kafka Connect, the Snowflake streaming source tracks the latest record that it receives from each table, so that it can start in the correct location for the next iteration. The source uses this functionality to filter data and only get the updated rows from a table on each iteration.
Prerequisites
The following section outlines prerequisite steps to complete before you can stream data from your Snowflake database to Experience Platform:
IP address allowlist
You must add region-specific IP addresses to your allowlist prior to connecting your sources to Experience Platform. For more information, read the guide on allowlisting IP addresses to connect to Experience Platform for more information.
The documentation below provides information on how to connect Amazon Redshift to Experience Platform using APIs or the user interface:
Gather required credentials
In order for Flow Service to connect with Snowflake, you must provide the following connection properties:
table 0-row-2 1-row-2 2-row-2 3-row-2 4-row-2 5-row-2 6-row-2 7-row-2 | |
---|---|
Credential | Description |
account |
The full account identifier (account name or account locator) of your Snowflake account appended with the suffix
For more information, read the . |
warehouse |
The Snowflake warehouse manages the query execution process for the application. Each Snowflake warehouse is independent from one another and must be accessed individually when bringing data over to Experience Platform. |
database |
The Snowflake database contains the data you want to bring the Experience Platform. |
username |
The username for the Snowflake account. |
password |
The password for the Snowflake user account. |
role |
(Optional) A custom-defined role that can be provided for a user, for a given connection. If unprovided, this value defaults to public . |
connectionSpec.id |
The connection specification returns a source’s connector properties, including authentication specifications related to creating the base and source connections. The connection specification ID for Snowflake is 51ae16c2-bdad-42fd-9fce-8d5dfddaf140 . |
To use key-pair authentication, you must generate a 2048-bit RSA key pair and then provide the following values when creating an account for your Snowflake source.
table 0-row-2 1-row-2 2-row-2 3-row-2 4-row-2 5-row-2 6-row-2 | |
---|---|
Credential | Description |
account |
An account name uniquely identifies an account within your organization. In this case, you must uniquely identify an account across different Snowflake organizations. To do this, you must prepend your organization name to the account name. For example: orgname-account_name . Read the guide on retrieving your Snowflake account identifier for additional guidance. For more information, refer to the . |
username |
The username of your Snowflake account. |
privateKey |
The Base64-encoded private key of your Snowflake account. You can generate either encrypted or unencrypted private keys. If you are using an encrypted private key, then you must also provide a private key passphrase when authenticating against Experience Platform. Read the guide on retrieving your Snowflake private key for more information. |
passphrase |
The passphrase is an additional layer of security that you must use when authenticating with an encrypted private key. You are not required to provide the passphrase if you are using an unencrypted private key. |
database |
The Snowflake database that contains the data you want to ingest to Experience Platform. |
warehouse |
The Snowflake warehouse manages the query execution process for the application. Each Snowflake warehouse is independent from one another and must be accessed individually when bringing data over to Experience Platform. |
For more information about these values, refer the .
Retrieve your account identifier retrieve-your-account-identifier
To authenticate your Snowflake instance with Experience Platform, you need to obtain your account identifier from the Snowflake UI dashboard.
Follow these steps to find your account identifier:
- Navigate to your account on the .
- In the left navigation, select Accounts, followed by Active Accounts from the header.
- Next, select the information icon and then select and copy the domain name of the current URL.
Retrieve your private key retrieve-your-private-key
If you plan to use key-pair authentication for your Snowflake connection, you need to generate a private key before connecting to Experience Platform.
To generate your encrypted Snowflake private key, run the following command on your terminal:
code language-shell |
---|
|
If successful, you should receive your private key in PEM format.
code language-shell |
---|
|
To generate your unencrypted Snowflake private key, run the following command on your terminal:
code language-shell |
---|
|
If successful, you should receive your private key in PEM format.
code language-shell |
---|
|
After generating your private key, encode it directly in Base64 without making any changes to its format or content. Before encoding, make sure there are no extra spaces or blank lines (including trailing newlines) at the end of the private key.
Verify configurations
Before you can create a source connection for your Snowflake data, you must also ensure that the following configurations are met:
- The default warehouse assigned to a given user must be the same as the warehouse that you input when authenticating to Experience Platform.
- The default role assigned to a given user must have access to the same database that you input when authenticating to Experience Platform.
To verify your role and warehouse:
- Select Admin on the left navigation and then select Users & Roles.
- Select the appropriate user and then select the ellipses (
...
) on the top-right corner. - In the Edit user window that appears, navigate to Default Role to view the role associated with the given user.
- In the same window, navigate to Default Warehouse to view the warehouse associated with the given user.
Once successfully encoded, you may then used that Base64-encoded private key on Experience Platform to authenticate your Snowflake account.
Configure role settings configure-role-settings
You must configure privileges to a role, even if the default public role is assigned, to allow your source connection to access the relevant Snowflake database, schema, and table. The various privileges for different Snowflake entities is as follows:
For more information on role and privilege management, refer to the .
Limitations and frequently asked questions limitations-and-frequently-asked-questions
-
The data throughput for the Snowflake source is 2000 records per second.
-
Pricing can vary depending on the amount of time that a warehouse is active and the size of the warehouse. For the Snowflake source integration, the smallest size, x-small warehouse is sufficient. It is suggested to enable auto-suspend so that the warehouse can suspend on its own when not in use.
-
The Snowflake source polls the database for new data every 10 seconds.
-
Configuration options:
-
You can enable a
backfill
boolean flag for your Snowflake source when creating a source connection.- If backfill is set to true, then the value for timestamp.initial is set to 0. This means that data with a timestamp column greater than 0 epoch time are fetched.
- If backfill is set to false, then the value for timestamp.initial is set to -1. This means that data with a timestamp column greater than the current time (the time in which the source begins ingesting) are fetched.
-
The timestamp column should be formatted as type:
TIMESTAMP_LTZ
orTIMESTAMP_NTZ
. If the timestamp column is set toTIMESTAMP_NTZ
, then the corresponding timezone in which the values are stored should be passed via thetimezoneValue
parameter. If unprovided, the value will default to UTC.TIMESTAMP_TZ
cannot be used a timestamp column or in a mapping.
-
Next steps
The following tutorial provides steps on how to connect your Snowflake streaming source to Experience Platform using the API: