Enable change data capture for source connections in the API
Change data capture in ÃÛ¶¹ÊÓÆµ Experience Platform sources is a capability that you can use to maintain real-time data synchronization between your source and destination systems.
Currently, Experience Platform supports incremental data copy, which ensures that newly created or updated records in the source system are periodically copied to the ingested datasets. This process relies on usage of the timestamp column, such as LastModified
in order to track changes and capture only the newly inserted or updated data. However, this method does not account for deleted records, which can lead to data inconsistencies over time.
With change data capture, a given flow captures and applies all changes, including inserts, updates, and deletes. Similarly, Experience Platform datasets remain fully synchronized with the source system.
You can use change data capture for the following sources:
Amazon S3
Ensure that _change_request_type
is present in the Amazon S3 file that you intend to ingest to Experience Platform. Additionally, you must ensure that the following valid values are included in the file:
u
: for inserts and updatesd
: for deletions.
If _change_request_type
is not present in your file, then the default value of u
will be used.
Read the following documentation for steps on how to enable change data capture for your Amazon S3 source connection:
Azure Blob
Ensure that _change_request_type
is present in the Azure Blob file that you intend to ingest to Experience Platform. Additionally, you must ensure that the following valid values are included in the file:
u
: for inserts and updatesd
: for deletions.
If _change_request_type
is not present in your file, then the default value of u
will be used.
Read the following documentation for steps on how to enable change data capture for your Azure Blob source connection:
Azure Databricks
You must enable change data feed in your Azure Databricks table in order to use change data capture in your source connection.
Use the following commands to explicitly enable the change data feed option in Azure Databricks
New table
To apply change data feed to a new table, you must set the table property delta.enableChangeDataFeed
to TRUE
in the CREATE TABLE
command.
CREATE TABLE student (id INT, name STRING, age INT) TBLPROPERTIES (delta.enableChangeDataFeed = true)
Existing table
To apply change data feed to an existing table, you must set the table property delta.enableChangeDataFeed
to TRUE
in the ALTER TABLE
command.
ALTER TABLE myDeltaTable SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
All new tables
To apply change data feed to all new tables, you must set your default properties to TRUE
.
set spark.databricks.delta.properties.defaults.enableChangeDataFeed = true;
For more information, read the .
Read the following documentation for steps on how to enable change data capture for your Azure Databricks source connection:
Data Landing Zone
You must enable change data feed in your Data Landing Zone table in order to use change data capture in your source connection.
Use the following commands to explicitly enable the change data feed option in Data Landing Zone.
Read the following documentation for steps on how to enable change data capture for your Data Landing Zone source connection:
Google BigQuery
To use change data capture in your Google BigQuery source connection. Navigate to your Google BigQuery page in the Google Cloud console and set enable_change_history
to TRUE
. This property enables change history for your data table.
For more information, read the guide on .
Read the following documentation for steps on how to enable change data capture for your Google BigQuery source connection:
Google Cloud Storage
Ensure that _change_request_type
is present in the Google Cloud Storage file that you intend to ingest to Experience Platform. Additionally, you must ensure that the following valid values are included in the file:
u
: for inserts and updatesd
: for deletions.
If _change_request_type
is not present in your file, then the default value of u
will be used.
Read the following documentation for steps on how to enable change data capture for your Google Cloud Storage source connection:
SFTP
Ensure that _change_request_type
is present in the SFTP file that you intend to ingest to Experience Platform. Additionally, you must ensure that the following valid values are included in the file:
u
: for inserts and updatesd
: for deletions.
If _change_request_type
is not present in your file, then the default value of u
will be used.
Read the following documentation for steps on how to enable change data capture for your SFTP source connection:
Snowflake
You must enable change tracking in your Snowflake tables in order to use change data capture in your source connections.
In Snowflake, enable change tracking by using the ALTER TABLE
and setting CHANGE_TRACKING
to TRUE
.
ALTER TABLE mytable SET CHANGE_TRACKING = TRUE
For more information, read the .
Read the following documentation for steps on how to enable change data capture for your Snowflake source connection: