Manage Experience Event Dataset Retention in the data lake using TTL
Efficient data management is critical for optimal performance, cost control, and data integrity. Use Experience Event Dataset Retention Time-To-Live (TTL) to enforce row-level expiration, automatically removing outdated records from datasets in the data lake while ensuring optimal storage efficiency and data relevance.
This guide explains how to evaluate, set, and manage TTL using the Catalog Service API. You’ll learn when and why to apply TTL, how to configure and update TTL values using API calls, and best practices to ensure effective implementation.
Why use TTL for row-level data management
As datasets grow, efficient data management becomes increasingly important to preserve performance, control costs, and keep data relevant. TTL-based row-level data expiration automates data cleanup by removing outdated records without manual intervention to help optimize storage and improve system efficiency.
TTL is useful when managing time-sensitive data that loses relevance over time. Consider implementing TTL if you need to:
- Reduce storage costs by automatically removing outdated records.
- Improve query performance by minimizing irrelevant data.
- Maintain data hygiene by keeping only relevant information.
- Optimize data retention to support business objectives.
Use TTL configurations to optimize storage based on entitlements. While Profile Store data (used in Real-Time CDP) may be considered stale and removed after 30 days, the same event data in the data lake can remain available for 12–13 months (or longer based on entitlement) for analytics and Data Distiller use cases.
Industry example industry-example
As an example, consider a video streaming service that tracks user interactions, such as video views, searches, and recommendations. While recent engagement data is crucial for personalization, older activity logs (for example, interactions from over a year ago) lose relevance. By using row-level expiration, Experience Platform automatically removes outdated logs, ensuring only current and meaningful data is used for analytics and recommendations.
Evaluate TTL suitability evaluate-ttl-suitability
Before applying a retention policy, assess whether your dataset is a good candidate for row-level expiration. Consider the following:
- Data relevance over time: Does older data provide value, or does it become obsolete?
- Impact on downstream processes: Will removing data affect reporting, analytics, or integrations?
- Storage cost versus retention value: Does the value of older data justify the cost of storing it?
If historical records are essential for long-term analysis or business operations, TTL may not be the right approach. Reviewing these factors ensures that TTL aligns with your data retention needs without negatively affecting data availability.
Best practices for setting TTL best-practices
Select the right TTL value to ensure your Experience Event Dataset Retention policy balances data retention, storage efficiency, and analytical needs. A TTL that is too short may cause data loss, while one that is too long can increase storage costs and unnecessary data accumulation. Ensure that the TTL aligns with your dataset’s purpose by considering how often the data is accessed and how long it remains relevant.
The table below provides common TTL recommendations based on dataset type and usage patterns:
Review TTL settings periodically to ensure they continue to align with your storage policies, analytical needs, and business requirements.
Key considerations when setting TTL key-considerations
Follow these best practices to ensure that TTL settings align with your data retention strategy:
- Audit TTL changes regularly. Every TTL update triggers an audit event. Use audit logs to track TTL modifications for compliance, data governance, and troubleshooting purposes.
- Disable TTL if data must be retained indefinitely. To disable TTL, set
ttlValue
tonull
. This prevents automatic expiration and retains all records permanently. Consider the storage implications before making this change.
Limitations of TTL limitations
Be aware of the following limitations when using TTL:
- Experience Event Dataset Retention using TTL applies to row-level expiration, not dataset deletion. TTL removes records based on a defined retention period but does not delete entire datasets. To remove a dataset, use the dataset expiration endpoint or manual deletion.
- TTL configuration remains active until explicitly disabled. The configuration remains in effect until you disable it. Disabling TTL stops expiration and ensures all records in the dataset are retained.
- TTL is not a compliance tool. While TTL optimizes storage and lifecycle management, you must implement broader governance strategies to ensure regulatory compliance.
Analyze dataset size and relevance before applying TTL analyze-dataset-size
Before applying TTL, use queries to analyze dataset size and relevance. Run targeted queries (such as counting records within specific date ranges) to preview the impact of various TTL values. Then use this information to choose an optimal retention period that balances data utility and cost-effectiveness.
Running targeted queries helps determine how much data would be retained or removed under different TTL configurations. For example, the following SQL query counts the number of records created within the last 30 days:
SELECT COUNT(1) FROM [datasetName] WHERE timestamp > date_sub(now(), INTERVAL 30 DAY);
Running similar queries for different time intervals helps validate TTL settings and ensure they balance storage efficiency and data accessibility.
Get started with TTL management
Before you can evaluate, set, and manage Experience Event Dataset Retention using the Catalog Service API, you must understand how to format your requests correctly. This includes knowing the API paths, providing required headers, and formatting request payloads. Refer to the Catalog Service API getting started guide for this essential information.
Check your TTL constraints check-ttl-constraints
Use the Data Hygiene API /ttl/{DATASET_ID}
endpoint to help plan TTL configurations. This endpoint returns the minimum and maximum TTL values supported for your organization, along with a recommended value (defaultValue
) for the dataset type.
See the ÃÛ¶¹ÊÓÆµ Developer documentation for more information.
To check the TTL currently applied to a dataset, make a GET request to the /dataSets/{DATASET_ID}
endpoint instead.
https://platform.adobe.io/data/foundation/catalog
. The base path for the Data Hygiene API is: https://platform.adobe.io/data/core/hygiene
API format
GET /ttl/{DATASET_ID}
{DATASET_ID}
/datasets
endpoint. See the list catalog objects API guide for instructions on filtering responses for relevant datasets.Request
The following request retrieves your organization’s TTL constraints for a particular dataset.
curl -X GET \
'https://platform.adobe.io/data/foundation/catalog/ttl/{DATASET_ID}' \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
-H 'x-sandbox-id: {SANDBOX_ID}'
Response
A successful response returns the recommended, maximum, and minimum TTL values based on your organization’s entitlements, along with a suggested TTL (defaultValue
) for the dataset. This defaultValue
is a recommended TTL duration, provided for guidance only. It is not applied unless explicitly configured by you. The response does not include any custom TTL values that may already be set. To view the current TTL for a dataset, use the GET /catalog/dataSets/{DATASET_ID}
endpoint.
code language-json |
---|
|
defaultValue
maxValue
P10Y
).minValue
P30D
).How to check applied TTL values check-applied-ttl-values
To check the current TTL value that has been applied to a dataset, use the following API call:
GET /dataSets/{DATASET_ID}
This call returns the current ttlValue
(if set) in the extensions.adobe_lakeHouse.rowExpiration
section.
Request
The following request retrieves your organization’s TTL value for a particular dataset.
curl -X GET \
https://platform.adobe.io/data/foundation/catalog/dataSets/{DATASET_ID} \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
A successful response includes the extensions
object, which contains the current TTL configuration applied to the dataset. The response example below is truncated for brevity.
{
"{DATASET_ID}": {
"name": "Acme Sales Data",
"description": "This dataset contains sales transaction records for Acme Corporation.",
"imsOrg": "{ORG_ID}",
"sandboxId": "{SANDBOX_ID}",
"extensions": {
"adobe_lakeHouse": {
"rowExpiration": {
"ttlValue": "P3M",
}
}
}
...
}
}
Set or update TTL for a dataset set-update-ttl
https://ns.adobe.com/xdm/data/time-series
).meta:extends
property. See the Schema endpoint documentation for guidance on how to do this.You can configure Experience Event Dataset Retention by setting a new TTL or updating an existing TTL using the same API method. Use a PATCH request to the /v2/datasets/{DATASET_ID}
endpoint to apply or adjust TTL.
API format
PATCH /v2/datasets/{DATASET_ID}
{DATASET_ID}
Request
In the example below, the ttlValue
is set to P3M
. This means that records older than three months are automatically deleted. Adjust the retention period to suit your business needs (for example, P6M
for six months or P12M
for one year).
curl -X PATCH \
'https://platform.adobe.io/data/foundation/catalog/v2/datasets/{DATASET_ID}' \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'x-api-key: {API_KEY}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-d '{
"extensions": {
"adobe_lakeHouse": {
"rowExpiration": {
"ttlValue": "P3M" // A 3 month retention period
}
}
}
}
rowExpiration.ttlValue
P3M
for 3 months, or P30D
for 30 days).Response
A successful response returns a reference to the updated dataset but does not explicitly include the TTL settings. To confirm the TTL configuration, make a follow-up GET /dataSets/{DATASET_ID}
request.
[
"@/dataSets/{DATASET_ID}"
]
Example scenario example-scenario
Consider a video streaming platform that initially sets the TTL to three months to ensure fresh engagement data for personalization. However, if subsequent analysis reveals that older interactions still provide valuable insights, the TTL can be extended to six months with the following request:
curl -X PATCH \
'https://platform.adobe.io/data/foundation/catalog/v2/datasets/{DATASET_ID}' \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'x-api-key: {API_KEY}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-d '{
"extensions": {
"adobe_lakeHouse": {
"rowExpiration": {
"ttlValue": "P6M" // Extend to 6 months
}
}
}
}
Dataset retention policy FAQs faqs
This FAQ covers practical questions about dataset retention jobs, immediate effects of TTL changes, recovery options, and how retention periods differ across Platform services.
What types of datasets can I apply retention policy rules to?
You can apply TTL-based retention policies to any dataset that uses time-series behavior. This includes datasets based on the standard XDM ExperienceEvent class, as well as custom schemas designed to capture time-series data.
Row-level expiration requires the following technical conditions:
- The schema must be designed to capture time-series data.
- The schema must include a timestamp field used to evaluate expiration.
- The dataset should store event-level data, typically using or extending the XDM ExperienceEvent class.
- The dataset must be registered in Catalog Service, as TTL settings are applied via
extensions.adobe_lakeHouse.rowExpiration
. - TTL values must use the ISO-8601 duration format (for example,
P30D
,P6M
,P1Y
).
How soon will the Dataset Retention job delete data from data lake services?
Can I set different retention policies for data lake and Profile services?
How can I check my current dataset usage?
You can check the latest dataset storage size for data lake and Profile stores as separate metrics on the Dataset inventory workspace. Sort the columns to identify the largest datasets and verify that retention policies are applied.
For sandbox-level usage, refer to the License Usage dashboard. See the License Usage documentation for details.
How can I verify if the data retention job was successful?
You can verify the last data retention job by checking its timestamp in the Dataset Retention Configuration UI or on the Data Inventory page.
Alternatively, you can make a GET request to the following endpoint:
GET https://platform.adobe.io/data/foundation/catalog/dataSets/{DATASET_ID}
The response includes the property extensions.adobe_lakeHouse.rowExpiration.lastCompleted
, which indicates the Unix timestamp (in milliseconds) of when the most recent TTL job was completed.
Historical dataset usage reporting is currently unavailable.
Can I recover deleted data?
What is the minimum TTL I can configure on a data lake Experience Event dataset?
What if I need to retain some data lake fields longer than my TTL policy allows?
Use Data Distiller to retain specific fields beyond your dataset’s TTL while staying within your utilization limits. Create a job that regularly writes only the necessary fields to a derived dataset. This workflow ensures compliance with a shorter TTL while preserving critical data for extended use.
For more details, see the Create derived datasets with SQL guide.
Next steps next-steps
Now that you’ve learned how to manage TTL settings for row-level expiration, review the following documentation to further your understanding of TTL management:
- Retention jobs: Learn to schedule and automate dataset expirations in the Experience Platform UI with the data lifecycle UI guide, or check Dataset Retention configurations and verify that expired records are deleted.
- Dataset Expiration API endpoint guide: Discover how to delete entire datasets rather than just rows. Learn how to schedule, manage, and automate dataset expiration using the API to ensure efficient data retention.
- Data usage policies overview: Learn how to align your data retention strategy with broader compliance requirements and marketing use restrictions.