Data Ingestion Overview

Documentation Experience Platform Data Ingestion Guide

Last update: Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

Topics:
Data Ingestion

CREATED FOR:

Developer

In ÃÛ¶¹ÊÓÆµ Experience Platform, data ingestion is the transportation of data from assorted sources to a storage medium where it can be accessed, used, and analyzed by an organization. Data ingestion in Experience Platform can be grouped into two main categories: streaming ingestion and batch ingestion.

Under streaming and batch ingestion are a number of different methods that you can use to ingest your data into Experience Platform. These methods include the use of a variety of sources and connecting to these sources to then bring data into Experience Platform.

Read this document for an overview of the many different ways that data can be ingested into Experience Platform.

Streaming ingestion streaming

You can use streaming ingestion to send data from client and server-side devices to Experience Platform in real-time. Experience Platform supports the use of data inlets to stream incoming experience data, which is persisted in streaming-enabled datasets within the data lake. Data inlets can be configured to automatically authenticate the data they collect, ensuring that the data is coming from a trusted source.

For more information, read the streaming ingestion overview.

Batch ingestion batch

In Experience Platform, a batch is a set of data collected over a period of time and processed together as a single unit. Datasets are made up of batches. You can use batch ingestion to ingest data into Experience Platform as batch files. Once ingested, batches provide metadata that describes the number of records successfully ingested, as well as any failed records and associated error messages.

Manually uploaded datafiles such as flat CSV files (mapped to XDM schemas) and parquet files must be ingested using this method.

For more information, read the batch ingestion overview.

Sources sources

You can also ingest data by connecting to Experience Platform Sources. Experience Platform maintains a catalog of a variety of different data sources that you can connect to and ingest data from. These sources can be native ÃÛ¶¹ÊÓÆµ applications such as the ÃÛ¶¹ÊÓÆµ Analytics source or the Marketo Engage source. You can also connect to third-party sources such as the Amazon S3 source and the Google Cloud Storage source.

Sources are grouped into different categories like cloud storages, databases, and CRM systems. A given source may support batch or streaming ingestion.

With sources, you can ingest data from a number of different data sources, and of varying different use case categories. Additionally, data ingestion via a source gives you the opportunity to authenticate against the external data source, configure an ingestion schedule, and manage ingestion throughput.

For more information, read the sources overview for more information.

ML-Assisted schema creation ml-assisted-schema-creation

To quickly integrate new data sources, you can now use machine learning algorithms to generate a schema from sample data. This automation simplifies the creation of accurate schemas, reduces errors, and speeds up the process from data collection to analysis and insights.

See the ML-assisted schema creation guide for more information on this workflow.

Data Prep data-prep

While data prep is not a method of ingestion, it is an important part of the data ingestion process. Use data prep functions to map, transform, and validate data to and from Experience Data Model (XDM) before creating a dataflow to ingest your data to Experience Platform. Data prep appears as the â€œMappingâ€ step in the Experience Platform user interface during the data ingestion process.

For more information, read the data prep overview.

Streaming ingestion methods streaming-ingestion-methods

The following table outlines the variety of methods that you can use to ingest streaming data to Experience Platform.

Streaming Sources

Method

Common Use Cases

Protocols

Considerations

ÃÛ¶¹ÊÓÆµ Web/Mobile SDK

Data collection from websites and mobile apps.
Preferred method for client side collection.

Push, HTTP, JSON

Implement multiple ÃÛ¶¹ÊÓÆµ applications leveraging a single SDK.

HTTP API Connector

Collection from streaming sources, transactions, relevant customer events and signals.

Push, REST API, JSON

Raw or XDM data is streamed directly to the hub, with no real-time Edge segmentation or event forwarding.

Edge Network API

Collection from streaming sources, transactions, relevant customer events and signals from the globally distributed Edge Network.

Push, REST API, JSON

Data is streamed through the Edge Network. Support for real-time segmentation and event forwarding on the Edge.

ÃÛ¶¹ÊÓÆµ Applications

Data ingestion from applications like ÃÛ¶¹ÊÓÆµ Analytics, Marketo Engage, ÃÛ¶¹ÊÓÆµ Campaign Managed Services, ÃÛ¶¹ÊÓÆµ Target, ÃÛ¶¹ÊÓÆµ Audience Manager

Push, Source Connectors and API

The recommended approach is to migrate to the Web/Mobile SDK instead of using traditional application SDKs.

Streaming Sources

Ingestion of an enterprise event stream, typically used for sharing enterprise data to multiple downstream applications.

Push, REST API, JSON

Data is streamed in JSON format and can be mapped to XDM schema.

Streaming Sources SDK

Use the self-service capabilities of Self-Serve Sources Streaming SDK to integrate your own data source to the Experience Platform sources catalog.

Push, HTTP API, JSON

Examples of partner-integrated streaming sources include: Braze, Pendo, and RainFocus.

Batch ingestion methods batch-ingestion-methods

The following table outlines the variety of methods that you can use to ingest batch data to Experience Platform.

Batch Sources

Method

Common Use Cases

Protocols

Considerations

Batch Ingestion API

Ingestion from an enterprise managed queue. Use batch ingestion if your data needs to be prepared and formatted prior to ingestion.

Push, JSON or Parquet

Must manage batches and files for ingestion.

Batch Sources

Common approach for ingestion of data from cloud storage, CRM, and marketing automation applications.
Ideal for ingesting large amounts of historical data.

Pull, CSV, JSON, Parquet

Source ingestion based on pre-configured scheduled intervals.

Data Landing Zone

ÃÛ¶¹ÊÓÆµ-provisioned cloud-based file storage. You have access to one Data Landing Zone container per sandbox.
Push your files to the Data Landing Zone for later ingestion into Experience Platform.

Push, CSV, JSON, Parquet

Experience Platform enforces a strict seven-day expiration time on all files and folders uploaded to a Data Landing Zone container. All files and folders are deleted after seven days.

Batch Sources SDK

Use the self-service capabilities of Self-Serve Sources Batch SDK to integrate your own data source to the Experience Platform sources catalog.
Ideal for partner connectors or for a tailored workflow experience for setting up an enterprise connector.

Pull, REST API, CSV or JSON

Examples of partner-integrated batch sources include: Mailchimp, OneTrust, Zendesk

Next steps and additional resources

This document provided a brief introduction to the different aspects of Data Ingestion in Experience Platform. Please continue to read the overview documentation for each ingestion method to familiarize yourself with their different capabilities, use cases, and best practices. You can also supplement your learning by watching the ingestion overview video below. For information on how Experience Platform tracks the metadata for ingested records, see the Catalog Service overview.

WARNING

The term â€œUnified Profileâ€ thats used in the following video is out-of-date. The terms â€œProfileâ€ or â€œReal-Time Customer Profileâ€ are the correct terms used in the Experience Platform documentation. Please refer to the documentation for the latest functionality.

video poster

Transcript

Hi there, Iâ€™m going to give you a quick overview of how to ingest data into ÃÛ¶¹ÊÓÆµ Experience Platform. Data ingestion is a fundamental step to getting your data in Experience Platform so you can use it to build 360 degree real-time customer profiles and use them to provide meaningful experiences. ÃÛ¶¹ÊÓÆµ Experience Platform allows data to be ingested from various external sources while giving you the ability to structure, label and enhance incoming data using Platform services. You can ingest data from various sources such as ÃÛ¶¹ÊÓÆµ applications, enterprise sources, databases, stream data using a web or mobile SDK and many others. Platform is API friendly and lets you ingest data using batch and streaming APIs. Experience Platform provides tools to ensure that the ingested data is XDM compliant and helps prepare that data for real-time customer profiles and other services. You can ingest data into Platform using various sources. You can configure a streaming source connector in Platform that provides an HTTP API endpoint, and then you can either do a batch ingestion or stream data into Platform using the endpoint. You can drag and drop files into the UI and ingest it with the batch mode. You can also configure it a source connector in the UI that will ingest data from the origin system using the most appropriate mode for that system. Source connectors ingest data using either batch ingestion or streaming ingestion. Platform provides you with the state of art streaming infrastructure to collect, enrich and activate data in real time. Streaming ingestion APIs makes it easy for customers to ingest data from the real-time messaging systems, other first party systems and partners. When data is streamed to Platform, data is verified to ensure that itâ€™s coming from trusted sources and itâ€™s in the XDM format. The is then placed on Experience Platform pipeline for consumption by other services as fast as possible. Different services within Platform then consume the data from the pipeline. In the next step that is stored in Data Lake as a dataset, comprised of batches and files that can be accessed by various Platform components. All data sets contain a reference to the XDM schema that constraints the format and structure of the data that they can store. Attempting to upload that to a dataset that does not conform to the datasets XDM schema, will cause an ingestion to fail. Any data that is configured to be processed into the profile gets flagged for immediately processing up into the identity graph and profile store. With real-time customer profile, you can see a holistic view of each individual customer by combining data from multiple channels, including online, offline, CRM and third-party. Profile allows you to consolidate your customer data into a unified view, offering an actionable, timestamped account of every customer interaction. With ÃÛ¶¹ÊÓÆµ Experience Platform Query Service, you can prove all your stored customer datasets including behavioral, CRM, point of sales data and more into one place and run faster petabyte SQL queries to discover the story behind customer behavior and generate impactful insights using a BI tool of your choice. Although your real-time customer profile requests real time data ingestion and activation, there are still many use cases where batch ingestion is needed. Many first party and third party systems do not support streaming edition yet. Plus, you might want to completely refresh the data in Platform with an updated version from your own Data Lake such as monthly refresh of your product catalog. In addition, if you want to upload large volumes of data, batch ingestion is still the optimal method to load terabytes of data into Platform. To support these use cases, Platform provides batch data ingestion pipelines that allow you to ingest data from any system. Batch pipeline validates, transforms and partitions data before itâ€™s stored in the Data Lake. This ensures that the data is stored in the most optimized format to support easy access at petabyte scale. Letâ€™s take a cute look at the source connector example to get a better understanding. When you log into a Platform you will see sources in the left navigation. Clicking sources will take you to the source catalog screen where you can see all of the source connectors currently available in Platform. For our video, letâ€™s use the Amazon S3 cloud storage to perform a batch ingestion. Click on the add data option and choose an existing Amazon account and then move to the next step. In this step, we choose the source file for data ingestion and verify the file data format. Not that the ingested file data can be formatted as XDM JSON, XDM Parquet or delimited. Currently for delimited files, you have an option to preview sample data of the source file. You can also choose a custom delimiter for your source data. For streaming and batch ingestion, ÃÛ¶¹ÊÓÆµ Experience Platform currently supports the following file formats. For data ingestion, another requirement is to have a dataset to store the incoming data. A data set is a storage and management construct for a collection of data, typically a table, that contains columns derived from a schema and the ingested data gets stored as rows. All data sets are based on existing XDM schemas which provide constraints for what the ingested data should contain and how it should be structured. Experience Platform uses schema to describe the structure of data in a consistent and reusable way. Before ingesting data into Platform, a schema must be composed to describe the data structure and provide constraints to the type of data that can be contained within each field, so data can be validated as it moves between systems. Schema consists of a base class and zero or more mix-ins. First you assign a class that defines what a schema is. For example, an individual profile or an Experience event. Next you can add mix-ins which are reusable components defining fields like personal details, preferences or addresses. ÃÛ¶¹ÊÓÆµ Experience Platform provides standard classes and mix-ins related to these classes. If there is a need you can also define a customer class or a custom mix-in for your use case. Data appropriation allows data engineers to map, transform and validate source data to and from Experience at a model. Data appropriation appears as a mapping step in the data ingestion process. Data engineers can use data prep to perform data manipulating during ingestion. You can define simple pass through mappings to assign source input attributes to XDM target attributes, create calculated fields to perform in row calculations that can we assign to XDM attributes. In this example, you can combine the first name and the last name source fields to populate the full name field in the target field using a concatenation operation. Similarly, you can also transform a particular field by applying string, numeric, a date manipulation functions provided by Platform. Letâ€™s select a frequency for this batch ingestion and move to the next step. With the help of error diagnosis, Platform allows users to generate error reports for newly ingested batches. Error diagnostics for failed records can be downloaded using the API. Partial ingestion enables the ingestion of valid records of new batch data, within a specified error threshold. The error threshold enables the configuration of personally acceptable errors before the entire batch fields. Letâ€™s review the changes and save your configuration. At this step, we are successfully configured a data ingestion flow from a source location to Platform. ÃÛ¶¹ÊÓÆµ Experience Platform allows data to be ingested from various external sources while giving you the ability to structure, label and enhance incoming data using Platform services. -

recommendation-more-help

2ee14710-6ba4-4feb-9f79-0aad73102a9a