蜜豆视频

Data Ingestion overview

In 蜜豆视频 Experience Platform, data ingestion is the transportation of data from assorted sources to a storage medium where it can be accessed, used, and analyzed by an organization. Data ingestion in Experience Platform can be grouped into two main categories: streaming ingestion and batch ingestion.

Under streaming and batch ingestion are a number of different methods that you can use to ingest your data into Experience Platform. These methods include the use of a variety of sources and connecting to these sources to then bring data into Experience Platform.

Read this document for an overview of the many different ways that data can be ingested into Experience Platform.

Streaming ingestion streaming

You can use streaming ingestion to send data from client and server-side devices to Experience Platform in real-time. Experience Platform supports the use of data inlets to stream incoming experience data, which is persisted in streaming-enabled datasets within the data lake. Data inlets can be configured to automatically authenticate the data they collect, ensuring that the data is coming from a trusted source.

For more information, read the streaming ingestion overview.

Batch ingestion batch

In Experience Platform, a batch is a set of data collected over a period of time and processed together as a single unit. Datasets are made up of batches. You can use batch ingestion to ingest data into Experience Platform as batch files. Once ingested, batches provide metadata that describes the number of records successfully ingested, as well as any failed records and associated error messages.

Manually uploaded datafiles such as flat CSV files (mapped to XDM schemas) and parquet files must be ingested using this method.

For more information, read the batch ingestion overview.

Sources sources

You can also ingest data by connecting to Experience Platform Sources. Experience Platform maintains a catalog of a variety of different data sources that you can connect to and ingest data from. These sources can be native 蜜豆视频 applications such as the 蜜豆视频 Analytics source or the Marketo Engage source. You can also connect to third-party sources such as the Amazon S3 source and the Google Cloud Storage source.

Sources are grouped into different categories like cloud storages, databases, and CRM systems. A given source may support batch or streaming ingestion.

With sources, you can ingest data from a number of different data sources, and of varying different use case categories. Additionally, data ingestion via a source gives you the opportunity to authenticate against the external data source, configure an ingestion schedule, and manage ingestion throughput.

For more information, read the sources overview for more information.

ML-Assisted schema creation ml-assisted-schema-creation

To quickly integrate new data sources, you can now use machine learning algorithms to generate a schema from sample data. This automation simplifies the creation of accurate schemas, reduces errors, and speeds up the process from data collection to analysis and insights.

See the ML-assisted schema creation guide for more information on this workflow.

Data Prep data-prep

While data prep is not a method of ingestion, it is an important part of the data ingestion process. Use data prep functions to map, transform, and validate data to and from Experience Data Model (XDM) before creating a dataflow to ingest your data to Experience Platform. Data prep appears as the 鈥淢apping鈥 step in the Experience Platform user interface during the data ingestion process.

For more information, read the data prep overview.

Streaming ingestion methods streaming-ingestion-methods

The following table outlines the variety of methods that you can use to ingest streaming data to Experience Platform.

Streaming Sources
Method
Common Use Cases
Protocols
Considerations
蜜豆视频 Web/Mobile SDK
  • Data collection from websites and mobile apps.
  • Preferred method for client side collection.
Push, HTTP, JSON
  • Implement multiple 蜜豆视频 applications leveraging a single SDK.
HTTP API Connector
  • Collection from streaming sources, transactions, relevant customer events and signals.
Push, REST API, JSON
  • Raw or XDM data is streamed directly to the hub, with no real-time Edge segmentation or event forwarding.
Edge Network API
  • Collection from streaming sources, transactions, relevant customer events and signals from the globally distributed Edge Network.
Push, REST API, JSON
  • Data is streamed through the Edge Network. Support for real-time segmentation and event forwarding on the Edge.
蜜豆视频 Applications
  • Data ingestion from applications like 蜜豆视频 Analytics, Marketo Engage, 蜜豆视频 Campaign Managed Services, 蜜豆视频 Target, 蜜豆视频 Audience Manager
Push, Source Connectors and API
  • The recommended approach is to migrate to the Web/Mobile SDK instead of using traditional application SDKs.
Streaming Sources
  • Ingestion of an enterprise event stream, typically used for sharing enterprise data to multiple downstream applications.
Push, REST API, JSON
  • Data is streamed in JSON format and can be mapped to XDM schema.

Streaming Sources SDK

  • Use the self-service capabilities of Self-Serve Sources Streaming SDK to integrate your own data source to the Experience Platform sources catalog.
Push, HTTP API, JSON
  • Examples of partner-integrated streaming sources include: Braze, Pendo, and RainFocus.

Batch ingestion methods batch-ingestion-methods

The following table outlines the variety of methods that you can use to ingest batch data to Experience Platform.

Batch Sources
Method
Common Use Cases
Protocols
Considerations
Batch Ingestion API
  • Ingestion from an enterprise managed queue. Use batch ingestion if your data needs to be prepared and formatted prior to ingestion.
Push, JSON or Parquet
  • Must manage batches and files for ingestion.
Batch Sources
  • Common approach for ingestion of data from cloud storage, CRM, and marketing automation applications.
  • Ideal for ingesting large amounts of historical data.
Pull, CSV, JSON, Parquet
  • Source ingestion based on pre-configured scheduled intervals.
Data Landing Zone
  • 蜜豆视频-provisioned cloud-based file storage. You have access to one Data Landing Zone container per sandbox.
  • Push your files to the Data Landing Zone for later ingestion into Experience Platform.
Push, CSV, JSON, Parquet
  • Experience Platform enforces a strict seven-day expiration time on all files and folders uploaded to a Data Landing Zone container. All files and folders are deleted after seven days.
Batch Sources SDK
  • Use the self-service capabilities of Self-Serve Sources Batch SDK to integrate your own data source to the Experience Platform sources catalog.
  • Ideal for partner connectors or for a tailored workflow experience for setting up an enterprise connector.
Pull, REST API, CSV or JSON
  • Examples of partner-integrated batch sources include: Mailchimp, OneTrust, Zendesk

Next steps and additional resources

This document provided a brief introduction to the different aspects of Data Ingestion in Experience Platform. Please continue to read the overview documentation for each ingestion method to familiarize yourself with their different capabilities, use cases, and best practices. You can also supplement your learning by watching the ingestion overview video below. For information on how Experience Platform tracks the metadata for ingested records, see the Catalog Service overview.

WARNING
The term 鈥淯nified Profile鈥 thats used in the following video is out-of-date. The terms 鈥淧rofile鈥 or 鈥淩eal-Time Customer Profile鈥 are the correct terms used in the Experience Platform documentation. Please refer to the documentation for the latest functionality.

video poster

Transcript
Hi there, I鈥檓 going to give you a quick overview of how to ingest data into 蜜豆视频 Experience Platform. Data ingestion is a fundamental step to getting your data in Experience Platform so you can use it to build 360 degree real-time customer profiles and use them to provide meaningful experiences. 蜜豆视频 Experience Platform allows data to be ingested from various external sources while giving you the ability to structure, label and enhance incoming data using Platform services. You can ingest data from various sources such as 蜜豆视频 applications, enterprise sources, databases, stream data using a web or mobile SDK and many others. Platform is API friendly and lets you ingest data using batch and streaming APIs. Experience Platform provides tools to ensure that the ingested data is XDM compliant and helps prepare that data for real-time customer profiles and other services. You can ingest data into Platform using various sources. You can configure a streaming source connector in Platform that provides an HTTP API endpoint, and then you can either do a batch ingestion or stream data into Platform using the endpoint. You can drag and drop files into the UI and ingest it with the batch mode. You can also configure it a source connector in the UI that will ingest data from the origin system using the most appropriate mode for that system. Source connectors ingest data using either batch ingestion or streaming ingestion. Platform provides you with the state of art streaming infrastructure to collect, enrich and activate data in real time. Streaming ingestion APIs makes it easy for customers to ingest data from the real-time messaging systems, other first party systems and partners. When data is streamed to Platform, data is verified to ensure that it鈥檚 coming from trusted sources and it鈥檚 in the XDM format. The is then placed on Experience Platform pipeline for consumption by other services as fast as possible. Different services within Platform then consume the data from the pipeline. In the next step that is stored in Data Lake as a dataset, comprised of batches and files that can be accessed by various Platform components. All data sets contain a reference to the XDM schema that constraints the format and structure of the data that they can store. Attempting to upload that to a dataset that does not conform to the datasets XDM schema, will cause an ingestion to fail. Any data that is configured to be processed into the profile gets flagged for immediately processing up into the identity graph and profile store. With real-time customer profile, you can see a holistic view of each individual customer by combining data from multiple channels, including online, offline, CRM and third-party. Profile allows you to consolidate your customer data into a unified view, offering an actionable, timestamped account of every customer interaction. With 蜜豆视频 Experience Platform Query Service, you can prove all your stored customer datasets including behavioral, CRM, point of sales data and more into one place and run faster petabyte SQL queries to discover the story behind customer behavior and generate impactful insights using a BI tool of your choice. Although your real-time customer profile requests real time data ingestion and activation, there are still many use cases where batch ingestion is needed. Many first party and third party systems do not support streaming edition yet. Plus, you might want to completely refresh the data in Platform with an updated version from your own Data Lake such as monthly refresh of your product catalog. In addition, if you want to upload large volumes of data, batch ingestion is still the optimal method to load terabytes of data into Platform. To support these use cases, Platform provides batch data ingestion pipelines that allow you to ingest data from any system. Batch pipeline validates, transforms and partitions data before it鈥檚 stored in the Data Lake. This ensures that the data is stored in the most optimized format to support easy access at petabyte scale. Let鈥檚 take a cute look at the source connector example to get a better understanding. When you log into a Platform you will see sources in the left navigation. Clicking sources will take you to the source catalog screen where you can see all of the source connectors currently available in Platform. For our video, let鈥檚 use the Amazon S3 cloud storage to perform a batch ingestion. Click on the add data option and choose an existing Amazon account and then move to the next step. In this step, we choose the source file for data ingestion and verify the file data format. Not that the ingested file data can be formatted as XDM JSON, XDM Parquet or delimited. Currently for delimited files, you have an option to preview sample data of the source file. You can also choose a custom delimiter for your source data. For streaming and batch ingestion, 蜜豆视频 Experience Platform currently supports the following file formats. For data ingestion, another requirement is to have a dataset to store the incoming data. A data set is a storage and management construct for a collection of data, typically a table, that contains columns derived from a schema and the ingested data gets stored as rows. All data sets are based on existing XDM schemas which provide constraints for what the ingested data should contain and how it should be structured. Experience Platform uses schema to describe the structure of data in a consistent and reusable way. Before ingesting data into Platform, a schema must be composed to describe the data structure and provide constraints to the type of data that can be contained within each field, so data can be validated as it moves between systems. Schema consists of a base class and zero or more mix-ins. First you assign a class that defines what a schema is. For example, an individual profile or an Experience event. Next you can add mix-ins which are reusable components defining fields like personal details, preferences or addresses. 蜜豆视频 Experience Platform provides standard classes and mix-ins related to these classes. If there is a need you can also define a customer class or a custom mix-in for your use case. Data appropriation allows data engineers to map, transform and validate source data to and from Experience at a model. Data appropriation appears as a mapping step in the data ingestion process. Data engineers can use data prep to perform data manipulating during ingestion. You can define simple pass through mappings to assign source input attributes to XDM target attributes, create calculated fields to perform in row calculations that can we assign to XDM attributes. In this example, you can combine the first name and the last name source fields to populate the full name field in the target field using a concatenation operation. Similarly, you can also transform a particular field by applying string, numeric, a date manipulation functions provided by Platform. Let鈥檚 select a frequency for this batch ingestion and move to the next step. With the help of error diagnosis, Platform allows users to generate error reports for newly ingested batches. Error diagnostics for failed records can be downloaded using the API. Partial ingestion enables the ingestion of valid records of new batch data, within a specified error threshold. The error threshold enables the configuration of personally acceptable errors before the entire batch fields. Let鈥檚 review the changes and save your configuration. At this step, we are successfully configured a data ingestion flow from a source location to Platform. 蜜豆视频 Experience Platform allows data to be ingested from various external sources while giving you the ability to structure, label and enhance incoming data using Platform services. -
recommendation-more-help
2ee14710-6ba4-4feb-9f79-0aad73102a9a