Data Distiller 101
This Data distiller overview will demonstrate how to overcome common data distiller challenges as well as the key uses best practices for success.
Key Discussion Points
- Data distiller overview
- Data Distill FAQ’s and their solutions
- Key use cases
Hi, everyone.
We’re going to give about a minute for folks to join. So as people are filtering in, the topic of our webinar today is data distiller 101. So if you’ve been curious about data distiller 101, we’re going to give you a little bit of a look at the data distiller 101. The topic of our webinar today is data distiller 101. So if you’ve been curious about data distiller, the purpose of this webinar is to provide an overview of data distiller, the key use cases it supports, and those will cover some of the most common customer solutions for data distiller. Some housekeeping items as we get started. This webinar is being recorded, and the recording will be shared with you after the session. All participants are in listen-only mode. However, if you do have questions, go ahead and post those in the meeting chat pod, and we’ll have time for Q&A at the end. We’ll go through your questions. And if there’s anything that we are unable to answer live today, we will do our best to take those away and follow up. I’ll go ahead and go through our agenda for today as well. So first and foremost, let me introduce our presenters. So we firstly have Rizal Alam. Rizal is a principal consultant with ÃÛ¶¹ÊÓÆµ for 17 years. He’s a multi-solution architect focusing on ÃÛ¶¹ÊÓÆµ Experience Cloud solution and ÃÛ¶¹ÊÓÆµ Experience Platform App Services. And I will be the other presenter today. I’m Brenna Skurlock. I’m a senior consultant with ÃÛ¶¹ÊÓÆµ for a little over two and a half years. I’m a field engineer focusing on Experience Platform and related applications. And our agenda for today is going to include the data distiller overview. So within that first section, we’re going to discuss with you the primary business drivers for using data distiller. We’ll cover the value proposition of the tool as well as the key capabilities. And then we’re going to jump into some use cases. So we’ll walk you through the primary use case patterns that have been identified for the data distiller product. And then we have selected five use cases to do a closer walkthrough on with you. All right. And then we’ll wrap up the rest of the session by providing you with some additional resources to learn more about data distiller. And then we’ll do Q&A at the end. All right. So we’re just about at five after the hour. So I think we can go ahead and get started. I’m going to pass it over to Rizal for the data distiller overview. Thank you very much, Brenna. So we’re going to do an overview of what data distiller is all about. Wanted to go to the next slide and we’ll discuss about the main drivers for using data distiller. So usually we have different personas when it comes to users of data distiller. So you have data architect, data engineer, data scientist, and also the marketing analyst. Why is it that there is the opportunity here for those personas to actually leverage data distiller? Primarily because when you have ingested the data in the data lake, there is sometimes additional data massaging that needs to happen. So an example here would be that as an author, I want to be able to use segments that the business needs with curated or even additional contextual data that are needed. And that means that sometimes after you have ingested the data in the data lake, these additional insights that are needed from the data that you have ingested. So what you can do, you can use data distiller by using query service. You retrieve the insights from the data that you have ingested. An example here would be a customer lifetime value. Usually this particular data point is not available as part of the data that you have ingested. If it’s not available, what you can do, you can leverage data distiller to actually bring that data out and leverage that in your marketing, in your segments. The other aspect here is that I want to use this data in ways that suit the different use cases of my stakeholders. The idea here is that the data comes from multiple areas of the business. You have multiple use cases. So sometimes you need to bring all that data together, derive some insights from those multiple data points. So how do you actually deliver multiple use cases on all those data is by actually creating the data from those data sets. And you derive all the data from, let’s say, for example, you have five data sets you have ingested. You want to actually derive information from those five data sets. So what you can do, you use data distiller to bring all these data into one data set, which is called a derived data set. What that does, it satisfies multiple stakeholders’ desire to use the data and also multiple use cases that you have. And the other aspect whereby we see data distiller being used is from gaining operational and transparency insights around my profile. So usually what happens is that you have your data from ÃÛ¶¹ÊÓÆµ Journey Optimizer, you have your data from ÃÛ¶¹ÊÓÆµ Experience Platform in the data lake. What you want to be able to do is that you want to be able to report on those data from Journey Optimizer and your segments in AP. So what you can do, you can bring these information together and create operational dashboards that allows you to gain insights from information like how many emails were sent, were open and clicked. But you can tie this to the different segments that was driving this information. So you can build these dashboards using data distiller. Let’s go on the next slide. So what is the value proposition here is that you have your three main use cases that we have come across. As mentioned, we have the derived data set whereby you have multiple data sets that you have ingested. And then what you can do, you run your SQL queries, you bring all these data set information together into one derived data set and then leverage the data set for your segmentation. The other aspect is regarding reporting, as mentioned around AJU and CDP, where you are able to bring all this information together and then create operational reporting dashboards. But also, we also launch in 2023, we launch the AI ML feature pipeline. So we will go through this as well. So these are the three main use cases, but we have other use cases that we have come across. But what I would like to mention here is that even though you’re going to have all those use cases, those use cases will have a full primary category. So you will have the cleaning of the data that you need to do. Once you’ve done that, then you can shape the data the way you want to shape the data. And then you manipulate the data. And then you enrich. As we go through these use cases, we’re going to dive into the cleaning, the shaping, the manipulating and the enriching of the data using Data Distiller. Let’s go on the next slide. So where does Data Distiller sits in? So as you bring the data into AEP, you have the batch ingestion and the streaming ingestion. And as part of the data ingestion, whether it is streaming and batch, you have the data prep. The data prep allows you to make some changes. You can manipulate the data as the data is streaming into AEP. You can make that change. But once the data land in the data lake, there’s not much transformation that can be done here. As mentioned, as an example, I’ve given you a neuron. You have multiple data sets. You want to derive meaningful data from all those information that you brought into AEP. This is where the red line here, the red line from onward, this is where Data Distiller starts. So from data lake onwards, this is where you will be able to do the cleaning of the data, the enriching of the data and the shaping and the manipulation of the data. Once you’ve done all these, you can create other data sets. You can create the dashboards. You can send these across to multiple app services, such as customer journey analytics or your own journey optimizer and any API tools that you may have. Let’s go on the next slide, I believe. This is the slide whereby we bring up the different capabilities of Data Distiller. So we have its SQL base. We have the SQL base processing engine. It’s scalable. The way we measure the use of Data Distiller is using compute. The other aspect here is that you have what we call ÃÛ¶¹ÊÓÆµ defined functions. You have anonymous block, snapshot incremental processing and sampling. These are features that allows you to, for example, the anonymous blocks allows you to finish. You start with one query and then you go to the next query only after the first query has run. So let’s say, for example, you have three data sets. You are running a query on this three data set. You want to make sure that query is finished. Then you move on to the next step of your findings. So that’s what anonymous blocks does. So in terms of operationalizing of data processing workflows, we have an easy automation and scheduling as part of Data Distiller. You have also monitoring and alerting, license usage monitoring. In terms of integration workflow for delivering extended insights, as mentioned, you are able to actually connect your, for example, the derived data set that you have. You can then, as part of this manipulation that you have done of your data, you can connect this with your BI tools and you can extend the reporting of the capabilities that you currently have in AP using Data Distiller. You can share all that data as well with any third party clients. I pass on to you now, Bren, I believe. Yes, thank you, Russell. All right, so now that you have a good understanding of the business purpose of Data Distiller, let’s talk about some of those specific use cases. All right, so before we dig into the specific use cases, let me outline the four use case patterns or categories that have been identified for Data Distiller. And these are patterns identified and defined based on customer feedback interactions, as well as actual usage of the product. So our first pattern is clean. And so cleaning allows the user to standardize data and do rule based filtering, such as filtering noisy bot data out, identity cleansing, or maybe performing data quality checks. The second category is going to be shape. So this category, this allows you to reshape the data. So let’s say you want to be able to reformat your incoming data in terms of array manipulation, transposing the data, joining data from an existing data set, or mapping some additional IDs into it. Okay, and you can see some currency standardization. Those are just some of the examples for reshaping of the data.
Our third use case category is manipulate. So this pattern is where we augment the data to get to the granularity that we’re looking for. So for example, this could be adding a new field that includes some aggregate data to support BI reporting use cases, or for example, using a windowing function to limit the data to only data within a defined reporting period. All right, and then finally, our fourth use case category is enrich. So this use case pattern allows enrichment of the data set by deriving additional attributes for downstream audience or campaign activation or analysis use cases. All right, so next we’re going to describe some of those specific use cases with you, and you’ll see how they fit into these four use case patterns.
All right, so these are the five use cases that we are going to be walking you through today. And let me start with use case number one. All right, so this first use case is about a South American retail company that wanted to understand how its customers interact across its multiple brands. The problem was their transaction records, browsing history, CRM data, payment and profile data were all defined at the root company level, not at the brand level. So what this meant was that if a customer purchased items from three different brands, or if they logged into a company that was not The problem was their transaction records, browsing history, CRM data, payment and profile data were all defined at the root company level, not at the brand level. So what this meant was that if a customer purchased items from three different brands, or if they logged into their three different accounts tied to the different brands, the company was not able to access the company’s accounts. So if a customer had logged into their three different accounts tied to the different brands, the company had no way to see this across those brand silos, right? They only knew that three items had been purchased and that the user had logged in multiple times. Okay, so they only had insight into the totals for the company. But what they wanted was to get more granular at the brand level. So this is where Data Distiller came into play. So by using Data Distiller, they were able to take all their data and group it by the individual brands and generate a brand centric data model, which was then used within their BI dashboards. So let’s look at the specific solution here. So you can see how Data Distiller was leveraged. Under the source tables are the feeds that they were originally ingesting. Okay, so browsing history, CRM, online and offline transactions, payment app, profile attributes. By using Data Distiller, they were able to join all of these data sets together via the EC ID, customer ID and device IDs, and regenerate that into a data model defined at the brand level, which is the output that you see to the right of those source tables. So after the data had been shaped and manipulated, we now have new tables like spend per customer, overall sales, average order value and number of customers. And these are all rolled up to the brand level, which is what they were looking for in their reporting. So now within their reporting tool, they can not only view data at the brand level, but they could also compare brands to each other and understand the synergies and interactions between them. So just to summarize the key benefits of Data Distiller in this use case. First, this customer was able to create their own data model that was optimized for their reporting needs. Okay. Secondly, the resulting reports allowed them to understand the engagement level of customers by brand, as well as the ability to analyze the demographics of more engaged customers. And finally, Data Distiller allowed them to view brand performance individually as well as comparing those brands to each other. All right, let’s look at our second use case. So this use case is for a telecommunications company that needed to enrich their next best offer emails with more personalized data. They wanted to understand what products a customer was browsing, but not ultimately purchasing and then use this email or use this information in an email sent out of campaign. So within AEP, with using segmentation and AEP, they were able to find the customers that met that criteria, but the segmentation wasn’t allowing them to extract the details of the products that the customers were viewing, but not purchasing. So this is where Data Distiller comes into play again. Using Data Distiller, they’re able to derive those products that were part of the abandoned cart. And they did this by joining together browsing history, product pricing and customer information. So let’s take a look at what that specific solution looked like. So in this case, they were able to use Data Distiller to schedule a query to run hourly that joined their analytics data. So, you know, the customer browsing behavior to their profile attributes. They were able to within this select only existing customers. So they were able to filter out any prospects and join this data to their product pricing table to select only the most expensive SKUs browsed. Then they were able to join this to the ECID mapping, which allowed them to link that back to anonymous browsing sessions. So they were able to link those back to a customer.
Then using Data Distiller, they took this joined data and they were able to derive those products viewed that were not purchased. So then they were able to save those un-purchased products as profile level attributes within a new data set. So they’re using all of the four pillars of the use case patterns, cleaning the data, reshaping, manipulating and enriching. So now they know which profiles abandoned a product, and they’re able to create a segment built only to select those profiles which had un-purchased products. And then they’re able to send that segment to ÃÛ¶¹ÊÓÆµ Campaign. Additionally, because they now have the actual product SKUs that were not purchased, they were also able to export that list of derived products and use that within their retargeting email as personalization attributes. So the key benefits of Data Distiller in this use case are the ability to derive attributes on a scheduled basis, the ability to use those derived attributes as part of segmentation, and then use those same attributes to power personalization downstream. And then our third use case here. So this use case is for a luxury retailer who was looking to optimize their data for reporting and attribution modeling. So their goals with this was to have more advanced data-driven attribution models to be able to explore the data and to be able to use that data to create a more advanced data-driven attribution model. To be able to explore customer journeys from email click to store or site purchase, and then unify that data under a single identity. Okay, so they were able to use Data Distiller for this to achieve this, as well as enriching their transaction data with details about the store in which a purchase is made, as well as the loyalty status of the customer at time of purchase.
Let’s look at this specific solution. So what the customer did in this case, their key source tables that they’re starting with are their transaction details, product information, store information, acquisition, loyalty member information. And these tables were not directly used in CJA, but they were used as inputs to generate data sets specific to their analysis use cases. So Data Distiller was leveraged in a couple of different ways. So first, they use Data Distiller to shape those source data sets, including the analytics data set by creating a new common customer ID across the data sets. They could then use that within CJA as the single identifier across all data sets. Secondly, they use Data Distiller to shape the transaction data by adding new columns to it for region and loyalty segment. So they did this in a couple of different ways. Firstly, by joining transaction records to the store table to find the region where the transaction took place. And then secondly, they joined transactions to customer segments to retrieve the loyalty status at the time of purchase. This was important because since a customer’s loyalty status could change weekly, having accurate attribution to a loyalty status was very important in this case. Finally, they use Data Distiller to create new data sets that could be used within CJA. So those new data sets included a repurchasers table, which only included customers from the transactions table who had made more than a single purchase in a calendar year, as well as a gross spend per customer table, which included transaction data aggregated up to the master customer ID level. So the benefits that they got out of this particular use case, they now have the ability to analyze how online touch points and engagements impact in-store purchases. They have the ability to analyze marketing performance across channels, regions, as well as across loyalty statuses. And they are able to join their source data together as inputs to generate overall richer output data sets to power increasingly meaningful reporting. All right. And now I will pass it back over to Russell to walk you through the last two use cases. Thank you, Perilla. Thank you. So for this particular use case, we’re going to look at how you can leverage Data Distiller to customize the different insights for operational dashboarding. For this particular use case, what we’ll be doing is we’ll be shaping the data, manipulating the data and enriching the data as we go through. The idea here is that why is it our customer wanted to do this particular use case? It’s primarily to go beyond what’s available out of the box metrics available in AP. And they needed more metrics. So you can use Data Distiller to achieve that. The customer objective here was to harness the key capabilities of Data Distiller along with the out of the box reporting data models in AP so that they are able to enrich and receive the data for their unique reporting needs. Let’s go on to the next slide. The idea here I mentioned earlier on is that you are able to customize those metrics from ÃÛ¶¹ÊÓÆµ China Optimizer and RT-CDP. What you can do, you can bring the data related to your profiles, your segments, even your destination, all those metrics available in AP. You can customize all these using Data Distiller and the journeys, the campaign dashboard as well. So bringing all those metrics together, you are able to bundle more than 60 plus metrics and create more than 10 different dashboards in AP. Why is it that you, I mean, how can you do that? You can quickly do this by building custom charts and dashboards available on top of Data Distiller that allows you to do those reporting. But more and more of our customers wants to go beyond the foundation metrics based on their needs and unique use cases. Let’s go on the next slide. So what is it that Data Distiller provides? It provides a variety of data visualization. You will be able to have a slide after this one where I will show you the complexity of dashboard that you can create. So in terms of data visualization, you can create these according to your requirement, and these will be in real time. The dashboard authoring is also the data comes in real time. You will be able to create those dashboards based on this visualization that you have created and that still remains real time. You have the SQL data modeling. The SQL data modeling, you can think of it as Brenna mentioned in her free use cases. You have to shape the data. You have to manipulate the data. So for you to do that, you need to have an understanding, have the data, those data set linked to each other. And how do we do that? You do a data modeling and we did a distiller. You are able to do that. It provides you the capability of doing that. The accelerated storm, the previous two features I talk about, the data visualization and data authoring, I mentioned that it’s real time. Why is it real time? It’s because the accelerated store is the SQL engine that runs in real time. While you’re actually dragging those visualization, you’re building this visualization, the data comes in, the accelerated store is running the SQL in real time, and you are able to actually see the result in your dashboard in real time. Using the accelerated store. On top of that, the key capability here using those operational dashboards is that you have the ability to connect to your BI tools internally that you use. And all these capabilities you can do with a data distiller. Let’s go to the next slide. So the idea here is the data comes into the data lake. You are able to generate the snapshot of data as they come through into AP, into real time customer profile, whether it is B2B, B2C, all those metrics comes in. And you have the data model. You remember we were talking about how data distiller can, will allow you to understand how the data links between each of the data sets. So you have the profile snapshot comes in from your unified profile in AP. Then you can bring this directly into your dashboard or even into the dashboard in AP. But as I mentioned, you know, our customers need more additional metrics that will provide more information to their business. So how do they do that? Providing that data to data distiller. Data distiller will shape the data, will clean the data, will enrich the data and then create into a data set. Then you have those additional metrics, such like we talk about the customer lifetime value, for example, you have this additional metrics on top of that. And whereby now this is available, then you push this into your dashboard in your BI tool or even the dashboard that you create in AP, inside AP. Let’s go into next slide. So this is an example of a dashboard that you can create using data distiller. As you can see here, you have the funnel, you have the different trend lines, you have the graph charts, you have the geolocation graph you have there as well. So the possibilities are there for you to leverage data distiller with all the different capabilities, especially around the accelerated store. That dashboard is driven by in real time using the data as they come through in AP. Let’s go on the next slide. Now we go into use case five whereby we have the capability within AP now to have a AI ML feature pipeline. That’s we did a distiller. What is it that our customer wants to do here? They want to be able to train their AI model with ÃÛ¶¹ÊÓÆµ data. So we are talking about taking the data out of AP into their environment, into their landscape, and then leverage that prediction for use cases in ÃÛ¶¹ÊÓÆµ Experience Platform. What that means is that you are able to take that data out of AP, you run your model, and then once you’ve done your prediction, once you’ve done all your modeling, then you bring this back into AP. The objective here is to leverage data distiller to explore and optimize the data in order to share with your AI ML environment for training and scoring. Once you’ve done the training and the scoring, then you bring that data back into AP. And that’s what this pipeline, this AI ML pipeline allows you to do. You are able to shape the data, you’re able to manipulate the data, and then you bring it back into ÃÛ¶¹ÊÓÆµ Experience Platform. Next slide, please. So we have those two segments, right? You have the data distiller on the left, whereby you are able to explore the data, engineer the different features, you share it with your AI ML environment. Once you’ve done that, you train and score the models, and then you bring it back into Experience Platform apps. Cool. Next slide, please. So the challenge that our customers have come across is how to train AI ML models with ÃÛ¶¹ÊÓÆµ data. And that’s what this pipeline allows you to do. We previously didn’t have this pipeline. There was the question mark referred to the fact that we didn’t have this pipeline before. But now you’re able to actually leverage this with data distiller to allow you to share the ÃÛ¶¹ÊÓÆµ Experience Platform data with your AI ML environment, such as Databricks, SageMaker, etc. You run your model, and then you’re able to bring these features back into, such as propensity models, right? So you can bring this propensity score back into your profile of your customers.
Next slide, please. And we have templates available as well, whereby you are able to, for quick starting, if you want to learn more about the Python notebook templates, we’re going to share the links with you on this Python notebook as well. They are templates which allows you to, for example, train a propensity model. It’s called the propensity model and then ingest and activate on the model prediction. So we are able to explore, train, activate. So you’re able to do all these with this AI ML pipeline. It allows you to bring your scoring data into AEP. School on the next page. So our product manager at ÃÛ¶¹ÊÓÆµ have built a really concise documentation around all the capabilities that are available. And on top of that, they have actually provided you the solution with the scripts, some example scripts that you can leverage, whether it is for dashboarding, whether it is for SQL queries that you can run. You can click on this link when we send you the slides.
You can explore these. It’s really, really powerful. A lot to learn from there. That’s it from me.
Excellent. Thank you. Thank you, Russell. That is all of the content that we have for you guys today on Data Distiller. Hopefully you should now have a good idea of the main business drivers for using Data Distiller, as well as those key use cases that we walked you through. Now we can open up for Q&A. So if you do have any questions, feel free to post those in the either the Q&A or the chat pod. I can see we have access to both of them. So we’ll give a few minutes for questions to come through. And while we do that, I’m actually going to launch a quick poll. This is just to give us some high level feedback on the webinar session today, as well as gain ideas for future webinars. So I’m going to go ahead and launch that poll. Just two questions there.
And let’s take a look at the questions. So I see that we do have one from Abhishek. Yep, I can see that as well. So experience even data to profile data using Data Distiller. I think what we are seeing here, Abhishek, is that you can combine data so that you are able to have a derived data set out of those data set. Yes, you can see that we are able to actually, if you’re referring to use case two, yes, by leveraging your event data set and profile data set, you can derive additional information and then bring this into another data set. Yes.
Yep, exactly. And I think in this use case, so they were also not just deriving, but they were also enriching from a look up data set of the product pricing information as well.
Any other questions? All right. Great. Well, thank you everyone for attending today. We will be distributing the recording and the slides from today’s webinar and look forward to future webinars in the series.
Thank you, everyone. Thank you. Cheers. Bye-bye. Bye.
Key takeaways
Overview and Purpose of Data Distiller
Data Distiller is designed to provide an overview of its key use cases and customer solutions. It supports data architects, data engineers, data scientists, and marketing entities by enabling data segmentation, curation, and contextual data addition.
Primary Use Cases
The webinar highlighted five primary use cases for Data Distiller:
- Creating brand-centric data models for a South American retail company.
- Enriching next-best-offer emails with personalized data for a telecommunications company.
- Optimizing data for reporting and attribution modeling for a luxury retailer.
- Customizing insights for operational dashboarding.
- Leveraging AI and ML feature pipelines for training and scoring models.
Key Capabilities
Data Distiller offers SQL-based processing, scalable data management, ÃÛ¶¹ÊÓÆµ-defined functions, automation and scheduling, monitoring and alerting, and integration with third-party tools for extended insights.
Data Transformation and Enrichment
Data Distiller allows for cleaning, shaping, manipulating, and enriching data. This includes standardizing data, reshaping data formats, augmenting data for granularity, and deriving additional attributes for downstream use.
Operational Dashboards and AI/ML Integration
Data Distiller enables the creation of real-time operational dashboards and supports AI/ML feature pipelines. This allows users to train models with ÃÛ¶¹ÊÓÆµ data, score models, and integrate predictions back into ÃÛ¶¹ÊÓÆµ Experience Platform for enhanced data-driven decision-making.