ÃÛ¶¹ÊÓÆµ

Introduction to ÃÛ¶¹ÊÓÆµ PDF Services API

ÃÛ¶¹ÊÓÆµâ€™s PDF Services API lets developers create, combine, export, and extract data from PDFs through powerful and flexible cloud-based APIs. In this session, learn how you can get started using ÃÛ¶¹ÊÓÆµ PDF Services API to integrate document experiences into your apps and customer experiences.

Transcript
All right, I think we can get started. So welcome, everybody. Thank you for coming to the session. I’m Ben Vandenberg. I’m Principal Developer Evangelist for our ÃÛ¶¹ÊÓÆµ Document Services, which is part of Document Cloud. To get started here today, what we’re going to do is we’re going to learn more about ÃÛ¶¹ÊÓÆµ PDF Services API and how you can use that for a lot of automation of different document processes. Now, if you joined us for the keynote, you saw all the different APIs that are available. We’re going to focus specifically on the PDF services part of this session. If you caught Ray’s session a little bit earlier, we’re just talking about PDF embed API, and we’re going to have some more sessions throughout today and tomorrow. But just as a kind of a after we go through this session, it’s if you go over to the networking section to Document Cloud, me and everybody else will be going over there. If you have any questions after the fact or you want to ask things, please feel free to do that. While we’re going through the session today, please also drop your questions into the session chat. You can also do it into the Q&A, but I think we’re a small enough group here today that we can do it all from within the chat. And feel free to also connect on Twitter, LinkedIn, Facebook, as I’d love to always hear from you and some of the things that you are working on every day. All right, so what we’re going to do as part of the session is first, I’m going to tell you a little bit about what is PDF services API, and we’re going to then learn how you can use it, some examples of how you can get started, and then some resources of where you can get it underneath your fingers and start using that yourself. So to start off, PDF services, along with the rest of document services, is really focused on how can you help transform documents in many of the ways that people transform using things like ÃÛ¶¹ÊÓÆµ Acrobat. So if you think of all the different ways that you work with PDFs, like creating PDFs, combining them together, exploring them into other formats, OCRing them so that you can make them searchable, all these types of actions are all part of our PDF services API. And when we look at those, there’s a number of different actions that are available depending on what you’re trying to do with your documents. Now, one of the first things that you can do is we have our PDF properties API that allows you to get information about your PDF to know whether it has been OCR’d, how many pages does it have, what fonts are contained in there, is it certified, is it signed, all that type of information we make available as part of our PDF properties, so it can help inform what type of actions that you need to take on your document. There’s also PDF Extract API, which gives you all of the content as JSON data from your actual PDFs. And for that, if you want to learn more about PDF Extract, stay tuned later on today where Joel is going to be showing some of the cool ways you can use some of that data. Now, when you’re trying to create PDFs, because that’s a lot of times what we’re trying to do, whether it’s taking HTML and turning that into a PDF, whether it’s taking data from and merging it into a Word document, which is our document generation API, or it’s taking PDFs and then converting them back into other formats, we have a number of those services that are available in there. As a quick note, if you’re interested in learning more about the Word and JSON data, have a look at Ray’s session later on, I believe it’s either, I think it’s tomorrow, where he’s going to go through some of the document generation API and how you can easily integrate that there. But when you’re trying to manipulate documents like split PDFs into multiple different smaller PDFs, or you’re trying to optimize them so that if they’re really, really large, you want to make them much smaller to load on a web page and linearize them, or if say you have scan documents that you need to make searchable, all our different services for optimizing your PDFs are available in there. And then if you’re wanting to actually transform by being able to protect your PDFs or replace certain pages or add and remove those, we also have those services available as well. But one thing that’s important to note is that when you do use PDF services API, when you subscribe to that, when you use it, all of those different services, whether it’s properties, document generation, PDF services or extract are all available as part of that. So if you start using our PDF services API and you want to leverage some of the other ones available in there, it’s all part of the same SDKs, it’s all part of the same APIs and so forth. Then ÃÛ¶¹ÊÓÆµ Sign, which becomes very complimentary to our PDF services, is a separate service, but you can use those in conjunction. And then PDF embed API, taking that output and then rendering that on a web page is also what’s available as well. And Ray, thanks a lot for clarifying on the timing for tomorrow. Okay, so one of the important things is also, well, I want to incorporate this into my project. So how do I do that? Well, we give a lot of different ways to do that. We have our SDKs for Node, Python for the extract API, Java.net, as well as REST-based APIs. And what we’ve also seen is a lot of times people are also using some low code tools like Power Automate. So we now have connectors into Power Automate for a lot of those for people within the Microsoft ecosystem there. So those are all available. But just to give you a little bit of a context of what you can do with this, let’s walk through a couple quick examples here. So let’s say I have a web page here where I have something that I needed to check when I upload a document here that the document has been signed with an electronic signature, whether that’s from ÃÛ¶¹ÊÓÆµ Sign or maybe they signed it inside of ÃÛ¶¹ÊÓÆµ Acrobat. I want to check to validate that this document is actually signed. And so the PDF properties API can give me that information. So in this case, if I’m uploading a PDF here, and I submit that, that can get the information and validate that this document has been signed using ÃÛ¶¹ÊÓÆµ Sign. And why do we know this? Well, if we scroll through here, this is actually the information that comes back from our properties API to help us understand. So here we can see if we scroll down to the document information here, we can see that the producer was ÃÛ¶¹ÊÓÆµ Sign. So we know that it was produced by ÃÛ¶¹ÊÓÆµ Sign. We can also get information like whether it was actually signed or not. And things like ÃÛ¶¹ÊÓÆµ Sign have encryption on the document to make sure that it’s tamper evident. We know that hasn’t been changed and we seal that with a digital certificate, which we can see from our properties here that it is certified and it is encrypted. So all of this information becomes really helpful for me to know, is this a valid document or is this not a valid document? And we provide that there. But along with that, we have a lot of the other information in here that can be really useful for a number of different scenarios. You can see that we have things like our font information that tells us what fonts are contained in that PDF. And these are really helpful as they might be used for things like checking branding and making sure that PDFs have appropriate branding inside of them. If you are working with compliances of certain PDFs like PDF A or PDF X or other compliances, you can see the information in there available to validate whether it has a certain compliance level that is reported by that PDF. And then if you’re also looking for some of the information like the XMP data, we do provide all the XMP data back in there so that you can extract that out and use that. But all this information we make available as part of the API so you can easily make decisions. So one example might be something like if you needed to optimize a document when it gets uploaded. So for example, if I look at say this document here, which I have open in ÃÛ¶¹ÊÓÆµ Acrobat, you’ll notice that I can’t actually select any of the text inside of here because, well, this was a scanned document. So ideally, what we want is that inside of our web application, we want to make sure that we optimize this document when it’s coming into our system so that we don’t have a whole bunch of scanned documents that aren’t searchable. So that’s where the PDF services with the OCR feature becomes helpful as we can use that to be able to optimize and normalize our documents as they come in. So if I go back to our app here, I have this form where I’m taking a document and I’m going to upload it. We’re going to take our input document here, which is the one that is the scanned document. And when I click on Submit, what it’s going to do is it’s going to take that, it’s passing it to our cloud based services, it’s OCRing that document and providing that document back. So then when we render it now on the page here, now we can see that it is actually searchable and selectable text in here so that it’s optimized. So that’s an example of a couple of different ways that you can use some of those services in the context of some of your applications. Along with that, one of the things that we’ve also done is we’ve made it easy for you to incorporate it into Worklist. So that’s an example of a web based application where you can leverage this. But you can also use it in a lot of scenarios where you might be using workflow engines, or you might be using business process management applications. And one example would be things like Microsoft Power Automate, where we’ve added all of those different actions so that you can add automated steps that when you’re, say, wanting to take a number of different PDFs and go through a compression or conversion process, you can do that in line directly within Microsoft Power Automate. Although if you wanted to leverage this in any other workflow solution or tool, you can do that simply using our APIs. So that then calls the question, well, how can you get started using this yourself? Because these are powerful, these are cool. But can I get my hands on this today? And the answer is yes, you can. So if we go to our web browser, and we go to home, and we go to ÃÛ¶¹ÊÓÆµ.io or developers.adobe.io, they’ll bring you to the landing page here. And if we go to our products, and we scroll down to document cloud, and we go to our document services API, we’ll see that it brings us to our landing page where we can learn a lot more about our PDF services, extract, PDF embed, all of these are available in here. But if you want to get started using this, you can go to get credentials. And you’ll need an ÃÛ¶¹ÊÓÆµ ID in order to create your credentials, I’m already logged in. And when you click on create credentials, this will ask you to log in using your creds. And you can say PDF services. And when you do this, it looks like I’m logged in as the wrong account here. When I do this, it’ll ask you for your credentials. And then it’ll give you the option to download which SDK you want to use, you can download the node, the Java, dotnet, etc. And when you download that one of the great things about it is, if I go over to the zip package that gets downloaded, we’ll see that we it includes directly inside of the package, our PDF services credentials, and our private key. So we don’t even have to go create any additional extra credentials, we can actually go in and once it downloads the zip file from when we create that, it will include that in the package so I can actually get started here. Now I am using the node SDK for my examples here. But we all of the examples are available across the different SDKs. So here we can then see that if I download this, one of the first things I’ll do here, they’ll say NPM install, make sure that all the dependencies are installed. And then if I go into the SRC folder here, we’ll see that all of those different actions are available as different items in here. So we can go into, let’s say we want to get the PDF properties first. We’ll see that there’s a couple examples in here. So we’ll do the PDF properties as file or as JSON, do the as JSON one here. And if I want to run any of these different actions, I just simply run the script in here. So I am in the root folder here. So I’m gonna say node, I’m gonna say SRC PDF properties, and say PDF properties as JSON. Now what this is doing as this is running, it’s provided the output there, but let me explain what it does here. So first off, the SDK we’re choosing here, and then it’s going to load in the credentials from the PDF services, credentials JSON file, which is for our JWT token authentication there. And once it does that, first thing, it’s going to just instantiate the execution context. And then from here, we’re asking it to set the input file here. So in this case, this is the PDF properties input, if I scroll up here to the resources folder, we’ll see that we provided a whole bunch of different sample documents that you can use for your own reference to give things a try. So if you want to use your own, you can use your own or you can use this at each of these files as means to get started. Now this example is this vendor security document that is attached in here. So if we take that, and let’s go back to our code here, I’m actually going to run the one with that outputs as a file. And when I render it, it’s going to output it into this output file. And it creates this JSON structure here. Let’s pretty this up. And so we’ll see then it provides that output that I can use that as part of my code to inform about the document here tells us all the font information, the other context, you name it. Great. So once I realized that that document has a certain characteristic, like whether it’s OCR or not, we would then use that to run some of our other actions. But this is basically getting the credentials, it’s informing what we’re going to have as our input here. So it’s setting the input based on that PDF properties input file there. And then we are specifying the what is it that is the action that we’re going to use. And you’ll notice that all these different patterns of how this is written is similar across all the different examples in here. Basically, it’s asking for the credentials, you’re providing the input, you’re telling it essentially, what is it going to do, in this case, it’s going to get the PDF properties. And then once it executes, we’re going to tell it to run and in this case, output that file. So simple concept. So this is for the PDF properties. But if I wanted to do something like the OCR that we saw before, we can see that under the OCR PDF section here, we can go in and see similarly, most of the beginning is the same, we’re creating your credentials. And in this case, we’re saying that we’re going to use an OCR operation on the document. And in here, we’re going to say that the input is this OCR input file. And the output is then going to save it as a file into this folder. Now, if we wanted to take this and store this into a folder, or you could do that, or another scenario is you might want to store this into AWS S3 bucket or Azure storage and, you know, these are portable, so you can put them into an Azure function or a Lambda function in order to put that in there. But we can see this return here that’ll say results, save as file, and outputs that into the file. So let’s go ahead and run this. We’ll say, node SRC, we’re going to say OCR, and it’s the OCR PDF.js. Okay, so that’s running that action. And we’ll take a quick second, and it’ll output that. Along with that, here we go. Let’s go ahead and open this up. And here’s our OCR document just as we saw before there. Cool. And again, if you have any questions as we’re going along, please feel free to drop them into the session chat and happy to answer them as we’re going along here. Okay, so along with that, we can then see we have OCR in our document, we’ve gotten properties. Similarly, if we want to combine PDF together or want to compress here, we have all these different actions available. Now, Ray is going to get more into things like the PDF, the document generation API. But you’ll see here that things like the document merge here, which is the document generation API is available also as samples in here. So in this case, if we wanted to say, generate a Word document. Similarly here, we have in this case, it is taking a JSON data object and a Word document. So we can see down below, it’s taking the Word document. So we have the…where is it here? There we go. So it’s taking the Word document from our resources. We have our merge template.docx. And we can choose to specify whether the document is going to be a docx file or a PDF file. And then once you take that, it’s going to generate and merge the data into that document. So if we go and say, OCR, remove that and say document merge, we’ll say merge document to docx. Run that. It’s going to take the JSON data that we have in line inside of here. Sorry, inside of here. It renders that out into the Word document. So then when we see this document that we have here, we will open it up. And then in Microsoft Word, we’ll see the merge document. Again, if you want to learn more about this, please go and check out Ray’s session tomorrow about document generation API. It’s definitely cool, but we can then see the output of document there. So all of these different tools are available in there for you to be able to easily get started in using. It’s really then a question of what is it that you need to take action on your different documents? Do you need to combine those documents together? Do you need to export them? Do you need to turn them into a different format, like take a PDF document and turn it into a Word or an Excel file? All of those are just different services that are easily available in here for you to be able to incorporate. So with that, I wanted to leave you also with a couple of resources and I’ll make the slides available in here. But a couple of the things in here is if you do want to learn more and get started, you can go to ÃÛ¶¹ÊÓÆµ IO, you can get credentials and you get a thousand free transactions available for you to be able to get started playing around and using that. And then you can choose to subscribe there. Along with that, you have all of the different SDKs as well as the REST APIs that you can leverage to integrate those into your projects. But if you do want to kind of join the conversation and learn more as you are going along, I also encourage you to go on to our community website where you can ask any questions that people have. We have a live and active community where people answer any of your questions as you’re going along on here. That is a great kind of community. Along with that, if you are also interested in learning more about any of those different services, if you go to the ÃÛ¶¹ÊÓÆµ Tech blog, which is medium.com slash ÃÛ¶¹ÊÓÆµ Tech, and you go into the document cloud section, we have a number of different articles that walk through different ways to use our different services. Like we can see Ray’s article here about using PDF services with an Amazon Lambda function, using PDF embed API with experience platform launch, creating your own schema. So all of these different articles are helpful ways for you to be able to get started in there. But with that, we have a few minutes left. So I wanted to take the last, say, five minutes. And if people have any questions, please feel free to drop them into the session chat or the Q&A and happy to answer them. So one of the common questions that sometimes comes up is whether we have some samples for things like our rest APIs. And those are available as part of ÃÛ¶¹ÊÓÆµ IO. So again, if you go to our website, and you go into our section around PDF services, you’ll see samples from our different documentation in here. And there’s also the API reference that allows you to utilize our rest based APIs. And you can also download postman collections to try using some of the different services that way as well. So SDKs, REST APIs, all those options are available in there. And then also, if you’re just trying to get a general idea, if you’re on the website here, you have each of the different what we call verbs in here, as well as you can see the sample in some of the different languages. Also, if you’re wanting to learn more about how to use the different services, if you go into our documentation on the website, and you go to PDF services API, this will break down each of the different things including your authentication, how to use this for different languages. So we can go in here and see, for example, for node, it breaks down the guide for the different languages in there for you to get started. All right, let’s have a quick look. Are there examples for converting an AEM sites web page into a PDF? So that’s an awesome question. I don’t have an example specifically with using AEM sites web pages into a PDF. But the create PDF action or the HTML to PDF action allows you to create PDFs from a couple different ways. One, you can provide a URL, and it will generate a PDF from that URL. So that might be one way to look at with AEM. Another way you can also do that is you can package up your HTML and any of your dependencies like your CSS and so forth into a zip file and upload it to our service to return back that way. Or if you render out a page as say HTML with all the like CSS inline that you just have as like one HTML file, you can also upload that to our services in option two. So that I don’t have a specific example as it relates to outputting from AEM, but each of those options would be available to you. All right. So we’re getting close to time. If you have any last minute questions, I am happy to answer them. If you have more questions, I’m also happy to go over to the document cloud networking section and learn more there. Is there any information about using API keys in different run modes in AEM? So good question, Joe. I think I would have to understand a little bit more the context because AEM is not my main set of tools. So I need to understand that a little bit better and maybe we need to get back to you on some of those questions. All right. So we are at time. Thank you so much for coming to the session. I hope this was a helpful session for you to get started and learn more how to use PDF services API. I hope to see some of you over in the networking document cloud section. But thank you so much for coming today and enjoy the rest of ÃÛ¶¹ÊÓÆµ Developers Live. Please also go check out later today, Joel, who’s going to be walking through the PDF extract API and how you can use that for extracting information out of PDFs.

Additional Resources

recommendation-more-help
3c5a5de1-aef4-4536-8764-ec20371a5186