Import PDF Data into Excel with Power Query
Historically we have been able to get PDF data into Excel using a standard copy/paste or some third-party tools. Now we have another option. We can use Power Query to retrieve values from PDF files. This post walks through the basics.
We have a PDF file, perhaps it looks like this:
We would like a way to extract the commission table into Excel, like this:
We can accomplish this using Power Query.
Note: depending on when you are reading this and which version of Excel you are using, you may or may not have the From PDF option. At the time of this writing, this data source is available in O365. If your Excel version doesn’t support this, you may be able to accomplish it with Power BI instead.
We’ll walk through these steps together.
- Connect to PDF
- Select desired table
- Load to Excel
Step 1: Connect to PDF
First, we use the Data > Get Data > From File > From PDF command. We then browse to the desired PDF file and click Import. This will open the Navigator dialog.
Note: this command is found within the Get & Transform group of commands. Depending on your version of Excel, you may not have this command. At the time of this writing, this data source is available in O365. If your Excel version doesn’t support this, you may be able to accomplish it with Power BI instead.
Step 2: Select desired table
In the Navigator dialog, we can see a list of items that are recognized by Power Query that we can import.
We can see Tables and Pages, and when we select one we can view a preview:
When we inspect the PDF file, we note that the sales commission values were not stored in a formal “table” … but were instead created with tabs. In other words, Power Query was able to detect the tabs and translate that into the type of tabular table data we want.
When we inspect the Table002, we see this:
When we look at the PDF, we notice that this section wasn’t stored in a true “table” either … this was a bulleted list. So, Power Query translated this bulleted list into an importable table.
We can also view the entire page, like this:
Also, if the tabular data you want to import spans multiple PDF pages, you can check the Select multiple items checkbox. Select all desired items and once in Power Query, you can Append multiple tables into a single table to send to Excel.
Step 3: Load to Excel
Since we would like to retrieve the commission table, we select it:
Since our data is clean and ready to go, we use the Load > Load To command and send the results back to an Excel table and apply the desired format:
If we needed to clean up the data, for example by removing extra rows, columns, unpivoting, adding calculated columns, and so on, then we could click the Transform Data button instead. Then from within the Power Query Editor, we could apply any desired transformations before sending the data back to Excel.
If you have any other alternatives to importing PDF data into Excel, or ways to improve this post, please share by posting a comment below.
If you would like to practice, feel free to download the PDF practice file below.
- Practice file: Agreement.pdf
Excel is not what it used to be.
You need the Excel Proficiency Roadmap now. Includes 6 steps for a successful journey, 3 things to avoid, and weekly Excel tips.
Want to learn Excel?
Access all Undergrad and Masters lessons with a Campus Pass or CPE Pass. Includes on-demand training plus live office hours.
What if I do not have the pdf option in my “get data from file” menu path? Is this additional functionality I must purchase for excel?
This is free, and is being made available to O365 subscribers over time. If you have Excel O365 for Windows, perhaps you can get it sooner by changing your update channel to Insiders.
Oh I was so excited when I saw your post, but like Jane, there’s no pdf option in my Excel. I have Excel 365 but updates take 6 – 9 months to show up after I see a post of a new feature. Thanks Jeff for the video and details, unfortunately I’ll need to just bookmark and wait for Excel to get the update.
Love your site and all the help you provide. It’s truly invaluable.
Hi Shair … you may be able to receive this update sooner by changing your Update Channel to Insiders. In summary, Microsoft pushes updates out according to the schedule defined in your office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster.
Have a great weekend!
I get how to copy a folder of PDF’s and import them into Excel. By using Excel and going to the Data tab of the ribbon > Get Data > From File .etc.
Problem is it puts all the data into one Excel page. I would like it to create separate pages for each PDF.
Any suggestions or coding would be great.
What are the steps to update your channel, in order to receive faster updates, for example to receive from pdf option?
It depends on how office was installed, but here is a Microsoft document that should be able to get you started:
Does this work in Excel 2010?
Jeff, I’m wondering if you have some “preview version” available to people such as yourself but not necessarily widely distributed. The PDF access doesn’t appear to be included in version 2005 of Microsoft 365 that was released just yesterday, so I guess I’ll have to wait a while longer to try it out. Thanks for the tip.
Andrew … I have a 365 subscription and my update channel is set to insiders (beta). You may be able to receive this update sooner by changing your Update Channel to Insiders as well. In summary, Microsoft pushes updates out according to the schedule defined in your Office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster. Your IT dept may be able to help as well if it is a corporate computer.
Do you know which excel version has this release?
I’m on Version 2005 (Build 12827.20336) and I don’t appear to have it yet.
You may be able to receive this update sooner by changing your Update Channel to Insiders. In summary, Microsoft pushes updates out according to the schedule defined in your office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster. If you are on a PC at your company, your IT department may be able to help with this.
Hi, I’m trying to build a query to import data from supplier quotes received in PDF. The source documents are uniform in that they contain quote number, date and so on at the header. The issue I run into is the body of the quote will vary in length, sometimes one page, sometimes several.
The import process is different to importing from other sources, in that you need to select the tables to import. How can I build an importer that caters for the fact that the table containing the data will sometimes be on one page, sometimes on two, or more?
is there a way to essentially do the opposite? i want to print my query to a pdf and update the pdf if the query updates, like with a paste link in word
How can I tell PQ to select all the available tables in a PDF file dynamically. I am having a situation where I have multiple tables but it may vary, sometimes the file has only one table, sometime two or three. However, The table structure remains absolutely same.
Is is possible to read data from a fillable PDF form including the fields that are populated by users (Text, checkbox, dropdown ect.)
Is there a way to take a multiple page pdf to one single tab within the worksheet instead of bunches of multiple tabs as it does by using the above technique?
Yes you can. Just create a new sheet, and click on the “Queries & Connections” button on the ribbon for the Data tab, and on the right side taskbar, right-click on the Querie you want to Load, then go to Load To…and you can choose “Existing Worksheet” and then it will insert it into the cell you are in on that sheet. Then repeat so you are placing each set of data where you want it to be on that sheet.