Import PDF Data into Excel with Power Query
Historically we have been able to get PDF data into Excel using a standard copy/paste or some third-party tools. Now we have another option. We can use Power Query to retrieve values from PDF files. This post walks through the basics.
We have a PDF file, perhaps it looks like this:
We would like a way to extract the commission table into Excel, like this:
We can accomplish this using Power Query.
Note: depending on when you are reading this and which version of Excel you are using, you may or may not have the From PDF option. At the time of this writing, this data source is available in O365. If your Excel version doesn’t support this, you may be able to accomplish it with Power BI instead.
We’ll walk through these steps together.
- Connect to PDF
- Select desired table
- Load to Excel
Step 1: Connect to PDF
First, we use the Data > Get Data > From File > From PDF command. We then browse to the desired PDF file and click Import. This will open the Navigator dialog.
Note: this command is found within the Get & Transform group of commands. Depending on your version of Excel, you may not have this command. At the time of this writing, this data source is available in O365. If your Excel version doesn’t support this, you may be able to accomplish it with Power BI instead.
Step 2: Select desired table
In the Navigator dialog, we can see a list of items that are recognized by Power Query that we can import.
We can see Tables and Pages, and when we select one we can view a preview:
When we inspect the PDF file, we note that the sales commission values were not stored in a formal “table” … but were instead created with tabs. In other words, Power Query was able to detect the tabs and translate that into the type of tabular table data we want.
When we inspect the Table002, we see this:
When we look at the PDF, we notice that this section wasn’t stored in a true “table” either … this was a bulleted list. So, Power Query translated this bulleted list into an importable table.
We can also view the entire page, like this:
Also, if the tabular data you want to import spans multiple PDF pages, you can check the Select multiple items checkbox. Select all desired items and once in Power Query, you can Append multiple tables into a single table to send to Excel.
Step 3: Load to Excel
Since we would like to retrieve the commission table, we select it:
Since our data is clean and ready to go, we use the Load > Load To command and send the results back to an Excel table and apply the desired format:
If we needed to clean up the data, for example by removing extra rows, columns, unpivoting, adding calculated columns, and so on, then we could click the Transform Data button instead. Then from within the Power Query Editor, we could apply any desired transformations before sending the data back to Excel.
If you have any other alternatives to importing PDF data into Excel, or ways to improve this post, please share by posting a comment below.
If you would like to practice, feel free to download the PDF practice file below.
- Practice file: Agreement.pdf
Excel is not what it used to be.
You need the Excel Proficiency Roadmap now. Includes 6 steps for a successful journey, 3 things to avoid, and weekly Excel tips.
Want to learn Excel?
Access all Undergrad and Masters lessons with a Campus Pass or CPE Pass. Includes on-demand training plus live office hours.