Import PDF Data into Excel with Power Query

Historically we have been able to get PDF data into Excel using a standard copy/paste or some third-party tools. Now we have another option. We can use Power Query to retrieve values from PDF files. This post walks through the basics.

Video

Objective

We have a PDF file, perhaps it looks like this:

 

We would like a way to extract the commission table into Excel, like this:

We can accomplish this using Power Query.

Note: depending on when you are reading this and which version of Excel you are using, you may or may not have the From PDF option. At the time of this writing, this data source is available in O365. If your Excel version doesn’t support this, you may be able to accomplish it with Power BI instead. 

Steps

We’ll walk through these steps together.

  1. Connect to PDF
  2. Select desired table
  3. Load to Excel

Step 1: Connect to PDF

First, we use the Data > Get Data > From File > From PDF command. We then browse to the desired PDF file and click Import. This will open the Navigator dialog.

Note: this command is found within the Get & Transform group of commands. Depending on your version of Excel, you may not have this command. At the time of this writing, this data source is available in O365. If your Excel version doesn’t support this, you may be able to accomplish it with Power BI instead. 

Step 2: Select desired table

In the Navigator dialog, we can see a list of items that are recognized by Power Query that we can import.

We can see Tables and Pages, and when we select one we can view a preview:

When we inspect the PDF file, we note that the sales commission values were not stored in a formal “table” … but were instead created with tabs. In other words, Power Query was able to detect the tabs and translate that into the type of tabular table data we want.

When we inspect the Table002, we see this:

When we look at the PDF, we notice that this section wasn’t stored in a true “table” either … this was a bulleted list. So, Power Query translated this bulleted list into an importable table.

We can also view the entire page, like this:

Also, if the tabular data you want to import spans multiple PDF pages, you can check the Select multiple items checkbox. Select all desired items and once in Power Query, you can Append multiple tables into a single table to send to Excel.

Step 3: Load to Excel

Since we would like to retrieve the commission table, we select it:

Since our data is clean and ready to go, we use the Load > Load To command and send the results back to an Excel table and apply the desired format:

If we needed to clean up the data, for example by removing extra rows, columns, unpivoting, adding calculated columns, and so on, then we could click the Transform Data button instead. Then from within the Power Query Editor, we could apply any desired transformations before sending the data back to Excel.

Conclusion

If you have any other alternatives to importing PDF data into Excel, or ways to improve this post, please share by posting a comment below.

If you would like to practice, feel free to download the PDF practice file below.

 

 

 

Posted in ,

Jeff Lenning

I love sharing the things I've learned about Excel, and I built Excel University to help me do that. My motto is: Learn Excel. Work Faster.

Stay Connected

If you'd like to be notified when I write a new Excel article, enter your name and email and click SUBSCRIBE. You can unsubscribe anytime, and I will never sell your email address.

11 Comments

  1. Jane Aldea on June 5, 2020 at 8:00 am

    What if I do not have the pdf option in my “get data from file” menu path? Is this additional functionality I must purchase for excel?

    • Jeff Lenning on June 5, 2020 at 8:13 am

      Hi Jane,
      This is free, and is being made available to O365 subscribers over time. If you have Excel O365 for Windows, perhaps you can get it sooner by changing your update channel to Insiders.
      Thanks
      Jeff

  2. Shair on June 5, 2020 at 9:28 am

    Oh I was so excited when I saw your post, but like Jane, there’s no pdf option in my Excel. I have Excel 365 but updates take 6 – 9 months to show up after I see a post of a new feature. Thanks Jeff for the video and details, unfortunately I’ll need to just bookmark and wait for Excel to get the update.
    Love your site and all the help you provide. It’s truly invaluable.

    • Jeff Lenning on June 5, 2020 at 9:47 am

      Hi Shair … you may be able to receive this update sooner by changing your Update Channel to Insiders. In summary, Microsoft pushes updates out according to the schedule defined in your office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster.
      Have a great weekend!
      Thanks
      Jeff

  3. DOUG SPROWLS on June 5, 2020 at 11:03 am

    What are the steps to update your channel, in order to receive faster updates, for example to receive from pdf option?
    Thank You

  4. Vivek on June 6, 2020 at 2:47 am

    Does this work in Excel 2010?

  5. Andrew Parker on June 10, 2020 at 12:58 pm

    Jeff, I’m wondering if you have some “preview version” available to people such as yourself but not necessarily widely distributed. The PDF access doesn’t appear to be included in version 2005 of Microsoft 365 that was released just yesterday, so I guess I’ll have to wait a while longer to try it out. Thanks for the tip.

    • Jeff Lenning on June 12, 2020 at 8:46 am

      Andrew … I have a 365 subscription and my update channel is set to insiders (beta). You may be able to receive this update sooner by changing your Update Channel to Insiders as well. In summary, Microsoft pushes updates out according to the schedule defined in your Office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster. Your IT dept may be able to help as well if it is a corporate computer.
      Thanks
      Jeff

  6. Joel on June 11, 2020 at 7:13 pm

    Do you know which excel version has this release?

    I’m on Version 2005 (Build 12827.20336) and I don’t appear to have it yet.

    • Jeff Lenning on June 12, 2020 at 8:48 am

      You may be able to receive this update sooner by changing your Update Channel to Insiders. In summary, Microsoft pushes updates out according to the schedule defined in your office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster. If you are on a PC at your company, your IT department may be able to help with this.
      Thanks
      Jeff

Leave a Comment





For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

I agree to these terms.