Import PDF Data into Excel with Power Query

Historically we have been able to get PDF data into Excel using a standard copy/paste or some third-party tools. Now we have another option. We can use Power Query to retrieve values from PDF files. This post walks through the basics.

Video

Objective

We have a PDF file, perhaps it looks like this:

 

We would like a way to extract the commission table into Excel, like this:

We can accomplish this using Power Query.

Note: depending on when you are reading this and which version of Excel you are using, you may or may not have the From PDF option. At the time of this writing, this data source is available in O365. If your Excel version doesn’t support this, you may be able to accomplish it with Power BI instead. 

Steps

We’ll walk through these steps together.

  1. Connect to PDF
  2. Select desired table
  3. Load to Excel

Step 1: Connect to PDF

First, we use the Data > Get Data > From File > From PDF command. We then browse to the desired PDF file and click Import. This will open the Navigator dialog.

Note: this command is found within the Get & Transform group of commands. Depending on your version of Excel, you may not have this command. At the time of this writing, this data source is available in O365. If your Excel version doesn’t support this, you may be able to accomplish it with Power BI instead. 

Step 2: Select desired table

In the Navigator dialog, we can see a list of items that are recognized by Power Query that we can import.

We can see Tables and Pages, and when we select one we can view a preview:

When we inspect the PDF file, we note that the sales commission values were not stored in a formal “table” … but were instead created with tabs. In other words, Power Query was able to detect the tabs and translate that into the type of tabular table data we want.

When we inspect the Table002, we see this:

When we look at the PDF, we notice that this section wasn’t stored in a true “table” either … this was a bulleted list. So, Power Query translated this bulleted list into an importable table.

We can also view the entire page, like this:

Also, if the tabular data you want to import spans multiple PDF pages, you can check the Select multiple items checkbox. Select all desired items and once in Power Query, you can Append multiple tables into a single table to send to Excel.

Step 3: Load to Excel

Since we would like to retrieve the commission table, we select it:

Since our data is clean and ready to go, we use the Load > Load To command and send the results back to an Excel table and apply the desired format:

If we needed to clean up the data, for example by removing extra rows, columns, unpivoting, adding calculated columns, and so on, then we could click the Transform Data button instead. Then from within the Power Query Editor, we could apply any desired transformations before sending the data back to Excel.

Conclusion

If you have any other alternatives to importing PDF data into Excel, or ways to improve this post, please share by posting a comment below.

If you would like to practice, feel free to download the PDF practice file below.

 

 

 

Posted in ,

Jeff Lenning

I love sharing the things I've learned about Excel, and I built Excel University to help me do that. My motto is: Learn Excel. Work Faster.

Excel is not what it used to be.

You need the Excel Proficiency Roadmap now. Includes 6 steps for a successful journey, 3 things to avoid, and weekly Excel tips.

Want to learn Excel?

Our training programs start at $29 and will help you learn Excel quickly.

20 Comments

  1. Jane Aldea on June 5, 2020 at 8:00 am

    What if I do not have the pdf option in my “get data from file” menu path? Is this additional functionality I must purchase for excel?

    • Jeff Lenning on June 5, 2020 at 8:13 am

      Hi Jane,
      This is free, and is being made available to O365 subscribers over time. If you have Excel O365 for Windows, perhaps you can get it sooner by changing your update channel to Insiders.
      Thanks
      Jeff

  2. Shair on June 5, 2020 at 9:28 am

    Oh I was so excited when I saw your post, but like Jane, there’s no pdf option in my Excel. I have Excel 365 but updates take 6 – 9 months to show up after I see a post of a new feature. Thanks Jeff for the video and details, unfortunately I’ll need to just bookmark and wait for Excel to get the update.
    Love your site and all the help you provide. It’s truly invaluable.

    • Jeff Lenning on June 5, 2020 at 9:47 am

      Hi Shair … you may be able to receive this update sooner by changing your Update Channel to Insiders. In summary, Microsoft pushes updates out according to the schedule defined in your office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster.
      Have a great weekend!
      Thanks
      Jeff

    • Marty on November 20, 2021 at 10:13 pm

      I get how to copy a folder of PDF’s and import them into Excel. By using Excel and going to the Data tab of the ribbon > Get Data > From File .etc.

      Problem is it puts all the data into one Excel page. I would like it to create separate pages for each PDF.
      Any suggestions or coding would be great.

  3. DOUG SPROWLS on June 5, 2020 at 11:03 am

    What are the steps to update your channel, in order to receive faster updates, for example to receive from pdf option?
    Thank You

  4. Vivek on June 6, 2020 at 2:47 am

    Does this work in Excel 2010?

  5. Andrew Parker on June 10, 2020 at 12:58 pm

    Jeff, I’m wondering if you have some “preview version” available to people such as yourself but not necessarily widely distributed. The PDF access doesn’t appear to be included in version 2005 of Microsoft 365 that was released just yesterday, so I guess I’ll have to wait a while longer to try it out. Thanks for the tip.

    • Jeff Lenning on June 12, 2020 at 8:46 am

      Andrew … I have a 365 subscription and my update channel is set to insiders (beta). You may be able to receive this update sooner by changing your Update Channel to Insiders as well. In summary, Microsoft pushes updates out according to the schedule defined in your Office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster. Your IT dept may be able to help as well if it is a corporate computer.
      Thanks
      Jeff

  6. Joel on June 11, 2020 at 7:13 pm

    Do you know which excel version has this release?

    I’m on Version 2005 (Build 12827.20336) and I don’t appear to have it yet.

    • Jeff Lenning on June 12, 2020 at 8:48 am

      You may be able to receive this update sooner by changing your Update Channel to Insiders. In summary, Microsoft pushes updates out according to the schedule defined in your office settings. Typically, they are pushed out semi-annually. But, you can switch your update channel to monthly or insiders to get them faster. If you are on a PC at your company, your IT department may be able to help with this.
      Thanks
      Jeff

  7. Denis on November 2, 2020 at 6:49 pm

    Hi, I’m trying to build a query to import data from supplier quotes received in PDF. The source documents are uniform in that they contain quote number, date and so on at the header. The issue I run into is the body of the quote will vary in length, sometimes one page, sometimes several.

    The import process is different to importing from other sources, in that you need to select the tables to import. How can I build an importer that caters for the fact that the table containing the data will sometimes be on one page, sometimes on two, or more?

    Thanks
    Denis

  8. Kurt on July 22, 2021 at 4:54 pm

    is there a way to essentially do the opposite? i want to print my query to a pdf and update the pdf if the query updates, like with a paste link in word

  9. Manish on August 25, 2021 at 7:39 am

    How can I tell PQ to select all the available tables in a PDF file dynamically. I am having a situation where I have multiple tables but it may vary, sometimes the file has only one table, sometime two or three. However, The table structure remains absolutely same.

  10. Sean on August 31, 2021 at 1:40 am

    Is is possible to read data from a fillable PDF form including the fields that are populated by users (Text, checkbox, dropdown ect.)

  11. Ken Metts on March 3, 2022 at 12:13 pm

    Is there a way to take a multiple page pdf to one single tab within the worksheet instead of bunches of multiple tabs as it does by using the above technique?

    • Brandon de la Torre on August 5, 2022 at 2:45 pm

      Yes you can. Just create a new sheet, and click on the “Queries & Connections” button on the ribbon for the Data tab, and on the right side taskbar, right-click on the Querie you want to Load, then go to Load To…and you can choose “Existing Worksheet” and then it will insert it into the cell you are in on that sheet. Then repeat so you are placing each set of data where you want it to be on that sheet.

  12. Peter Turrell on August 8, 2023 at 1:51 am

    Is there a method to import a pdf file into Power Query that has a password attched to the pdf file?

    Thanks
    Peter

  13. Nicolaj Thorndahl on June 13, 2024 at 6:55 am

    Thangs for some great help. Just a small problem, some of the PDF’s that I want to load into Excel are actual Pictures, and not tables. Normaly I can use “On error goto xxx” in the vba code, but I cant get it to work with the “Call ActiveWorkbook.Connections(“Query – LOAD PDF TO SHEET”).Refresh” when Power Query loading the PDF that is not a table, but a picture the code creates a “error”. Can someone provide a code example to test if the pdf is a table or a picture ??

Leave a Comment