Retrieve Values from Many Workbooks

Alright my friends, this week I’ll tackle a question I received about retrieving values from workbooks. Here is the basic idea of the question. I have a folder with several hundred workbooks, and each workbook may contain any number of worksheets. For example, some workbooks have two sheets, some have three sheets, and some have up to thirty worksheets. I need to create a single summary sheet that retrieves specific cell values from every sheet in all of these workbooks. Currently, I copy/paste, but, I want to automate this lookup process. Thanks Regina for your question, and for inspiring this post.

Objective

Let’s just be crystal clear about our objective. Here is the set up. We have a folder that contains hundreds of Excel files. And, just to make this a bit trickier, let’s assume that at any time, new workbooks can be added to the folder, or, some workbooks can be removed. In other words, our solution needs to adapt to a changing number of workbooks, and essentially grab values from all workbooks in the folder, however many there are at that moment.

Plus, workbooks do not contain the same number of worksheets. Some contain two, some three, and some up to thirty. Our solution needs to retrieve values from all sheets, plus, sheets may be added or removed at any time. And, there is no naming convention for the worksheets…they are named without any consistent pattern. So, our solution needs to adapt to this crazy and dynamic situation.

Plus, each worksheet may contain one or more values to retrieve. And, the values can appear on any row. So, it isn’t like we can use a formula that retrieves a value from say row 10…the value can appear on any row, and, there may be more than one value on each sheet. Craziness!

Now, we finally do get a bit of good news. The values we need are always stored in column C, and, Column B always contains the word Total next to the value. For example, here is the basic structure of each worksheet.

Example worksheet by Jeff Lenning

But remember, the total can appear on any row, and, some sheets have multiple totals, like this one.

Accounting sheet by Jeff Lenning

So, our mission, should we choose to accept it, is to create a single summary sheet that retrieves values from various rows in any number of worksheets, in any number of workbooks in the folder. And, we want to be able to refresh our summary sheet at any time to capture the current values. Also, we don’t want to write any VBA code. And, going forward, we want to be able to update our summary in less than 10 seconds since we have to update it every day.

Can we do this? Yes, of course. We’ll explore how to accomplish all of this with a single Get & Transform query. Let’s get to it.

Details

We’ll accomplish our objective with three basic steps:

  • Create a basic query
  • Customize the query
  • Return the results

This is going to be so fun!

Note: The steps below are presented with Excel for Windows 2016. If you are using a different version of Excel, please note that the features presented may not be available or you may need to download and install the Power Query Add-in.

Create a basic query

First we need to create a query that retrieves a list of all workbooks in the folder. This can be done by selecting the Data > New Query > From File > From Folder icon. Excel displays the Folder dialog (as shown below), and you can type or Browse to identify the folder that contains all of the Excel workbooks.

Folder dialog by Jeff Lenning

Clicking OK will display a preview dialog that lists the workbooks in the folder. Click the Edit button to open the Query Editor as shown below.

Query editor by Jeff Lenning

At this point, we have a query that retrieves the folder contents, and provides a list of all workbooks in the folder. With this basic query set up, it is time to customize it.

Note: if the folder contains a mix of file types, you can filter the Extension column to retain only the xlsx files.

Customize the query

At this point, we have a query that includes a list of the workbooks in the folder. Now we need to add a new column to include each worksheet in each of the workbooks. To do this from within the Query Editor, we select the Add Column > Add Custom Column command.

In the resulting dialog, we enter a name for our new column, something like Sheets would be fine, and then the formula:

=Excel.Workbook(File.Contents([Folder Path]&[Name]))

This is displayed below:

Custom formula by Jeff Lenning

Click OK to add the new column to the query.

Our new column, Sheets, needs to be expanded. We can do this by clicking the little icon in the column header, or the Transform > Structured Column > Expand command. We can just expand all columns for now.

The query editor now includes all sheets and all workbooks, as shown below.

Updated query editor by Jeff Lenning

Note: if needed, apply filters to remove any rows that aren’t needed.

Next, we need to bring the sheet contents from columns B and C into the query. So, we expand the Sheets.Data column. We do this by clicking the little icon in the header, or by clicking the Transform > Structured Column > Expand icon. Since the Total label is in column B, and the Value in column C, we don’t need to bring in all columns, we can just check the Column2 and Column3 boxes.

Now our query includes the values from the worksheets, as shown below.

Updated query by Jeff Lenning

Now we have all the data we need! We just have to do some clean up and remove the rows and columns we don’t need.

The only query rows we need to keep are those with the text string Total in Column2. So, we apply a filter by using the drop-down for Column2, and selecting Total. This leaves only the rows that include the total values we are seeking.

Next, we can remove any columns we don’t want or need returned to our summary sheet. We can pick and choose the columns and use the Remove Columns icon for this. Here, I’ve removed all columns except the ones that contain the workbook name, worksheet name, and value. The updated query is shown below.

Query results by Jeff Lenning

Now, all the hard work is done! And we can just move on to the last step.

Note: thank you Ken Puls (Power Query guru) for your post.

Return the results

Finally, we can use the Close and Load command to return the query results to our summary sheet, as shown below.

Summary worksheet by Jeff Lenning

Unbelievable…we did it! And, the best part is that we don’t need to go through all of that trouble next time we want to update the summary. We can simply right-click the results table and select Refresh. When we do, Excel goes back to the folder, and automatically includes any files that were added since last time. Plus, it automatically includes any new worksheets! And, it will retrieve the totals from all sheets, regardless of how many there are on each sheet or which row they are in. Wow!

But Jeff, come on man, we only had 3 files, I could have copy/pasted faster than that. Well, maybe, but, since this summary needs to be updated each day, week, or month, the Refresh will certainly be faster next period. Out of curiosity, I copied these three files, and pasted, pasted, pasted, and pasted, over an over, until I had about 420 Excel files in the folder. Then I hit Refresh. 6.76 seconds later, my summary table included 1,680 rows of data…freaking awesome!

If you have any other options to accomplish this task, or any other fun Get & Transform query tips, please share by posting a comment below…thanks!

Resources

 

 

 

This article was written by Jeff Lenning

11 comments:

  1. George
    Reply

    Jeff, I had the same problem, but I worked it out somewhat in a different way. This example is used in a financial report for a year to date and a month report along with a previous YTD and month report all on one report. You pick/choose a report by picking from a drop down list indicating month and year. Once the timeframe is chosen all figures, calculations are updated to reflect that timeframe
    At present my workbook is looking through 27 worksheets containing 6 columns and 229 rows of numbers of which 1 column is text.
    In my program all worksheet reside in one workbook and each workbook has an ID name that reflects mth & yr. and each row has a individual account number that needs to be modified to reflect the month and year. This is done by running a VBA program which take seconds. Now one the master worksheet I have a number of hidden columns that will tie in the month and year from those worksheets by a vlookup formula and from that row where I want to return a number to reflect that year and month I use the following formula which is an array.
    =VLOOKUP($E5,INDIRECT(“‘”&INDEX(SSheetList,MATCH(1,–(COUNTIF(INDIRECT(“‘”&SSheetList&”‘!$A$5:$G$232″),$E5)>0),0))&”‘!$A$5:$G$232”),3,FALSE)
    I can add a new sheet and get results in less then 2 minutes and once added I can pick from my drop down mth. yr. list and have the worksheet updated in less than 10 sec.
    Now this didn’t get made in a day or two, it took me a couple of days just to get this formula in place and working but now that I have it, it saves me 100’s of analyze hours.

    1. Jeff Lenning Post author
      Reply

      That sounds awesome…I love hearing how Excel can help save time, and how there are typically multiple ways to accomplish any given task. Thanks for sharing your comment!

  2. Mindy
    Reply

    Can you do this if the files are in multiple folders? Can it bring back the name of the folder that it is in?

    I ask because I’m collecting a large amount of A/P data that the main folder is information received, then it goes down to the vendor and from there has the file containing the ID number, address, etc. for that specific vendor.

    1. Jeff Lenning Post author
      Reply

      Hi Mindy! Yes and yes! This same exact query will include all files AND subfolders in the source folder. The query includes a column for full path (folder) as well. Totally awesome!
      Thanks
      Jeff

  3. Mindy
    Reply

    Wow! that IS totally awesome! thank you!

    1. Jeff Lenning Post author
      Reply

      Welcome 🙂

  4. Lisa
    Reply

    I use the Power Query, and love it. It is nice to be able to pull in information from other Workbooks, without changing the data, or missing some. We use it to combine several Sheets into one, for a month report, and then in another file, use Power Query to combine the months into the report for a YTD. We use 2013, and it does not have the Query option, so I learned the Power Query. Really nice and easy to use, once you understand it. I haven’t tried to have it figure the number of sheets per book, but that is something I will have to try. Thank you for another great post!

    1. Jeff Lenning Post author
      Reply

      Power Query is a game changer…I love it too 🙂

  5. Torstein Johnsen
    Reply

    Thanks Jeff! Your explanation is excellent and Power Query is avesome. . I have troubled with the Query formulas several times! They seems to be case sensitive!

    1. Jeff Lenning Post author
      Reply

      Ah, yes, they are case sensitive! A great book for learning more is M is for (Data) Monkey.
      Thanks
      Jeff

  6. Moshe Slonim
    Reply

    Thank you Jeff.
    Your explanations and demos are very helpful.
    Moshe

Leave a Reply

Your email address will not be published. Required fields are marked *