Retrieve Values from Many Workbooks
Alright my friends, this week I’ll tackle a question I received about retrieving values from workbooks. Here is the basic idea of the question. I have a folder with several hundred workbooks, and each workbook may contain any number of worksheets. For example, some workbooks have two sheets, some have three sheets, and some have up to thirty worksheets. I need to create a single summary sheet that retrieves specific cell values from every sheet in all of these workbooks. Currently, I copy/paste, but, I want to automate this lookup process. Thanks Regina for your question, and for inspiring this post.
Let’s just be crystal clear about our objective. Here is the set up. We have a folder that contains hundreds of Excel files. And, just to make this a bit trickier, let’s assume that at any time, new workbooks can be added to the folder, or, some workbooks can be removed. In other words, our solution needs to adapt to a changing number of workbooks, and essentially grab values from all workbooks in the folder, however many there are at that moment.
Plus, workbooks do not contain the same number of worksheets. Some contain two, some three, and some up to thirty. Our solution needs to retrieve values from all sheets, plus, sheets may be added or removed at any time. And, there is no naming convention for the worksheets…they are named without any consistent pattern. So, our solution needs to adapt to this crazy and dynamic situation.
Plus, each worksheet may contain one or more values to retrieve. And, the values can appear on any row. So, it isn’t like we can use a formula that retrieves a value from say row 10…the value can appear on any row, and, there may be more than one value on each sheet. Craziness!
Now, we finally do get a bit of good news. The values we need are always stored in column C, and, Column B always contains the word Total next to the value. For example, here is the basic structure of each worksheet.
But remember, the total can appear on any row, and, some sheets have multiple totals, like this one.
So, our mission, should we choose to accept it, is to create a single summary sheet that retrieves values from various rows in any number of worksheets, in any number of workbooks in the folder. And, we want to be able to refresh our summary sheet at any time to capture the current values. Also, we don’t want to write any VBA code. And, going forward, we want to be able to update our summary in less than 10 seconds since we have to update it every day.
Can we do this? Yes, of course. We’ll explore how to accomplish all of this with a single Get & Transform query. Let’s get to it.
We’ll accomplish our objective with three basic steps:
- Create a basic query
- Customize the query
- Return the results
This is going to be so fun!
Note: The steps below are presented with Excel for Windows 2016. If you are using a different version of Excel, please note that the features presented may not be available or you may need to download and install the Power Query Add-in.
Create a basic query
First we need to create a query that retrieves a list of all workbooks in the folder. This can be done by selecting the Data > New Query > From File > From Folder icon. Excel displays the Folder dialog (as shown below), and you can type or Browse to identify the folder that contains all of the Excel workbooks.
Clicking OK will display a preview dialog that lists the workbooks in the folder. Click the Edit button to open the Query Editor as shown below.
At this point, we have a query that retrieves the folder contents, and provides a list of all workbooks in the folder. With this basic query set up, it is time to customize it.
Note: if the folder contains a mix of file types, you can filter the Extension column to retain only the xlsx files.
Customize the query
At this point, we have a query that includes a list of the workbooks in the folder. Now we need to add a new column to include each worksheet in each of the workbooks. To do this from within the Query Editor, we select the Add Column > Add Custom Column command.
In the resulting dialog, we enter a name for our new column, something like Sheets would be fine, and then the formula:
This is displayed below:
Click OK to add the new column to the query.
Our new column, Sheets, needs to be expanded. We can do this by clicking the little icon in the column header, or the Transform > Structured Column > Expand command. We can just expand all columns for now.
The query editor now includes all sheets and all workbooks, as shown below.
Note: if needed, apply filters to remove any rows that aren’t needed.
Next, we need to bring the sheet contents from columns B and C into the query. So, we expand the Sheets.Data column. We do this by clicking the little icon in the header, or by clicking the Transform > Structured Column > Expand icon. Since the Total label is in column B, and the Value in column C, we don’t need to bring in all columns, we can just check the Column2 and Column3 boxes.
Now our query includes the values from the worksheets, as shown below.
Now we have all the data we need! We just have to do some clean up and remove the rows and columns we don’t need.
The only query rows we need to keep are those with the text string Total in Column2. So, we apply a filter by using the drop-down for Column2, and selecting Total. This leaves only the rows that include the total values we are seeking.
Next, we can remove any columns we don’t want or need returned to our summary sheet. We can pick and choose the columns and use the Remove Columns icon for this. Here, I’ve removed all columns except the ones that contain the workbook name, worksheet name, and value. The updated query is shown below.
Now, all the hard work is done! And we can just move on to the last step.
Note: thank you Ken Puls (Power Query guru) for your post.
Return the results
Finally, we can use the Close and Load command to return the query results to our summary sheet, as shown below.
Unbelievable…we did it! And, the best part is that we don’t need to go through all of that trouble next time we want to update the summary. We can simply right-click the results table and select Refresh. When we do, Excel goes back to the folder, and automatically includes any files that were added since last time. Plus, it automatically includes any new worksheets! And, it will retrieve the totals from all sheets, regardless of how many there are on each sheet or which row they are in. Wow!
But Jeff, come on man, we only had 3 files, I could have copy/pasted faster than that. Well, maybe, but, since this summary needs to be updated each day, week, or month, the Refresh will certainly be faster next period. Out of curiosity, I copied these three files, and pasted, pasted, pasted, and pasted, over an over, until I had about 420 Excel files in the folder. Then I hit Refresh. 6.76 seconds later, my summary table included 1,680 rows of data…freaking awesome!
If you have any other options to accomplish this task, or any other fun Get & Transform query tips, please share by posting a comment below…thanks!
- Sample data files (zipped): DataFiles
If you'd like to be notified when I write a new Excel article, enter your name and email and click SUBSCRIBE. You can unsubscribe anytime, and I will never sell your email address.
Want to learn Excel?
Our Campus Pass includes access to our entire Undergrad and Masters catalog. Gamification ensures it is the most fun you can have learning Excel :)