# Retrieve Values from Many Workbooks

Alright my friends, this week I’ll tackle a question I received about retrieving values from workbooks. Here is the basic idea of the question. I have a folder with several hundred workbooks, and each workbook may contain any number of worksheets. For example, some workbooks have two sheets, some have three sheets, and some have up to thirty worksheets. I need to create a single summary sheet that retrieves specific cell values from every sheet in all of these workbooks. Currently, I copy/paste, but, I want to automate this lookup process. Thanks Regina for your question, and for inspiring this post.

## Objective

Let’s just be crystal clear about our objective. Here is the set up. We have a folder that contains hundreds of Excel files. And, just to make this a bit trickier, let’s assume that at any time, new workbooks can be added to the folder, or, some workbooks can be removed. In other words, our solution needs to adapt to a changing number of workbooks, and essentially grab values from all workbooks in the folder, however many there are at that moment.

Plus, workbooks do not contain the same number of worksheets. Some contain two, some three, and some up to thirty. Our solution needs to retrieve values from all sheets, plus, sheets may be added or removed at any time. And, there is no naming convention for the worksheets…they are named without any consistent pattern. So, our solution needs to adapt to this crazy and dynamic situation.

Plus, each worksheet may contain one or more values to retrieve. And, the values can appear on any row. So, it isn’t like we can use a formula that retrieves a value from say row 10…the value can appear on any row, and, there may be more than one value on each sheet. Craziness!

Now, we finally do get a bit of good news. The values we need are always stored in column C, and, Column B always contains the word Total next to the value. For example, here is the basic structure of each worksheet.

But remember, the total can appear on any row, and, some sheets have multiple totals, like this one.

So, our mission, should we choose to accept it, is to create a single summary sheet that retrieves values from various rows in any number of worksheets, in any number of workbooks in the folder. And, we want to be able to refresh our summary sheet at any time to capture the current values. Also, we don’t want to write any VBA code. And, going forward, we want to be able to update our summary in less than 10 seconds since we have to update it every day.

Can we do this? Yes, of course. We’ll explore how to accomplish all of this with a single Get & Transform query. Let’s get to it.

## Details

We’ll accomplish our objective with three basic steps:

• Create a basic query
• Customize the query
• Return the results

This is going to be so fun!

Note: The steps below are presented with Excel for Windows 2016. If you are using a different version of Excel, please note that the features presented may not be available or you may need to download and install the Power Query Add-in.

### Create a basic query

First we need to create a query that retrieves a list of all workbooks in the folder. This can be done by selecting the Data > New Query > From File > From Folder icon. Excel displays the Folder dialog (as shown below), and you can type or Browse to identify the folder that contains all of the Excel workbooks.

Clicking OK will display a preview dialog that lists the workbooks in the folder. Click the Edit button to open the Query Editor as shown below.

At this point, we have a query that retrieves the folder contents, and provides a list of all workbooks in the folder. With this basic query set up, it is time to customize it.

Note: if the folder contains a mix of file types, you can filter the Extension column to retain only the xlsx files.

### Customize the query

At this point, we have a query that includes a list of the workbooks in the folder. Now we need to add a new column to include each worksheet in each of the workbooks. To do this from within the Query Editor, we select the Add Column > Add Custom Column command.

In the resulting dialog, we enter a name for our new column, something like Sheets would be fine, and then the formula:

`=Excel.Workbook(File.Contents([Folder Path]&[Name]))`

This is displayed below:

Click OK to add the new column to the query.

Our new column, Sheets, needs to be expanded. We can do this by clicking the little icon in the column header, or the Transform > Structured Column > Expand command. We can just expand all columns for now.

The query editor now includes all sheets and all workbooks, as shown below.

Note: if needed, apply filters to remove any rows that aren’t needed.

Next, we need to bring the sheet contents from columns B and C into the query. So, we expand the Sheets.Data column. We do this by clicking the little icon in the header, or by clicking the Transform > Structured Column > Expand icon. Since the Total label is in column B, and the Value in column C, we don’t need to bring in all columns, we can just check the Column2 and Column3 boxes.

Now our query includes the values from the worksheets, as shown below.

Now we have all the data we need! We just have to do some clean up and remove the rows and columns we don’t need.

The only query rows we need to keep are those with the text string Total in Column2. So, we apply a filter by using the drop-down for Column2, and selecting Total. This leaves only the rows that include the total values we are seeking.

Next, we can remove any columns we don’t want or need returned to our summary sheet. We can pick and choose the columns and use the Remove Columns icon for this. Here, I’ve removed all columns except the ones that contain the workbook name, worksheet name, and value. The updated query is shown below.

Now, all the hard work is done! And we can just move on to the last step.

Note: thank you Ken Puls (Power Query guru) for your post.

### Return the results

Finally, we can use the Close and Load command to return the query results to our summary sheet, as shown below.

Unbelievable…we did it! And, the best part is that we don’t need to go through all of that trouble next time we want to update the summary. We can simply right-click the results table and select Refresh. When we do, Excel goes back to the folder, and automatically includes any files that were added since last time. Plus, it automatically includes any new worksheets! And, it will retrieve the totals from all sheets, regardless of how many there are on each sheet or which row they are in. Wow!

But Jeff, come on man, we only had 3 files, I could have copy/pasted faster than that. Well, maybe, but, since this summary needs to be updated each day, week, or month, the Refresh will certainly be faster next period. Out of curiosity, I copied these three files, and pasted, pasted, pasted, and pasted, over an over, until I had about 420 Excel files in the folder. Then I hit Refresh. 6.76 seconds later, my summary table included 1,680 rows of data…freaking awesome!

If you have any other options to accomplish this task, or any other fun Get & Transform query tips, please share by posting a comment below…thanks!

## Resources

Get a quick email notice when a new Excel article is available

• This field is for validation purposes and should be left unchanged.

This article was written by Jeff Lenning

## 40 comments:

1. George
Reply

Jeff, I had the same problem, but I worked it out somewhat in a different way. This example is used in a financial report for a year to date and a month report along with a previous YTD and month report all on one report. You pick/choose a report by picking from a drop down list indicating month and year. Once the timeframe is chosen all figures, calculations are updated to reflect that timeframe
At present my workbook is looking through 27 worksheets containing 6 columns and 229 rows of numbers of which 1 column is text.
In my program all worksheet reside in one workbook and each workbook has an ID name that reflects mth & yr. and each row has a individual account number that needs to be modified to reflect the month and year. This is done by running a VBA program which take seconds. Now one the master worksheet I have a number of hidden columns that will tie in the month and year from those worksheets by a vlookup formula and from that row where I want to return a number to reflect that year and month I use the following formula which is an array.
=VLOOKUP(\$E5,INDIRECT(“‘”&INDEX(SSheetList,MATCH(1,–(COUNTIF(INDIRECT(“‘”&SSheetList&”‘!\$A\$5:\$G\$232″),\$E5)>0),0))&”‘!\$A\$5:\$G\$232”),3,FALSE)
I can add a new sheet and get results in less then 2 minutes and once added I can pick from my drop down mth. yr. list and have the worksheet updated in less than 10 sec.
Now this didn’t get made in a day or two, it took me a couple of days just to get this formula in place and working but now that I have it, it saves me 100’s of analyze hours.

1. Jeff Lenning Post author
Reply

That sounds awesome…I love hearing how Excel can help save time, and how there are typically multiple ways to accomplish any given task. Thanks for sharing your comment!

2. Mindy
Reply

Can you do this if the files are in multiple folders? Can it bring back the name of the folder that it is in?

I ask because I’m collecting a large amount of A/P data that the main folder is information received, then it goes down to the vendor and from there has the file containing the ID number, address, etc. for that specific vendor.

1. Jeff Lenning Post author
Reply

Hi Mindy! Yes and yes! This same exact query will include all files AND subfolders in the source folder. The query includes a column for full path (folder) as well. Totally awesome!
Thanks
Jeff

3. Mindy
Reply

Wow! that IS totally awesome! thank you!

1. Jeff Lenning Post author
Reply

Welcome 🙂

4. Lisa
Reply

I use the Power Query, and love it. It is nice to be able to pull in information from other Workbooks, without changing the data, or missing some. We use it to combine several Sheets into one, for a month report, and then in another file, use Power Query to combine the months into the report for a YTD. We use 2013, and it does not have the Query option, so I learned the Power Query. Really nice and easy to use, once you understand it. I haven’t tried to have it figure the number of sheets per book, but that is something I will have to try. Thank you for another great post!

1. Jeff Lenning Post author
Reply

Power Query is a game changer…I love it too 🙂

5. Torstein Johnsen
Reply

Thanks Jeff! Your explanation is excellent and Power Query is avesome. . I have troubled with the Query formulas several times! They seems to be case sensitive!

1. Jeff Lenning Post author
Reply

Ah, yes, they are case sensitive! A great book for learning more is M is for (Data) Monkey.
Thanks
Jeff

6. Moshe Slonim
Reply

Thank you Jeff.
Your explanations and demos are very helpful.
Moshe

7. Sadie
Reply

I have a question about this. I’m not sure if this is even possible or if this is even the best way to do what I want to accomplish. However, what I am trying to accomplish is this:

– I have a file full of employee attendance records. In the attendance records I put for example “A” for absence, “AA” for approved absence, “SI” for shift interruption, etc.

– On a completely different sheet, I want all those individual attendance records to communicate to this other “Time Off Request” sheet so if in “Bob’s” attendance record he has an approved absence (“AA”) that it will pop up on the right date in my “Time Off Request” sheet.

So the value the time off request sheet will be looking for is always “AA” but it could be on any given day, and from all the attendance records in the file. This way I can look at one day and see how many people are scheduled to be out that day.

So is this possible and is it doable with using this Power Query function or is there a different thing I could do to achieve this? I’m not super knowledgeable with excel, but I am also not completely terrible with it. Haha. Any help would work. Thanks!

1. Jeff Lenning Post author
Reply

Hi Sadie,
I think you are trying to find the count of the number of people with a given code, A, AA, or SI. I believe the COUNTIFS function may be able to help, as it provides the count of the number of rows where multiple conditions are true. For example, where the number of codes is AA. It can also count based on date conditions as well.
I’ve written an article about COUNTIFS here:
https://www.excel-university.com/articles/cal-cpa/comparing-spreadsheet-lists-with-ease/
Hope it helps you get started!
Thanks
Jeff

8. Sadie King
Reply

Thank you 🙂 I did play around with the COUNTIFS and I don’t think it’s quite what I need. I honestly don’t know if what I am trying to accomplish can be done or not, so I might have to rethink how else I can do it using the Power Query or the COUNTIFS function. Thank you for the reply!

9. Clay
Reply

I found this article after a marathon of Google searches trying to find out how return the sum of similar cells across multiple workbooks. This seems to do the first steps of that (i.e., pulling the values from all of those workbooks). Is there another step or two in Power Query where I could combine/add all of these results? For example, I want to sum the values of all B7 cells from hundreds of workbooks.

1. Jeff Lenning Post author
Reply

Clay,
Several options here to sum up the resulting values. Probably the easiest to view just the sum is to select the column and then Transform > Statistics > Sum command.
Another option if you want to see the details would be to dump the results back to Excel and then click the Total Row checkbox on the results table. This is demonstrated here:
https://www.excel-university.com/get-transform-an-alternative-to-simple-pivottable-reports/
Another option if you want to provide a summary by item would be to use the Group By command in PQ. This is demonstrated here:
https://www.excel-university.com/use-get-transform-to-reconcile-two-lists/
Hope these help…thanks!

10. Clay
Reply

This was very helpful. Is it possible to add a column to show the row number of the source worksheet? Something like custom column > Row(Sheets.Data.Column1) ?

11. crasher
Reply

I just stumbled across this when trying to write a Macro to do the same thing – this has saved me a huge amount of time, thank you for sharing

12. Thomas
Reply

Thank you! This was extremely helpful for folders stored on my computer. However I was hoping you could help me draw a power query just like this from a Sharepoint folder. There is an option for Sharepoint folders, but in the custom columns where you entered “=Excel.Workbook(File.Contents([Folder Path]&[Name]))” I’m not sure what to put. If you could help me with this it would really increase the functionality of this powerful tool for me.

Thanks in advance!

1. Jesse
Reply

I have the same question. I am trying to retrieve values from a SharePoint folder, but the query mentioned above doesn’t work for a SharePoint folder.

13. Petio Ivanov
Reply

Thank you Jeff for posting this article. I had written a macro to combine multiple workbooks into a single one resulting in multiple sheets and was just about to figure out how to combine those sheets until I read your article which literally saved me hours. I have bookmarked this website and look forward to getting even more value out of it.

14. Jay Shukla
Reply

Let say I have workbook name “A”. I am fetching some data from the web on this A workbook.
I conditioned this workbook to refresh data every 60 min.

Now I created workbook name “B” through Power Query, I am getting all the data from A to B.

Will workbook A will refresh data without opening it so that I get updated data in workbook B?

15. Dave Maggs
Reply

Can this same process be used for a single workbook. I have a workbook with 1 sheet for each month, each sheet contains a list of all work done under a specific project code for the month. What I want is to show a list of all project codes used on a separate sheet without duplicates. I could copy and paste from each month, but I am all for automation, so is the process you have use applicable to what I want or is there another process. Thanks

16. Muhammad
Reply

Hey Jeff..u are awesome man…just one question why power query is not showing complete text in a cell. it just show only 45-50 characters in a cell while my original excel file contains more than 200 characters in a cell. please help

17. Michelle
Reply

This is awesome. I have an additional question. I have multiple worksheets in each file in the folder. Each worksheet has the same layout. I need to retrieve the data from all the worksheets in all the files in the folder. I would really appreciate your help. Thank you.

18. Shrikant
Reply

Hi Jeff,

First of all, thank you for this amazing idea.
I wanted to know whether it is possible to input values for column filter through an excel sheet. This would make it more dynamic and scalable. Please help me resolve this.

Regards
Shrikant

19. Jay Shukla
Reply

Hi Jeff,

Thank you for your this excellent article.

I have a question before doing all this.

suppose I have Imported data from one particular website by applying data refresh every day.

in this case,
1) will data will keep getting refreshed even if I don’t open an excel.
2) Also, imported data is not in a set format and so first I need to format in the table for my easy understanding.
Do we have any particular method to arrange a date so that no error comes while data gets refreshed?

Thanks

20. Brittany
Reply

Hi Jeff,

Thank you again for posting this. I have a question I’m hoping yourself or other commenters can assist with. I find when running this for a large number of data points (more than 100) that the query maxes out at 100 rows. I’m running excel 32-bit and my hunch is that’s the issue, but wondering if anyone else has come across this?

21. Chet
Reply

We have daily files setup in a hierarchy of Annual folder > Monthly Folder > Daily Files. So, can the concept in this article be applied to pull data from all the daily files within all of the monthly folders in the annual folder? If so, how does the new column formula need to be setup to accomplish it?

22. Danny Chen
Reply

Hi

I am actually looking for a way to get data from the last one or two worksheet on a single excel file. Does anyone know how to direct a formula to take values from daily incremented worksheets excel? For example, on the first day of the month I can only use a formula that checks the last worksheet for the data, on the second day, it will be the average of the values from day 1 to day 2 , on the third day it would be averaged values from day 2 to day 3, etc. etc. until end of the month.

If anyone got any suggestion please just email me, I had been stuck on this for a while and can’t seems to find a solution on my own.
Thanks beforehand to all those that will be commenting.

23. Douglas Wirnowski
Reply

The solution sounds like a great one, yet, the newest version of Excel offers a totally different (and bewildering) UEX — no way to add a Custom Column. The UEX displays “Combine”, “Load” and “Transform Data”… nothing else. I believe this is the only way I can accomplish what I’m trying, but I’m stuck at the moment…

24. Samuel
Reply

Thanks for this, a great help in extracting data. Is it possible to move the columns that I am extracting from multiple sources to a new column for each source, rather than the current two columns that I currently have? I am trying to extract to show reference information in a first column, then values in a second and repeat for subsequent extracted data.

Thanks

25. Michelle
Reply

I think this may help with a puzzle we are trying to solve for attendance records. Is there a way to create a new row from new data being retrieved on the main spreadsheet? For example if column 1 is names – and then the new columns coming in are numbers of absences. We are exporting new excel files weekly and want to pull data to one main sheet. (the only data we want to pull is name and number of absences.) The column headers would be the date.

26. Gautam Gupta
Reply

Hi,
Firstly thank you for such a informative description of Power Query.
I am facing one issue.
I have the data set in Power Query with all the data transformed as required.
There are 100 Files names picking data through the power query
I need to place the files in such as way that the data for each file becomes one row instead of columns since for each file it is generating 30 rows which when placed as a column while loading can be very helpful
The Columns and rows of the data extracted is same, just the name of the file differs everytime.

27. James
Reply

Thanks Jeff,

this guide 99% solves my problem.. the only bit im missing is that from each worksheet I only want row 4..

How do I add this to the query?

thanks

J

28. Vinicius
Reply

Is there a way to do these things without power query? I’m on a company computer and don’t have admin rights and we only have excel 2013. I basically need exactly thsi but can’t install power query

1. Jeff Lenning Post author
Reply

Power Query is also available in the free Power BI desktop app, powerbi.com.
Thanks
Jeff

29. Les
Reply

Hi,
I have setup my query and would have the same number of 12 repeated values in column 1 which I would like to use as column headers. Ie First Name, Last Name, eMail etc.
Is there a way to use the query to place the column 2 values under the column 1 headers after each workbook is scanned?

Thanks,
Les

30. Ezra
Reply

Hi Jeff,

I have a similar issue I am working on but my filter for my row is not based on any adjacent cells, but I am trying to make it so it filters by the cell name. The problem is that I have named the cell (which is actually multiple rows and columns merged into one named cell), and when I try to filter by the named cell, the data shows up as being (null). Can I correctly filter the data based not on rows and columns but by the named cell?

Thanks

31. Bob
Reply

Hi Jeff,
I have an even crazier variation on your example – I have been tasked with retrieving about a dozen specific pieces of data from worksheets that were modeled after paper forms (non-tabular data) in multiple worksheets within each workbook. Overall there are several thousand workbooks in multiple folders & subfolders that need to be queried. In most cases, the cells that contain the data are merged cells, and the cell that identifies the content can be either to the left or above the cell with the contents again, often in a merged cell for formatting purposes.
I am well versed in Excel (currently using Excel 365) but I am totally new to Power Query – any advice on tackling this very daunting task?
Thanks – Bob