This post discusses ways to retrieve aggregated values from a table based on the column labels.
Beginning with Excel 2007, we can store data in a table with the Insert > Table Ribbon command icon. If you haven’t yet explored this incredible feature, please check out this CalCPA Magazine article Excel Rules.
Frequently, we need to retrieve values out of data tables for reporting or analysis. This task is fairly easy using traditional lookup functions or conditional summing functions. However, when preparing workbooks to be used on an ongoing basis, we need to keep the formula consistency principle in mind. This means we write consistent formulas within a range, so that we can fill them down and right. Often, this is just a matter of setting the cell references properly, such as relative, absolute, or mixed. However, when our data is stored in a table, we can use structured table references and column headers to build consistent formulas.
Let’s identify our goals and objectives before we begin. We have exported some invoice information from our accounting system, and have stored it in a table named tbl_inv, pictured below:
We would like to retrieve values from the table, and aggregate them by customer ID, in order to populate our little summary report, pictured below:
Our first goal is to write a single formula in C8, and then fill it down and to the right. That is, to use consistent formulas.
Next, let’s examine our data and our report. One thing to notice is that each customer may appear on many rows. Thus, we need to use a conditional summing function such as SUMIFS rather than a traditional lookup function, which would only return the related value from the first matching item. If you are not familiar with the SUMIFS function, please check out Multiple Condition Summing in Excel with SUMIFS.
Next, we notice that the report column order differs from the data column order. The report order is custid, amount, tax, shipping, and total. The order of the data columns are custid, shipping, tax, amount, and total. That means we need to write a formula that can accommodate the column order differences.
So, to summarize our objectives:
- Use consistent formulas
- Aggregate multiple rows
- Accommodate column order differences between the data and report
We can meet our objectives by nesting the INDEX and MATCH functions inside of our SUMIFS function to dynamically select the proper sum column. Let’s unpack the formula step by step.
The first argument of the SUMIFS function is the column of numbers to add. When populating our report’s first column, the column to add is the table’s amount column tbl_inv[Amount]. Our formula would be something like this:
However, when populating the next report column, the column of numbers to add is the table’s tax column tbl_inv[Tax]. Our formula would be something like this:
Since the first function argument needs to be different for each report column, we are forced to write unique formulas for each column. This does not fall in line with our formula consistency objective. So, the question is, how do we express the first SUMIFS argument so it is dynamic?
One approach is to use the INDEX/MATCH functions. If you are not familiar with the INDEX/MATCH functions, please feel free to check out How to Return a Value Left of VLOOKUP’s Lookup Column for more information. In that blog post, we discussed how the INDEX function returns a cell value, but, it does much more than that.
The INDEX function can actually return a cell value or a range reference. Microsoft describes the function as having two “forms”, the array form and the reference form. It is a fancy way to say that the function can return either a cell value (array form) or a range reference (reference form). We’ll ask the function to return a range reference that can be used as the first SUMIFS argument.
When we are done, the first argument of the SUMIFS function will be dynamic, and it will use INDEX to return a range reference and MATCH to dynamically figure out which column. It will look a little something like this:
The INDEX/MATCH functions will provide the SUMIFS function with the column of numbers to add. The basic idea is that we will ask the INDEX function to return a reference and we will ask the MATCH function to tell the INDEX function which column to refer to based on the header value. MATCH will look for our report column header, such as Amount, in the table’s header row. The assumption here is that the report header labels match the data header labels.
Since the MATCH function returns the relative position number of a list item, we ask it to tell us the column number of the matching report label. For example, the Amount column is the 6th column, so, it would return 6 to the INDEX function. INDEX uses this information to return the Amount column reference to the SUMIFS function. SUMIFS uses this reference as the column of numbers to add.
Since there are several moving parts, we’ll just ease into this formula. First, let’s replace the first argument of the SUMIFS function with an INDEX function. The resulting formula looks like this:
This formula uses the SUMIFS function to add a column of numbers. The column of numbers to add is the first argument, which is an INDEX function. The first argument of the INDEX function provides the initial range, the whole table, tbl_inv. The second argument of the INDEX function is the row_num argument, and tells the INDEX function which row number to return. Since we want to return all rows, we leave this argument blank. The third argument of the INDEX function is the column_num argument, and we entered 6 because the table’s amount column is the 6th column. Thus, the INDEX function above returns the range reference corresponding to the 6th column in the table to the SUMIFS function.
However, we can’t stop here because we hard-coded the column_num argument 6. If we fill this formula to the right, the 6 would remain and all report column would return the same result, the sum of the amount column. Instead, we need to ask Excel to dynamically figure out which column number the INDEX function should use. And this is accomplished with the MATCH function. The following MATCH function will figure out which column has a matching column header label.
This function looks for the value in C$7 in the headers row of the table. We use a relative column reference (C) so that it updates as it is filled right, and an absolute row reference ($7) so that it is locked onto the report header row. Note the special structured table reference that refers to the headers row. It starts with the table’s name, tbl_inv, and then #Headers enclosed in square brackets. You can type in this reference or just use your mouse to select it interactively. The third argument, 0, tells the function we are looking for an exact match.
Since the MATCH function above returns 6, we need to nest it inside of the INDEX function. If we nest the MATCH function inside the INDEX function, we get the following:
This formula segment returns the column reference needed by the SUMIFS function. So, nesting the INDEX/MATCH functions inside the SUMIFS function results in the following:
We could write this formula in C8, and then fill it down and right to create the report. But, there is one more little detail.
If this workbook is designed to be used on a recurring basis and to remain in place for a long time, then we need to think about ways that a future user could break the workbook and address any risks up front, as discussed in Excel University Volume 1 Chapter 19. We therefore need to consider the different ways that future users may try to fill our formula down and to the right, because the way they fill right could accidentally break our formula.
Specifically, if a user tries to fill the formula right with the fill command, a copy/paste, or, with Ctrl+Enter, then Excel treats the tbl_inv[CustID] reference as absolute. This is good because as we fill the formula right we do want all formulas to reference the customer id column. However, if a user chooses to fill the formula right by dragging the fill handle, then, the reference is treated as relative, meaning, it will also slide to the right and change to tbl_inv[Shipping], tbl_inv[Tax], and so on.
So, to be safe, we’ll modify the reference slightly so that Excel treats it as absolute even if a user fills right with the fill handle. We’ll update it to a single-column range reference: tbl_inv[[CustID]:[CustID]]. The final formula is:
We enter the formula in C8, and fill it down and right, and the resulting report is pictured below.
Although it may feel like it took us a long time to get here, the end result is a formula that meets our objectives. It can be filled down and to the right and continue to work, it accommodates column order differences, and it aggregates values.
Generally, it is worth the time to create consistent formulas that can be filled down and right and continue to work for recurring use workbooks because it makes updating them over time faster. Additionally, writing formulas that can accommodate minor structure changes, such as the column orders moving around over time, will help reduce errors and improve efficiency. Although the upfront time investment getting a formula like the one above to work may be significant, you’ll receive your investment back each subsequent period through improved productivity.
To check out the Excel file that was used to prepare the screenshots, feel free to download the sample file:
To check up the updated version that replaces the $B8 cell reference with a named reference, feel free to download the sample file:
- If you need to retrieve a text string, rather than the sum of numbers, you’ll probably want to use traditional lookup functions, such as VLOOKUP. The workbook sample file includes a VLOOKUP sheet with the relevant formulas.
- If you’d like more assistance with the SUMIFS, INDEX, and MATCH functions, please feel free to check out our online Excel training courses.
- If you have another approach you prefer, please post a comment, we would love to hear about it!