VLOOKUP Return Multiple Matching Rows and Columns
In this post, we’ll discuss a way to simulate using VLOOKUP to return multiple matching rows and/or columns. What do you mean by “simulate” Jeff? Well, VLOOKUP is designed to return a single value, not multiple values. That is, VLOOKUP scans down the lookup range and stops at the first matching row … ignoring any additional matching rows. Once VLOOKUP finds a matching row, it shoots to the right to retrieve the related value from a single column. But let’s say we want to return the values from multiple columns and all of the matching rows. Well, this is where VLOOKUP breaks down. So, we’ll simulate this functionality by turning to Power Query. Let’s get to it … I have a video and full written narrative below.
Before we get too far, let’s just clarify our objective. Let’s say we have a list of invoices, and our list contains summary information such as InvoiceID and Customer … something like this (Summary table):
We also have an invoice detail list, which contains line-item details for each invoice. It contains the InvoiceID, Item, Amount, and perhaps other info. Something like this (Detail table):
Now, we want to go through each item in the summary table (each invoice) and retrieve related values from the detail table. You’ll notice there are multiple matching rows in the detail table for each invoice. Plus, we want to retrieve multiple columns from the detail table, for example, the Item and Amount columns.
When we are done, we want to combine multiple matching rows and columns like this:
Since VLOOKUP is limited to retrieving a single cell value from the first matching row and the specified return column, we’ll use Power Query instead.
We’ll take it in three steps:
- Merge query
- Specify multiple columns
- Combine multiple rows
So, let’s get to it.
Note: if this were a one-time project, you may be able to use VLOOKUP to retrieve the CustID from the summary table into the detail table if you use approximate match and sort the summary in ascending order by the lookup column. Then write one formula for each column value you want returned. But in practice, Power Query will create a solution that is easily refreshed in future periods as the data tables change.
In this first step, we will load each table into Power Query and then merge them. So, first, we select any cell in the Summary table and select the Data > From Table command. The table is loaded into Power Query:
Then, we click Home > Close & Load To… and select Only Create Connection as shown below:
We do the same steps again to load the Detail table into Power Query.
Now, we need to merge these two tables with Power Query. One way to do this from inside the Power Query editor is to select the Home > Merge Queries command. One way to do this from inside Excel is to select the Data > Get Data > Combine Queries > Merge command. Either way, you’ll be presented with the Merge dialog where you identify the Summary table first and then the Detail table. Then, you identify the lookup column by selecting the InvoiceID column in both tables, as shown below.
Note: the Join Kind is Left Outer because we selected the summary table first and then the detail table. If you selected Detail first and then Summary you’ll want to change the Join Kind to Right Outer. Depending on what you are working on, other Join Kind options may be helpful so check them out.
Now, we click OK and we find ourselves inside the Power Query editor:
With our queries merged, it is time to perform our next step.
Specify multiple columns
Recall that one thing we want to accomplish is to return multiple column values. VLOOKUP can return a value from a single column, but we can easily return multiple column values with Power Query.
To do so, just click the Expand icon on the right side of the Detail column header, or the Transform > Structured Column > Expand command. You’ll be able to pick one or multiple columns to return from the detail table:
Here, we want to return the Item and Amount columns. We click OK and bam … we have returned multiple columns:
Now that we have returned multiple columns, let’s address the rows.
Specify multiple rows
At the end of the day, we want one row for each InvoiceID, so, we select the InvoiceID column and then the Transform > Group By command. Since we want to display a variety of columns for each InvoiceID, we select Advanced in the resulting Group By dialog:
Now, we define each additional column desired. In our case, we would like 3 columns:
- New column named Amount, that computes the Sum, of the existing Amount column.
- New column named Count, that will Count Rows.
- New column named Items, that includes All Rows
The updated dialog is here:
We click OK and:
And, we are almost there! The final step is to convert the Items column into a comma-separated list of the actual item values found in the detail table.
To do this, we’ll use a trick that Ken and Miguel taught me in their workshop (by the way, check out their amazing M is for Data Monkey book).
We click Add Column > Custom Column. In the resulting Custom Column dialog, we enter a desired column name, like ItemList, and write the formula shown below.
Note: the [Items] argument references the table name and the “Item” references the column name.
Clicking OK results in a new ItemList column, like this:
Now, we need to tell Power Query that we want to convert each list in the ItemList column into a comma-separated list of values. To do this, we select the Expand button on the right side of the ItemList header and select Extract Values… . In the resulting Extract values from list dialog, we pick our preferred delimiter. Here, we’ll go with a Comma:
Click OK and bam!
We can remove the Items column, and then click Home > Close & Load To … and send it to a Table. The results appear in Excel like this:
And, that is exactly what we set out to do!
Now, it took quite a few steps to get here, but now for the best news. Once this is set up, all we need to do in future periods is right-click the results table and select Refresh. New rows in the summary or detail tables are automatically included. This means that the investment of time to set this up will pay efficiency dividends every subsequent period.
If you have any other Power Query tips or ways to improve this process, please share by posting a comment below … thanks!