Fuzzy Match with Power Query

A few years ago, I wrote a post about how to perform fuzzy lookups using the “Fuzzy Lookup Add-In for Excel.” However, this capability is now available in Power Query!! Wait, what? Yes! Performing a fuzzy match is so much easier and far more intuitive in Power Query. This post walks through the basics … and a huge THANK YOU to the Microsoft developers for this feature 🙂

Objective

Before we get to the mechanics, let’s back up and understand what we are trying to do and what a fuzzy match is.

Generally speaking, Excel lookup functions expect that the lookup values match on both lists. The case can be different (eg, “Microsoft” and “microsoft” would match), but generally, the idea is that the text strings need to be the same. There are some exceptions to this as noted below, but generally, the idea is that the lookup values are the same between both lists.

When the lookup values are different, for example ABC Company and ABC Company, Inc., traditional lookup functions aren’t always reliable. Now, to us humans, these two names represent the same underlying company. But to Excel, these represents different lookup values. The idea with a fuzzy lookup is to enable Excel to match them. 

For example, here is a table (named CompanyName) with some company names:

We would like Excel to match them to this table (named CompanyList):

You’ll notice there are some differences: ABC Company vs ABC Company, Inc., Micro soft vs Microsoft, and Xeon vs Xeon Products.

With a traditional lookup function such as VLOOKUP, they don’t match:

With Power Query’s fuzzy merge option, they do:

Now, let’s get to the mechanics.

Note: some lookup functions provide an option for “approximate” match. For example, when you set the 4th argument of VLOOKUP to TRUE, or the 3rd argument of MATCH to 1 or -1, it enables us to perform range lookups. And, depending on the data set, lookup values, and sort order, this may or may not provide the expected approximate match desired. Additionally, some lookup functions support the use of a wildcard which has helped in some situations as well. But, we can now use the built-in fuzzy merge option in Power Query to perform a true fuzzy match 🙂

Details

We will walk through this process in three steps:

  • Import List 1
  • Import List 2
  • Fuzzy merge

Let’s do this thing.

Note: depending on when you are reading this and your version of Excel, you may not have the fuzzy merge option, which was recently released via O365 subscription (Office Insiders update channel).

Import List 1

We begin by getting our first list (CompanyName) into Power Query by selecting the table (or other data source) and using the Data > From Table/Range command. A preview is displayed in the Power Query Editor:

Next, we Home > Close & Load To … a connection only query … and no need to load it into the data model:

Now, it is time to get our second list.

Import List 2

We basically rinse and repeat to import the lookup table (CompanyList). We again Close & Load To … and create a connection only query.

At this point, we should see both queries listed in the Queries & Connections pane on the right side of the Excel window:

With our first and second tables in Power Query, we can now perform the fuzzy merge.

Fuzzy merge

To start this step, from within Excel we select Data > Get Data > Combine Queries > Merge. The Merge dialog appears, and we select our first list (CompanyName) and our second list (CompanyList). We also identify the lookup columns by clicking the column header from both, as shown below:

You’ll notice the bottom of the screenshot says, “The selection matches 0 of 3 rows from the first table.” This is because ABC Company is not the same as ABC Company, Inc., and so on.

But … and here is the good part … you’ll notice a checkbox called Use fuzzy matching to perform the merge.

Check it and … interesting! We now have 2 of 3 rows that match:

Wow! Now let’s see which items matched by clicking OK. 

Note: don’t worry, we’ll explore Fuzzy merge options shortly.

Back in the Power Query Editor, we Expand the CompanyList Table field and we can see the matches below:

Our first two companies match … awesome!

Can we get our third company, Xeon, to match as well? Time to explore a few fuzzy merge options. Click the gear icon on the Source applied step and expand the Fuzzy merge options:

  • The Similarity threshold field allows us to specify how similar the two values must be to make a match. The range is 0 to 1, and you can set as desired based on your data.
  • The Ignore case is checked by default (so MICROSOFT will match microsoft).
  • The Match by combining text parts is checked by default (so Micro soft would match Microsoft).
  • The Maximum number of matches allows us to set how many matches are returned for each lookup value.
  • The Transformation table allows us to specify a separate “mapping table” that increases reliability for lookups we perform frequently (more on this below).

Let’s change the Similarity threshold to 0.5, and bam … 3 of 3 items now match:

We click OK and … yes … it worked!

We Close & Load To … a Table, and the results are now back in Excel:

And that, my friends, is a fuzzy lookup in Excel!

Now, one last note. If we perform a similar lookup on similar lists over and over, we can increase the reliability of our match by incorporating a transformation “mapping” table.

Transformation Table

To create a Transformation Table, you create a new table with a From and To column, as shown below:

Then, you pull it into Power Query (Data > From Table/Range) and save the query as a connection-only query as we did previously.

Now, we open the Merge dialog by editing the fuzzy merge query (double-click the merge query in the Queries & Connections pane). Once again, click the gear icon in the Source applied step. Expand Fuzzy merge options and select your table in the Transformation table drop-down:

Think of this table as providing a reliable match for known differences, and if a value can’t be found in it, then Power Query uses the fuzzy match logic.

Well, for me, this is a very exciting capability in Power Query as it solves a common issue that has historically been hard to solve. Now that it is built-in to Power Query, it is easy to use and will really help in practice. Again, thanks Microsoft for this gift 🙂

Sample File: FuzzyPQ.xlsx

Get a quick email notice when a new Excel article is available

  • This field is for validation purposes and should be left unchanged.

This article was written by Jeff Lenning

4 comments:

  1. Sylvain
    Reply

    Hi Jeff, This is great to know. Not having to change case in a column to make a match with another column is one less thing to do! I am guessing that the fuzzy logic works on more than one column match? I. E. Matching 2 or more columns in table 1 vs 2 or more columns in table 2. Eg Name and address.
    Lastly is there a quick way of showing which rows were a match because of fuzzy logic being applied. (you could do a separate normal table merge match and then compare… )

  2. Greg Wolff
    Reply

    Very useful

  3. Thomas
    Reply

    I’m currently on the latest version of office 365 ProPlus and i don’t see the fuzzy option appear when i follow these steps, is there some setting or add on I’m missing?

    Thanks in advance.

    1. Jeff Lenning Post author
      Reply

      O365 has a few different update channels, and these determine when you receive updates. So, if you want to receive updates and new features quickly, update your channel to office insiders 🙂
      Thanks
      Jeff

Leave a Reply

Your email address will not be published. Required fields are marked *

For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

I agree to these terms.