Identify Missing IDs and Sequence Gaps

There are several ways in Excel to find missing IDs (or gaps) in a big list of sequential IDs, such as check numbers or invoice numbers. In this post, we’ll use Power Query so that each time we have a new list, we simply click Refresh. Excel then creates an updated list of the missing IDs. This approach makes recurring tasks fast because there is no need to manually sort, write formulas, filter, or apply conditional formatting. Just click Refresh and Excel provides a list of the missing IDs.

Objective

Before we jump to Power Query, let’s clarify our goal. Let’s say we export a list that includes a column of sequential IDs, for example, a list of checks. Our goal is to review the list and identify any missing check numbers, that is, gaps in the ID column.

Here is an idea of the list:

Our goal is to find any missing CheckNums in the sequence. As this is Excel, there are of course many ways to accomplish this task. For example, if this was a one-time project or a short list, we could choose to do this manually.

A common approach is to sort the data by the CheckNum column. Once sorted, we could visually scan this list looking for gaps, or, write a formula. For example, we could write a formula that subtracts the check number from the row above, or something similar. If the difference is 1, we know it is sequential, otherwise, we know there is a gap. Then, perhaps we could either apply a filter or conditional formatting or something to make it a bit easier to identify the gaps. And, this approach would get the job done.

But, when this is a task we perform frequently, such as every week or month, we may choose to eliminate all of these manual steps so that we can get our work done faster. This is where Power Query can help. There is a trade-off of course, because the initial set-up in PQ may take more time … but, that is an investment that pays efficiency dividends each subsequent month.

I’ve created a video and written narrative to walk through the process of setting up Power Query.

Video

Narrative

We’ll perform the following steps:

  • Import the data
  • Create a list of all IDs
  • Generate list of missing IDs

Let’s get to it.

Import the data

The first thing we need to do is import the data into Power Query. To do this, just head to Data > Get Data and pick the corresponding data location. In this case, our sample data is in an Excel table so we use the Get Data > From Table/Range. Excel then loads the data and opens the Power Query Editor, as shown below.

We click Close & Load To … and pick Only Create Connection in the resulting Import Data dialog, as shown below.

Now, on to our next step … creating a list of all IDs.

Create a list of all IDs

The goal of this step is to create a list of all IDs within the range … that is, between the min and max ID values. To achieve this step, we’ll create three new queries. The first will determine the min value. The second will determine the max value. And the third will create the range of all values between the min and max, inclusive.

Min

To create a query that returns the min ID value, we right-click the query created previously, in this case it is named Table1, and select Reference as shown below.

This simply creates a new query that uses as its starting point the results of the Table1 query, as shown below.

Then, to get the min value in the CheckNum column, we select the CheckNum column and then click the Transform > Statistics > Minimum command. Power Query finds and returns the smallest CheckNum, as shown below.

At this point, we just give the query a descriptive Name, such as IDMin, and Close & Load To … Create Connection Only.

We now see the new IDMin query in the Queries & Connections pane, as shown below.

Max

We basically rinse and repeat to create another query that returns the maximum value in the range. We name the new query IDMax.

Range

The final step is to use the IDMin and IDMax values to create a range of all IDs. We do this by setting up a new blank query (Data > Get Data > From Other Sources > Blank Query).

We simply enter the following formula into the formula bar (which updates the Source step):

={IDMin .. IDMax}

And hit enter. Bam:

Wait, what!? I know, right … it is super cool! The curly braces tell PQ to create a list of values between the IDMIn and IDMax values.

Note: PQ is case-sensitive, so, be sure to use the exact query names (IDMax vs idmax).

The final thing is to convert the resulting “list” into a “table” so we can use it in our next step. This is easy. We just click the To Table command in the Convert group shown above. We click OK in the resulting dialog, and now we have a table we can use in our final step, as shown below.

We can clean it up a bit by changing the Column1 column label to ID, and changing the query name to IDRange, as shown below.

We just Close and Load To … Create Connection Only.

Now we are ready for the final step.

Generate list of missing IDs

The goal of final step is to have PQ create a list of the missing IDs. More specifically, to create a list of IDs in the IDRange query that are not found in the original data source query. To accomplish this, we’ll merge the two queries and specify the desired Join Kind.

First, we select the Data > Get Data > Combine Queries > Merge command. In the resulting Merge dialog, we pick our IDRange query and then our original data source query (Table1) as shown below.

Then, we identify the columns that are shared between the two tables, in this case ID and CheckNum by selecting them as shown below:

Finally, we need to pick the desired Join Kind. In this case, we want to create a list that includes the IDs in the first table that do not appear in the second table. So, we select Left Anti (rows only in first) as shown below.

We click OK, and bam:

We did it! Now, we can just delete the Table1 column (select it and press the Delete key on your keyboard) and Close and Load To … a Table in a New worksheet. The missing IDs are displayed in a table in Excel, as shown below.

And now that the work is done, this list of missing IDs is easy to update each subsequent period. Whenever there is new data, we just need to right-click the results table and click Refresh. Power Query instantly repeats all of the steps automatically, and updates the results table. Outstanding!

If you prefer any other approaches to perform this task, please share by posting a comment below.

Sample file:  MissingIDs.xlsx

 

Get a quick email notice when a new Excel article is available

  • This field is for validation purposes and should be left unchanged.

This article was written by Jeff Lenning

Leave a Reply

Your email address will not be published. Required fields are marked *

For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

I agree to these terms.