Retrieve First Occurrence After a Date
In this post, we’ll discuss one approach for retrieving the first occurrence (or min date) after a specified date. Let me back up. Michael asked for a way to find the first occurrence of an office visit (a date) after the date of being discharged from the hospital (and within 30 days). As this is Excel, there are of course many options. The approach I present below uses one of my favorite Excel tools: Power Query. Thanks Michael for the question!
Before we get too far, let’s try to visualize the issue. There are many ways that this issue can appear, but in Michael’s case, it is related to hospital and office visits. You may be working on other data, such as loans and payments, stock values, or sales data. But here, we have a list of hospital visits like this:
We also have a list of office visits for each patient with their doctor, like this:
The idea is that we want to find the first time the patient visited their doctor’s office after getting discharged from the hospital, and that office visit should only appear if it was within 30 days of getting discharged.
Excel offers many possible ways to accomplish this. In this post, I’ll use Power Query because the solution will be easy to update in future periods and it doesn’t rely on the source data table being sorted.
We’ll walk through the following steps:
- Load and combine the tables
- Filter dates outside of range
- Group by and keep the min
Let’s take them one at a time.
Load and combine tables
First, we need to get these tables into Power Query. To do that, we select any cell in the table and click Data > From Table/Range. From within PQ, we click Close & Load To … and save as a connection only query.
We do this for each of the two tables.
To combine them, we click Data > Combine Queries > Merge. In the resulting Merge dialog, we select the Hospital table and the Office table and then click the PatientID column in each:
The results appear in Power Query:
We expand the Office column (click the expand icon on the right side of the column label) and select the OfficeDate column. The results are shown below:
With that complete, we move on to our next step of filtering out the rows that don’t meet our criteria.
Filter dates outside of range
The only rows that we want to consider in our results table are those where the OfficeDate is greater than the DischargeDate AND are within 30 days of the DischargeDate. All rows that don’t meet that criteria need to be removed from consideration.
One way to accomplish this is to compute the number of days between the OfficeDate and the DischargeDate and then apply a filter.
So, we select the OfficeDate column AND THEN the DischargeDate column, and click Add Column > Date > Subtract Days. The results are shown below:
Next, we just apply a filter to the Subtraction column. Use the drop-down and select Number Filter > Greater Than. In the resulting Filter Rows dialog, we opt to keep rows where the number of days is greater than 0 and is less than or equal to 30:
The updated results are shown below:
All that remains is to keep only the earliest OfficeDate for each patient.
Group by and keep the min
To keep the earliest OfficeDate for each patient, we first select the PatientID and DischargeDate columns. Then, we use the Transform > Group By command. In the resulting dialog, we create a new column called FirstVisitDate and set it to the Min of OfficeDate:
Yes, we got it:
Now, we can update data types for the date columns to Date if desired, and then Close & Load To … an Excel worksheet.
Next period, when the Hospital and OfficeVisit tables are updated, we just right-click the results table and click Refresh.
If you have any other approaches to this issue you prefer, please share by posting a comment below … thanks!
- Sample file: FirstDate.xlsx