Automate File Cleanup
I was recently asked the following question:
I download an excel file every week and make changes like deleting columns, counting and adding rows and so on. Can I automate this repetitive task ?
In this post, I’ll attempt to answer this question. The built-in Excel feature we’ll use to accomplish this is Power Query. If you haven’t used Power Query inside of Excel before, don’t worry, I’ll walk you though the steps.
Video
Written Guide
With Excel, routine tasks don’t have to be mundane and time-consuming. Excel’s Power Query tool provides a practical solution to automate repetitive tasks, such as deleting columns, counting, and adding rows. If you’ve never used Power Query before, this guide will be a smooth introduction to understanding this handy tool.
Getting Started with Power Query
First, let’s pull our data into the Power Query editor. Follow the steps below:
1. Go to Data > Get Data > From File > From Excel Workbook
2. In the resulting Import Data dialog, navigate to and select your Workbook
3. In the resulting Navigator dialog, select the desired worksheet, then click Transform Data
This will open the Power Query Editor and provide a preview of the data in your file.
While Power Query can perform many cleanup tasks, called transformations, this post will focus on answering the specific question asked. But, if you’d like to learn more about Power Query, we have tons of Power Query tutorials here.
Let’s start by deleting columns.
Note: deleting a column in Power Query DOES NOT delete it from the original data source. In fact, the data in the original data source is read-only to Power Query. Power Query does not modify the original data source. It simply makes changes while a copy of the data is making its way into Excel.
Deleting columns in Power Query is super easy.
1. Simply select the desired column.
2. Then, either hit the delete key on your keyboard or use the Home > Remove Columns command.
Each time we perform a transformation, such as deleting a column, our actions are added to the Applied Steps box, creating a to-do list for Excel every time we click Refresh.
After completing the basic transformations, we send the data back to Excel by clicking the Home > Close & Load To command. You can load it as a table into an existing worksheet. The data from the source, after the transformations, is placed into an Excel table:
The second part of the question was about counting and adding.
One easy option is to select any cell in the results table, and then use the Table Design > Total Row checkbox. This adds a row at the bottom of the table.
Excel will provide a default Total row, but you can customize it. For example, if we wanted to show the Count of the number of rows in the Account column, we select the total row cell in that column to reveal a drop-down.
We can pick the Count function from the drop-down to display that total.
You can change the aggregate function for any column at anytime!
Another option for counting and adding is with a PivotTable. A PivotTable enables far more powerful options. Think of the flow like this: Power Query retrieves and cleans the data, and then you feed the results table into a PivotTable to build a summary report. If you haven’t explored PivotTables, we have tons of PivotTable tutorials here.
In either case, once this is set up, updating the results table with a new Excel file next week just requires a refresh (no need to build everything again).
What Happens when New Files Arrive?
When a new weekly file arrives for import, we can update our results table a couple different ways.
One option is to replace the old file with a new one (use the same directory and the same filename). In this case, we can simply click Data > Refresh All. Power Query will go to the same folder path, file name, and sheet name, and import the data. It will apply the transformations and update the results table.
Another option is to browse to a different folder path / file name. To do so, edit the query by double-clicking it in the Queries & Connections pane. (You can toggle the Queries & Connections pane on/off by clicking the Data > Queries & Connections command.)
Then, inside the Power Query Editor, click the gear icon on the Source.
In the resulting dialog, simply click Browse, select your file, and then Close & Load.
But wait a second … what if the column labels change? Or, what if there are new columns? No worries … let’s talk about that next.
Changes to file structure
Over time, there may be changes made to the file structure, such as new columns or renamed columns. Generally speaking, new rows and columns will flow in just fine. If a column label is changed, or columns are removed, you may get an error when doing a refresh. If so, you’ll want to examine any step that explicitly references the column name. You can change that reference or remove it.
For example, let’s say a column name changes from CkNum to Check. When we refresh, we get an error. To track down the error, we open the query by double-clicking it in the Queries & Connections pane (as mentioned above).
Inside the Power Query Editor, we may see a yellow alert with a Go To Error button.
If so, click the button to be taken to the step that produces the error. If not, simply click through each step in order until you see the error.
In this case, we can simply change the old column reference (CkNum) to the new updated column label (Check) by editing it in the formula bar. So change it from this:
To this:
Hitting Enter commits this change and the error is resolved. You can then Close & Load the results to Excel as before.
Wrapping Up
So there we have it! Power Query certainly makes file cleanup tasks, including deleting columns and counting and adding rows, a breeze. Hopefully, you found this guide helpful. Using Excel effectively is all it takes to make seemingly complex tasks quite simple!
If you have any alternatives, suggestions, or questions, please share by posting a comment below … thanks!
Sample File
FAQs
Q: What is Power Query?
Power Query is a powerful data connection technology that enables you to discover, connect, combine, and shape data across a variety of sources.
Q: How do I open the Power Query Editor?
To open Power Query Editor, go to Data > Get Data > Launch Power Query Editor.
Q: Can Power Query handle large datasets?
Power Query can efficiently handle large datasets, making it a great tool for big data handling.
Q: How do I delete multiple columns in Power Query?
To delete multiple columns, select the columns you want to delete by holding CTRL key and then click Home > Remove Columns.
Q: How do I undo a step in Power Query?
To undo a step, just click the ‘X’ in the ‘Applied Steps’ box beside the step you want to undo.
Q: Can I use Power Query in other Microsoft apps apart from Excel?
Yes, Power Query is also available in other Microsoft apps like Power BI Desktop.
Q: Can Power Query automate complex transformation tasks?
Absolutely! Power Query can automate a wide variety of complex transformation tasks, beyond just row/counting or column deletion.
Q: How do I refresh data in Power Query?
To refresh the data, you can simply go to Data > Refresh All on the Excel workbook.
Q: Can I change the data source in Power Query?
Yes, you can. Follow these steps: double click the query > click the gear icon > browse > select the new file > click import > click close and load.
Q: What happens if file structures change in Power Query?
Power Query has solutions for that too. You can update the column names in Power Query or just redo the step to avoid error.
Excel is not what it used to be.
You need the Excel Proficiency Roadmap now. Includes 6 steps for a successful journey, 3 things to avoid, and weekly Excel tips.
Want to learn Excel?
Our training programs start at $29 and will help you learn Excel quickly.