⌘ Using Rsync and Keyboard Maestro to Merge and Purge thousands of redundant files

A perfect example of combining the power of the Shell and Keyboard Maestro.

Quite some time ago I removed my individual external drives from their cases and combined them into several multi-unit drive enclosures. This was very convenient, but led to the issue of many, many duplicates across multiple drives. Consolidating all these files has turned out to be more of a chore than I expected.

Merging folders is either a foreign concept or a dirty word. Apps have no idea how to handle it. If I have Drive 1 and Drive 2, if files already exist on Drive 1, simply skip them. If there is a new file, copy it over, but don’t waste time copying something that already exists in the destination.

At the end, delete all the files off Drive 2 so I can reclaim the space. Don’t leave it 98% full like it was before. For this task, only a handful of files should be copied.

The apps I’ve found can’t handle this scenario correctly. They approach it from a remove the duplicates standpoint, meaning, spend hours scanning the drives, then have the user spend hours selecting all the files that are dupes. A total waste of time since I already know 98% of the files are duplicates.

Other apps just copy/move everything. Again, a waste of time. Why copy the file again? There are thousands of files and this will take hours, possibly days to complete.

Turns out the answer is built right into the OS. Rsync handles this and makes for a better, faster, simpler approach.

Combine this with Keyboard Maestro to select the source and destination folder, and I now have a full fledged sync tool with UI.

Using this approach, I was able to consolidate 4 drives and 12TB of data in around an hour.

In my case, the rsync command turned out to be:

rsync -arvt "$KMVAR_instance_source_location" "$KMVAR_instance_target_location" --remove-source-files

This says to use rsync with archive, recursively, verbose, and retain the timestamp.

The --remove-source-files removes files from the source drive when the job is complete.

That’s it. That’s the core of a $30 app.

This is what I ended up putting together.

  • I used Keyboard Maestro to prompt for the source and destination locations. This is easier than trying to type it in by hand each time I want to select folders. A UI is faster and more accurate.
  • For sanity, there is a quick check to make sure I haven’t chosen the same location twice.
  • Before kicking off command, there is a confirmation window of the source and target with a chance to back out if something is wrong.
  • If everything looks good, the process kicks off, compares thousands of files, copies any differences, then removes the source files.

The only drawback, --remove-source-files won’t remove empty folders.

That is easily fixed with:

find * -type d -empty -delete
rmdir "$KMVAR_instance_source_location"

If these fail or throw an error, there is a file left over which needs to be checked. But again, it’s all built into the OS.

Not only did I not find a simple tool to handle this task, the ones that would have wasted hours of my time were $9.99 to $29.99.

No thanks. For an hours worth of work and testing, I can handle it myself.

By making a couple of prompts in Keyboard Maestro and tying them into the Rsync command, I made an incredibly powerful sync tool that I’m still using today. There are two dozen drives to process.

This has saved a huge amount of time and proves how powerful Keyboard Maestro can be.

It's bad luck to be superstitious.
Author Signature for Posts
0