A Day Of Fuzzies

I spent almost the whole day today using the Excel Fuzzy Lookup add-in.

I’m not going to describe how to use it or how it works, because there are guides out there. Instead, I’ll describe my experience.

So first thing, I should have cleaned my data up more before using it. I got a lot of easy matches in the beginning, over a thousand from nine thousand rows. Pretty good going! However, in my final sweep where I cleaned up the remaining input data, I had a lower number of matches (around 400) but it was a higher proportion of the data left (around 80%).

Clean your data people!

On the other hand, I did get to see how the fuzzy matching works, so it’s not such a bad thing I suppose.

The other thing is that you can customise how it runs, but making a table of easy substitutions it can use (such as changing ‘corp’ to ‘corporation’ and a token weight table. I don’t rightly know how the token weights table works, but I didn’t use either of these options today. Maybe I should have…

Oh well, too late now.

Something that I found helpful, was to create some helper columns after the lookup has been performed that follows certain rules for error checking. For example, if there are certain words that absolutely must be in the source data and the match, you can use that to verify the matches are accurate rather than checking by eye.

In the end I did check a large proportion by eye, but it was certainly a much smaller amount of checking than I would have had to do myself otherwise.

Tell me what you think

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: