Edit distance practical application

When was the last time you had a direct implementation for an algorithm?

At work today we needed to filter out malformed email domains from a list of 4.2 Million email ids. e.g gnail.com instead of gmail.com

Looking at the problem, it just occurred to me Oh this can be done using - Levenshtein distance (edit distance)

I decided to use pandas.py as it provided some high level constructs for manipulating csv. Also found this simple package implementing editdistance: editdistance · PyPI

In less than an hour, i hacked a script together using and we processed the 4.2M email ids to get 1342 email domains along with count of email ids using such domains.