When was the last time you had a direct implementation for an algorithm?
At work today we needed to filter out malformed email domains from a list of 4.2 Million email ids. e.g gnail.com instead of gmail.com
Looking at the problem, it just occurred to me Oh this can be done using - Levenshtein distance (edit distance)
I decided to use pandas.py as it provided some high level constructs for manipulating csv. Also found this simple package implementing editdistance: editdistance · PyPI
In less than an hour, i hacked a script together using and we processed the 4.2M email ids to get 1342 email domains along with count of email ids using such domains.
When was the last time you had a direct implementation for an algorithm?
— Shubham Sethi (@suave_sethi) August 16, 2022
At work today we needed to filter out malformed email domains from a list of 4.2 Million email ids.
e.g https://t.co/CtHbFCEc0w vs https://t.co/R8Qi3DgZCD, https://t.co/dIpqGlmaOm vs https://t.co/9SJSZ7sdRT