Pure Python Spell Checking

Tutorial Difficulty Level    

Pure Python Spell Checking based on Peter Norvig’s blog post on setting up a simple spell checking algorithm.

It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.

pyspellchecker supports multiple languages including English, Spanish, German, French, and Portuguese. Dictionaries were generated using the WordFrequency project on GitHub.

pyspellchecker supports Python 3 and Python 2.7 but, as always, Python 3 is the preferred version!

pyspellchecker allows for the setting of the Levenshtein Distance to check. For longer words, it is highly recommended to use a distance of 1 and not the default 2. See the quickstart to find how one can change the distance parameter.

Installation

The easiest method to install is using pip:

To install from source:

As always, it is highly recommend to use the Pipenv package to help manage dependencies!

Quickstart

After installation, using pyspellchecker should be fairly straight forward:

If the Word Frequency list is not to your liking, you can add additional text to generate a more appropriate list for your use case.

If the words that you wish to check are long, it is recommended to reduce the distance to 1. This can be accomplished either when initializing the spell check class or after the fact.

Additional Methods

correction(word): Returns the most probable result for the misspelled word

candidates(word): Returns a set of possible candidates for the misspelled word

known([words]): Returns those words that are in the word frequency list

unknown([words]): Returns those words that are not in the frequency list

word_probability(word): The frequency of the given word out of all words in the frequency list