Linux provides all sorts of tools for data analysis and automation, and it can also help with an issue that we all struggle with from time to time: spelling! Whether you’re grappling with the spelling of a single word while you’re writing your weekly report, or you want a set of computerized “eyes” to find any typos before you submit a business proposal, maybe it’s time to check out how Linux can help.
Using aspell
Likely the most frequently used and powerful of the spell checkers used on Linux systems is aspell. It can quickly run through a long and complicated text file and help you to fix all of the typos it has detected.
Here’s a simple example of how to use it. This example starts with using the fortune command to create a simple text file and then runs aspell. The fact that there is no output means that the command found no typos, so don’t start thinking that it’s ignoring you!
$ fortune > myluck; aspell check myluck $
The following example shows what you should expect to see if the file you’re checking does include typos. First, I’ll create a small file that contains one typo.
$ echo "Let's dispose of typose!" > goal
The following command then checks the file for typos.
$ aspell check goal
When a file includes a misspelling, the aspell response will include a number of potential replacements for misspelled words. It will look something like this:
Let's dispose of typose! 1) typos 6) Topsy 2) types 7) tapes 3) typo's 8) pose 4) depose 9) type 5) tops 0) type's i) Ignore I) Ignore all r) Replace R) Replace all a) Add l) Add Lower b) Abort x) Exit
The 1-0 entries in the output in the top 5 lines above are the potential replacements for the mistyped word. Note that the first entry in this case is the closest match. Press the number for the selected replacement and aspell will do the rest. To replace it with something else, press “r” (replace) and you’ll be prompted for the replacement.
You could also choose to ignore this misspelling and move on to the next by selecting “i” (Ignore) or ignore multiple misspellings of the word with “I” (Ignore all). To replace all instances of a particular misspelling (one that appears multiple times in the file) with something you’ll be prompted to enter, choose R (Replace all). The listed options in the bottom four lines help to select what you want to do.
If a file contains multiple typos, aspell will manage them one at a time. Decide what should be done with the first and it will move to the second, etc.
The aspell tool gets its vocabulary from the words file (/usr/share/dict/words on my system). Even if a file contains my first name “Sandra”, it’s accepted as OK. This words file includes a good collection of first names.
The aspell command also saves a backup (goal.bak in the example above) if changes are made to the file in case you need to revert to the original file for any reason.
You can also use aspell to check the spelling of a group of words. Type “aspell -a” as shown below and you can type a word or two and see the list of suggested replacements. If aspell responds with an asterisk (*), the word was spelled correctly. Otherwise, it provides possible replacements. Replacements for the misspelling “quagmyre” are shown below following the misspelling. The word “existential” was properly spelled so it’s followed by an asterisk.
$ aspell -a @(#) International Ispell Version 3.1.20 (but really Aspell 0.60.8) quagmyre & quagmyre 3 0: quagmire, quagmires, quagmire's existential *
The aspell tool is a fast and easy-to-use spell checker. It makes replacing most typos very easy by selecting close matches from the system’s words file.
Using enchant-2
The enchant-2 command is another tool that you can use to find the typos in your files.
With the -l (list misspellings only) option, this tool displays a list of the typos it has detected. Here’s an example:
$ enchant-2 -l goal dispoze typoz enchant-2 typoz
To include line numbers, add a -L option like this:
$ enchant-2 -lL goal 3 dispoze 3 typoz 3 enchant-2 4 typoz
If you expect duplicated typos, you can use a command like the one below to tell you how many times each typo appears in the file. Note that we’re sorting the list of typos and using the uniq -c to count each group.
$ enchant-2 -l goal | sort | uniq -c 1 dispoze 1 enchant-2 2 typoz
To list the options available, just type “enchant-2” like this:
$ enchant-2 Usage: enchant-2 -a|-l|-h|-v [-L] [-d DICTIONARY] [FILE] -d DICTIONARY use the given dictionary -p FILE use the given personal word list -a list suggestions in ispell pipe mode format -l list only the misspellings -L display line numbers -h display help and exit -v display version information and exit
Note that you can use a personal dictionary or an alternate system dictionary.
Using look
The look command isn’t one that will find typos in your files, but it will help you avoid spelling errors by matching the string you provide to it with lines from the words file. In the example below, the output is passed to the column command to make output a little easier to use.
$ look quag | column quag quaggier quaggle quagmired quagmiriest quagga quaggiest quaggy quagmires quagmiry quaggas quagginess quagmire quagmirier quags
One serious limitation is that look only grabs lines that start with the letters you provide. The grep command is not as limited.
Using grep
The grep command can match strings anywhere in words. In the example below, the $ ensures that the grep command only selects words that end in “look”.
$ grep look$ /usr/share/dict/words | column belook inlook look onlook overlook side-look unlook flook klook-klook off-look Outlook plook skylook uplook forelook landlook offlook outlook relook underlook
It would have found 242 words had I not used the $ to ensure that the command above only looked for words ending in “look”.
$ grep look /usr/share/dict/words | wc -l 242
Wrap-up
Linux provides some excellent commands to help you avoid spelling errors. Some commands will likely need to first be installed on your system.
Source:: Network World