7 ways to compare text files on Linux

There are numerous ways to compare text files on a Linux system from the command line. This post describes seven commands that can help you do this and explains how to interpret the results.

So that you can best understand the commands included below, two of the files used in this post – file1 and file2 – contain these lines:

Kids, the seven basic food
groups are gum, puff pastry,
pizza, pesticides, antibiotics,
and milk duds!!

The third file, file3, contains these lines:

Kids, the six basic food
groups are fruits, vegetables,
grains, protein and dairy
and milk duds!!

Using the diff command

The diff command is one of the easiest commands to use to see the differences between two text files. In the first command below, we’re comparing the two files that just happen to be identical. As a result, there is no output. In the second command, we compare the two of the files with different content. The symbols refer to the first and second files and the three dashes separate the content and display those lines that are different. Any identical content in these files is omitted – as was the “and milk duds!!” line.

$ diff file1 file2

$ diff file1 file3
1,4c1,3
< Kids, the seven basic food
< groups are gum, puff pastry,
< pizza, pesticides, antibiotics,
< and milk duds!!
---
> Kids, the five basic food
> groups are fruits, vegetables,
> grains, protein and dairy

Using the comm command

The comm command will compare text files, but it has one special requirement. It expects the file content to be in sorted order. In the example below, you will see some complaints when non-sorted file content is used.

$ comm file1 file3
        Kids, the five basic food
comm: file 2 is not in sorted order
        groups are fruits, vegetables,
        grains, protein and dairy
Kids, the seven basic food
comm: file 1 is not in sorted order
groups are gum, puff pastry,
pizza, pesticides, antibiotics,
and milk duds!!
comm: input is not in sorted order

Were the files to be sorted first, the output would be displayed in a columnar format as shown below. Lines only in the first file are displayed in the first column. Lines only in the second file would be displayed in the middle column.  The lines that exist in both files would be displayed in the final column.

$ sort file1 > f1; sort file3 > f3
$ comm f1 f3
                and milk duds!!
        grains, protein and dairy
        groups are fruits, vegetables,
groups are gum, puff pastry,
Kids, the seven basic food
        Kids, the six basic food
pizza, pesticides, antibiotics,

Using the cmp command

The cmp command with no additional options confirms that files are different and points out the position of the first difference. In this case, that’s the 12th character in the first line.

$ cmp file1 file3
file1 file3 differ: byte 12, line 1
As with the diff command, there is no output if the files have identical content.
$ cmp file1 file2

Using the diff3 command

The diff3 command is similar to the diff command, but allows you to compare three files instead of just two. In addition, the formatting of the output is quite different. The ====3 included in the output below means that the third file is different. It then shows the differences between the first two files and the third.

$ diff3 file1 file2 file3
====3
1:1,3c
2:1,3c
  Kids, the seven basic food
  groups are gum, puff pastry,
  pizza, pesticides, antibiotics,
3:1,3c
  Kids, the six basic food
  groups are fruits, vegetables,
  grains, protein and dairy

Using the sdiff command

The sdiff command compares files side-by-side. In the first example, you can see that the content of both files is the same. In the second example, only the last line is the same in both files and the vertical bar noting the differences in the first lines is missing.

$ sdiff file1 file2
Kids, the seven basic food             Kids, the seven basic food
groups are gum, puff pastry,           groups are gum, puff pastry,
pizza, pesticides, antibiotics,        pizza, pesticides, antibiotics,
and milk duds!!                        and milk duds!!

$ sdiff file1 file3
Kids, the seven basic food             | Kids, the six basic food
groups are gum, puff pastry,           | groups are fruits, vegetables,
pizza, pesticides, antibiotics,        | grains, protein and dairy
and milk duds!!                          and milk duds!!

Using the colordiff command

The colordiff command displays the differences between two files like the diff command but adds color when the content is different. The first command below has no output because the files are the same. The second displays the differences between the two files. On your computer screen, the font used for the first set of lines would be red and the second green.

$ colordiff file1 file2
$ colordiff file1 file3
1,3c1,3
< Kids, the seven basic food
< groups are gum, puff pastry,
< pizza, pesticides, antibiotics,
---
> Kids, the six basic food
> groups are fruits, vegetables,
> grains, protein and dairy

Using the wdiff command

The wdiff command displays the content of the compared files (one copy) if they are identical. If they are different, it marks the differences using square brackets, minus and + signs to indicate their locations as in the second example below.

$ wdiff file1 file2
Kids, the seven basic food
groups are gum, puff pastry,
pizza, pesticides, antibiotics,
and milk duds!!
$ wdiff file1 file3
Kids, the [-seven-] {+six+} basic food
groups are [-gum, puff pastry,
pizza, pesticides, antibiotics,-] {+fruits, vegetables,
grains, protein and dairy+}
and milk duds!!

Wrap-up

Don’t take the limited commands in this post as meaning that these commands don’t have additional options. Use a command like wdiff –help to get a listing of the command’s options.

Source:: Network World