The comm command in Linux is a text utility that is used to compare two lexically sorted files line by line.
The primary purpose is to categorize lines into three columns: lines unique to the first file, lines unique to the second file, and lines common to both files.
In this guide, we learn about comm command in Linux with examples.
Syntax
The syntax for comm command is as follows :
comm [options] file1, file2
Options
Here are some common options you can use with comm command.
- -1: Suppress column 1 (lines unique to file1).
- -2: Suppress column 2 (lines unique to file2).
- -3: Suppress column 3 (lines that appear in both files).
- --check-order: Check that the input is correctly sorted, even if all input lines are pairable.
- --nocheck-order: Do not check that the input is correctly sorted.
- --output-delimiter=STR: Separate columns with STR.
- --total: Output a summary.
- -z, --zero-terminated: Line delimiter is NUL, not newline.
Basic Comparison
Let's take three text files. Here file1.txt and file2.txt are already sorted files.
$ cat file1.txt
Apple Basketball Football Volleyball
$ cat file2.txt
Apple Banana Cherry
$ cat file3.txt
Apple Science Math English
Lines that appear in one file but not in another file are unique lines while the lines that appear in both files are called common lines.
Let's compare two sorted file1.txt and file2.txt
$ comm file1.txt file2.txt
Output:
Apple
Banana
Basketball
Cherry
Football
Volleyball
Here you can see the output are in three columns where, zero tab represents the lines only present on file1.txt, while the column containing single tab represents the lines that are only in file2.txt and column with two tabs represents that the line is common in both file1.txt and file2.txt.
Now let's check what happens when we compare two unsorted files:
$ comm file1.txt file3.txt
Apple
Basketball
Football
Science
comm: file 2 is not in sorted order
Math
English
Volleyball
comm: input is not in sorted order
In the above output, you can see comm command is unable to produce proper output showing "input is not in sorted order". This means comm command requires input text to be sorted before passing to it.
Note: The comm command indeed requires lexicographic sorting, not numeric sorting. ie use plain sort without the -n option.
Approach 1:
$ comm <(sort file1.txt) <(sort file3.txt)
Apple Basketball English Football Math Science Volleyball
To display the lines without tab indentation, you may also use --output-delimiter but its not perfect.$ comm <(sort file1.txt) <(sort file3.txt) | tr -d '\t' Apple Basketball English Football Math Science Volleyball
Here Using <()
process substitution and allows to use the output of a command as if it were a file. In this case, it's used to sort the files before passing them to comm.
What output shows? Lines unique to file1.txt, lines unique to file2.txt, lines common to both file1.txt and file2.txt.
Approach 2:
$ sort -o file2.txt file2.txt
$ sort -o file3.txt file3.txt
$ comm file2.txt file3.txt
First use sort -o to to overwrite the files with their sorted contents. Then compare two files using comm command.
The output contains default tabs
Suppressing Columns
By using ‘ -1, -2, -3 or combinations options you can suppress the columns.
By using ‘-1’: Suppress the lines unique to file1.txt
$ comm -1 file1.txt file2.txt | tr -d '\t'
Apple
Banana
Cherry
The output shows the lines that are common to both file1.txt and file2.txt and unique lines in file2.txt.
By using ‘-2’: Suppress the lines unique to file2.txt
$ comm -2 file1.txt file2.txt | tr -d '\t'
Apple
Basketball
Football
Volleyball
The output shows the lines that are common to both file1.txt and file2.txt and unique lines in file1.txt.
By using ‘-3’: Suppress the lines that are common in both file1.txt and file2.txt.
$ comm -3 file1.txt file2.txt | tr -d '\t'
Banana
Basketball
Cherry
Football
Volleyball
The output shows the all lines file1.txt and file2.txt except common lines.
By using ‘-123': Suppressing all columns (no columns displayed)
$ comm -123 file1.txt file2.txt
As expected the output will be blank.
By using ‘-12’: Shows the line that is common in both the files.
$ comm -12 file1.txt file2.txt | tr -d '\t
Apple
By using ‘-13’: Suppress the lines that are common in both files and display only the lines unique to the second file (file2.txt).
$ comm -13 file1.txt file2.txt | tr -d '\t
Banana
Cherry
By using ‘-23’: This is the opposite of -13, suppress the lines that are common in both files and display only the lines unique to the second file (file1.txt).
$ comm -23 file1.txt file2.txt | tr -d '\t
Few more examples:
Let's take another use case examples of two unsorted files and you need to get unique lines to the first file to be redirected to a new file. Example:
comm -23 <(sort file2.txt) <(sort file3.txt) > file4.txt
Results: cat file4.txt Banana Cherry
Another use case example were you want to list all packages currently installed on Debian based which depends on python3:
comm -12 <(dpkg -l | awk '{print $2}' | sort) <(apt-cache rdepends python3 | awk '{print $1}' | sort)
To ignore case comm doesn't have a direct option, but you can convert the text in both input files to lowercase or uppercase using the tr command.
Example:
$ comm <(tr '[:upper:]' '[:lower:]' < file1.txt | sort) <(tr '[:upper:]' '[:lower:]' < file2.txt | sort)
By this way converting the text to lowercase, you ensure that the comparison is case-insensitive.
Comments