comm Command in Linux Explained [with Examples]

The comm command in Linux is a text utility that is used to compare two lexically sorted files line by line.

The primary purpose is to categorize lines into three columns: lines unique to the first file, lines unique to the second file, and lines common to both files.

In this guide, we learn about comm command in Linux with examples.

Syntax

The syntax for comm command is as follows :

comm [options] file1, file2

Options

Here are some common options you can use with comm command.

-1: Suppress column 1 (lines unique to file1).
-2: Suppress column 2 (lines unique to file2).
-3: Suppress column 3 (lines that appear in both files).
--check-order: Check that the input is correctly sorted, even if all input lines are pairable.
--nocheck-order: Do not check that the input is correctly sorted.
--output-delimiter=STR: Separate columns with STR.
--total: Output a summary.
-z, --zero-terminated: Line delimiter is NUL, not newline.

Basic Comparison

Let's take three text files. Here file1.txt and file2.txt are already sorted files.

$ cat file1.txt
Apple
Basketball
Football
Volleyball

$ cat file2.txt
Apple
Banana
Cherry

$ cat file3.txt
Apple
Science
Math
English

Lines that appear in one file but not in another file are unique lines while the lines that appear in both files are called common lines.

Let's compare two sorted file1.txt and file2.txt

$ comm file1.txt file2.txt

Output:

            	Apple
    	Banana
Basketball
    	Cherry
Football
Volleyball

Here you can see the output are in three columns where, zero tab represents the lines only present on file1.txt, while the column containing single tab represents the lines that are only in file2.txt and column with two tabs represents that the line is common in both file1.txt and file2.txt.

Now let's check what happens when we compare two unsorted files:

$ comm file1.txt file3.txt
                Apple
Basketball
Football
        Science
comm: file 2 is not in sorted order
        Math
        English
Volleyball
comm: input is not in sorted order

In the above output, you can see comm command is unable to produce proper output showing "input is not in sorted order". This means comm command requires input text to be sorted before passing to it.

Note: The comm command indeed requires lexicographic sorting, not numeric sorting. ie use plain sort without the -n option.

Approach 1:

$ comm <(sort file1.txt) <(sort file3.txt)
                Apple
Basketball
        English
Football
        Math
        Science
Volleyball

To display the lines without tab indentation, you may also use --output-delimiter but its not perfect.

$ comm <(sort file1.txt) <(sort file3.txt) | tr -d '\t'
Apple
Basketball
English
Football
Math
Science
Volleyball

Here Using <() process substitution and allows to use the output of a command as if it were a file. In this case, it's used to sort the files before passing them to comm.

What output shows? Lines unique to file1.txt, lines unique to file2.txt, lines common to both file1.txt and file2.txt.

Approach 2:

$ sort -o file2.txt file2.txt
$ sort -o file3.txt file3.txt
$ comm file2.txt file3.txt

First use sort -o to to overwrite the files with their sorted contents. Then compare two files using comm command.

The output contains default tabs

Suppressing Columns

By using ‘ -1, -2, -3 or combinations options you can suppress the columns.

By using ‘-1’: Suppress the lines unique to file1.txt

$ comm -1 file1.txt file2.txt | tr -d '\t'

Apple
Banana
Cherry

The output shows the lines that are common to both file1.txt and file2.txt and unique lines in file2.txt.

By using ‘-2’: Suppress the lines unique to file2.txt

$ comm -2 file1.txt file2.txt | tr -d '\t'

Apple
Basketball
Football
Volleyball

The output shows the lines that are common to both file1.txt and file2.txt and unique lines in file1.txt.

By using ‘-3’: Suppress the lines that are common in both file1.txt and file2.txt.

$ comm -3 file1.txt file2.txt | tr -d '\t'

Banana
Basketball
Cherry
Football
Volleyball

The output shows the all lines file1.txt and file2.txt except common lines.

By using ‘-123': Suppressing all columns (no columns displayed)

$ comm -123 file1.txt file2.txt

As expected the output will be blank.

By using ‘-12’: Shows the line that is common in both the files.

$ comm -12 file1.txt file2.txt | tr -d '\t

Apple

By using ‘-13’: Suppress the lines that are common in both files and display only the lines unique to the second file (file2.txt).

$ comm -13 file1.txt file2.txt | tr -d '\t

Banana
Cherry

By using ‘-23’: This is the opposite of -13, suppress the lines that are common in both files and display only the lines unique to the second file (file1.txt).

$ comm -23 file1.txt file2.txt | tr -d '\t

Few more examples:

Let's take another use case examples of two unsorted files and you need to get unique lines to the first file to be redirected to a new file. Example:

comm -23 <(sort file2.txt) <(sort file3.txt) > file4.txt

Results:

cat file4.txt

Banana
Cherry

Another use case example were you want to list all packages currently installed on Debian based which depends on python3:

comm -12 <(dpkg -l | awk '{print $2}' | sort) <(apt-cache rdepends python3 | awk '{print $1}' | sort)

To ignore case comm doesn't have a direct option, but you can convert the text in both input files to lowercase or uppercase using the tr command.

Example:

$ comm <(tr '[:upper:]' '[:lower:]' < file1.txt | sort) <(tr '[:upper:]' '[:lower:]' < file2.txt | sort)

By this way converting the text to lowercase, you ensure that the comparison is case-insensitive.

About The Author

Bobbin Zachariah

Bobbin Zachariah is an experienced Linux engineer who has been supporting infrastructure for many companies. He specializes in Shell scripting, AWS Cloud, JavaScript, and Nodejs. He has qualified Master’s degree in computer science. He holds Red Hat Certified Engineer (RHCE) certification and RedHat Enable Sysadmin.