Linux uniq command Explained

Last updated: August 30, 2022

In some scenarios, files might contain several duplicate lines. It becomes extremely hard to view those repeated neighboring lines in a file. In Linux, the uniq command detect repeated lines, reports or removes the duplicated lines, and writes the filtered data to a file or standard output.

uniq command

The uniq command is a Linux command line utility program that is capable of identifying adjacent lines that are duplicated in an input file and prints unique lines to the standard output or writes to an output file.

Most importantly, the uniq command can locate duplicate lines only if they are adjacent. So, the input text file content needs to be sorted ahead. Then we can pipe the sorted content to the uniq command. In that case, the sort command can be used to sort the file content.

syntax

uniq [option] ... [input_file [output_file]]

uniq command Options

Useful options of uniq command:

OptionsDescription
-u, - -uniqueThe unique lines will only be printed from the input content.
-d, - -repeatedThe duplicate lines will only be printed from the input content where it displays one line per each repeated line.
-c, - -countDisplays the duplicate count of each repeated line as a number before the line.
-D, - -all-repeatedOnly outputs all the duplicated lines and ignores unique lines.
-z, - -zero-terminatedA line will end with a NULL or 0 bytes. By default, each line ended with a newline.
-f N, - -skip-fields(N)When the command checks for the uniqueness of a line, an N number of fields will be ignored.
-s N, - -skip-chars(N)First N number of characters will be skipped when comparing each line for uniqueness.
-i, - -ignore-caseThe comparisons done by uniq command are case sensitive. The -i option can be used to make case insensitive comparisons among each line.
-w N, - -check-chars(N)This option will use the specified number of characters(N) as the first N characters to be tested for uniqueness. Opposite of the -s N option where it skips the first N chars.

uniq options with examples

In the following examples, we will be using a text file called sample.txt with the below content.

Remote working is the new Trend.
Remote working is the new Trend.
Remote working is the new Trend.
Remote working is the new Trend.
No mercy
How are you..
How are you..
How are you.
Super Cars are the future.

-c option

The -c option displays the duplicate count of each line for a given input file.

uniq -c sample.txt
linux uniq count duplicate lines in a file

As shown in the output, the count of the duplicated lines is shown as a number before each repeated line group.

-d option

It prints only repeated lines and non-repeated lines are discarded.

uniq -d sample.txt
print only repeated lines

As expected, the following unique or non-repeated lines have been ignored in the output. It prints one line per each repeated group but not all the duplicate lines.

No mercy
How are you.
Super Cars are the future.

-D option

The -D option prints all the duplicated lines from the input file. It doesn't group the repeated lines as in the -d option. In addition, it declines non-repeated lines as well.

uniq -D sample.txt
linux uniq show identical lines  lines

As expected, the command permitted us to print duplicate lines in the input file.

-u option

The -u option displays all the non-repeated lines in the given text file. In short, the command is capable of displaying only the unique lines.

uniq -u sample.txt
displays all non-repeated lines

Upon executing the above command, the uniq command enabled us to print unique lines.

-f N option

With the -f option, you can ignore a given number fields from the start of a line. A field is a collection of characters delimited by white space.

When the uniq command checks for the uniqueness among lines, it skips the given number of fields from the input text file and outputs the lines per each repeated group. In addition, it displays all the unique lines as well. Let's use the following sample1.txt file as the input.

#1 Remote working is the new Trend.
#2 Remote working is the new Trend.
#3 Remote working is the new Trend.
#4 Remote working is the new Trend.
#5 No mercy
#6 How are you..
#7 How are you..
#8 How are you.
#9 Super Cars are the future.

In the above input content, the first field is a #number pattern text. So let's ignore the first field when the uniq command compares each line for uniqueness.

uniq -f 1 sample1.txt

-s N option

This option is more similar to the -f option except that the -s option skips a given number of characters from the start of each line when checking for duplicates.

Let's use the sample2.txt file with the following content.

AbEHOW ARE YOU?
*^#HOW ARE YOU?
089HOW ARE YOU?

$#@NO MERCY
pppNO MERCY
111NO MERCY
uniq -s 3 sample2.txt

As expected, the line 'HOW ARE YOU?' has been identified as a duplicated line. Because the first 3 characters were skipped from each line, All the 'HOW ARE YOU?' phrases become identical lines.

Similarly, the lines containing the 'NO MERCY' phrase is identified as repeated line.

-w option

The -w option can be used to consider only a given number of characters when comparing lines for uniqueness. The output would be one line per each repeated lines and also the unique lines.

The following input in the sample3.txt file will be used in this example.

$#1This is one line
$#1This is another line

Unique line

$$$New type line
$$$New type line to check
uniq -w 3 sample3.txt
check a number of characters for duplicates

In the above example, the uniq command only considers the first three characters of each line. So, the first two lines are considered duplicate lines. Similarly, the last two lines become identical as well. In addition, the non-repeated lines are printed too.

-i option

The -i option ignores the case of the content in the input file. The duplicated lines will be removed from the output and unique lines will be printed as shown in the following output.

We use the sample4.txt file which holds the following lines.

THIS IS CAPS LINE.
this is caps line.
this IS Caps LINE.

unique line.
uniq -i sample4.txt
uniq ignore the character case

The line with the phrase 'THIS IS CAPS LINE.' is duplicated in another two lines when the case is ignored. So, those repeated lines will be removed from the output.

-z option

Usually, the uniq command gives a newline terminated output. This can be altered with the -z option when specified the output will be null-terminated. The following syntax is used.

uniq -z input_file

Conclusion

To conclude, the uniq command in Linux can detect matching lines in a given input file and filter out the identical lines as per your requirement. Several options like -c, -u, -i, etc are available to use with the uniq command to filter out the final output. As discussed, the uniq command needs matching lines to be adjacent to each other when determining the uniqueness of the content. Overall, the uniq command can be very useful when dealing with lengthy content which contains tens of repeated lines.

About The Author

Nimesha Jinarajadasa

Nimesha Jinarajadasa

Nimesha is a Full-stack Software Engineer for more than five years, he loves technology, as technology has the power to solve our many problems within just a minute. He have been contributing to various projects over the last 5+ years and working with almost all the so-called "03 tiers(DB, M-Tier, and Client)". Recently, he has started working with DevOps technologies such as Azure administration, Kubernetes, Terraform automation, and Bash scripting as well.

SHARE

Comments

Please add comments below to provide the author your ideas, appreciation and feedback.

Comments Off on How to Articles