Linux uniq command Explained [with Examples]

Written by: Nimesha Jinarajadasa   |   Last updated: August 30, 2022

The uniq command in Linux is used to manipulate and filter text by removing duplicate lines from sorted input. It is primarily used to filter, count, or display unique lines from a text source, with the requirement that the input data should be sorted beforehand to work effectively.

Syntax

The basic syntax of the uniq command is as follows:

uniq [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

Here are some commonly used options for the uniq command:

  • -c or --count: Prefix lines with the count of occurrences.
  • -d or --repeated: Display only duplicate lines.
  • -i or --ignore-case: Ignore differences in case when comparing lines.
  • -f N or --skip-fields=N: Ignore the first N fields (columns) in each line.
  • -s N or --skip-chars=N: Ignore the first N characters in each line.
  • -w N or --check-chars=N: Compare only the first N characters of each line.

The uniq command in its basic form counts or removes only adjacent duplicate lines. If you want to identify or remove non-adjacent duplicate lines, you need to ensure that the lines are sorted first so that identical lines are grouped together.

Examples

For example, we will use the following content of file unique_sample.txt

Linux and Unix share similarities.
Linux command-line is powerful.
Linux command-line is powerful.
Linux is an open-source operating system.
Linux is known for its stability.
Linux command-line is powerful.
Debian is another Linux distribution.

Removing Duplicate Lines

sort unique_sample.txt | uniq

This removes duplicate lines from the file and displays only the unique lines.

Here the pipeline allows you to perform a series of text-processing operations on the data before applying the uniq command. You can also use uniq command directly with a file.

Output:

Debian is another Linux distribution.
Linux and Unix share similarities.
Linux command-line is powerful.
Linux is an open-source operating system.
Linux is known for its stability.

If you want to unique lines to a new file, use:

sort input.txt | uniq > output.txt

Displaying Duplicate Lines

sort unique_sample.txt | uniq -d

This displays only the duplicate lines from a sorted text file.

Output:

Linux command-line is powerful.

Where the -D option, prints all duplicate lines, whereas the -d option prints only one instance of each duplicate line.

sort unique_sample.txt | uniq -D

Output:

Linux command-line is powerful.
Linux command-line is powerful.
Linux command-line is powerful.

If you want to consider lines that are the same when ignoring case and display only the unique ones, you can use the -i option.

Counting Duplicate Lines

sort unique_sample.txt | uniq -c

This provides a count of occurrences for each unique line in the sorted input.

  1 Debian is another Linux distribution.
  1 Linux and Unix share similarities.
  3 Linux command-line is powerful.
  1 Linux is an open-source operating system.
  1 Linux is known for its stability.

Display Unique Lines

uniq -u unique_sample.txt

The -u option shows lines that occur only once and excludes any lines with duplicates.

Output:

Linux and Unix share similarities.
Linux is an open-source operating system.
Linux is known for its stability.
Linux command-line is powerful.
Debian is another Linux distribution.

Skip Characters

You can use the --skip-chars=N option to skip a specified number of characters at the beginning of each line when comparing lines for uniqueness.

Assuming you have the file data.txt with the content:

12345Line1
23456Line2
34567Line3
12345Line3

Command:

cat data.txt | uniq --skip-chars=5 -c

Output:

      1 12345Line1
      1 23456Line2
      2 34567Line3

From the output:

  • "12345Line1" and "23456Line2" are unique lines because their content after skipping the first 5 characters differs.
  • "34567Line3" appears twice in the input data, and the uniq -c command correctly identifies and counts it as a duplicate.

Let's look into another example where we have a CSV file with the following content:

2023,Software Engineer
2020,Software Engineer
2021,Data Analyst
2000,Data Analyst
2000,System Administrator

Command:

cat cdata.csv | sort -t, -k2 | uniq --skip-chars=4 -c

Here we used sort, uniq, and --skip-chars to effectively process the data, identifying and counting the unique lines based on the content after skipping the first 4 characters.

Output:

  2 2000,Data Analyst
  2 2020,Software Engineer
  1 2000,System Administrator

Note: The uniq also supports another option -w N which can limit the comparison to the first N characters.

Ignoring Leading Fields

The -f N or --skip-fields=N option allows you to ignore a specified number of fields (columns) at the beginning of each line when comparing lines for uniqueness.

For example, you have the following data in data.csv file:

Year,Designation,Country
2023,Software Engineer,USA
2020,Software Engineer,USA
2021,Data Analyst,Australia
2000,Data Analyst,Ireland
2000,System Administrator,Ireland

Command:

head -n1 data.csv && tail -n+2 data.csv | sort -u -t',' -k3,3

This approach is effective for sorting and removing duplicates based on a specific column, in this case, the "Country" column.

  • head -n1 data.csv: This command prints the first line (header) of the CSV file.
  • tail -n+2 data.csv: This command skips the first line (header) and outputs the remaining lines.
  • sort -u -t',' -k3,3: Here's how this part works:
    • -u: Remove duplicate lines.
    • -t',': Set the comma (',') as the field separator.
    • -k3,3: Sort by the third column (Country).

Output:

Year,Designation,Country
2021,Data Analyst,Australia
2000,Data Analyst,Ireland
2023,Software Engineer,USA

This result removes lines with duplicated values in the "Country" column while keeping the header line in the output.

About The Author

Nimesha Jinarajadasa

Nimesha Jinarajadasa

Nimesha is a Full-stack Software Engineer for more than five years, he loves technology, as technology has the power to solve our many problems within just a minute. He have been contributing to various projects over the last 5+ years and working with almost all the so-called "03 tiers(DB, M-Tier, and Client)". Recently, he has started working with DevOps technologies such as Azure administration, Kubernetes, Terraform automation, and Bash scripting as well.

SHARE

Comments

Please add comments below to provide the author your ideas, appreciation and feedback.

Leave a Reply

Leave a Comment