In this guide, we learn how to use the grep command to count the number of matches in the file(s) or a directory.
Counting with Grep
One of the useful features of grep is to count the number of lines that match a pattern. This is done using the -c or --count option.
Example:
grep -c Linux samplefile.txt
This command will print out a number representing how many lines in "samplefile.txt" contain the word "Linux". Count 5 in the output indicates that there are 5 lines that contain the match.
Counting every individual occurrence
As we see in the previous section the grep -c command only counts the number of lines it matches. What about counting every individual occurrence of a match on each line? The grep command by direct does have the option to count all individual occurrences of a pattern.
The workaround is to use grep with the -o option and wc -l command.
Example:
grep -o Linux samplefile.txt | wc -l
Where -o tell grep to output only the matching parts and wc -l count the number of lines from its input. In the above command, it will print out a number representing how many times the word "Linux" appears in "samplefile.txt".
Remember the option -i to tell grep to ignore case and -w for exact word matching - those can be combined based on your requirement.
Count occurrences in a directory with grep:
What about telling grep to take a count of occurrences in a directory and its subdirectories? Then we can combine -r or -R option for the recursive operation. Example:
grep -o -r Linux dir1/ | wc -l
In case you want to count matches with specific files inside a directory run:
grep -o "Linux" ./*.txt | wc -l
This command counts all individual occurrences of the word "Linux" in all .txt files within the current directory.
2. Syntax
The basic syntax for grep counting:
grep -c <pattern> filename
Where c stands for count
Let's use the following sample file named text.txt
3. Counting Matching Lines
Let's first check how we can use grep to search for a specific pattern in a given file.
Example:
$ grep "unix" test.txt
This will just match the pattern “unix” and print matching lines highlighting the matching pattern. Note that grep is case-sensitive by default.
To get the count of lines where this pattern is matching we need to use -c option. Where -c counts the number of matching lines.
$ grep -c "unix" test.txt
Output 2 indicates two lines have the matching pattern "unix".
4. Counting Multiple Matches in a Line
To find multiple matches per line we need to use -o option with grep. The -o option extracts each occurrence of the pattern on a separate line,
$ grep -o "unix" test.txt
This will search for the word "unix" in the file test.txt and display each occurrence of "unix" on a separate line.
Now we can use wc -l to filter out the count of each match. Alternatively, use the -c option.
$ grep -o "unix" test.txt | wc -l
or$ grep -o "unix" test.txt | grep -c "unix"
To grep the number of unique occurrences, type:
$ grep -o "unix" test.txt | sort | uniq -c
Let's look at another example using regular expression.
$ grep -o " u[a-z]*" test.txt
The regular expression u[a-z]*
matches "u" followed by zero or more lowercase letters. So it will match words such as "unix," "ubuntu," and "uTorrent."
This will output the total count of occurrences of the pattern " u[a-z]*" in the file test.txt.
$ grep -o " u[a-z]*" test.txt | wc -l
This means that there are two occurrences of the pattern " u[a-z]*" in the file test.txt.
4.1 Counting Matches Multiple Files
Grep functionality is not just limited to a single file, we can also use it to match a pattern in multiple files.
To understand this let's create another file named test2.txt with some sample data.
Let's first search for the word "unix" in all files with a .txt extension in the current directory.
$ grep "unix" *.txt
Now let's display the count of matching lines for each file.
$ grep -c "unix" *.txt
or$ grep -c "unix" test.txt test2.txt
In this example, test.txt contains 2 lines with the word "unix", test2.txt contains 1 line.
5. Case-Insensitive Matching
By using the -i option, grep treats uppercase and lowercase characters as equivalent during the search, effectively making it case-insensitive.
For example:
$ grep -i "unix" test.txt
It searches for all patterns as unix, Unix,uNix etc. irrespective of the case. To get the counts of the simply add -c option to get the count.
$ grep -ic "unix" test.txt
Ignore the case and let's do count of unique matches:
$ grep -oi "unix" test.txt | sort | uniq -c
6. Grep Count Recursive
The -r option is used to search for a pattern recursively and search for the pattern in each file and every directory from the current location.
The current directory contains two files (test.txt and test2.txt and a directory (opsys).
The following command will search for the string "unix" in all files within the current directory and its subdirectories. If a match is found, grep
will display the line containing the matched pattern along with the corresponding file name.
$ grep -r "unix" *
To count the number of occurrences of the string "unix" in all files within the current directory and its subdirectories:
$ grep -or "unix" * | wc -l
Output 9 indicates the total count of occurrences of the string "unix" in test.txt, test2.txt, and opsys/text2.txt.
7. Count Commented / Empty Lines
grep can be a handy tool for identifying commented lines and empty files within a shell script. Here we will use the following sample file named test.txt
7.1. Commented Lines (#)
Use the following grep command to count the number of lines in a file that start with a hash symbol (#
).
$ grep -c "^#" test.txt
Output 1 indicates that there is only 1 commented line that begins with a hash symbol.
7.8. Empty Lines
"^$" is a regular expression pattern that matches empty lines. The caret (^) represents the beginning of a line, and the dollar sign ($) represents the end of a line. When both are combined without any characters in between, it represents an empty line.
The following grep command counts the number of blank lines in a file named test.txt.
$ grep -c "^$" test.txt
The output indicates the file contains 4 blank lines.
8. Conclusion
In conclusion, grep is a powerful tool for efficiently counting matches in text files. It offers a straightforward syntax and a range of options to meet diverse matching needs. Whether it's counting matching lines, multiple matches within a line, or searching recursively in directories, grep provides an effective solution.
Comments