Extract Extension from File Path in Bash

Written by: Bobbin Zachariah   |   Last updated: May 27, 2023

1. Introduction

In Bash scripting, extracting file extensions from file paths is a common requirement for various tasks. Being able to retrieve the file extension provides valuable information about the file type and enables users to perform actions like renaming files, filtering by extension, or applying specific operations based on the file format.

This tutorial shows different techniques in Bash for extracting file extensions, offering practical solutions and example scripts to empower users in efficiently handling file manipulation tasks.

2. Using parameter expansion

We saw how parameter expansions work. We have used it to extract the file name with the extension, as well as just the file name. But we might only be interested in the file extension. We can still proceed with it.

filepath=/home/ubuntu/sample.txt
echo ${filepath#*.}
extraction of file extension from a file path in Bash using parameter expansion

The above example can be generalized as follows:

${variable#pattern}

This trims the longest match from the beginning till the pattern. It displays whatever is after the pattern. The only drawback of this method is that we have to use a variable since this is a parameter expansion

In our example, we have used the regex (*) as the pattern. In the pattern we have also added the dot (.). This dot represents the beginning of the extension. Bash will look for all the characters till the last dot and trim it. Hence, we will get the characters after the final dot. This gives us the extension of the file.

3. Using awk

We have used the awk command in the previous article to extract the name of the file with its extension. We can just take out the extension from the provided full path as well.

Example:

filepath=/home/ubuntu/sample.txt
echo $filepath | awk -F. '{print $NF}'
using awk command to extract the file extension from the given file path in Bash

In this example, we have taken a variable to hold the entire file path. We are free to supply that string directly to the echo command. We pipe the result of this echo command so that Bash can send it as an input to the awk command. 

The -F option available with the awk command acts as the field separator. We have built the regular expression for the awk statement in such a way that the string dissociates into two halves from the final dot (.).

Awk with the support of the NF keyword facilitates the extraction of the file extension. $NF selects the last half of the field, already separated by the -F option.

Let us consider another example:

filepath=/home/ubuntu/file_backup.tar.gz
echo $filepath | awk -F. '{print $NF}'
using awk command to extract the file extension from the provided file path in Bash, even when multiple dots are present in the filename

Not the output we were hoping for? We were expecting the awk command to give us the extension as "tar.gz". However, we only got "gz". This is because the string in the filepath splits from the final dot (gz) rather than from the initial one tar.gz.

As a result, we should be careful with awk when working with extensions that have more than one dot. This will also produce unexpected results, if the file path has a dot as well.

4. Example Scripts

4.1 Renaming files with new extensions

The script prompts the user for the directory path. After checking its existence it asks the user to provide the extension. Based on that it will create 5 sequential files from 1 to 5 with the provided extension.

Example:

#!/bin/bash 

read -p "Enter the directory path: " dpath


if [ ! -d "$dpath" ]
then
	echo "Invalid directory. Please check the input."
else
	read -p "Enter the new extension you want: " ext
	for i in $(seq 1 5)
do
touch ${i}.${ext}
done

fi
Bash that prompts the user to enter a directory path and a new extension.

1. The script begins by using the read command to prompt the user to enter the directory path. The entered value is stored in the variable named dpath.

2. The script checks if the value present in the variable path is a directory or not using the -d test condition. If the directory structure doesn't exist, it prints "Invalid directory path. Please check the input."

3. If the file exists, the script prompts the user to enter the new extension. This response is stored in the ext variable.

4. The script runs a for loop 5 times. The iterator is the variable i which starts from the value 1 and goes till 5.  

5. In every iteration, the script creates a file $i with the extension $ext. 

6. By the end of the script, 5 files will be created in the working directory, with filename as 1, 2, 3, 4 and 5 respectively. The extension of all these five files will come from input provided by the user, which is stored in the ext variable.

4.2 Listing the extension of all the files

The script prompts the user for the directory path. After checking its existence it lists the extensions of all the files present 

Example:

#!/bin/bash 

read -p "Enter the directory path: " dpath


if [ ! -d "$dpath" ]
then
	echo "Invalid directory. Please check the input."
else
	contentsOfDir=`ls`
for content in $contentsOfDir
do
	if [ -d $content ]
	then
		echo $content is a directory and will not have any extension
	else
		ext=`echo ${content#*.}`
		echo "$content has extension: "$ext
	fi
done
fi
Bash that prompts the user to enter a directory path. It lists the contents of the directory and identifies the file extensions for each file while noting that directories do not have extensions

1. The execution of the script starts by prompting the user to enter the directory path. This is done by the read command and it stores the response in the variable named dpath.

2. The script verifies if the user input is a directory or not using the -d test condition on the variable dpath. If the path is non-existent, it prints "Invalid directory path. Please check the input."

3. If the file exists, the script fetches all the contents of that directory and stores it in the variable contentsofDir using the backticks technique.

4. The script runs a for loop for all the contents present in that directory. The for loop iterator is named content.

5. In every iteration, the script checks if the content is a file or directory. If it is a directory, it displays that the current value of the iterator is a directory and will not have any extension.

6. If it is a file (not a directory), the script will extract the extension using the parameter expansion and store the result in the ext variable.

7. For each iteration, it will finally print the name of the file and the extension it has.

About The Author

Bobbin Zachariah

Bobbin Zachariah

Bobbin Zachariah is an experienced Linux engineer who has been supporting infrastructure for many companies. He specializes in Shell scripting, AWS Cloud, JavaScript, and Nodejs. He has qualified Master’s degree in computer science. He holds Red Hat Certified Engineer (RHCE) certification and RedHat Enable Sysadmin.

SHARE

Comments

Please add comments below to provide the author your ideas, appreciation and feedback.

Leave a Reply

Leave a Comment