Introduction
Binary files, a rich amalgam of compiled data, often conceal textual data essential for debugging, reverse engineering, and forensic analysis. Navigating this complex structure requires specialized techniques to accurately extract and interpret embedded strings.
In this guide, we learn Linux tools and its techniques to search strings in binary files.
strings command
In Linux and other Unix-like systems, the strings command is one of the preferred and most straightforward tools for extracting human-readable strings from binary files. Its primary purpose is to scan files and display sequences of printable characters, making it particularly useful for examining the contents of non-text files.
Example:
strings /usr/bin/nproc
This command extracts and displays all printable character sequences from the nproc binary, which is used in Linux to show the number of available processing units.
Some useful options of strings command that come useful to filter the search:
- -a - This scans the whole file rather than specific sections in object files.
- -f - This prints the file names in front of each extracted line.
- -n - This specifies the minimum length of a sequence of characters to be printed.
- -t - This displays the offset of the string in a specified format.
Example:
strings -f -a -n 5 /usr/bin/nproc
In the example, the string command scans the entire file (-a), lists lines with 5 or more characters (-n 5). The file name is displayed before each extracted line (-f) - which is useful when searching multiple files.
Example 2: Filter by grep
strings /usr/bin/nproc | grep "help"
This command extract all printable strings from the nproc executable and then filter the output to display only lines containing the word "help".
Using grep on Binary Data
While grep is traditionally used for text-based searches, it can also be used with binary files, with some limitations and potential pitfalls. The -a (or --binary-files=text) option treats the binary file as text. This can be useful to force grep to process a binary file.
Note: When grep matches a pattern in a binary file, it will try to display the matched line. However, binary files can have non-printable characters that might disrupt the terminal display.
Example:
grep -a "help" /usr/bin/nproc
When you run this command, if the string "help" exists within the readable strings of /usr/bin/nproc, then grep will output the lines containing that string.
If grep finds a match in a binary file without the -a
option, it will typically display a message like "Binary file binaryfile matches", instead of attempting to display the matched line.
When dealing with large binary files, memory limitations can become a concern when using tools like grep. To overcome this issue, several approaches can be taken. One solution involves feeding small chunks of the file to grep using commands like dd and fold, or by utilizing grep --mmap. These techniques help avoid running out of memory when processing large lines of data. An example of this approach is shown in the command: dd if=bigfile skip=xxx | fold | grep -b -a string. The -b option in grep provides the byte offset of matched strings.
To obtain the surrounding context of a pattern in binary files, the following command can be used: grep -aPo '.{0,20}pattern.{0,20}' binfile. This command will display up to 20 characters before and after the matched pattern. This can be especially useful to get a quick glance at where and how a particular string or pattern appears in a binary file, without necessarily having to deep dive into the full context or use a full-fledged hex editor.
Leveraging hexdump
The hexdump command is used to display the contents of a file in hexadecimal, decimal, octal, or ASCII format. The -c
option specifically instructs hexdump to display the file contents in canonical hex+ASCII format.
When you use hexdump with the -c option, the output consists of two sections for each line: 1. Hexadecimal representation of the bytes 2. ASCII representation of the same bytes, with non-printable characters.
Example:
hexdump -C /usr/bin/nproc | grep "help"
This leverages the hexdump tool to display the content of the nproc
binary in a "canonical" format and then searches for the string "help" within this output.
The advantage of hexdump -c is that you get to see the hexadecimal representation of data alongside its ASCII representation, which can be useful in some binary analysis scenarios.
Let's check how to use hexdump to search for a specific hexadecimal string within a binary file. Example:
hexdump -e '/1 "%02X"' /usr/bin/nproc | grep 68656C70
Where,
hexdump
: Display the file content in hex format.-e '/1 "%02X"'
: This is a format specification for hexdump./1
: This means process one byte at a time."%02X"
: Print the byte in two-digit uppercase hexadecimal format.
/usr/bin/nproc
: The file you are inspecting.grep 68656C70
:- This is searching for the hexadecimal string "68656C70".
- "68656C70" corresponds to the ASCII string "help" (
68
=h
,65
=e
,6C
=l
,70
=p
).
Using od
The od command in Linux stands for "octal dump" and is used to dump binary files in various formats, including octal, hexadecimal, and ASCII. It's traditionally been used to display the contents of binary files in human-readable format.
If you are looking to search for strings or sequences in binary files using od, you would typically use od to produce a dump and then pipe that to grep to search for specific strings or sequences.
od -A x -t x1z /usr/bin/nproc | grep '68 65 6c 70'
Here, the -A x
option displays addresses in hexadecimal, -t x1z
outputs data in hexadecimal bytes and ASCII, and we're searching for the hexadecimal representation of "help" (68
= h
, 65
= e
, 6c
= l
, 70
= p
).
xxd command
xxd is a versatile tool in Linux that can be used to create a hex dump of a given file or convert a hex dump back to its binary form. Using the -p option with xxd
gives a plain hex dump, which is a continuous stream of hexadecimal numbers without any additional formatting, offsets, or ASCII representation.
If you want to search for a particular string (e.g., "help"), you first need to convert that string to its hex representation. The string "help" translates to 68656c70 in hexadecimal.
xxd -p /usr/bin/nproc | grep '68656c70'
rabin2
rabin2 is a tool from the Radare2 suite primarily used to extract information from binary files. While it's not primarily designed for string searching like strings, it does have the capability to extract strings from a binary.
You can install rabin2 by downloading from git, using snap, or by the appropriate package manager of your Linux Distribution.
Using snap you can install rabin2 using the following command - sudo snap install radare2 --classic.
To extract strings from a binary using rabin2, you can use the following command:
rabin2 -z /usr/bin/nproc
The -z option specifically tells rabin2 to extract strings from the binary. This command will list all the strings it finds in the binary. This is similar to the strings command but with more flexibility.
If you want to search for a specific string within those extracted strings, you can then pipe the output to grep:
rabin2 -z /usr/bin/nproc | grep help
Some useful options of rabin2:
-I
- extract basic information from a binary.-S
- List all sections of a binary.-s
- List all symbols (both exported and imported).-e
- Display all entry points.
Comments