How to Search Strings in Binary Files

Written by: Aditya Harsh   |   Last updated: August 30, 2023

Introduction

Binary files, a rich amalgam of compiled data, often conceal textual data essential for debugging, reverse engineering, and forensic analysis. Navigating this complex structure requires specialized techniques to accurately extract and interpret embedded strings.

In this guide, we learn Linux tools and its techniques to search strings in binary files.

strings command

In Linux and other Unix-like systems, the strings command is one of the preferred and most straightforward tools for extracting human-readable strings from binary files. Its primary purpose is to scan files and display sequences of printable characters, making it particularly useful for examining the contents of non-text files.

Example:

strings /usr/bin/nproc
using strings commands

This command extracts and displays all printable character sequences from the nproc binary, which is used in Linux to show the number of available processing units.

Some useful options of strings command that come useful to filter the search:

  • -a - This scans the whole file rather than specific sections in object files.
  • -f - This prints the file names in front of each extracted line.
  • -n - This specifies the minimum length of a sequence of characters to be printed.
  • -t -  This displays the offset of the string in a specified format. 

Example:

strings -f -a -n 5 /usr/bin/nproc
using strings options to search strings in binary file

In the example, the string command scans the entire file (-a), lists lines with 5 or more characters (-n 5). The file name is displayed before each extracted line (-f) - which is useful when searching multiple files.

Example 2: Filter by grep

strings /usr/bin/nproc | grep "help"
strings with pipe grep

This command extract all printable strings from the nproc executable and then filter the output to display only lines containing the word "help".

Using grep on Binary Data

While grep is traditionally used for text-based searches, it can also be used with binary files, with some limitations and potential pitfalls. The -a (or --binary-files=text) option treats the binary file as text. This can be useful to force grep to process a binary file.

Note: When grep matches a pattern in a binary file, it will try to display the matched line. However, binary files can have non-printable characters that might disrupt the terminal display.

Example:

grep -a "help" /usr/bin/nproc
using grep -a

When you run this command, if the string "help" exists within the readable strings of /usr/bin/nproc, then grep will output the lines containing that string.

If grep finds a match in a binary file without the -a option, it will typically display a message like "Binary file binaryfile matches", instead of attempting to display the matched line.

When dealing with large binary files, memory limitations can become a concern when using tools like grep. To overcome this issue, several approaches can be taken. One solution involves feeding small chunks of the file to grep using commands like dd and fold, or by utilizing grep --mmap. These techniques help avoid running out of memory when processing large lines of data. An example of this approach is shown in the command: dd if=bigfile skip=xxx | fold | grep -b -a string. The -b option in grep provides the byte offset of matched strings.

To obtain the surrounding context of a pattern in binary files, the following command can be used: grep -aPo '.{0,20}pattern.{0,20}' binfile. This command will display up to 20 characters before and after the matched pattern. This can be especially useful to get a quick glance at where and how a particular string or pattern appears in a binary file, without necessarily having to deep dive into the full context or use a full-fledged hex editor.

Leveraging hexdump

The hexdump command is used to display the contents of a file in hexadecimal, decimal, octal, or ASCII format. The -c option specifically instructs hexdump to display the file contents in canonical hex+ASCII format.

When you use hexdump with the -c option, the output consists of two sections for each line: 1. Hexadecimal representation of the bytes 2. ASCII representation of the same bytes, with non-printable characters.

Example:

hexdump -C /usr/bin/nproc | grep "help"
hexdump -C

This leverages the hexdump tool to display the content of the nproc binary in a "canonical" format and then searches for the string "help" within this output.

The advantage of hexdump -c is that you get to see the hexadecimal representation of data alongside its ASCII representation, which can be useful in some binary analysis scenarios.

Let's check how to use hexdump to search for a specific hexadecimal string within a binary file. Example:

hexdump -e '/1 "%02X"' /usr/bin/nproc | grep 68656C70
hexdump with -e option

Where,

  • hexdump: Display the file content in hex format.
  • -e '/1 "%02X"': This is a format specification for hexdump.
    • /1: This means process one byte at a time.
    • "%02X": Print the byte in two-digit uppercase hexadecimal format.
  • /usr/bin/nproc: The file you are inspecting.
  • grep 68656C70:
    • This is searching for the hexadecimal string "68656C70".
    • "68656C70" corresponds to the ASCII string "help" (68 = h, 65 = e, 6C = l, 70 = p).

Using od

The od command in Linux stands for "octal dump" and is used to dump binary files in various formats, including octal, hexadecimal, and ASCII. It's traditionally been used to display the contents of binary files in human-readable format.

If you are looking to search for strings or sequences in binary files using od, you would typically use od to produce a dump and then pipe that to grep to search for specific strings or sequences.

od -A x -t x1z /usr/bin/nproc | grep '68 65 6c 70'
using od command for binary file search

Here, the -A x option displays addresses in hexadecimal, -t x1z outputs data in hexadecimal bytes and ASCII, and we're searching for the hexadecimal representation of "help" (68 = h, 65 = e, 6c = l, 70 = p).

xxd command

xxd is a versatile tool in Linux that can be used to create a hex dump of a given file or convert a hex dump back to its binary form. Using the -p option with xxd gives a plain hex dump, which is a continuous stream of hexadecimal numbers without any additional formatting, offsets, or ASCII representation.

If you want to search for a particular string (e.g., "help"), you first need to convert that string to its hex representation. The string "help" translates to 68656c70 in hexadecimal.

xxd -p /usr/bin/nproc | grep '68656c70'
using xxd command for binary file search

rabin2

rabin2 is a tool from the Radare2 suite primarily used to extract information from binary files. While it's not primarily designed for string searching like strings, it does have the capability to extract strings from a binary.

You can install rabin2 by downloading from git, using snap, or by the appropriate package manager of your Linux Distribution.

Using snap you can install rabin2 using the following command - sudo snap install radare2 --classic.

To extract strings from a binary using rabin2, you can use the following command:

rabin2 -z /usr/bin/nproc
using rabin2 command for binary file search

The -z option specifically tells rabin2 to extract strings from the binary. This command will list all the strings it finds in the binary. This is similar to the strings command but with more flexibility.

If you want to search for a specific string within those extracted strings, you can then pipe the output to grep:

rabin2 -z /usr/bin/nproc | grep help
rabin2 pipe out out to grep

Some useful options of rabin2:

  • -I - extract basic information from a binary.
  • -S - List all sections of a binary.
  • -s - List all symbols (both exported and imported).
  • -e - Display all entry points.

About The Author

Aditya Harsh

Aditya H

Aditya Harsh graduated from BITS Pilani, India with a Bachelor’s in Computer Science in 2015. Since then he has been working as a Software Developer and specializes in automation, especially in Java, and Bash scripting. Over these years, he has worked on a lot of cutting-edge technologies and enjoys using his skills to contribute to technological advances. He believes in the power of knowledge and takes great joy in sharing what he has learned.

SHARE

Comments

Please add comments below to provide the author your ideas, appreciation and feedback.

Leave a Reply

Leave a Comment