Process and Thread Concepts of Linux

Written by: Linuxopsys   |   Last updated: February 24, 2023

In Linux, processes and threads are some of the important concepts to get acquainted with. Even though Linux-based systems treat them similarly, but each has its own use and characteristics. So it's important to understand the differences and nuances in order to choose the right approach for your specific task.

Process

A process is basically a program in its execution state; every time you start a program, a new process will be created with all of its commands and instructions, translated to the CPU and other resources. Additionally, each running program has a unique PID number (short for process ID) associated with a specific user or group, to keep things organized.

The ps command

The ps command is one famous command that could be used to keep track of the running processes in your Linux system. By providing a lot of different output choices and options, to get you relevant information, the ps command comes with its own complexities.

However, let's start with its simplest form to get the running processes in our system:

ps
ps showing PID of a process

As can you see above, the output lists only processes that are related to the current user and terminal, specifically, the process ID (PID) of the processes, the terminal (TTY) that they are spawned from, and the CPU time the process has consumed. Moreover, you can always use flags to get more information and know what's going on under the hood.

The -ef option

One of the useful options to use the ps command are -e and -f switches, which will allow you to monitor all processes running on the system coupled with additional informative columns.

Let's see execute the command:

ps –ef
ps with -ef showing all running processes

To break it down, the -e option was responsible for listing all running processes on the system, simultaneously, the -f option expanded the default columns (PID, TTY, and TIME) to the following:

  • UID: The user who launched the process.
  • PPID: The PID number of the parent process that spawned the current process.
  • C: The amount of processor resources used by the process since its creation.
  • STIME: The system time at the process initial creation.
  • CMD: The name of the program or command spawned by the current process.

Before concluding this section, you should keep in mind that, there are two different types of processes, parent process (as mentioned in PPID), and child process. When you open a terminal in Linux, a new shell process is created -- it's the program responsible for managing the terminal window and accepting user input.

From that shell, you might trigger the creation of a second process through a command or a script. And this second process is known as the "child process" being that it was spawned by the first process or "parent process".

How Process Works in Linux C

In the Linux/GNU C realm, whenever you start a new "child process", it's basically under the umbrella of the "parent process". This new process is referred to as "subshell" or "subprocess," and has access to the same environment, current directory, and open files as the parent process

The concept behind the creation of these child processes is the "fork system call" (forking), which is one of the system calls that creates a duplicated instance of the parent process, and this instance becomes the child process. Since the child process is a duplicate of the parent, it shares the same environment and resources, but it can also execute its own independent commands and instructions.

How to create processes using C programming

You can see a practical example of creating processes by implementing the fork system call in a C program. As discussed, the fork() function will create a duplication of the current process, and this means that we will end up with two processes (parent and child). Moreover, the fork() function evaluates to:

  • PID number (positive value) of the child process that was spawned, to indicate a parent process.
  • Zero (0), to indicate a child process.
  • Negative number, to indicate an error.

Having mentioned that, let's check our program:

#include <stdio.h>
#include <unistd.h>

int main() {
   
    pid_t pid;
    printf("The initial PID %d\n", getpid());
    pid = fork();

    if (pid == 0) {
        printf("This is the child process, parent pid=%d\n", getppid());

    } else if (pid > 0) {
        printf("This is the parent process, child pid=%d\n", pid);

    } else {
        printf("Failed to create child process\n");
    }
  
    return 0;
}  

After running the program you'll get:

The initial PID 56609                                                           
This is the parent process, child pid=56610                                     
This is the child process, parent pid=56609

To break it down, first we identify the PID number of the current process using the getpid() method, then we call the fork() system call to create a new process. At the heart of the program, we implement an if-else conditional construct to determine the parent process from the child process, as well as, using the getppid() method to retrieve the parent process PID, making sure it's the same as the initial process.

Threads

If a process is a program in execution, then a thread is the unit of execution within that process. Moreover, a process can include more than one thread, and each can efficiently perform different tasks faster at the same time. Threads under a process, share the same memory assignment and resources as the containing/underlying process.

When a process contains only one thread, it's called a single-threaded process, in contrast to a multi-threaded process which has more than one thread (limited by the available resources). The main difference is, that the former runs sequentially, performing one task at a time which can make the process run slower and less responsive. On the other hand,  a multi-threaded process runs concurrently, performing a number of tasks at the same time which allows for parallel processing and faster performance.

As you might have guessed, both multi-process and multi-thread can achieve concurrency and allow for parallel processing. However, since the latter share the same memory assignment as the process and other parallel threads, it is much faster when compared to multi-process. The multi-thread is not without its disadvantages, for instance, one faulty thread can affect the multiple execution flows of the process.

Just as with processes, you can keep track of the running threads in your Linux system by using the ps command. You already know that a process can have multiple threads, suggesting that threads under a process will share the same PID number.

To be able to see the threads of the running processes, we'll use the same previous command coupled with the -L switch:

see threads of a running processes

And we'll also grep a specific output (bouhannana process) to demonstrate the multi-threaded concept:

The -L option will add the LWP column (abbreviation of Light Weight Process) that signifies the thread id. And the NLWP column (abbreviation of Number of LWPs) is the thread count in the system for the underlying process.

Even though the second output shows us that each listed process named bouhannana_z has the same PID number (2638); each entry has a unique number thread id (LWP) numbers (2638, 2643, 2645, and 2645).

In contrast to a multi-threaded process, you may also notice that in single-threaded processes, the PID numbers and LWP numbers are always similar (output 1).

We can't conclude this section without discussing the two approaches to implementing threads: The first way is called User Level Threads, in this approach,  threads are managed without the kernel involvement in the process but rather by the program. Actually, the kernel treats these threads as if they were single-threaded. Regarding performance, this type is easy to handle and much faster than the second approach.

In the Kernel Level Threads approach, threads are managed and supported by the operating system and the kernel. this enables the kernel to manage and schedule multiple threads from one process or across multiple processes. In contrast to the User Level Threads, this type is complicated to implement and much slower.

How Threads works in Linux C

The Linux kernel actually treats processes and threads the same way, being that a thread is a process that shares a number of resources with the underlying process: the address space, open files, and more. For the posix management of threads, Linux uses the C pthread library (the p stands for POSIX), which provides you with a set of functions for creating and managing threads.

For instance, The pthread_create() function will create and initiate a new thread inside a process by following this syntax:

int pthread_create (pthread_t *thread ,
pthread_attr_t *attr,
void *myfunction,
void *arg );

The pthread_create() service creates a lightweight process (thread) that executes the function named myfunction with the argument arg and the attributes attr. The attributes allow specifying the size of the stack, priority, scheduling policy, etc. There are several ways to modify the attributes.

Let's create a program that demonstrates the creation of threads in Linux/GNU C, as well as the retrieval of the thread's PID number of both the parent and child:

#include <pthread.h> 
#include <stdio.h>
// This function will get executed by the new thread
void *myThreadFunc(void *arg) {
    printf("PID of new thread = %d\n", getpid()); // Prints the PID of the child thread
    while (1);

    return NULL;
}

int main() {
    pthread_t thread; // Declares a variable to store the thread ID

    printf("PID of main func = %d\n", getpid()); // Prints the PID of the main thread
    pthread_create(&thread, NULL, &myThreadFunc, NULL); // Creates the child thread
    while (1);

    return 0;
} 

The program must be compiled with the -lpthread library/option:

gcc -o mythreadp mythreadp.c -lpthread

after running the program you'll get:

PID of main func = 3360 PID of new thread = 3360

Since both main() and myThreadFunc() will run indefinitely (while(1)), this allows us to closely observe the details of their execution. As expected, the underlying process (main) and the newly created thread have the same PID numbers. Additionally, the pthread_create() evaluates to 0 if the thread creation was successful, this is worth noting in case you want to add conditional constructs.

Differences between processes and threads

In the Linux world, both processes and threads are used to accomplish tasks. They are fundamental concepts not just in Linux but in any operating system and programming language. As discussed before, although both share similarities like achieving concurrency, there are significant differences between the two which can affect system design and program efficiency.

Let's check the major differences between processes and threads summarized in this table:

ProcessThread
Each process has its own separate memory and resources allocated to itEach thread shares the same memory and resources allocated with the underlying process
Communication between processes is slow being that it requires mechanisms such as pipes, redirections, and socketsCommunication between threads is fast being that they share the same memory assignment and resources
Processes are heavy-weight so they take more resources to create and manageThreads are light-weight so they require fewer resources to create and manage
Processes are isolated from each other, so a faulty process will not affect other processesThreads are not isolated from each other, one faulty thread can affect the underlying process.

By and large, processes are better suited for tasks that require isolation and security. In contrast to threads are better suited for tasks that require more efficient and faster communication.

Conclusion

In conclusion, understanding the differences between processes and threads is essential for writing efficient and effective programs. For instance, threads give an efficient way to implement parallelism and concurrency, whereas processes give a suitable way for executing separate tasks independently.
Ultimately, mastering the use of both can help you improve the performance and scalability of a program.

SHARE

Comments

Please add comments below to provide the author your ideas, appreciation and feedback.

Leave a Reply

Leave a Comment