Threads

Posted on Nov 6, 2025

Context and brief history

Before threads as we know them today, operating systems already supported concurrency using multiple processes and hardware interrupts. We’re talking roughly about the period from the late 1950s to the 1980s, before the standardization of POSIX Threads (pthreads) in the 1990s.

Early systems like IBM OS/360 (1967), MULTICS (1969), and later UNIX Version 6 (1975) implemented multiprogramming: several independent processes could appear to run at once by time-slicing the CPU. Each process had its own address space and execution context. The kernel saved and restored the CPU state when switching between them. This gave the illusion of parallelism on a single-core CPU.

At the hardware level, CPUs of that era executed one instruction stream per processor core, in program order, but still overlapped work using mechanisms like interrupts and direct memory access (DMA). Interrupts allowed external devices or timers to temporarily suspend execution to handle events, and DMA let peripherals read or write memory without CPU intervention — limited forms of hardware concurrency.

Threads were introduced later (1980s onward, formalized by POSIX in the early 1990s) to provide fine-grained, low-overhead concurrency within a single process. Unlike processes, threads share memory, file descriptors, and other resources directly, making communication and synchronization much cheaper.

Why threads? What threads? Some concepts first

A thread is a single sequence of execution within a process — the smallest schedulable unit of work in an operating system. Each thread has its own register state, program counter, and stack, but shares the process’s address space, open files, and other resources.

The idea behind threads came from a practical need: reduce overhead and improve responsiveness. Before threads, creating or switching between processes required a full context switch — saving and restoring all CPU state and changing the active memory mapping. This was relatively expensive, especially on systems that handled many concurrent tasks (for example, window managers, network servers, or database engines).

Threads make these cases cheaper because they:

Avoid duplicating address spaces.
Avoid heavy inter-process communication (IPC) costs — all threads see the same memory.
Allow better latency hiding, e.g., one thread waiting on I/O while another continues computation.

In short, threads were introduced to provide lightweight concurrency inside a single process, where tasks need to cooperate closely and share data freely. They bridge the gap between full isolation (processes) and pure event-driven or asynchronous code by letting multiple control flows coexist in the same memory context.

At the kernel level, a thread is not magic. On Linux and most modern Unix-like systems, a “thread” is simply a task created with specific sharing flags (clone() with CLONE_VM, CLONE_FILES, and similar). Each thread still has its own kernel stack and scheduling entry, but they all map to the same address space. The kernel scheduler doesn’t treat them differently from processes — only the sharing semantics differ.

From a system design perspective, threads trade safety for performance:

They are cheaper to create and switch between.
They allow direct memory sharing.
But a bug in one thread (e.g., invalid pointer, race condition) can corrupt the entire process.

TL;DR: A thread is a lightweight execution context within a process. It exists to make fine-grained parallelism and responsive design practical without the cost of full process isolation.

Processes vs Threads

In UNIX-like systems, both processes and threads represent independent flows of execution. The difference is what they share and what they isolate.

A process is an instance of a running program with its own:

Address space (private virtual memory)
Global and static variables
Heap and stack
File descriptor table (copied from the parent at fork(), but independent afterward)
Signal handlers, PID, and other kernel metadata

A thread, on the other hand, is a separate execution context within a process. It has:

Its own stack (for function calls, locals, return addresses)
Its own register state and thread-local storage (TLS)
But it shares everything else with its sibling threads:
- Global and static variables
- Heap memory
- Open file descriptors
- Signal dispositions (in most POSIX models)
- The process ID (getpid() returns the same value for all threads)

Example: Global Variables and Shared Memory

This is one of the key behavioral differences:

In processes, global variables are copied on fork().

After a fork(), the parent and child each have their own independent copy of every variable. Modifying a global variable in the child does not affect the parent, because they live in separate address spaces.

int counter = 0;
if (fork() == 0)
    counter++; // child changes its own copy
else
    counter--; // parent changes its own copy

In threads, all globals are shared.

Changing a global variable from one thread is visible to all threads in that process immediately.

int counter = 0;

void* worker(void* arg) {
    counter++;
    return NULL;
}

int main(void) {
    pthread_t t;
    pthread_create(&t, NULL, worker, NULL);
    pthread_join(t, NULL);
    printf("%d\n", counter); // will print 1
}

Because both the main thread and the worker thread share the same address space, counter is common to both.

This is why threads require synchronization primitives like:

mutexes
condition variables
atomic operations

… to prevent race conditions on shared data.

Deadlocks

A deadlock occurs when two or more threads are waiting indefinitely for each other to release resources. The key property: cyclic waiting. Once a cycle exists, no thread can proceed.

For example:

pthread_mutex_t a = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t b = PTHREAD_MUTEX_INITIALIZER;

void* t1(void*) {
    pthread_mutex_lock(&a);
    pthread_mutex_lock(&b);
    ...
}

void* t2(void*) {
    pthread_mutex_lock(&b);
    pthread_mutex_lock(&a);
    ...
}

Both threads acquire one lock and wait forever for the other.

Avoiding deadlocks usually means enforcing lock ordering: always acquire locks in a globally consistent order (e.g., alphabetically, by address, or by ID).

Other mitigations include:

Use trylock() variants and timeouts.
Keep critical sections minimal.
Prefer lock-free algorithms when possible.

Spinlocks

A spinlock is the simplest lock type. Instead of putting a thread to sleep while waiting, it spins — repeatedly checking if the lock is available. This wastes CPU cycles but avoids context-switch overhead, so it’s useful when:

Critical sections are short (microseconds).

The lock is used inside the kernel, where sleeping isn’t allowed (e.g., interrupt handlers).

In pthreads: pthread_spin_lock() / pthread_spin_unlock().

Typical kernel spinlock implementation pattern:

while (__atomic_test_and_set(&lock, __ATOMIC_ACQUIRE))
    ; // busy-wait

TL;DR: Use spinlocks only for very short, non-blocking critical sections; Otherwise, they degrade throughput.

Reader/Writer Locks

A reader/writer lock (also called a shared/exclusive lock) allows:

Multiple readers to hold the lock simultaneously.
A single writer to hold it exclusively.

Useful when reads are frequent and writes are rare, e.g., configuration tables, routing caches, metadata lookups.

POSIX provides pthread_rwlock_t:

pthread_rwlock_t rw = PTHREAD_RWLOCK_INITIALIZER;

pthread_rwlock_rdlock(&rw);  // shared read access
pthread_rwlock_wrlock(&rw);  // exclusive write access
pthread_rwlock_unlock(&rw);

Internally, the implementation tracks reader counts and a pending writer flag. Writers typically have priority to avoid starvation.

TL;DR: Use reader/writer locks to improve read-heavy workloads; Do not use them if writes are frequent — they’ll serialize anyway.

Locks are coordination primitives to protect shared state. Deadlocks arise from incorrect locking order; spinlocks trade CPU for latency; reader/writer locks trade simplicity for concurrency in read-heavy code. Every real system (time correct) uses all three — depending on the performance and correctness balance required.

Thread Control

Threads are not abstract entities. They’re concrete execution contexts managed by the kernel scheduler and created by system calls (clone() on Linux). From user space, we control threads primarily through the POSIX Threads (pthreads) API. Let’s cover their full lifecycle and map each function to what the kernel actually does.

Creating threads API:

#include <pthread.h>

int pthread_create(pthread_t *restrict thread,
                  const pthread_attr_t *restrict attr,
                  void *(*start_routine)(void *),
                  void *restrict arg);

For thread termination, there are three standard ways a thread ends:

Normal return from the start routine.
Explicit call to pthread_exit(void* retval).
Another thread calls pthread_cancel().

When a thread terminates:

Its return value or exit code is stored until another thread joins it.
Kernel deallocates the kernel stack and other thread-specific structures.
If it’s the last thread in the process, the entire process exits.

For joining threads: pthread_join() waits for another thread to finish:

It blocks until the target thread terminates, then copies its exit status (if any). The joining thread is responsible for cleanup; detached threads (see below) are auto-cleaned by the runtime.

A detached thread cleans up automatically when it exits — no need for pthread_join().

pthread_detach(tid);

Internally, this sets the thread’s joinable flag to false. When it terminates, the runtime frees all resources immediately. Once detached, you can’t join it later — the handle becomes meaningless after exit.

For cancellation: pthread_cancel() requests another thread to exit

It doesn’t forcibly kill it; it sends a cancellation request, and the target checks for it at “cancellation points” — e.g. read(), pthread_testcancel(), sleep().

Rule of thumb: Never rely on cancellation for cleanup. Use explicit signals or condition variables.

Thread IDs: A POSIX thread ID (pthread_t), valid only within its process or a kernel TID (Thread ID), visible via gettid() or /proc/self/task/.

getpid() returns the same value for all threads in a process (because its the same process).

gettid() returns a unique kernel ID per thread — used by the scheduler and visible in ps -eLf.

Scheduling and Priorities: Threads are scheduled the same way as processes. The kernel scheduler treats every thread as a task with its own priority and timeslice.

pthread_setschedparam(pthread_self(), SCHED_FIFO, &param);

TL;DR: A thread’s lifetime is managed entirely in user space via pthread_* APIs, but its existence and scheduling are handled by the kernel as a normal task. Threads are cheap to create, but not free — each one still has a kernel stack, scheduler entry, and system call overhead.

Linux kernel context

In Linux, there is no such thing as a “thread object” separate from a process.

Everything is represented by the same structure: struct task_struct

The task_struct is the central kernel representation of a schedulable entity — process or thread. Below is a reduced version showing only the fields relevant to user-space threads.

struct task_struct {
    /* scheduling / execution context */
    pid_t           pid;        /* unique kernel thread ID */
    pid_t           tgid;       /* thread group ID (process ID) */
    struct mm_struct *mm;       /* memory descriptor (NULL for kernel threads) */
    struct files_struct *files; /* open file table */
    struct fs_struct    *fs;    /* filesystem info: cwd, root */
    /* signal handling */
    struct signal_struct *signal; /* shared signal handlers */
    /* thread group leader and siblings */
    struct task_struct *group_leader;
    /* kernel stack pointer, architecture context etc */
    struct thread_info thread_info;
    /* scheduling linkage, priority etc */
    /* ... many other members omitted ... */
};

This structure represents a schedulable entity — what the kernel calls a task.

User-space pthread_create() eventually calls the system call: clone(flags, stack, ptid, ctid, tls)

“Everything is a thread”

Linux doesn’t maintain separate code for “process creation” and “thread creation.”

All user tasks are created by clone().

The only thing that changes is the degree of resource sharing.

That’s why the kernel community often says:

“Everything in Linux is a thread; fork() is just a clone() with fewer flags.”

This unification simplifies scheduling, accounting, and namespace management — all tasks are equal citizens in the scheduler, regardless of whether they’re “threads” or “processes.”

Threads in GUI applications

Graphical applications often use threads to keep the user interface responsive. The main thread runs the event loop (handling input, drawing, etc.), while worker threads perform blocking or long-running tasks such as file I/O, network requests, or background computation.

The typical model is this:

Main/UI thread – owns all GUI state and must never block.
Worker threads – handle heavy or asynchronous work, then notify the main thread (via signals, message queues, or atomic flags).

Most GUI toolkits — GTK, Qt, Cocoa, Win32 — enforce that all UI operations must occur in the main thread.

This prevents race conditions in windowing subsystems that are not thread-safe.

So, GUI applications use threads for isolation of work, not for parallel rendering. Responsiveness first, throughput second.

Virtual threads

“Virtual threads” refer to threads implemented mostly in user space, not directly as kernel tasks.

For example:

Go goroutines
Java Project Loom
Rusts async tasks (using executors)

They multiplex many user-space execution contexts over a smaller pool of kernel threads.

The kernel only sees the physical threads; scheduling between virtual ones happens in user land.

But why?

Lower creation and context-switch cost.
Scales to hundreds of thousands of concurrent tasks.
Fits workloads dominated by I/O wait (not CPU).

So… why not?

No true parallelism without multiple kernel threads.
Blocking system calls stall the entire physical thread.
Complex runtime scheduler and stack management.

Conclusion

It’s all trade-offs — user-space scheduling vs kernel scheduling, scalability vs transparency. There’s no universally perfect model. Choose the best tool for the job based on requirements and objetives.

Resources

[1] W. R. Stevens, S. A. Rago, Advanced Programming in the UNIX Environment, 3rd ed., Addison-Wesley Professional, 2013.

[2] M. Kerrisk, The Linux Programming Interface: A Linux and UNIX System Programming Handbook, No Starch Press, 2010.

[3] R. Love, Linux Kernel Development, 3rd ed., Addison-Wesley Professional, 2010.

[4] Lawrence Livermore National Laboratory, “POSIX Threads Tutorial,” [Online]. Available: https://hpc-tutorials.llnl.gov/posix/