LEVENT KAYA'S WEBSITE

CONTACT    ARCHIVE    RSS     DONATE


ptrace(2) is Broken: A Case Study in Debugging the Debugger Gone Wrong, BoltDbg Episode #1

· tags: c, linux, low-level, debuggers, and boltdbg

When Im working on my C, C++ code I’m always running it from the debugger.
John Carmack

*

The Setup

Like John Carmack, debuggers are a key element of my development environment. Whether I’m writing in C, Java, or Assembly, the first thing I do is run the debugger and examine the code line by line. This allows me to understand what I’ve written much more deeply.

So, I’m decided to build my own debugger, BoltDBG, a modern cross-plaftorm debugger. Like any debugger, it needs to control other processes—set breakpoints, inspect memory, single-step through code. My starting target is Linux and On Linux, this means using ptrace(2).

I edit in Emacs. I compile with CMake. Everything should be straightforward.

Not in production. Not when running standalone. Only when debugging the debugger itself.

This sent me down a rabbit hole that exposed fundamental flaws in how ptrace was designed. Let me show you what I mean.

The “One Tracer” Rule

Here’s the core of my code:

pid_t pid = fork();

if (pid == 0) {
    // Child: make myself traceable
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    execvp(target_program, args);
}

// Parent: wait for child to stop
waitpid(pid, &status, 0);

This is textbook process tracing. Fork, child declares “I can be traced,” parent waits for the trace signal after exec. Every debugger does this.

But when I debug my debugger (using GDB to debug BoltDBG which itself uses ptrace), the kernel says no:

A process can only have one tracer at a time.

Why This Design is Terrible

Problem 1: Silent Inheritance

When you fork a traced process, the child inherits the traced state but there’s no way to query this.

You can’t ask: “Am I currently being traced?”

Well, you can, but not through ptrace. You have to parse /proc/self/status:

bool isDebuggerAttached() {
    FILE* f = fopen("/proc/self/status", "r");
    char line[256];
    
    while (fgets(line, sizeof(line), f)) {
        if (strncmp(line, "TracerPid:", 10) == 0) {
            int tracerPid;
            sscanf(line, "TracerPid: %d", &tracerPid);
            fclose(f);
            return tracerPid != 0;
        }
    }
    fclose(f);
    return false;
}

This is parsing text files to determine kernel state. The information exists in the kernel—ptrace is enforcing it—but there’s no syscall to query it.

Compare this to any well-designed API:

But ptrace? Go parse /proc.

Problem 2: The API Overload

ptrace has one function with 31+ operations (on x86_64). Here’s the signature:

long ptrace(enum __ptrace_request request, pid_t pid, 
            void *addr, void *data);

The meaning of addr and data changes completely based on request:

// Read data from tracee memory
ptrace(PTRACE_PEEKDATA, pid, address, NULL);

// Write data to tracee memory  
ptrace(PTRACE_POKEDATA, pid, address, data);

// Get register state
ptrace(PTRACE_GETREGS, pid, NULL, &regs);

// Set options
ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK);

// Continue execution
ptrace(PTRACE_CONT, pid, NULL, signal);

Same function. Completely different semantics for each parameter. No type safety. No documentation in the types themselves.

This is the systems programming equivalent of:

// Bad API design from the 1970s
int ioctl(int fd, unsigned long request, ...);

Oh wait, that’s also a real syscall. And it’s also terrible.

Problem 3: Implicit State Machines

Here’s what should happen when tracing a fork:

// Set option to stop on fork
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACEFORK);

// Continue parent
ptrace(PTRACE_CONT, pid, 0, 0);

// Wait for fork event
waitpid(pid, &status, 0);

// Extract child PID from... somewhere?
long child_pid;
ptrace(PTRACE_GETEVENTMSG, pid, 0, &child_pid);

The state machine is:

  1. Set option
  2. Continue
  3. Wait returns with special status
  4. Call different ptrace command to get child PID

This is not documented in the types. It’s not enforced by the compiler. You have to memorize the state machine by reading man ptrace (all 1,500 lines of it).

Problem 4: Error Handling is Broken

ptrace returns -1 on error and sets errno. Sounds reasonable, except:

// Reading memory: valid return value can be -1!
long data = ptrace(PTRACE_PEEKDATA, pid, addr, NULL);
if (data == -1 && errno != 0) {
    // Error... maybe?
}

You have to clear errno before calling ptrace to distinguish “legitimate -1” from “error -1”.

From the man page:

Unfortunately, under Linux, different variations of this fault will return EIO or EFAULT more or less arbitrarily.

“More or less arbitrarily.” In the official documentation.

How It Should Have Been Designed

Option 1: Separate Syscalls

Instead of one overloaded function:

// Modern, typed API
int trace_attach(pid_t pid, int flags);
int trace_detach(pid_t pid);
int trace_continue(pid_t pid, int signal);
int trace_step(pid_t pid);

ssize_t trace_read_mem(pid_t pid, void *addr, void *buf, size_t len);
ssize_t trace_write_mem(pid_t pid, void *addr, const void *buf, size_t len);

int trace_get_regs(pid_t pid, struct user_regs_struct *regs);
int trace_set_regs(pid_t pid, const struct user_regs_struct *regs);

int trace_get_status(pid_t pid, struct trace_status *status);

Each function does one thing. The types tell you what’s legal. The compiler helps you.

Option 2: File Descriptor-Based

Like FreeBSD’s procfs or Linux’s modern /proc/pid/mem:

int trace_fd = trace_open(pid, O_RDWR);

// Read/write using normal file operations
pread(trace_fd, &regs, sizeof(regs), OFFSET_REGS);
pwrite(trace_fd, &regs, sizeof(regs), OFFSET_REGS);

// Control via ioctl (still not great, but better than ptrace)
ioctl(trace_fd, TRACE_CONT, 0);
ioctl(trace_fd, TRACE_STEP, 0);

// Query state
struct trace_info info;
ioctl(trace_fd, TRACE_GETINFO, &info);
printf("Is traced: %d\n", info.is_traced);

close(trace_fd);

Now tracing is a resource with a handle. You can:

Option 3: Modern Async API

What if ptrace worked like io_uring?

struct trace_ring *ring = trace_ring_init(128);

// Queue operations
trace_ring_prep_continue(ring, pid);
trace_ring_prep_read_mem(ring, pid, addr, buf, len);
trace_ring_prep_get_regs(ring, pid, &regs);

// Submit all at once
trace_ring_submit(ring);

// Wait for completion
struct trace_completion *comp;
while ((comp = trace_ring_wait(ring)) != NULL) {
    if (comp->result < 0) {
        fprintf(stderr, "Operation failed: %s\n", strerror(-comp->result));
    }
}

Now you can:

The Workaround I Had to Use

Back to my actual problem. I can’t change ptrace. So I detect being traced and adapt:

void ProcessControl::launchProcess(const std::list<std::string>& targetProcess) {
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        bool skipPtrace = isDebuggerAttached();
        
        if (!skipPtrace) {
            // Normal operation: trace the child
            if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) == -1) {
                perror("ptrace");
                _exit(1);
            }
        } else {
            // Debugging the debugger: skip PTRACE_TRACEME
            // The outer debugger (GDB) will handle tracing
        }
        
        execvp(targetProcess.front().c_str(), /* argv */);
        _exit(1);
    }
    
    // Parent: wait for child
    waitpid(pid, &status, 0);
}

This works, but look at what I’m doing:

All because ptrace has no PTRACE_QUERY_STATUS operation.

The Broader Pattern

ptrace isn’t unique. Linux has several syscalls with this “one function, many operations” pattern:

These all share problems:

Compare to Plan 9, where everything is a file:

/proc/123/ctl      # write "stop" or "start"
/proc/123/mem      # read/write memory
/proc/123/regs     # read/write registers
/proc/123/status   # read current state

Or to modern Windows debugging APIs:

BOOL ReadProcessMemory(HANDLE process, LPCVOID addr, 
                       LPVOID buffer, SIZE_T size, SIZE_T *bytesRead);

BOOL WriteProcessMemory(HANDLE process, LPVOID addr,
                        LPCVOID buffer, SIZE_T size, SIZE_T *bytesWritten);

BOOL GetThreadContext(HANDLE thread, CONTEXT *context);

Separate functions. Clear semantics. Type-safe.

What We Can Learn

  1. One function should do one thing. Not one function with 30 modes.

  2. Make state queryable. If the kernel enforces a rule, provide a way to check it without parsing text files.

  3. Use types to encode invariants. ptrace’s void* parameters can mean anything. That’s not flexible, it’s dangerous.

  4. Error handling should be unambiguous. “Returns -1 on error, except when -1 is valid, so check errno, except errno might be wrong” is not acceptable.

  5. Design for composition. File descriptors compose with select/poll/epoll. ptrace doesn’t compose with anything.

Debugging in Emacs

For what it’s worth, debugging this in Emacs with gdb-many-windows was instructive. The UI makes the multi-inferior nature visible:

M-x gdb
(gdb) set detach-on-fork off
(gdb) set follow-fork-mode parent
(gdb) info inferiors
  Num  Description       
* 1    process 12345     
  2    process 12346
(gdb) inferior 2

But the underlying problem remains: ptrace’s API makes this harder than it needs to be.

Conclusion

ptrace works. Billions of debugging sessions rely on it daily. But “works” isn’t the same as “well-designed.”

Every time I write ptrace code, I think: this would be so much cleaner with a modern API. Separate functions. Type safety. Queryable state. Composable primitives.

We’re stuck with ptrace for backward compatibility. But for new systems? We can do better.

The real lesson: API design matters. A bad interface can be technically functional but conceptually broken. And programmers will be working around its limitations for decades.


Building BoltDBG has been an education in systems programming APIs—both the good and the ugly. You can follow the project on GitHub or read more about the design decisions on my blog.

Edited in Emacs, naturally.