On 64-bit Windows, the CPU's GS segment register is wired to point at the current thread's Thread Environment Block (TEB). This is not a coincidence — Microsoft hardwired this mapping so the OS can efficiently access per-thread data without an expensive system call. Every thread on every Windows process has its own TEB, and GS always points to the running thread's.
Two offsets into the GS segment are particularly important for shellcode development. GS:[0x30] holds the TEB's own base address (a self-referential pointer), and GS:[0x60] holds a pointer directly to the PEB. Both routes lead to the same destination — the choice between them affects resilience against certain EDR hooks as covered in Module 9.
GS:[0x30] is the more resilient path — reading [GS:[0x30] + 0x60] requires two dereferences but avoids certain EDR products that monitor direct GS:[0x60] access. Both paths arrive at the same PEB address.
-
In x86 (32-bit) Windows, the FS segment register was used for this purpose. The TEB was at FS:[0x00] and the PEB was at FS:[0x30]. When x64 arrived, Microsoft remapped this to GS, keeping FS available for other uses (Linux on x64 uses FS for thread-local storage).
Segment registers in x64 work differently than in x86 — the visible selector value means less; what matters is the hidden base address loaded into the segment descriptor. The CPU maintains this base transparently. When you write
mov rax, [gs:0x60], the CPU adds the hidden GS base to 0x60 to produce the physical address to read.💡 In WinDbg:!tebto see the TEB,!pebto see the PEB. Ordt nt!_TEB @$tebto inspect the full TEB structure. -
Two assembly patterns to reach the PEB:
Direct (one dereference):
mov rax, [gs:0x60] ; RAX = PEB directly
Indirect via TEB (two dereferences):
mov rax, [gs:0x30] ; RAX = TEB base
mov rax, [rax+0x60] ; RAX = PEBThe indirect path is used in Module 9's TEB-walk shellcode specifically because some EDR products hook or monitor the
GS:[0x60]access pattern as a shellcode detection signal. Going through the TEB self-pointer first is a less monitored access pattern.
The TEB (_TEB in Windows internals notation) is a per-thread data structure that Windows maintains in user-mode memory. Every thread has its own TEB. It begins with an NT_TIB block (the oldest part, going back to Win9x) and extends with dozens of fields — most of which we don't care about for shellcode purposes.
The one field that matters most is PebBaseAddress at offset +0x060. This is a pointer to the process-wide PEB. Every thread in a process has the same PebBaseAddress value pointing to the shared PEB.
The PEB (_PEB) is a process-wide structure — unlike the TEB, there is only one per process, shared by all threads. It holds everything about the process: the image base address, heap handles, environment variables, command-line arguments, and crucially for us: a pointer to the loader data that tracks every loaded DLL.
We need Ldr at offset +0x018. This is a pointer to a PEB_LDR_DATA structure that the Windows loader populates when the process starts and updates every time a DLL loads or unloads. It contains three doubly-linked lists of loaded modules. We will use one of them to find kernel32.
PEB.Ldr is a pointer to PEB_LDR_DATA, not the structure itself. Always an extra dereference. Think of it as: PEB tells you where the loader info lives, not the info itself.
The _PEB_LDR_DATA structure contains three doubly-linked lists — each containing every loaded module, but ordered differently. All three lists hold the same modules; they differ only in ordering. We use InMemoryOrderModuleList at offset +0x010 — this is the one most commonly referenced in shellcode and the one used in both the classic walk and the KERN-search approach from Module 9.
Each list is a LIST_ENTRY struct containing two pointers: Flink (forward link, points to next) and Blink (backward link, points to previous). They form a circular doubly-linked list — the last entry's Flink points back to the LIST_ENTRY head inside PEB_LDR_DATA itself.
When we read [Ldr + 0x10] we get the Flink of InMemoryOrderModuleList — a pointer to the first _LDR_DATA_TABLE_ENTRY in the list. The _LDR_DATA_TABLE_ENTRY is the per-module structure that holds the DLL base address, the full path, and the base name — everything we need to identify a module.
Flink from InMemoryOrderModuleList, the pointer lands at offset +0x000 of the entry's InMemoryOrderLinks field — NOT the start of the _LDR_DATA_TABLE_ENTRY. They happen to be the same address here, but the named field is InMemoryOrderLinks. DllBase is at [Flink + 0x30] and BaseDllName.Buffer is at [Flink + 0x60].
-
All three lists contain the same modules. The difference is ordering and the offset used to access the entry.
InMemoryOrderModuleListatLdr+0x10is the conventional choice in shellcode because it's the list most documented in public shellcode examples — its offsets are well-known and reliable across Windows versions.InLoadOrderModuleList(atLdr+0x08) is also commonly used. The classic "third entry is kernel32" trick relied on load order being predictable: ntdll.dll loads first, then kernel32.dll, then kernelbase.dll. This is why older shellcode counted to the third Flink. The KERN-unicode-search approach from Module 9 doesn't rely on order at all — it works on any of the three lists.💡 To see all three lists live: attach WinDbg to any process and rundt nt!_PEB_LDR_DATA poi(poi(@$peb+0x18)) -
The
BaseDllNamefield is a_UNICODE_STRINGstructure, defined as:typedef struct _UNICODE_STRING {
USHORT Length; // +0x00 — byte count, NOT char count
USHORT MaximumLength; // +0x02
PWSTR Buffer; // +0x08 — pointer to UTF-16LE chars
} UNICODE_STRING;Since
BaseDllNameis at[entry + 0x58], theBufferpointer is at[entry + 0x58 + 0x08]=[entry + 0x60]. This pointer points to the actual character bytes of the DLL name in UTF-16LE — where "K" is stored as0x4B 0x00.- BaseDllName.Length at
[entry+0x58]— byte count of the string - BaseDllName.Buffer at
[entry+0x60]— pointer to first char bytes - Dereference Buffer to get the first 8 bytes: "KERN" in UTF-16LE =
0x004E00520045004B
- BaseDllName.Length at
The complete PEB walk chains six pointer dereferences together. Each step reads a value from the previous step's address. Missing any one step — or getting an offset wrong — produces a garbage pointer. The interactive stepper below walks through every dereference, showing what register changes and what it now points to.
GS:[0x30] gives the TEB's own base address (a self-referential pointer — the TEB points to itself). Reading GS:[0x60] directly gives the PEB. We use the two-step path for EDR resilience.The complete PEB walk in NASM assembly, combining every step from sections 1–5. This is the code from the Module 9 calc.asm — isolated here so you can study just the walk without the export table search or WinExec call mixed in. Every single instruction is annotated with the register state it produces.
R8 holds the base address of KERNEL32.DLL in the current process. The next step — parsing the PE Export Address Table to resolve a function like WinExec — is covered in the shellcode course (Module 2). The walk output is the input to the EAT parser.
The register state after each step, summarized:
mov rax,[gs:0x30]mov rax,[rax+0x60]mov rax,[rax+0x18]mov rsi,[rax+0x10]mov rsi,[rsi] (loop)mov rbx,[rsi+0x30]-
R8 is one of the x64 extended registers (R8–R15) that were added in x64 and have no legacy 32-bit mapping in the traditional sense. More importantly, R8 is a non-volatile register by the Windows x64 calling convention — it's preserved across function calls. By storing the kernel32 base there early, the export table parsing loop can freely use RAX, RBX, RCX, RDX, and other registers without worrying about clobbering the base address.
In the full calc.asm code from Module 9, you'll see R8 being used as the kernel32 base reference throughout the EAT parsing — every RVA-to-VMA conversion does
add rxx, r8.💡 In NASM x64 shellcode without a calling convention to worry about, the choice of register is more about human readability and avoiding instruction-level conflicts than strict ABI compliance. R8–R15 are popular for "save aside" values because they're less likely to be clobbered by incidental code. -
The
InMemoryOrderModuleListis a circular doubly-linked list. The last real entry'sFlinkpoints back to theInMemoryOrderModuleListfield inside_PEB_LDR_DATAitself — the list head. If you loop without a termination check, you'll loop forever (or until you hit a module with bytes that match your search by accident).In practice, the KERN-unicode-search works because KERNEL32.DLL is always loaded in every Windows process — it's mandatory. The loop will find it. But robust code should add a termination check: if
RSIreturns to the list head address (saved before the loop starts), break with an error.The
int3atFunctionNameNotFoundin the Module 9 shellcode is exactly this — a breakpoint that fires if the search unexpectedly exhausts the list, which would only happen on extremely unusual system configurations. -
The classic PEB walk used a different approach: instead of searching for a Unicode name, it counted list position — the third entry in
InLoadOrderModuleListwas traditionallykernel32.dll. This relied on a predictable load order that held true for years but is no longer guaranteed.The offset differences between the two approaches:
- Classic: Uses
InLoadOrderModuleListatLdr+0x08, accessesDllBaseat[entry+0x18](because the entry pointer lands atInLoadOrderLinks, notInMemoryOrderLinks) - Module 9 (KERN search): Uses
InMemoryOrderModuleListatLdr+0x10, accessesDllBaseat[entry+0x30]andBaseDllName.Bufferat[entry+0x60]
The offset difference (+0x18 vs +0x30 for DllBase) exists because the entry pointer from each list lands at a different field within the
_LDR_DATA_TABLE_ENTRYstructure — and those fields are at different offsets relative to the structure base. - Classic: Uses
-
To verify offsets and trace the walk live on any Windows process:
0:000> !peb ; show PEB address
0:000> dt nt!_PEB @$peb ; dump PEB fields
0:000> dt nt!_PEB_LDR_DATA poi(@$peb+0x18) ; dump Ldr
0:000> !list -x "dt nt!_LDR_DATA_TABLE_ENTRY @$extret" poi(poi(@$peb+0x18)+0x10) ; walk all entries
0:000> lm m kernel32 ; verify kernel32 base address💡 In x64dbg: run the target, then go to the Memory Map tab. You'll see kernel32.dll listed with its base address — this should match the value your shellcode produces.