Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Binary Layout

All addresses in this page apply to ptxas v13.0.88 (CUDA 13.0). Other versions will differ.

PTXAS v13.0.88 is a 37,741,528-byte stripped x86-64 ELF executable. Its .text section spans 26.2 MB (0x403520--0x1CE2DE2) containing 40,185 functions. This page maps every byte of the binary to the subsystem that owns it, derived from all 40 sweep reports covering the complete address range.

ELF Section Map

SectionAddressSizeNotes
.plt0x402C002,336 B (146 stubs)Procedure linkage table for libc/libpthread imports
.text0x40352026,212,546 B (26.2 MB)All executable code -- 40,185 functions
.rodata0x1CE2E007,508,368 B (7.5 MB)Read-only data: encoding tables, strings, DFA tables
.eh_frame_hdr0x240BF90358,460 B (350 KB)Exception handling frame index
.eh_frame0x2664A603,751,640 B (3.7 MB)Unwinding data for 40K functions
.gcc_except_table0x29F8938940 BC++ exception filter tables
.ctors0x29F8CE8104 B (12 entries)Static constructor table
.data.rel.ro0x29F8D604,256 BVtable pointers, resolved at load time
.got.plt0x29FA0001,184 B (148 entries)Global offset table for PLT
.data0x29FA4A014,032 B (13.7 KB)Initialized globals: function pointers, defaults
.bss0x29FDB8085,864 B (83.9 KB)Zero-init globals: knob tables, TLS keys, mutexes

Total file composition:

ComponentSizePercentage
.text26.2 MB69.4%
.rodata7.5 MB19.9%
.eh_frame + .eh_frame_hdr4.0 MB10.7%
.data + .bss + other0.1 MB0.3%

Program Headers

SegmentVirtAddrMemSizFlagsContents
LOAD 00x40000032.4 MBR E.text + .rodata + headers + .eh_frame_hdr
LOAD 10x2664A603.7 MBRW.eh_frame + .data + .bss + .got
GNU_RELRO0x2664A603.6 MBRRead-only after relocation (.eh_frame through .data.rel.ro)
GNU_EH_FRAME0x240BF90350 KBRException handling index
GNU_STACK0x00RWNon-executable stack

Entry point: 0x42333C (ELF e_entry), which is inside .text (the CRT startup stub _start). The actual main is at 0x409460.

Three Subsystems

The .text section decomposes into three subsystems with distinct coding styles, data structures, and origins:

  .text linear address map (26.2 MB)
  0x403520                 0x67F000        0xC52000                          0x1CE2DE2
  |--- PTX Frontend 2.9 MB ---|-- Ori Optimizer 5.8 MB --|---- SASS Backend 17.6 MB ----|
  |          11%               |          22%             |              67%              |
  |  parsers, validators,      | passes, regalloc,        | encoding handlers, ISel,      |
  |  intrinsics, formatters    | scheduling, CFG analysis  | peephole, codecs, ABI, ELF    |
SubsystemAddress RangeSizeFunctionsShareAvg Fn SizeLargest Function
PTX Frontend0x403520--0x67F0002.9 MB~2,59211%~1,170 Bsub_46E000 (93 KB, opcode table builder)
Ori Optimizer0x67F000--0xC520005.8 MB~11,00122%~550 Bsub_926A30 (155 KB decomp, interference graph)
SASS Backend0xC52000--0x1CE2DE217.6 MB~26,59267%~690 Bsub_169B190 (280 KB, master ISel dispatch)

The backend dominates the binary because SASS instruction encoding is template-generated code: each of the ~4,000 encoding handler functions is a standalone vtable entry, never called directly. The optimizer has the highest function density (many small pass helpers), while the frontend has the largest average function size (complex validators and parsers).

Complete .text Address Map

The table below maps every address range in the .text section to its subsystem, function count, and key entry points. Data is aggregated from the 30 sweep partitions (p1.01 through p1.30).

PTX Frontend (0x403520--0x67F000, 2.9 MB)

Note on the 0x400000--0x403520 gap. The LOAD segment begins at 0x400000, but the first 13.6 KB before .text contains the ELF header (64 B at 0x400000), program headers (7 entries, 392 B), .interp (28 B, path to ld-linux-x86-64.so.2), .hash / .gnu.hash (symbol hash tables), .dynsym / .dynstr (dynamic symbol table, 146 entries), .gnu.version / .gnu.version_r (symbol versioning), .rela.plt (PLT relocations, 146 entries), and the .plt stub table (2,336 B, 146 stubs at 0x402C00--0x403520). These are standard ELF infrastructure, not ptxas application code. The first ptxas function begins at 0x403520.

Address RangeSizeFunctionsSubsystemKey Functions
0x403520--0x430000178 KB~300Runtime infrastructure: pool allocator, hash maps, TLS, diagnostics, error reporting, string utilitiessub_424070 (pool alloc, 3809 callers), sub_4280C0 (TLS context, 3928 callers), sub_426150 (hash insert, 2800 callers), sub_42FBA0 (diagnostic emitter, 2350 callers), sub_427630 (MurmurHash3)
0x430000--0x460000200 KB~120CLI parsing and compilation driver: option registration, argument parser, target configuration, register/resource constraints, Chrome trace JSON parsersub_446240 (real main, 11 KB), sub_432A00 (option registration, 6 KB), sub_434320 (option parser, 10 KB), sub_43B660 (register constraint calc), sub_439880 (trace JSON parser)
0x460000--0x4D5000470 KB~350PTX instruction validators: per-opcode semantic checkers for MMA, WMMA, load/store, cvt, atomics, barriers, tensormap, async copysub_4B2F20 (general validator, 52 KB), sub_4CE6B0 (Bison parser, 48 KB), sub_4C5FB0 (operand validator, 28 KB), sub_4C2FD0 (WMMA/MMA validator, 12 KB), sub_4A73C0 (tensormap validator, 11 KB)
0x4D5000--0x5AA000872 KB581PTX instruction text generation: 580 per-opcode formatters that convert internal IR to PTX assembly text, plus a built-in function declaration emittersub_5D4190 (formatter dispatch, 13 KB), sub_5FF700 (builtin decl emitter, 34 KB), ~580 formatter functions (avg 1.5 KB each)
0x5AA000--0x67F000874 KB628Intrinsic infrastructure: 608 CUDA intrinsic handlers, MMA/WMMA/tcgen05 tensor core codegen, SM profile tables (sm_75 through sm_121), special register init, ELF/DWARF finalization, memory space managementsub_5D1660 (608 intrinsics, 46 KB), sub_607DB0 (SM profile hash maps, 14 KB), sub_6765E0 (arch capability constructor, 54 KB), sub_612DE0 (version string)

Ori Optimizer (0x67F000--0xC52000, 5.8 MB)

Address RangeSizeFunctionsSubsystemKey Functions
0x67F000--0x754000869 KB~500Mercury SASS backend core: scheduling engine (ReduceReg/DynBatch, 9 reg pressure counters), WAR hazard management, Opex (operand expansion) pipeline, OCG intrinsic lowering, instruction encoding core, Flex DFA scanner, ELF section helperssub_688DD0 (scheduler engine, 20 KB), sub_6D9690 (encoding switch, 94 KB), sub_6FC240 (WAR/scoreboard), sub_720F00 (Flex scanner, 64 KB, 552 rules)
0x754000--0x829000872 KB1,545Knobs infrastructure (1,294 entries) and peephole optimizer class: knob lookup/read/file parsing, PeepholeOptimizer with 7 virtual methods (Init, RunOnFunction, RunOnBB, RunPatterns, SpecialPatterns, ComplexPatterns, SchedulingAwarePatterns), pipeline orchestrator, Mercury operand registration helperssub_79B240 (GetKnobIndex), sub_79D070 (ReadKnobsFile), sub_7A5D10 (PeepholeOptimizer), sub_7BD3C0/sub_7BD650/sub_7BE090 (operand registrars), sub_7BD260 (encoding finalize)
0x829000--0x8FE000872 KB1,069Debug line tables, scheduler core, and HW profiles: ScheduleInstructions pipeline (context setup, priority computation, reverse scheduling, register budget with occupancy optimization), ROT13 SASS mnemonic table, architecture-specific latency/throughput profiles, constant bank naming, peephole/legalization passes, cutlass-aware scheduling heuristicssub_8BF000--0x8D1600 (ScheduleInstructions), sub_896D50 (ROT13 SASS mnemonics), sub_8F0D00 (HW latency profiles), sub_8F4820 (cutlass heuristics)
0x8FE000--0x9D3000872 KB1,090Register allocator: fatpoint algorithm core, interference graph builder (155 KB decompiled -- largest non-dispatch function), spill/refill mechanism, live range analysis, retry with reduced register count, memory-to-register promotion, ConvertMemoryToRegisterOrUniform passsub_926A30 (interference graph, 155 KB decomp), sub_957160 (fatpoint core), sub_95DC10 (regalloc driver), sub_9714E0 (failure handler + retry), sub_910840 (ConvertMemoryToRegister)
0x9D3000--0xAA8000860 KB1,218Post-RA pipeline phases: NamedPhases registry (OriPerformLiveDead, OriCopyProp, shuffle, swap1--swap6), DAG/dependency analysis, IR statistics printer (instruction count, reg count, estimated latency, spill bytes, occupancy, throughput), hot/cold split, mbarrier intrinsics, regalloc verification, uninitialized register detectionsub_9F4040 (NamedPhases registry), sub_A3A7E0 (IR stats printer), sub_A0B5E0 (uninitialized reg detector), sub_A9EDB0 (mbarrier/scheduling, 85 KB decomp)
0xAA8000--0xB7D000862 KB4,493GMMA/WGMMA pipeline optimizer, ISel, and instruction emission: GMMA register allocation, warpgroup sync injection, instruction emission helpers (SASS encoder dispatch), post-scheduling IR statistics, operand legalization, 1,269 tiny vtable dispatchers (~160 bytes each), live range analysis, scheduler-integrated mega-passsub_AED3C0 (mega scheduling/ISel pass, 137 KB decomp), sub_AF7DF0/sub_AF7200 (register decode helpers), ~1,269 vtable dispatchers
0xB7D000--0xC52000870 KB1,086CFG analysis, bitvectors, and IR manipulation: ~390 instruction operand pattern matchers, bitvector dataflow framework (alloc, OR, AND, XOR, clear, iterate), CFG analysis (edge printing, reverse post-order, DOT graph dump), scoreboard and instruction classification, sync analysissub_BDC000 (bitvector infra), sub_BDE8B0 (CFG/RPO/DOT), sub_BE2E40 (scoreboard classification), ~390 operand pattern matchers

SASS Backend (0xC52000--0x1CE2DE2, 17.6 MB)

Address RangeSizeFunctionsSubsystemKey Functions
0xC52000--0xD27000853 KB1,053PhaseManager (159 phases): phase factory (159-case switch), phase vtable table at off_22BD5C8, default phase ordering table at 0x22BEEA0, 530 encoding table initialization bodies, instruction handler vtable bodiessub_C60D30 (phase factory), sub_C62720 (PhaseManager constructor), sub_C60D20 (default table pointer), ~530 phase table body functions
0xD27000--0xDFC000853 KB592SASS encoder table (SM100 Blackwell, set 1): 592 uniform template-generated encoding handlers, each packing operands into a 1,280-bit instruction word at a1+544. Covers 60 opcode classes across 16 format groups. All vtable-dispatched (zero direct callers).592 per-variant handlers (avg 1,473 B), sub_7B9B80 (bitfield insert helper)
0xDFC000--0xED1000877 KB591SASS encoder/decoder (SM100 Blackwell, set 2): 494 encoders translating IR to packed SASS bitfields, plus 97 decoders for the reverse direction (disassembly/validation). All vtable-dispatched.494 encoders (0xDFC--0xEB2), 97 decoders (0xEB3--0xED0), sub_E0F370 (largest, 11 KB)
0xED1000--0xFA6000860 KB683SM100 SASS encoders (set 3): 683 per-variant encoding handlers for 59 SASS opcodes. Each sets opcode ID, loads 128-bit format descriptor via SSE, initializes 10-slot register class map, registers operands, finalizes, extracts bitfields.683 template-generated handlers, 128-bit xmmword format descriptors
0xFA6000--0x107B000851 KB678SM100 SASS encoders (set 4): 587 primary encoders (opcodes 16--372, predicate/comparison/memory/tensor/control flow), plus 91 alternate-form encoders for dual-width or SM-variant instruction encodings. Combined with sets 1--3: 2,544 SM100 encoding handlers total. Six mega dispatch tables.587 primary + 91 alternate-form encoders, 6 dispatch tables
0x107B000--0x1150000853 KB3,396SM100 codec completion: 641 final encoding handlers, 78 object lifecycle and scheduling support functions (FNV-1a hash, instruction construction), 2,095 bitfield accessor functions (machine-generated read/write primitives for the packed encoding format). Seven core extractors handle 1-bit, 2-bit, and multi-bit fields across 192-bit words.sub_10AFF80 (instruction constructor, 11 KB, 32 params), 2,095 bitfield accessors, 7 core extractors
0x1150000--0x1225000852 KB733SASS codec (decoders + encoders): both directions of the instruction codec for an older SM target (likely sm_89 Ada Lovelace or sm_90 Hopper). Decoders read 128-bit words and extract fields; encoders pack fields back. Three mega-decoders (29--33 KB each) and two mega-dispatchers (78--104 KB, too large for Hex-Rays).3 mega decoders (29--33 KB), 2 mega dispatchers (78--104 KB), 728 of 733 vtable-dispatched
0x1225000--0x12FA000860 KB1,552Register-pressure scheduling + ISel + encoders: register-pressure-aware instruction scheduling (0x1225--0x1240), instruction selection and emission pipeline (0x1240--0x1254), 982 SASS binary encoders packing operand fields into 128-bit words (0x1254--0x12FA). All encoders vtable-dispatched.Scheduling at 0x1225--0x1240, ISel at 0x1240--0x1254, 982 encoding handlers
0x12FA000--0x13CF000845 KB1,282Operand legalization and peephole: 522 per-instruction bit-field encoders (366 KB), 186 peephole pattern matchers (81 KB), 11 operand legalization/materialization functions (40 KB), 38 operand encoding emitters (31 KB), 8 live-range analysis functions (14 KB).sub_137B790 (operand legalization, 8.5 KB), 186 peephole matchers, 522 encoders
0x13CF000--0x14A4000844 KB1,219SM120 (RTX 50-series) peephole pipeline: 1,087 instruction pattern matchers (429 KB), one 233 KB master opcode dispatch switch (sub_143C440, 373-case primary switch), 123 instruction encoders (180 KB). Pattern matchers validate opcode, modifiers, and operand types; dispatch rewrites opcode byte and operand mapping.sub_143C440 (233 KB dispatch, 373-case switch), 1,087 pattern matchers, 123 encoders
0x14A4000--0x1579000852 KB606Blackwell ISA encode/decode: 332 encoder functions (0x14A4--0x1520) packing SASS bitstreams, 1 dispatcher (vtable router at 0x15209F0), 273 decoder functions (0x1520--0x1578) unpacking bitstreams and validating fields. Encoder state struct is 600+ bytes with 128-bit format descriptor at +8, operand arrays at +24--+143.332 encoders, 273 decoders, 1 dispatcher
0x1579000--0x164E000852 KB1,324SASS encoding + peephole matchers: Zone A has 367 instruction encoders, Zone B has 78 utility/transition functions, Zone C has 469 peephole pattern matchers. All pattern matchers are called from a single 280 KB mega-dispatcher (sub_169B190).367 encoders, 469 peephole matchers, 78 utilities
0x164E000--0x1723000873 KB899ISel pattern matching core: 762 PTX opcode pattern matchers (Zone A), the master dispatch function sub_169B190 at 280 KB / 66K instructions (Zone B -- the single largest function in the binary), 100 encoding table entries, and 36 multi-instruction template expanders. The dispatch tries every matcher, selects the highest-scoring match, and records which SASS expansion template to use.sub_169B190 (280 KB, 66K insns, 15,870 callees), 762 matchers, 36 template expanders
0x1723000--0x17F8000852 KB631ISA description database: ~555 SASS instruction format descriptor classes (one per opcode variant), ~316 bitfield layout initializers, ~239 opcode handler vtable entries. Also contains instruction sequence generators (multi-instruction expansions for complex PTX operations), register allocation helpers, and Newton-Raphson approximation templates. 91.8% of functions have zero static callers (vtable-dispatched).~555 format descriptor classes, ~316 bitfield initializers, ~239 vtable entries
0x17F8000--0x18CD000852 KB1,460SASS instruction printer + peephole: Subsystem A (0x17F8--0x181F) implements SASS disassembly rendering via virtual method overrides on a builder/visitor with a 4,080+ byte vtable. Subsystem B (0x1820--0x18CC) is a 231 KB peephole dispatch function (sub_18A2CA0, 54K instructions, 1,330 unique callees).sub_18189C0 (SASS printer, 45 KB), sub_181B370 (SASS printer, 28 KB), sub_18A2CA0 (231 KB peephole dispatch)
0x18CD000--0x19A2000877 KB1,598Scheduling + peephole dispatchers: Zone A (275 KB) is the instruction scheduling core (list scheduler, dependency graph, ready queue, register pressure tracking). Zone B (130 KB) contains 318 opcode property/classification tables. Zones C+D (460 KB) contain 888 peephole pattern matchers called from sub_198BCD0 (239 KB, 1,336 unique callees).sub_198BCD0 (239 KB peephole dispatch), 392 scheduling functions, 318 opcode property tables, 888 pattern matchers
0x19A2000--0x1A77000880 KB1,393GPU ABI/calling convention + SM89/90 encoders: Zone A (250 KB, 276 functions) implements the NVIDIA GPU calling convention -- parameter register allocation, return address placement, scratch/preserved classification, convergent boundary enforcement, coroutine SUSPEND semantics, uniform register support, per-SM ABI lowering (sm_35 through sm_100+). Zone B (480 KB) has ~1,117 supplementary SASS encoding vtable handlers.sub_19D1AF0 (master ABI setup, 5.6 KB), 276 ABI functions, ~1,117 encoding handlers
0x1A77000--0x1B4C000829 KB1,518SASS emission backend (4 SM families): Zone A has 1,083 bit-field packing encoders spanning sm_50 through sm_100+. Zone B has 339 instruction lowering/expansion functions (two SM families: sm_8x and sm_9x/10x). Zone C has 84 Ampere/Ada/Hopper-era encoders. Zone D has 92 Blackwell-era encoders.sub_1B6B250 (register-class-to-HW mapping, 254 callers), 1,083 emitters, 339 lowering functions
0x1B4C000--0x1C21000876 KB1,974SASS emission + format descriptors: register-class encoding tables (Zone A), per-SM instruction bit-field encoders (Zone B), instruction emission orchestrators (Zone C), multi-operand dispatch emitters (Zone D), mirrored SM-variant emitters (Zone E), instruction format descriptors (Zone F, 0x1C05--0x1C21).487 functions exceed 2 KB decompiled
0x1C21000--0x1CE2DE2776 KB1,628Library layer: custom ELF emitter (CUBIN output), capsule Mercury ELF (.nv.capmerc debug metadata), section layout and memory allocation (shared/constant/local/global), relocation resolution (branch targets, UFT/UDT, YIELD-to-NOP), call graph analysis (recursion detection, dead function elimination), DWARF debug generation (.debug_info/.debug_line/.debug_frame), option parsing library, thread pool (pthread-based), JSON builder, GNU Make jobserver client, C++ name demangler (Itanium ABI), ELF file writersub_1C9F280 (ELF emitter, 97 KB decomp), sub_1CABD60 (section allocator, 67 KB), sub_1CC9800 (EIATTR builder, 90 KB), sub_1CDC780 (demangler, 93 KB), sub_1CB53A0 (ELF world init), sub_1CD48C0 (relocation resolver, 22 KB), sub_1CBB920 (recursion detector), sub_1CB18B0 (thread pool), sub_1CD13A0 (file writer, 11 KB)

.rodata Contents (7.5 MB)

The .rodata section at 0x1CE2E00--0x240BF8F is 29% of the binary by size. Its dominant consumers:

ContentEstimated SizeNotes
SASS encoding format descriptors~3.5 MB128-bit xmmword constants loaded via SSE by ~4,000 encoding handlers
Flex DFA transition tables~600 KBoff_203C020, the 552-rule PTX scanner's state machine
Bison parser tables~400 KBLALR(1) action/goto tables for the PTX grammar
Error/diagnostic format strings~300 KB30,632 strings extracted from the binary
Phase ordering + vtable tables~100 KBDefault 159-entry phase table at 0x22BEEA0, vtable table at off_22BD5C8
ROT13-encoded string tables~200 KBPTX opcode names (~900 entries), knob names (~2,000 entries)
Architecture capability tables~150 KBPer-SM feature maps (sm_75 through sm_121), HW latency profiles
DWARF name tables~50 KBDW_FORM_*, DW_AT_*, DW_OP_* string tables
Hash constants + misc~2.2 MBMurmurHash3 mixing constants, lookup tables, padding

.bss Contents (84 KB)

ContentNotes
ROT13 PTX opcode name tablePopulated by ctor_003 (0x4095D0, 17 KB) at startup
General OCG knob tablePopulated by ctor_005 (0x40D860, 80 KB) -- ~2,000 entries
Mercury scheduler knob tablePopulated by ctor_007 (0x421290, 8 KB) -- 98 entries
Thread-local storage keyspthread_key_t for per-thread context (280-byte struct)
Global pool allocator mutexpthread_mutex_t at pool struct offset 7128
Diagnostic suppression bitmapsPer-warning-ID suppression flags
SM architecture profile objectsConstructed on demand per sub_6765E0
Global error/warning countersIncremented by sub_42FBA0
Make jobserver stateAtomic state machine (0=init, 5=no MAKEFLAGS, 6=no auth, 7=failed)

.data Contents (14 KB)

ContentNotes
Function pointer tablesExit wrapper (off_29FA4B0), error handler dispatch
Default option valuesPopulated by sub_432A00 (option registration)
Static string table pointersVersion strings, format strings
Diagnostic output tablesSeverity prefix strings: "error ", "warning ", "info ", "fatal "

Static Constructors

The .ctors section holds 12 entries executed before main. The four largest are:

ConstructorAddressBinary SizePurpose
ctor_0010x4094C0204 BThread infrastructure: pthread_key_create, mutex init, thread priority range
ctor_0030x4095D017,007 BPTX opcode name table: ~900 ROT13-encoded opcode mnemonics
ctor_0050x40D86080,397 BGeneral OCG knob table: ~2,000 ROT13-encoded knob names + hex defaults
ctor_0070x4212907,921 BMercury scheduler knob table: 98 ROT13-encoded scheduler knobs

The remaining 8 constructors handle memory allocator pool initialization, hash map infrastructure setup, diagnostic system initialization, and architecture vtable factory registration (sub_1CCD900).

Mega-Functions (>50 KB binary)

AddressBinary SizeDecompiledFunctionCallees
sub_169B190280 KBN/AMaster ISel pattern dispatch (66K instructions)15,870
sub_198BCD0239 KBN/APeephole dispatch, SM variant 21,336
sub_143C440233 KBN/ASM120 peephole dispatch (373-case switch)~1,100
sub_18A2CA0231 KBN/APeephole dispatch, SM variant 11,330
sub_6D969094 KBN/AInstruction encoding switch~500
sub_46E00093 KBN/APTX opcode-to-handler table builder1,168
sub_40D86080 KBN/Actor_005: general knob registration~2,000
sub_720F0064 KBN/AFlex DFA scanner (552 rules)~50

These eight functions account for 1.2 MB of code (4.8% of .text) but only 0.02% of the function count.

Most-Called Functions

AddressCallersIdentity
sub_4280C03,928Thread-local context accessor (pthread_getspecific)
sub_42BDB03,825Fatal OOM handler (called from every allocation site)
sub_4240703,809Pool memory allocator (alloc)
sub_4261502,800Hash map insert/update
sub_42FBA02,350Central diagnostic message emitter
sub_4248B01,215Pool memory deallocator (free)
sub_42CA60298Linked list prepend
sub_42D850282Hash set insert
sub_1B6B250254Register-class-to-hardware-number lookup (SASS emission)
sub_4279D0185String prefix match (starts_with)

The top five functions are all in the runtime infrastructure region (0x403520--0x42F000). Together they represent the core allocation, error handling, and data structure layer that the rest of the binary depends on.

Binary Composition by Purpose

Estimated from function classification across 30 sweep reports (p1.01--p1.30). Each function was assigned to a single purpose category based on its dominant behavior; functions straddling categories (e.g., a scheduling pass that also emits SASS) are attributed to the category consuming the larger share of their code.

PurposeEstimated SizeShare of .text
SASS instruction encoding/decoding~12 MB46%
Optimization passes + scheduling~5 MB19%
Peephole pattern matching + dispatch~3 MB12%
Frontend: parsing + validation~2 MB8%
ISel pattern matching + templates~1.5 MB6%
Infrastructure: allocator, hash, ELF, debug~1.5 MB6%
GPU ABI + calling convention~0.7 MB3%

The single largest consumer of code space is SASS instruction encoding. Each SM architecture generation requires its own set of per-opcode encoding/decoding handler functions. With support for SM75 through SM121 (six major generations), this yields approximately 4,000 encoding handlers, each a standalone function averaging 1,400 bytes.