Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Binary Layout

This page is the primary orientation map for navigating the nvlink v13.0.88 binary in IDA Pro. It documents every ELF segment, the complete .text address-range breakdown across all 20 sweep regions, subsystem composition, the five mega-hub instruction selectors, and the import table. If you are opening the binary for the first time, start here.

ELF Overview

nvlink is a dynamically linked, fully stripped x86-64 ELF binary. All C++ symbols have been removed. The binary imports only C-level symbols from glibc, pthreads, and libdl -- no std::, no __cxa_, no vtable symbols are visible in the import table. This confirms that the C++ runtime is statically linked (or that all C++ was compiled with hidden visibility and LTO).

PropertyValue
File size~37 MB
Architecturex86-64, little-endian
LinkingDynamic (glibc, pthreads, libdl, libm)
StrippedYes -- all internal symbols removed
Build IDcuda_13.0.r13.0/compiler.36424714_0
Build dateWed Aug 20, 2025
VersionCuda compilation tools, release 13.0, V13.0.88
Tool identityNVIDIA (R) Cuda linker
Functions40,532 total
Imports156 dynamic symbols (C-only, no C++ mangled names)

ELF Segment Map

SegmentStartEndSizePerms
ELF header + LOAD0x4000000x402EE012.0 KBR-X
.init0x402EE00x402EF824 BR-X
.plt0x402F000x4038C02.4 KBR-X
.text0x4038C00x1D3217225.2 MBR-X
.fini0x1D321740x1D3218214 BR-X
.rodata0x1D321A00x2463BB07.2 MBR--
.eh_frame_hdr0x2463BB00x24BC244354 KBR--
.eh_frame0x26BC9680x2A599383.6 MBRW-
.gcc_except_table0x2A599380x2A59CE4940 BRW-
.ctors0x2A59CE80x2A59D50104 BRW-
.dtors0x2A59D500x2A59D6016 BRW-
.data.rel.ro0x2A59D600x2A5AE004.2 KBRW-
.got0x2A5AFE00x2A5AFF824 BRW-
.got.plt0x2A5B0000x2A5B4F01.2 KBRW-
.data0x2A5B5000x2A5F18015.1 KBRW-
.bss0x2A5F1800x2A77DD899.1 KBRW-
.tls0x2A77DE00x2A77DF824 B---
extern (imports)0x2A77DF80x2A782D81.2 KB---

The .text section dominates the binary at 25.2 MB. The .rodata section (7.2 MB) holds instruction encoding constant tables (xmmword templates, operand descriptor arrays, opcode lookup tables), string literals, and read-only data structures.

Binary Composition

The single most important structural fact about nvlink is its size distribution. The binary is a hybrid linker-compiler where the actual linker is a small minority of the code.

ComponentApprox. Size% of .textFunction CountRole
Embedded ptxas backend~15 MB~60%~20,000PTX-to-SASS compiler: ISel, regalloc, scheduling, encoding
Instruction encoding tables~7 MB~28%~8,000Per-SM SASS encoder/decoder/descriptor functions
Linker core~1.2 MB~5%~600ELF merge, symbol resolution, relocation, layout, output
LTO orchestration~1.0 MB~4%~500IR collection, libnvvm dlopen, PTX compilation dispatch
Infrastructure~0.5 MB~2%~400Memory arenas, option parsing, hash tables, compression
Mercury / FNLZR~0.3 MB~1%~200SM100+ capsule mercury post-link transformation

Approximately 95% of the binary is compiler backend. Only about 5% is linker logic. Most functions you encounter during reverse engineering are instruction selection patterns, SASS encoders, or register allocator internals -- not linker code.

Visual Memory Map

The following ASCII diagram shows the complete virtual address layout of nvlink v13.0.88. Each row is proportional to its byte size. The .text section is expanded into all 16 functional zones. The rightmost column shows the dominant subsystem and approximate function count.

Full ELF Virtual Address Layout

 Address         Section / Region                                       Size
 ──────────────────────────────────────────────────────────────────────────────
 0x400000       ┌──────────────────────────────────────────────────┐
                │  ELF Header (64 B) + Program Headers (392 B)    │   12.0 KB
                │  .interp / .hash / .dynsym / .dynstr / .rela    │
 0x402EE0       ├──────────────────────────────────────────────────┤
                │  .init (24 B)                                    │   24 B
 0x402F00       ├──────────────────────────────────────────────────┤
                │  .plt  (156 stubs)                               │   2.4 KB
 0x4038C0       ├══════════════════════════════════════════════════╡
                │                                                  │
                │              .text  (25.2 MB)                    │
                │       40,532 functions -- see breakdown below    │
                │                                                  │
 0x1D32172      ├══════════════════════════════════════════════════╡
                │  .fini (14 B)                                    │   14 B
 0x1D321A0      ├──────────────────────────────────────────────────┤
                │  .rodata  (string tables, encoding templates,    │   7.2 MB
                │   xmmword constants, operand descriptor arrays,  │
                │   ROT13 SASS mnemonics, lookup tables)           │
 0x2463BB0      ├──────────────────────────────────────────────────┤
                │  .eh_frame_hdr                                   │   354 KB
 0x24BC244      ├──────────────────────────────────────────────────┤
                │  (hole -- LOAD segment alignment padding)        │   2.0 MB
 0x26BC968      ├──────────────────────────────────────────────────┤
                │  .eh_frame  (unwinding data for 40K functions)   │   3.6 MB
 0x2A59938      ├──────────────────────────────────────────────────┤
                │  .gcc_except_table                               │   940 B
 0x2A59CE8      ├──────────────────────────────────────────────────┤
                │  .ctors (104 B) + .dtors (16 B)                  │   120 B
 0x2A59D60      ├──────────────────────────────────────────────────┤
                │  .data.rel.ro  (vtable pointers)                 │   4.2 KB
 0x2A5AFE0      ├──────────────────────────────────────────────────┤
                │  .got (24 B) + .got.plt (1.2 KB)                 │   1.2 KB
 0x2A5B500      ├──────────────────────────────────────────────────┤
                │  .data  (initialized globals: option defaults,   │   15.1 KB
                │   function pointers, dispatch tables)            │
 0x2A5F180      ├──────────────────────────────────────────────────┤
                │  .bss   (zero-init: knob tables, TLS keys,       │   99.1 KB
                │   option vars, SM dispatch hash maps, mutexes)   │
 0x2A77DD8      ├──────────────────────────────────────────────────┤
                │  .tls (24 B) + extern/imports (1.2 KB)           │   1.2 KB
 0x2A782D8      └──────────────────────────────────────────────────┘
                   Total mapped: ~43 MB virtual, ~37 MB on disk

.text Section Expanded -- 16 Functional Zones

The 25.2 MB .text section decomposes into 16 zones. The diagram uses one line per major zone, with bar width proportional to byte size. The [====] bars indicate relative size (1 = per ~100 KB).

  0x4038C0                                                             0x1D32172
  |                         .text  (25.2 MB, 40,532 functions)                 |
  |                                                                            |
  Zone  Start      End        Size   Bar                      Identity
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   1    0x4038C0   0x470000   430 KB [====]                   Linker core: main,
                                                              CLI, merge, layout
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   2    0x470000   0x530000   768 KB [=======]                Finalize, arch compat,
                                                              debug info, LTO entry,
                                                              encoding infra
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   3    0x530000   0x620000   960 KB [=========]              IR primitives +
                                                              SM50-7x ISel +
                                                              MercExpand engine
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   4    0x620000   0x7A0000   1.5 MB [===============]        SM100+ Blackwell
                                                              SASS encoders (1,068)
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   5    0x7A0000   0x920000   1.5 MB [===============]        SM100+ encoders tail
                                                              (469) + InstrDesc
                                                              init table (670)
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   6    0x920000   0xA70000   1.3 MB [=============]          InstrDesc table cont.
                                                              (943) + NVInst
                                                              accessors + operand
                                                              dispatch switches
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   7    0xA70000   0xB80000   1.1 MB [===========]            Multi-arch codec:
                                                              field queries,
                                                              SM90/100 encoders
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   8    0xB80000   0xCA0000   1.1 MB [===========]            SM7x + SM80 codecs,
                                                              Turing/Ampere
                                                              enc/dec tables
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
   9    0xCA0000   0xDA0000   1.0 MB [==========]             SM80 ISel backend:
                                                              pattern matchers +
                                                              mega-hub (239 KB)
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
  10    0xDA0000   0xF16000   1.5 MB [===============]        SM100+ Blackwell
                                                              codec 2nd table:
                                                              encoders + decoders
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
  11    0xF16000   0x100C000  984 KB [=========]              SM75 ISel backend:
                                                              patterns (276) +
                                                              mega-hub (280 KB)
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
  12    0x100C000  0x11EA000  1.9 MB [===================]    Shared encoders +
                                                              SM89/90 backend +
                                                              ISel mega-hub (231 KB)
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
  13    0x11EA000  0x1430000  2.3 MB [=======================]PTX frontend + ISel
                                                              helpers + LTO engine
                                                              + ISel pattern clones
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
  14    0x1430000  0x16E0000  2.8 MB [============================]
                                                              PTX assembler +
                                                              builtins + CUDA
                                                              compilation backend
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
  15    0x16E0000  0x1A00000  3.2 MB [================================]
                                                              Scheduler + regalloc
                                                              + SASS opcode tables
                                                              + codegen verify
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────
  16    0x1A00000  0x1D32172  3.2 MB [================================]
                                                              SASS backend + ABI +
                                                              ELF builder + DWARF +
                                                              demangler + knobs
  ───── ────────── ────────── ────── ──────────────────────── ────────────────────

Subsystem Heat Map

The following diagram groups the .text section by subsystem identity rather than address, showing where each logical component's code resides. Non-contiguous ranges for the same subsystem are combined.

  .text subsystem view (25.2 MB)
  ┌──────────────────────────────────────────────────────────────────────────────┐
  │ Linker Core        ░░░                                                │1.2MB│
  │ (main, merge,      ░░░                                                │  5% │
  │  layout, reloc)    ░░░                                                │~150 │
  ├────────────────────┬───────────────────────────────────────────────────┤     │
  │ LTO + Infra   ░░░░░│                                                  │1.5MB│
  │ (libnvvm, arena,   │                                                  │  6% │
  │  knobs, compress)  │                                                  │~100 │
  ├────────────────────┴──┬────────────────────────────────────────────────┤     │
  │ ISel Dispatch          ████████████████████████████                    │5.0MB│
  │ (5 mega-hubs +         ████████████████████████████                    │ 20% │
  │  2,000+ pattern        ████████████████████████████                    │~2500│
  │  matchers)             ████████████████████████████                    │     │
  ├────────────────────────┴──┬───────────────────────────────────────────┤     │
  │ SASS Encoding Tables       ████████████████████████████████████████   │7.0MB│
  │ (per-instruction encoders, ████████████████████████████████████████   │ 28% │
  │  decoders, descriptors)    ████████████████████████████████████████   │~8000│
  ├────────────────────────────┴──┬───────────────────────────────────────┤     │
  │ Operand Dispatch + IR         ███████████                             │1.5MB│
  │ (setOperandField,             ███████████                             │  6% │
  │  getOperandField,             ███████████                             │ ~40 │
  │  NVInst accessors)            ███████████                             │     │
  ├───────────────────────────────┴──┬────────────────────────────────────┤     │
  │ PTX Assembler Frontend           ████████████████                     │3.0MB│
  │ (parsers, validators,            ████████████████                     │ 12% │
  │  builtins, handlers)             ████████████████                     │~800 │
  ├──────────────────────────────────┴──┬─────────────────────────────────┤     │
  │ Backend Compiler Core               ██████████████████████████████   │6.0MB│
  │ (scheduler, regalloc, codegen,      ██████████████████████████████   │ 23% │
  │  ABI, ELF builder, DWARF,          ██████████████████████████████   │~800 │
  │  scoreboard, liveness)              ██████████████████████████████   │     │
  └─────────────────────────────────────┴─────────────────────────────────┘
                                   Legend: ░ = linker-specific
                                           █ = embedded ptxas compiler

Cross-reference: The embedded ptxas backend in nvlink is a subset of the standalone ptxas v13.0.88 binary (37.7 MB, 40,185 functions). The standalone ptxas wiki documents the same three-subsystem decomposition (PTX Frontend / Ori Optimizer / SASS Backend) in full detail. The ISel mega-hubs, encoding tables, and register allocator within nvlink correspond to the SASS Backend subsystem of standalone ptxas, while the PTX assembler frontend in nvlink zones 13--14 maps to ptxas's PTX Frontend subsystem. The standalone ptxas includes additional optimization passes (the "Ori Optimizer" at 5.8 MB) that are absent from nvlink's embedded copy, since nvlink relies on libnvvm for IR optimization during LTO.

Per-Zone Statistics

ZoneAddress RangeSizeFunctionsAvg SizeCode DensityDominant Pattern
10x4038C0--0x470000430 KB~1602.7 KBMediumLarge control-flow functions (main=58 KB)
20x470000--0x530000768 KB~4111.9 KBMediumConfig/init/debug infrastructure
30x530000--0x620000960 KB~1,544636 BHighISel patterns (1,293 identical-template)
40x620000--0x7A00001.5 MB~1,0681.4 KBVery high128-bit instruction encoders (template)
50x7A0000--0x9200001.5 MB~1,1391.3 KBVery highEncoders + descriptor initializers
60x920000--0xA700001.3 MB~9541.4 KBVery highDescriptor init + operand dispatch
70xA70000--0xB800001.1 MB~3033.6 KBMediumMulti-arch codec functions
80xB80000--0xCA00001.1 MB~3323.3 KBMediumSM7x/SM80 encode/decode
90xCA0000--0xDA00001.0 MB~4132.5 KBMediumSM80 ISel + mega-hub
100xDA0000--0xF160001.5 MB~1,0881.4 KBVery highBlackwell codec table 2
110xF16000--0x100C000984 KB~3143.1 KBMediumSM75 ISel + mega-hub
120x100C000--0x11EA0001.9 MB~9782.0 KBHighShared encoders + SM89/90
130x11EA000--0x14300002.3 MB~3,310712 BHighISel clones + LTO pipeline
140x1430000--0x16E00002.8 MB~1,0342.8 KBMediumPTX frontend + builtins
150x16E0000--0x1A000003.2 MB~1,1262.9 KBMediumScheduler + regalloc + SASS tables
160x1A00000--0x1D321723.2 MB~1,2392.6 KBMediumBackend + ABI + ELF + DWARF

Code density classification: "Very high" = zones dominated by template-generated functions where every byte is encoding logic with minimal control flow. "High" = many small pattern-matcher or clone functions. "Medium" = mix of large complex functions and small helpers.

Major Function Clusters

The binary contains several distinct function "clusters" -- groups of 50+ functions with identical structure, generated from the same template or macro expansion.

ClusterZone(s)CountTemplate SizeIdentity
SM100+ SASS encoders4, 51,5374--9 KB eachsub_4C28B0-based bitfield insertion, 128-bit instructions
SM100+ SASS decoders106483--8 KB eachMirror of encoder set, binary-to-IR translation
InstrDesc initializers5, 61,6131--3 KB eachOperand count/type/constraint/latency metadata
SM50-7x ISel patterns31,2931--3 KB each(ctx, node, &pattern_id, &priority) template
SM80 ISel patterns92591--3 KB eachSame template, Ampere-specific opcodes
SM75 ISel patterns112761--3 KB eachSame template, Turing-specific opcodes
SM89/90 ISel patterns12~1601--3 KB eachSame template, Ada/Hopper-specific opcodes
ISel parametric clones13~5001--2 KB eachPer-SM-variant copies of shared patterns
Shared encoders12~7501--2 KB eachTemplate-instantiated SASS encoding table entries
Builtin code-template gen14~2503--5 KB eachCUDA builtin lowering functions

These 10 clusters account for approximately 7,286 functions (18% of the total) but consume roughly 15 MB of .text (60% of code bytes). Template-generated code dominates the binary.

Function Size Distribution

Size BucketCount%Notes
< 100 B11,67728.8%IR node accessors, type predicates, identity functions
100 B -- 1 KB20,60150.8%ISel pattern matchers, encoding helpers, small utilities
1 -- 5 KB7,95719.6%Instruction encoders/decoders, operand emitters
5 -- 20 KB2620.65%Pipeline drivers, complex validators, pass entry points
20 -- 100 KB270.07%Main functions, scheduling drivers, mega-dispatchers
100 -- 280 KB80.02%ISel mega-hubs, builtin prototype DB (see below)

The Five ISel Mega-Hubs

Five functions exceed 160 KB each. They are the top-level instruction selector dispatch functions for different SM architecture generations. Each contains a massive jump table that calls hundreds of ISel pattern matchers in sequence, selects the highest-priority match, then dispatches to the corresponding emitter. These functions are too large for Hex-Rays to decompile.

AddressSizeTarget SMSubsystem
sub_FBB810280 KBSM75 (Turing)ISel dispatch -- calls 276+ pattern matchers
sub_126CA30239 KBSM50-7x (Maxwell/Pascal/Volta) sharedISel dispatch for pre-Turing backends
sub_D5FD70239 KBSM80 (Ampere)ISel dispatch for Ampere-class GPUs
sub_119BF40231 KBSM89/90 (Ada/Hopper)ISel dispatch for Ada Lovelace / Hopper
sub_5B1D80204 KBSM50-7x (MercExpand)MercExpand instruction expansion dispatch

Each mega-hub sits at the boundary between ISel pattern matchers (hundreds of small ~3 KB functions) and instruction emitters. The pattern matchers feed (pattern_id, priority) pairs to the hub, which selects the best match and dispatches to an emission function.

Other large functions outside ISel hubs:

AddressSizeIdentity
sub_15B86A0345 KBcuda_builtin_prototype_generator -- 608-case switch on builtin index
sub_147EF50288 KBptx_instruction_semantic_analyzer -- master instruction validator
sub_1487650240 KBptx_statement_processor -- top-level PTX statement handler
sub_146BEC0206 KBptx_load_store_validator -- memory operation validator
sub_15903D0165 KBCUDA builtin code-template generator (SM variant)
sub_1638800118 KBPTX intrinsic lowering dispatch

Complete .text Address Map

The following table maps every region of the .text section to its identified subsystem, based on the 20 sweep reports covering 0x400000 through 0x1D32172. Adjacent regions with identical function are merged where possible.

Linker Core (0x400000 -- 0x530000, ~1.2 MB)

RangeSizeSubsystemFunctionsKey Finding
0x400000--0x4038C014 KBELF header, PLT stubs~156PLT entries for 156 dynamic imports
0x4038C0--0x40980024 KBCRT init, startup~30_start, __libc_start_main trampoline
0x409800--0x427AE0122 KBnvlink main1main(): 58 KB / 1936 lines, file-type dispatch loop
0x427AE0--0x43200041 KBCLI option parsing~10nvlink_parse_options(): 30 KB, ~60 option registrations
0x432000--0x44500076 KBMemory arena, ELF mgmt~40Arena allocator, ELF section/symbol management
0x445000--0x469D60150 KBMerge/layout/relocate~50merge_elf, layout, relocation engine
0x469D60--0x47000025 KBFinalize, output~20Callgraph, dead-code elimination, output write

Finalization + Architecture + JIT Pipeline (0x470000 -- 0x530000, ~768 KB)

RangeSizeSubsystemFunctionsKey Finding
0x470000--0x4717006 KBArchitecture compat~8can_finalize_architecture_check, sm_100/103/110/120/121 maps
0x471700--0x4748F012 KBFinalization orchestrator1nvlink_finalize_object: 78 KB, 460+ locals, key-value config
0x4748F0--0x476FB010 KBLink+finalize entry1Top-level 25-param entry: mercury/capmerc/sass binary kinds
0x476FB0--0x48853093 KBDebug info, line tables~30DWARF-like debug encoding/decoding, file tables
0x488530--0x4BC6F0209 KBKnobs, archive, nvvm~60Knobs infrastructure, .a parsing, libNVVM integration layer
0x4BC6F0--0x4C2A6025 KBLTO + PTX JIT entry~10compile_linked_lto_ir, ptxas_jit_compile entry points
0x4C2A60--0x530000843 KBEncoding infrastructure~40setBitfield, encodeOperand, init helpers -- shared by all SM

IR Node Primitives + SM50-7x ISel (0x530000 -- 0x620000, ~960 KB)

RangeSizeSubsystemFunctionsKey Finding
0x530E80--0x530FD0<1 KBIR node accessors22sub_530FB0 has 31,399 callers -- universal operand accessor
0x530FE0--0x5B1AB0523 KBISel pattern matchers1,293SM50-7x backend: 152 target opcodes, 36 priority levels
0x5B1D80--0x5E4470204 KBMercExpand mega-hub1MercExpand dispatch + CFG analysis + bitvector ops
0x5E4470--0x600260114 KBMercExpand engine~50Node creation, tree walker, register constraint propagation
0x603F60--0x61FA60112 KBSM50 instruction encoders79Per-instruction binary encoding functions

SM100+ (Blackwell) SASS Encoding (0x620000 -- 0x7A0000, ~1.5 MB)

RangeSizeSubsystemFunctionsKey Finding
0x6200A0--0x79FAF01.5 MBBlackwell SASS encoders1,068128-bit instruction encoders for SM100+ ISA

Each function encodes one instruction variant into a 128-bit instruction word. Opcode breakdown: major=1 (ALU/Scalar) 37%, major=2 (Vector/Mem/Ctrl) 63%, across 118 instruction families.

SM100+ Encoding Cont. + Descriptor Table (0x7A0000 -- 0x920000, ~1.5 MB)

RangeSizeSubsystemFunctionsKey Finding
0x7A0000--0x84DD70700 KBSM100+ SASS encoders (tail)469Continuation of Blackwell encoding table
0x84DD70--0x920000750 KBInstrDesc init table670Instruction descriptor initializers -- operand types, latencies

Combined encoding table: 0x620000--0x84DD70 (1,537 encoders). Combined descriptor table starts at 0x84DD70.

Descriptor Table + Operand Dispatch (0x920000 -- 0xA70000, ~1.3 MB)

RangeSizeSubsystemFunctionsKey Finding
0x920240--0xA482901.1 MBInstrDesc init table (cont.)943Covers opcode IDs 15--365+; most common: 18 (IMAD), 56 (FFMA)
0xA49010--0xA4AB104 KBNVInst accessors~30IR instruction class hierarchy, sub_A49150 (30,768 callers)
0xA4AB1011 KBNVInst constructor1Allocates and initializes instruction IR node
0xA4B5E0--0xA4C7C05 KBFNV-1a hash tables4Instruction lookup by hash
0xA5B6B0180 KBsetOperandField dispatch1Giant switch: sets operand fields by opcode class
0xA6222065 KBsetOperandImm dispatch1Giant switch: sets immediate operand values
0xA6590067 KBgetOperandField dispatch1Giant switch: reads operand fields by opcode class
0xA67910141 KBgetDefaultOperandValue dispatch1Giant switch: returns default operand values per opcode

Instruction Codec -- Multi-Arch (0xA70000 -- 0xCA0000, ~2.2 MB)

RangeSizeSubsystemFunctionsKey Finding
0xA709F054 KBField offset query16,491-line switch: (opcode_class, field_id) -> bit_offset
0xA7DE7050 KBField presence query1Mirror of A709F0: returns hasField boolean
0xA87CE0--0xB25D50630 KBPer-opcode encoders~164SM90/100 instruction binary encoders
0xACECF0--0xB77B60700 KBPer-opcode decoders~139Binary-to-IR instruction decoders
0xB80000--0xB9FDE0128 KBMulti-arch utilities~20Register pair validators, operand class computation
0xB9FDE0--0xBC2CC0142 KBSM7x (Volta/Turing) codecs~60Instruction encoders + decoders for SM70/SM75
0xBC3FC0--0xBFEC10236 KBSM75 extended codecs~80Turing-specific instruction encoding variants
0xC00070--0xC2FB60193 KBSM80 (Ampere) codecs~70Ampere instruction encoders
0xC3D540--0xC5097083 KBSM80 decoders~15HMMA tensor core, SHF, memory decoders
0xC7EC90--0xC9EE60131 KBSM86/89 (Ada) codecs~40GA10x / AD10x encoders + decoders

SM80 (Ampere) ISel Backend (0xCA0000 -- 0xDA0000, ~1 MB)

RangeSizeSubsystemFunctionsKey Finding
0xCA0000--0xCDC000240 KBOperand emission + packing137Phase 2: operand emitters for 19 SM80 opcodes
0xCDD5F0--0xCDD690<1 KBSM80 operand predicates15isGPR, isPredicate, isUniform, isImmediate, etc.
0xCE2000--0xD5FD70510 KBISel pattern matchers259SM80 pattern matching functions
0xD5FD70239 KBSM80 ISel mega-hub1Too large for Hex-Rays
0xD9A400--0xDA000023 KBBinary instruction encoding17Phase 3: final SASS binary packing

SM100+ (Blackwell) SASS Codec -- Second Table (0xDA0000 -- 0xF16000, ~1.5 MB)

RangeSizeSubsystemFunctionsKey Finding
0xDA0310--0xE436D0669 KBBlackwell encoders438Format 1: 147, Format 2: 290, Format 3: 1
0xE43C201 KBEncoder dispatch1Routes opcode to specific encoder function
0xE43DC0--0xF15A50847 KBBlackwell decoders648Mirrors encoder set; handles architecture variants
0xEFE6C01 KBDecoder dispatch1Routes opcode to specific decoder function

SM75 (Turing) ISel Backend (0xF16000 -- 0x100C000, ~984 KB)

RangeSizeSubsystemFunctionsKey Finding
0xF16030--0xF160F0<1 KBSM75 operand predicates15Operand type classifiers (register, immediate, etc.)
0xF10080--0xF15A5022 KBSM75 instruction emitters18Phase 2: operand emission for Turing instructions
0xF16150--0xFBB780678 KBISel pattern matchers276SM75 pattern matching -- linear scan architecture
0xFBB810280 KBSM75 ISel mega-hub1Largest function in binary. Too large for Hex-Rays
0xFFFDF0--0x100BBF048 KBPost-ISel emit+encode38Combined match+emit for complex instructions

SM89/90 Backend + Shared Encoders (0x100C000 -- 0x11EA000, ~1.9 MB)

RangeSizeSubsystemFunctionsKey Finding
0x100C000--0x10FFFFF1.0 MBShared instruction encoders~750Template-instantiated SASS encoding table entries
0x1100000--0x1120000128 KBSM89/90 backend driver~30ptxas_main_compilation_driver (65 KB), ELF output gen
0x110495038 KBptxas option parser1~60 option registrations (gpu-name, opt-level, etc.)
0x1112F3065 KBCompilation driver main1Per-module entry: PTX header, codegen callback selection
0x111689060 KBELF output + metadata1CUBIN generation, JSON metadata tree builder
0x1120000--0x119BF40496 KBISel pattern matchers~160SM89/90 instruction selection patterns
0x119BF40231 KBSM89/90 ISel mega-hub1Too large for Hex-Rays
0x11D4680--0x11EA00090 KBScheduler + emission~16Instruction scheduling, register-pressure tracking

PTX Frontend + ISel Helpers (0x11EA000 -- 0x12B0000, ~800 KB)

RangeSizeSubsystemFunctionsKey Finding
0x11EA000--0x126C000520 KBISel pattern-match predicates~160Shared PTX compiler ISel patterns (all SM targets)
0x126CA30239 KBPTX ISel mega-hub1Too large for Hex-Rays
0x12A7000--0x12AD00024 KBPTX type system / operand builders~15Type constructors, operand IR building
0x12AF000--0x12B00004 KBPTX module initializer5Creates module-level PTX compilation context

LTO Compilation Engine (0x12B0000 -- 0x1430000, ~1.5 MB)

RangeSizeSubsystemFunctionsKey Finding
0x12B0000--0x12BA00040 KBPTX operand/type system~20Special registers (%ntid, %laneid, etc.), symbol management
0x12BA000--0x12D000088 KBISel lowering passes~30Instruction lowering for LTO path
0x12D0000--0x12D500020 KBDWARF debug line info gen~5Line table emission for LTO-compiled code
0x12D5000--0x14000001.2 MBISel pattern clones~500Parametric clones per SM variant (sm_5x through sm_10x)
0x1400000--0x1430000192 KBLTO pipeline + ELF emit~20Top-level LTO pipeline, MMA lowering, output

PTX Assembler Frontend (0x1430000 -- 0x15C0000, ~1.6 MB)

RangeSizeSubsystemFunctionsKey Finding
0x1430000--0x144200072 KBPTX version/SM gates~30ptx_version_gate, sm_version_gate helpers
0x1442000--0x146BEC0156 KBInstruction emission handlers~80Per-instruction PTX code generators
0x146BEC0206 KBptx_load_store_validator1Validates ld/st/atom/fence with SM-version checks
0x147EF50288 KBptx_instruction_semantic_analyzer1Master validator: checks all SM version requirements
0x1487650240 KBptx_statement_processor1Top-level PTX statement handler
0x14932E0--0x15B86A0700 KBInstruction handlers + builtins~250Code-template generators for CUDA builtins
0x15B86A0345 KBcuda_builtin_prototype_generator1608-case switch: sm20 through sm10x builtin prototypes

CUDA Compilation Backend (0x15C0000 -- 0x16E0000, ~1.2 MB)

RangeSizeSubsystemFunctionsKey Finding
0x15C0CE015 KBSM dispatch tables1Registers 7 callback pointers per arch (sm_75 through sm_121)
0x15C44D0--0x15CA450348 KBnv.info attribute emitters~10Per-SM EIATTR record generation (largest: 78 KB)
0x15CD6A0--0x15D500030 KBAttribute merging/sorting~5Cross-function attribute merger
0x1610000--0x163FFFF192 KBPTX compilation frontend~40Operand handling, control flow, symbol management
0x1640000--0x165FFFF128 KBCodegen operand lowering~30Atom formatting, offset calculation
0x1660000--0x169FFFF256 KBISel/scheduling + DWARF~40Instruction scheduling, peephole, debug emission
0x16A0000--0x16DFFFF256 KBOCG intrinsic lowering~80builtin_ocg_* handlers, tcmma/tensor operations

PTX Lexer/Parser + Instruction Scheduler + SASS Tables (0x16E0000 -- 0x1850000, ~1.5 MB)

RangeSizeSubsystemFunctionsKey Finding
0x16E0000--0x16E3AB012 KBtcgen05 intrinsic codegen~10SM100 tensor memory address setup, guardrails
0x16E4D60--0x16F600070 KBPTX instruction builder~20Instruction construction, operand insert helpers
0x16F6000--0x1740000296 KBTepid instruction scheduler~50Full instruction scheduling pipeline
0x175D000--0x176800044 KBKnobs/config infrastructure~15Runtime tuning parameters
0x1769000--0x1850000924 KBSASS opcode tables~150SM70-SM120 instruction encoding/emission tables

ROT13-encoded SASS mnemonics: SNQQ=FADD, SZHY=FMUL, VZNQ=IMAD, SRAPR=FENCE, ZREPHEL=MERCURY.

Backend Compiler Core (0x1850000 -- 0x1A00000, ~1.7 MB)

RangeSizeSubsystemFunctionsKey Finding
0x1850000--0x186F000124 KBInstruction scheduling~15ScheduleInstructions (85 KB), ReduceReg, DynBatch, Cutlass
0x1878000--0x189C000144 KBConvertMemoryToRegister~20Shared-memory to register promotion pass
0x189C000--0x18FC000384 KBRegister allocation~40Spilling, SMEM spilling, multi-class regalloc
0x18FC000--0x1920000144 KBsetmaxnreg / CTA-reconfig~20Blackwell+ register budget negotiation
0x1916000--0x1960000296 KBmbarrier + ORI passes~30Instruction lowering, copy propagation, dead-code elim
0x1960000--0x19E0000512 KBCodegen verification~40Uninitialized register detection, rematerialization verify
0x19A0000--0x19C0000128 KBPTX metrics/statistics~15Occupancy estimation, instruction counts, loop analysis
0x19C0000--0x1A00000256 KBScheduling/regalloc guidance~20Guidance output, loop metrics, register validation

SASS Backend + ABI + ELF Builder (0x1A00000 -- 0x1B60000, ~1.4 MB)

RangeSizeSubsystemFunctionsKey Finding
0x1A009C0--0x1A0B1806 KBBug injection framework~5Testing framework for intentional bug injection
0x1A0B180--0x1A2000084 KBInstruction operand analysis~30Operand lowering, constant buffer encoding
0x1A1A000--0x1A2A00064 KBWarp sync / mbarrier~15%%mbarrier_%s_%s instruction generation
0x1A4B000--0x1A6109088 KBWGMMA pipeline analysis~20Warpgroup MMA live ranges, synchronization injection
0x1A61090--0x1A6A48038 KBScoreboard management~10Instruction scheduling scoreboard
0x1A6A480--0x1AA2090352 KBISel/lowering + encoding~80Instruction selection, SASS emission
0x1AA2090--0x1ABF000124 KBRegalloc + ABI~30Register allocation, ABI handling
0x1AEAA90--0x1AEE07014 KBInstruction vtable factory~10SASS instruction vtable construction
0x1AEE070--0x1B0000070 KBMemory pool diagnostics~10Pool tracking, encoding passes
0x1B00000--0x1B20000128 KBRegister liveness~30Interference graph construction
0x1B19750--0x1B40000160 KBMachine scheduling + CFG~40Basic block management, scheduling
0x1B40000--0x1B60000128 KBDependency tracking~30Scoreboard / dependency graph management

Final Segment: ISel + ABI + ELF + Demangler + DWARF (0x1B60000 -- 0x1D32172, ~1.8 MB)

RangeSizeSubsystemFunctionsKey Finding
0x1B60000--0x1B9FFFF256 KBISel + lowering~200PTX-to-SASS ISel, tail-call optimization
0x1BA0000--0x1BFFFFF384 KBABI / calling convention~150Return address mgmt, convergent boundary, coroutine regs
0x1C00000--0x1CDFFFF896 KBELF section builder~120.nv.constant, .nv.shared, .strtab, cubin/fatbin container
0x1CE0000--0x1CEDFFF56 KBC++ name demangler~40Itanium ABI + MSVC demangler, templates, RTTI, lambdas
0x1CF0000--0x1D32172265 KBDWARF + LEB128 + KNOBS~140DW_FORM/TAG/AT generation, SSE-accelerated LEB128, ELF finalize

Import Table

nvlink dynamically imports exactly 156 symbols from glibc, libpthread, and libdl. There are zero C++ mangled symbols in the import table -- the binary is fully stripped with all C++ internals resolved at static link time.

TypeCountExamples
libc (stdio/stdlib/string)112printf, malloc, free, fopen, memcpy, strcmp, getenv
pthread20pthread_create, pthread_mutex_lock, pthread_cond_wait
GCC unwinder12_Unwind_Resume, _Unwind_GetIPInfo, _Unwind_SetGR
math (libm)9sqrt, sin, cos, pow, log, floor, ceil, expf
dl (dynamic loading)3dlopen, dlsym, dlclose

The dlopen/dlsym/dlclose imports are used for loading libnvvm at runtime during LTO compilation. nvlink does not statically link libnvvm -- it discovers and loads it dynamically via the --nvvmpath option.

.rodata Composition

The 7.2 MB .rodata section (0x1D321A0--0x2463BB0) holds:

ContentApprox. SizeDescription
Instruction encoding templates~4 MBxmmword_* constants: 128-bit opcode templates OR'd into instruction words
Operand descriptor tables~1.5 MBPer-instruction operand type/constraint/modifier arrays (10 DWORDs x 3)
String literals~1 MBError messages, option names, ROT13 mnemonics, format strings
Lookup tables~0.5 MBArchitecture dispatch tables, opcode-to-handler mappings
Miscellaneous~0.2 MBAlignment padding, small constant arrays

.bss Layout

The 99 KB .bss section (0x2A5F180--0x2A77DD8) stores runtime state:

  • 0x2A5F2xx--0x2A5F3xx: Global option variables (set by nvlink_parse_options)
  • 0x2A5F330: Input file linked list head
  • 0x2A644B8--0x2A644C0: SM dispatch hash map pointers (7 tables)
  • Remaining: Thread-local storage, compilation state, hash table backing stores

Key Observations for Navigation

  1. Address heuristic: If you are at an address below 0x530000, you are likely in linker-specific code. Above 0x530000, you are almost certainly in the embedded ptxas compiler backend.

  2. Pattern recognition: Functions in the range 0x530FE0--0x5B1AB0, 0x11EA000--0x126C000, and 0xCE2000--0xD5FD70 are ISel pattern matchers. They all follow the identical (ctx, node, &pattern_id, &priority) signature with nested attribute checks.

  3. Encoding tables: The massive contiguous blocks at 0x620000--0x84DD70 and 0xDA0000--0xF15A50 are instruction encoder/decoder tables. Each function is 4--9 KB, follows a rigid template, and differs only in opcode constants and operand layouts. Analyzing one is sufficient to understand all of them.

  4. Four operand dispatch switches: sub_A5B6B0 (180 KB), sub_A62220 (65 KB), sub_A65900 (67 KB), sub_A67910 (141 KB) are the operand read/write dispatch functions. They map (opcode_class, field_id) to specific field accessors. If you see a call to sub_A49150, it routes through these.

  5. The 31K-caller function: sub_530FB0 (get operand by index) is the most-called function in the binary with 31,399 callers. sub_A49150 (get instruction attribute) is second with 30,768 callers. These are the IR accessor layer that every backend function uses.

  6. ROT13 strings: NVIDIA obfuscates SASS instruction mnemonics via ROT13. When you see strings like VZNQ, SZHY, SNQQ, ZREPHEL in .rodata, apply ROT13 to decode: IMAD, FMUL, FADD, MERCURY.