Binary Layout

This page is a visual guide to navigating the cicc v13.0 binary in IDA Pro. It covers the ELF structure, section layout, subsystem address ranges, embedded data payloads, and the statically linked jemalloc allocator. If you are opening this binary for the first time, start here to orient yourself before diving into individual subsystems.

ELF Overview

CICC is a statically linked, stripped x86-64 ELF binary. There are no dynamic symbol tables, no .dynsym, no DWARF debug info, and no export table. Every function name was removed at build time. IDA Pro recovers 80,562 functions; Hex-Rays successfully decompiles 80,281 of them (99.65%).

Property	Value
File size	60,108,328 bytes (57.3 MB)
Architecture	x86-64, little-endian
Linking	Fully static (no `.interp`, no PLT/GOT)
Stripped	Yes, all symbol tables removed
Build ID	`cuda_13.0.r13.0/compiler.36424714_0`
Compiler	Built with GCC (inferred from CRT stubs and `.init_array` layout)
Allocator	jemalloc 5.3.x, statically linked (~400 functions)

Because the binary is statically linked, libc, libpthread, and libm are all embedded. This inflates the raw function count but also means every call target resolves to a concrete address within the binary itself -- there are no external dependencies at runtime beyond the kernel syscall interface.

Address Space Map

The binary's .text section spans roughly 0x400000 to 0x3C00000. Within that 56 MB range, subsystems occupy contiguous, non-overlapping regions. The map below is the primary orientation tool for IDA Pro navigation.

0x400000 ┌─────────────────────────────────────────┐
         │  CRT startup + libc stubs               │  ~52 KB
0x40D000 ├─────────────────────────────────────────┤
         │  jemalloc stats / vsnprintf              │  ~80 KB
0x420000 ├─────────────────────────────────────────┤
         │  (gap: misc libc, math, string ops)      │  ~64 KB
0x430000 ├─────────────────────────────────────────┤
         │  Global constructors (cl::opt reg)        │  ~1.6 MB
         │  ~1,689 LLVM command-line option objects  │
0x5D0000 ├─────────────────────────────────────────┤
         │  EDG 6.6 C++ Frontend                    │  3.2 MB
         │  Parser, constexpr evaluator, IL walker   │
0x8F0000 ├─────────────────────────────────────────┤
         │  CLI / Real Main / NVVM Bridge            │  520 KB
         │  sub_8F9C90 (real main), dual-path dispatch│
0x960000 ├─────────────────────────────────────────┤
         │  Architecture detection, NVVM options     │  576 KB
0x9F0000 ├─────────────────────────────────────────┤
         │  Bitcode reader (parseFunctionBody)       │  ~1 MB
0xAF0000 ├─────────────────────────────────────────┤
         │  X86 AutoUpgrade (legacy, 457KB fn)       │  ~1 MB
0xBF0000 ├─────────────────────────────────────────┤
         │  LLVM IR Verifier                        │  500 KB
0xC00000 ├─────────────────────────────────────────┤
         │  LLVM Support / ADT library              │  ~3.2 MB
         │  (see detailed sub-map below)             │
0x12D0000├─────────────────────────────────────────┤
         │  PassManager / NVVM bridge                │  4.2 MB
         │  Pipeline assembly (sub_12E54A0)          │
0x12FC000├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤
         │  jemalloc core (~400 functions)           │  ~256 KB
0x1700000├─────────────────────────────────────────┤
         │  Backend / machine passes                 │  8 MB
         │  RegAlloc, Block Remat, Mem2Reg           │
0x1F00000├─────────────────────────────────────────┤
         │  SelectionDAG                            │  2 MB
         │  LegalizeTypes (348KB), LegalizeOp        │
0x2100000├─────────────────────────────────────────┤
         │  NVPTX PTX emission                      │  1 MB
0x2340000├─────────────────────────────────────────┤
         │  New PM / pass registration               │  768 KB
         │  2,816-line registrar at sub_2342890      │
0x2A00000├─────────────────────────────────────────┤
         │  Loop passes                             │  4 MB
         │  LoopVectorize, SLP, Unroll               │
0x3000000├─────────────────────────────────────────┤
         │  NVPTX ISel + lowering                    │  7 MB
         │  343KB intrinsic switch (sub_33B0210)     │
0x3700000├─────────────────────────────────────────┤
         │  Machine-level passes (tail)              │  ~3 MB
         │  BlockPlacement, Outliner, StructurizeCFG │
0x3A00000├─────────────────────────────────────────┤
         │  (trailing code, CRT finalization)        │
         └─────────────────────────────────────────┘

DATA SECTIONS:
0x3EA0080   Embedded libdevice bitcode (Path A)    456 KB
0x420FD80   Embedded libdevice bitcode (Path B)    456 KB
0x4F00000+  Global BSS (cl::opt storage, hash tables, state)

Detailed Subsystem Map at Pass Granularity

The coarse map above partitions the binary into ~18 zones. The following map refines every zone to individual-pass resolution, giving the factory address of each identified pass or subsystem entry point. Addresses prefixed with sub_ are IDA function names. Sizes in parentheses are decompiled C output; actual machine code is typically 2-3x smaller.

Zone 1: CRT, libc, jemalloc stats (0x400000 - 0x42FFFF)

0x400000   _start / CRT entry (ELF entry point)
0x40D5CA   sub_40D5CA   vsnprintf (jemalloc stats formatting)
0x420000   libc math/string helpers (memcpy, memset, strlen, etc.)

No LLVM or NVIDIA code lives here. Pure runtime support.

Zone 2: Global constructors (0x430000 - 0x5CFFFF)

~1,689 cl::opt registration constructors execute before main(). Each registers a command-line option string, description, default value, and storage pointer into the global option registry. The .init_array section holds function pointers to these constructors.

Zone 3: EDG 6.6 C++ Frontend (0x5D0000 - 0x8EFFFF)

The complete Edison Design Group C++ frontend, version 6.6. Contains the lexer, parser, constexpr evaluator, template instantiator, overload resolver, IL walker/copier, diagnostic engine, SARIF output, and CUDA-specific extensions (kernel launch grammar, __shared__/__device__ memory space parsing, atomic builtin stubs).

Function	Address	Size
EDG main entry (called from real main)	`sub_5D2A80`
Expression parser core	`sub_610000`-`sub_62FFFF`	128 KB
Declaration processing	`sub_750000`-`sub_76FFFF`	128 KB
Template / constexpr	`sub_840000`-`sub_87FFFF`	256 KB
SARIF, diagnostics, keywords	`sub_880000`-`sub_8EFFFF`	448 KB

Zone 4: CLI / Real Main / Dual-Path Entry (0x8F0000 - 0x9EFFFF)

Function	Address	Size
Real main (after CRT/jemalloc init)	`sub_8F9C90`
Path A CLI parsing (LibNVVM API mode)	`sub_900130`
Path A simple compile entry	`sub_902D10`
Path A multi-stage pipeline	`sub_905EE0`	43 KB
Path A builtin resolution table	`sub_90AEE0`	109 KB
Architecture detection, NVVM option parsing	`sub_960000`-`sub_9EFFFF`	576 KB

Zone 5: Bitcode Reader / X86 AutoUpgrade / Verifier (0x9F0000 - 0xBFFFFF)

Sub-range	Contents
`0x9F0000`-`0xAEFFFF`	Bitcode reader (`sub_A24000` parseFunctionBody ~166KB)
`0xAF0000`-`0xBEFFFF`	X86 AutoUpgrade (`sub_A939D0` 457KB -- legacy intrinsic upgrader)
`0xBF0000`-`0xBFFFFF`	LLVM IR Verifier entry points

Zone 6: LLVM Support Library (0xC00000 - 0xCAFFFF)

1,653 functions. Pure LLVM infrastructure -- no NVIDIA-specific modifications except a single !Flat address space annotation in the sample profile reader at sub_C29E70.

Sub-range	Functions	Contents
`0xC00000`-`0xC0F000`	65	IR Verifier (`sub_C05FA0` visitInstruction 75KB, `sub_C0A940` verify 12KB)
`0xC0D4F0`	1	`sub_C0D4F0` TargetRegistry::lookupTarget (8KB)
`0xC0F6D0`	1	`sub_C0F6D0` IR module linker (48KB)
`0xC10000`-`0xC2FFFF`	~400	InstrProf reader, Sample Profile reader/writer, hashing
`0xC30000`-`0xC3FFFF`	214	ImmutableMap/Set, APInt printing
`0xC40000`-`0xC4FFFF`	197	APInt core arithmetic (div, mul, shift)
`0xC50000`-`0xC5FFFF`	141	CommandLine parser (`cl::opt` infrastructure)
`0xC60000`-`0xC6FFFF`	135	JSON parser, debug counters, error handling
`0xC70000`-`0xC7FFFF`	114	ConstantRange arithmetic
`0xC80000`-`0xC8FFFF`	194	SHA-1 hash, regex, SmallVector, sorting
`0xC90000`-`0xC9FFFF`	139	Timer/profiling, TimeTrace (Chrome trace)
`0xCA0000`-`0xCAFFFF`	186	YAML lexer/parser, TypeSize, VFS

Zone 7: NVVM Container, SCEV, DWARF, MC Layer (0xCB0000 - 0x10CFFFF)

This 4 MB zone contains LLVM mid-level infrastructure and the NVVM container format.

Sub-range	Contents	Key functions
`0xCB0000`-`0xCBFA60`	YAML parser/emitter (libyaml)	`sub_CB9640` main parser (26KB)
`0xCC0130`-`0xCCABA0`	LLVM Triple parsing	`sub_CC0130` Triple_normalize (35KB)
`0xCCBB10`-`0xCDCA30`	NVVM container format	`sub_CDD2D0` serialize, `sub_CD1D80` deserialize, `sub_CCD5F0` version validator (9KB)
`0xCD9990`		NVVM options parser (calls 60+ parse helpers)
`0xD60000`-`0xD82000`	NV Module Summary / LTO	`sub_D7D4E0` buildModuleSummary (74KB), `sub_D81040` runOnModule (56KB)
`0xD83000`-`0xDFD000`	ScalarEvolution (SCEV)	SCEV framework, AddRecExpr, backedge analysis
`0xE00000`-`0xE0FFFF`	DWARF debug info string/enum tables
`0xE10000`-`0xE2FFFF`	Itanium C++ name demangler	`sub_E18BB0` parseExpr (47KB)
`0xE30000`-`0xEBFFFF`	MC assembler layer	ELF/COFF/MachO section parsers, expression evaluator
`0xEC0000`-`0xED0000`	MC assembler directives	`sub_ECB300` ELF section parser (40KB)
`0xED0000`-`0xEF8000`	InstrProf / MemProf reader	Profiling data infrastructure
`0xEF8000`-`0xF05000`	Bitstream remark serialization
`0xF05000`-`0xF6FFFF`	SelectionDAG infrastructure	DAG node creation, SDValue, EVT/MVT helpers
`0xF70000`-`0xF8FFFF`	Loop vectorization runtime checks	`sub_F77B70` vectorizeLoop (37KB), `sub_F72730` canVectorizeMemory (29KB)
`0xF90000`-`0xFCFFFF`	SimplifyCFG + code sinking	`sub_FB0000` switch table gen, `sub_FA0000` speculative exec
`0xFD0000`-`0xFEFFFF`	AliasSet, register pressure tracking, CFG graphviz
`0xFF0000`-`0x101FFFF`	Block scheduling, RPO traversal, constant folding
`0x1020000`-`0x103FFFF`	Inline ASM + scheduling model	`sub_1035170` CUTLASS kernel detection (41KB)
`0x1040000`-`0x106FFFF`	Divergence analysis, DAG utilities, IR linker
`0x1070000`-`0x10AFFFF`	MC object emission, InstructionSimplify	`sub_10ACA40` visitAdd (94KB)

Zone 8: InstCombine Mega-Region (0x10D0000 - 0x122FFFF)

The single largest contiguous pass in the binary. NVIDIA's modified InstCombine spans 1.4 MB of code with three NVIDIA-custom opcodes (0x254D, 0x2551, 0x255F) for proprietary intrinsic folding.

Sub-range	Contents	Key functions
`0x10D0000`-`0x10EFFFF`	InstCombine visitors (casts, shifts, memory)	Various visitXxx functions
`0x10EE7A0`	InstCombine main visitor	`sub_10EE7A0` (405KB / 9,258 lines -- largest function in binary)
`0x10F0000`-`0x1100000`	Sub-visitors for specific opcodes
`0x1100000`-`0x1170000`	Intrinsic folding, demanded bits	`sub_1169C30` intrinsic folder (87KB), `sub_11A7600` computeKnownBits (127KB)
`0x1180000`-`0x119FFFF`	InstCombine core worklist	`sub_1190310` main dispatch (88KB)
`0x11A0000`-`0x11AFFFF`	ValueTracking / KnownBits	`sub_11AE870` SimplifyDemandedBits
`0x11B0000`-`0x11BFFFF`	InstCombine tail (vector, extract/insert)
`0x11D0000`-`0x11FFFFF`	SimplifyLibCalls	Math function optimization
`0x11FF000`-`0x122FFFF`	LLVM textual IR parser (LLParser)

Zone 9: NVVM Bridge / Builtin System / IR Codegen (0x1230000 - 0x12CFFFF)

This zone is the core NVIDIA bridge between the EDG frontend AST and the LLVM IR optimizer.

Sub-range	Contents	Key functions
`0x1230000`-`0x125FFFF`	LLVM IR codegen from AST	Expression, statement, type codegen
`0x125FB30`	Path B CLI parsing	`sub_125FB30` (standalone/nvcc mode)
`0x1262860`	Path B simple compile	`sub_1262860`
`0x1265970`	Path B multi-stage pipeline	`sub_1265970` (48KB)
`0x126A7B0`	Builtin lookup helper	`sub_126A7B0`
`0x126A910`	Builtin registration table	`sub_126A910` (126KB) -- registers 717 builtins (IDs 1-770)
`0x12B3FD0`	Builtin resolution dispatch	`sub_12B3FD0` (103KB) -- giant switch on builtin ID
`0x12C06E0`	Bitcode linker	`sub_12C06E0` (libdevice linking)

Zone 10: Pipeline Builder / Pass Options (0x12D0000 - 0x12FFFFF)

The pipeline assembler constructs the complete LLVM pass pipeline, inserting passes by calling factory functions whose addresses scatter across the entire binary.

Function	Address	Size
Module split-range helper	`sub_12D3E60`
Pass factory: creates NVIDIA custom pass	`sub_12D4560`	325 B
NVVMPassOptions initializer -- populates 222 pass option slots into 4,480-byte struct	`sub_12D6300`	125 KB
AddPass -- hash-table-based pass insertion into pipeline	`sub_12DE0B0`	3.5 KB
Tier 0 sub-pipeline builder (full optimization, 40 passes)	`sub_12DE330`	4.8 KB
Tier 1/2/3 sub-pipeline builder (85-pass superset, tier-gated)	`sub_12DE8F0`
Codegen dispatch -- routes to backend machine pass pipeline	`sub_12DFE00`
Master pipeline assembler -- 1,553 lines, two major pipelines (normal + fast)	`sub_12E54A0`	49.8 KB
Machine pass assembly (Pipeline B fast path)	`sub_12EB010`
Machine codegen execution	`sub_12EC4F0`
jemalloc core (~400 functions)	`sub_12FC000`+	~256 KB
`malloc_conf_init` (parses 199 config strings from `MALLOC_CONF`)	`sub_12FCDB0`	129 KB

Zone 11: IR Infrastructure / PassManager (0x1300000 - 0x16FFFFF)

Dense LLVM infrastructure: IR types, constants, instructions, metadata, use-lists, PassManager execution engine, IR linker, bitcode reader, regex, and DataLayout.

Sub-range	Contents	Key functions
`0x1300000`-`0x135FFFF`	IR constants, types, APInt, APFloat
`0x1360000`-`0x13FFFFF`	IR instructions, basic blocks, functions	`sub_1361950` AssumptionCacheTracker
`0x1400000`-`0x14FFFFF`	TargetLibraryInfo, pass scheduling	`sub_149CCE0` TLI wrapper, `sub_14A04B0` TLI creation, `sub_14A3CD0` NVPTX TargetPassConfig
`0x1500000`-`0x15FFFFF`	IR builder, GEP, PHI, branch creation	`sub_15F83E0` conditional branch, `sub_15F9210` load, `sub_15F9650` store
`0x1600000`-`0x160FFFF`	PassManager execution engine	`sub_160FB70` PassManager::run, `sub_1611EE0` PassManagerBuilder init
`0x1610000`-`0x162FFFF`	Pass scheduling, metadata RAUW	`sub_1619140` register target passes, `sub_1619BD0` PassManager::finalize
`0x1630000`-`0x16FFFFF`	IR Linker, bitcode reader, regex	`sub_16786A0` IRLinker::run (61KB), `sub_166A310` parseFunctionBody (60KB)

Zone 12: InstCombine (NewPM) + Sanitizers + PGO (0x1700000 - 0x17FFFFF)

946 functions. Dominated by the New Pass Manager version of InstCombine (~600 functions, ~3.5 MB decompiled), with sanitizer instrumentation (MSan, TSan, coverage) and PGO/GCov infrastructure.

Sub-range	Contents	Key functions
`0x1700000`-`0x17B0000`	InstCombine (NewPM)	`sub_1743DA0` main visitor (168KB), `sub_17A9010` liveness (111KB)
`0x17B0000`-`0x17BFFFF`	GCov instrumentation	`sub_17BF860` coverage notes (53KB)
`0x17C0000`-`0x17CFFFF`	PGO indirect-call promotion	`sub_17C2DB0` (39KB)
`0x17D0000`-`0x17DFFFF`	MemorySanitizer	`sub_17DDCE0` shadow propagation (58KB)
`0x17E0000`-`0x17EFFFF`	PGO instrumentation	`sub_17EEF60` InstrProfiling reader (81KB)
`0x17F0000`-`0x17FFFFF`	ThreadSanitizer, SanitizerCoverage	`sub_17FF260` TSan entry (51KB), `sub_17F91F0` SanCov (44KB)
`sub_17060B0`		PrintModulePass (debug dump, inserted ~30x in pipeline)

Zone 13: GVN + Scalar Passes + NVIDIA Custom IR Passes (0x1800000 - 0x1CFFFFF)

This 5 MB zone contains the bulk of LLVM's scalar optimization passes and all of NVIDIA's custom IR-level passes.

GVN family (0x1900000 - 0x193FFFF):

Function	Address	Size
GVN::runOnFunction (core fixed-point iteration)	`sub_1900BB0`	83 KB
GVN PRE (Partial Redundancy Elimination)	`sub_1906720`	26 KB
NewGVN expression printing	`sub_1930810`	3 KB
NewGVN core value numbering	`sub_1933B40`	43 KB

Standard scalar passes (0x1830000 - 0x1AFFFFF):

Function (pipeline factory call)	Address	Size
InstructionCombining (Old PM wrapper)	`sub_1832270`
TailCallElim / JumpThreading	`sub_1833EB0`
FunctionAttrs	`sub_1841180`
SCCP (Sparse Conditional Constant Propagation)	`sub_1842BC0`
ConstantMerge / GlobalDCE	`sub_184CD60`
NVVMReflect	`sub_1857160`
IPConstantPropagation / ArgumentPromotion	`sub_185D600`
Sink / MemorySSA	`sub_1869C50`
NVVMPredicateOpt / SelectionOpt	`sub_18A3430`
LoopPass (barrier optimization)	`sub_18B1DE0`
DCE (Dead Code Elimination)	`sub_18DEFF0`
CorrelatedValuePropagation	`sub_18EEA90`
DSE (Dead Store Elimination)	`sub_18F5480`
DeadArgumentElimination	`sub_18FD350`
SimplifyCFG	`sub_190BB10`
LICM / LoopRotate	`sub_195E880`
LoopIndexSplit	`sub_1952F90`
LoopUnroll / LoopVectorize	`sub_197E720`
LoopSimplify / IndVarSimplify	`sub_198DF00`
SROA (Scalar Replacement of Aggregates)	`sub_198E2A0`
InstCombine variant	`sub_19401A0`
SROA variant / LoopUnswitch	`sub_19B73C0`
NVIDIA pass (unknown)	`sub_19CE990`
NVVMRematerialization (IR-level remat)	`sub_1A13320`
NVVMIRVerification	`sub_1A223D0`
LLVM standard pass pipeline (parameterized, called ~8x with different configs)	`sub_1A62BF0`
LoopIdiomRecognize / IndVarSimplify	`sub_1A68E70`
InstructionSimplify / ValueTracking	`sub_1A7A9F0`

Loop unrolling + switch lowering (0x1B00000 - 0x1B7FFFF):

Function	Address	Size
LoopUnroll main driver	`sub_1B01A40`	68 KB
Unroll-and-Jam	`sub_1B07290`	55 KB
Loop peeling	`sub_1B0BF10`	39 KB
Unroll prologue/epilogue generation	`sub_1B12B90`	65 KB
Code sinking (".sink.split")	`sub_1B51110`	51 KB
SimplifyCFG condition combining	`sub_1B5C580`	30 KB
Switch-to-lookup-table transformation	`sub_1B60700`	83 KB

Loop/SLP vectorizer (0x1B80000 - 0x1BFFFFF):

Function	Address	Size
LoopVectorize main driver ("loop-vectorize")	`sub_1BB6740`	43 KB
VPlan builder	`sub_1BAB460`	32 KB
SLP horizontal reduction ("slp-vectorizer")	`sub_1BDDB00`	47 KB
SLP shuffle/reorder engine	`sub_1BD0660`	62 KB

NVVM module validation + configuration (0x1C00000 - 0x1C3FFFF):

Function	Address	Size
NVVM codegen config parser (70+ knobs: AdvancedRemat, CSSACoalescing, DoMMACoalescing, PGO, OCGKnobs)	`sub_1C20170`	33 KB
NVVM compile mode parser (WHOLE_PROGRAM_NOABI/ABI, SEPARATE_ABI, opt level, debug info)	`sub_1C21CE0`	28 KB
Kernel attribute validator (cluster launch, parameter size, Hopper constraints)	`sub_1C32740`	30 KB
NVVM intrinsic lowering (tex/surf/syncwarp/ISBE/MAP/ATTR validation)	`sub_1C36530`	112 KB
NVVM module validator (data layout, target triple, UnifiedNVVMIR)	`sub_1C3BC10`	48 KB

NVIDIA custom IR passes (0x1C40000 - 0x1CFFFFF):

This 1 MB block contains the majority of NVIDIA's proprietary IR-level optimization passes. Every pass listed here has no upstream LLVM equivalent.

Function	Address	Size	Role
Dead Synchronization Elimination -- removes redundant `__syncthreads()` barriers via fixed-point R/W dataflow	`sub_1C47810`	63 KB	dead-sync-elim
Alloca cloning / PHI insertion (mem2reg extension)	`sub_1C4D210`	69 KB
NVIDIA pass helper (dead-sync / common-base infrastructure)	`sub_1C585C0`	39 KB
Common Base Elimination -- removes redundant base address computations	`sub_1C5DFC0`	39 KB	common-base-elim
Block-level analysis infrastructure ("Processing", "Block")	`sub_1C5FDC0`	26 KB
Base address bitcast helper ("baseValue", "bitCastEnd")	`sub_1C637F0`	28 KB
Base Address Strength Reduction ("BaseAddressStrengthReduce")	`sub_1C67780`	59 KB	base-addr-sr
MemorySpaceOpt loop index analysis ("phi maxLoopInd")	`sub_1C6A6C0`	54 KB
GVN or LICM variant	`sub_1C6E800`
ADCE (Aggressive DCE)	`sub_1C6FCA0`
MemorySpaceOpt function cloning -- specializes generic pointers to global/shared/local	`sub_1C70910`	75 KB	memspace-opt (core)
LoopIndexSplit -- splits loops on index conditions (three modes: all-but-one, single-iter, range-split)	`sub_1C7B2C0`	84 KB	loop-index-split
Memmove Unrolling -- forward/reverse element copy loops	`sub_1C82A50`	40 KB	lower-aggr-copies
Struct/Aggregate Splitting -- element-wise memcpy decomposition	`sub_1C86CA0`	73 KB	lower-aggr-copies
EarlyCSE / GVN variant	`sub_1C8A4D0`
FP128/I128 Emulation -- replaces 128-bit ops with `__nv_*` library calls	`sub_1C8C170`	26 KB	lower-ops
MemorySpaceOpt entry (pipeline factory address)	`sub_1C8E680`		nvvm-memspace-opt
NVVMLowerBarriers / BarrierLowering	`sub_1C98160`
MemorySpaceOpt address space resolution (warnings for illegal atomics on const/local)	`sub_1CA2920`	32 KB
MemorySpaceOpt secondary resolver	`sub_1CA9E90`	28 KB
Printf Lowering -- lowers `printf` to `vprintf` + local buffer packing	`sub_1CB1E60`	31 KB	printf-lowering
NVVMIntrinsicLowering (most frequently inserted pass, ~10 occurrences in pipeline)	`sub_1CB4E40`		nvvm-intrinsic-lower
NVVMBranchDist	`sub_1CB73C0`		branch-dist
RLMCAST transformation (register-level multicast)	`sub_1CBFA40`	75 KB
NVVMSinking2 (NVIDIA enhanced code sinking)	`sub_1CC60B0`		sinking2
IV Demotion -- narrows 64-bit induction variables to 32-bit ("demoteIV", "newBaseIV")	`sub_1CD74B0`	75 KB	iv-demotion
NLO (NVIDIA Live Output) helper ("nloNewAdd", "nloNewBit")	`sub_1CDC1F0`	35 KB
Instruction classification / cost model (NLO/remat)	`sub_1CDE4D0`	80 KB
Simplify Live Output (NLO pass -- "nloNewBit")	`sub_1CE10B0`	48 KB
Rematerialization pull-in cost analysis ("Total pull-in cost")	`sub_1CE3AF0`	56 KB
Rematerialization block executor ("remat_", "uclone_" prefixes)	`sub_1CE67D0`	32 KB
NVVMRematerialization main driver -- live-in/live-out pressure analysis per block	`sub_1CE7DD0`	67 KB	remat
Final NVVM lowering / intrinsic cleanup	`sub_1CEBD10`
Formal parameter space overflow checker	`sub_1CEE970`	27 KB
NVVMPeephole	`sub_1CEF8F0`		nvvm-peephole
Instruction scheduling helper (physical register constraints)	`sub_1CFDD60`	49 KB

Zone 14: SelectionDAG ISel / CodeGenPrepare / Backend (0x1D00000 - 0x1EFFFFF)

Sub-range	Contents	Key functions
`0x1D00000`-`0x1D60000`	SelectionDAG ISel core	`sub_1D4BB00` bytecode interpreter (97KB, 131-case switch), `sub_1D54C20` runOnMachineFunction (72KB, "sdagisel")
`0x1D1B0D0`		`sub_1D1B0D0` computeKnownBits (87KB, 62-case ISD switch)
`0x1D210A0`		`sub_1D210A0` SimplifyDemandedBits (46KB, 118-case switch, calls NVPTX hooks at `sub_1F58D40`)
`0x1D70000`-`0x1D7FFFF`	CodeGenPrepare	`sub_1D73760` address sinking (65KB, "sunkaddr")
`0x1D07BB0`	57 KB	Pre-RA instruction scheduling
`0x1D80000`-`0x1DFFFFF`	Deque worklist, block splitting	`sub_1D7AA30` (74KB, ".unlikely", ".cond.split")
`0x1E00000`-`0x1EFFFFF`	Register allocation infrastructure	Greedy RA, live intervals, spill cost

Zone 15: Backend CodeGen Infrastructure (0x1F00000 - 0x20FFFFF)

Sub-range	Contents	Key functions
`0x1F00000`-`0x1F0C000`	ScheduleDAG infrastructure	`sub_1F0A020` DAG builder/emitter (41KB)
`0x1F0BF50`-`0x1F0EBC0`	Shrink Wrapping	`sub_1F0DCB0` core analysis (27KB, "shrink-wrap")
`0x1F10000`-`0x1F15000`	SlotIndexes + SpillPlacement	`sub_1F10320` "slotindexes", `sub_1F12110` "spill-code-placement"
`0x1F15000`-`0x1F1F000`	LiveInterval utilities	`sub_1F19E60` "Impossible to implement partial COPY"
`0x1F20000`-`0x1F5FFFF`	Register coalescer, VirtRegRewriter
`0x1F58D40`		NVPTX target hook for SimplifyDemandedBits
`0x1F60000`-`0x1FFFFF`	TwoAddressInstruction, stack protection
`0x2000000`-`0x20FFFFF`	LegalizeTypes	`sub_20019C0` (341KB -- third largest function in binary)

Zone 16: NVPTX Target Backend (0x2100000 - 0x21FFFFF)

Sub-range	Contents	Key functions
`0x2100000`-`0x210FFFF`	Register allocation support	`sub_210BC20` seedLiveRegs ("regalloc"), `sub_210BE60` "ran out of registers"
`0x2110000`-`0x212FFFF`	DAG type legalization/promotion
`0x2130000`-`0x213FFFF`	DAG combiners, ISel patterns
`0x2140000`-`0x214FFFF`	NVPTXAsmPrinter	PTX header/kernel emission
`0x2150000`-`0x215FFFF`	PTX function/param emission	`sub_215D9D0` NVVMAnnotationsProcessor / GenericToNVVM
`0x2160000`-`0x216FFFF`	NVPTXTargetMachine	Pass pipeline, SubtargetInfo
`0x2170000`-`0x218AFFF`	Atomics lowering, rematerialization (machine-level)
`0x21BC000`-`0x21BFFFF`	Alloca hoisting, image opt
`0x21C0000`-`0x21CFFFF`	MemorySpace lowering (machine-level)
`0x21D0000`-`0x21DFFFF`	DAG lowering mega-function, peephole, prolog/epilog
`0x21E0000`-`0x21EFFFF`	MMA/tensor codegen, atomics, special regs, cluster ops
`0x21F0000`-`0x21FFFFF`	Ldg transform, vec split, mem2reg, register pressure

Zone 17: New PM Pass Registration (0x2340000 - 0x23FFFFF)

Function	Address	Size
Master pass registration -- registers all 526 passes (121 module + 174 function + 23 loop + 48 MF + analyses) into StringMap	`sub_2342890`	~2,816 lines
Print available passes (--print-pipeline-passes)	`sub_233C410`
Function pass pipeline text parser	`sub_233F860`
Module pipeline text parser	`sub_2377300`
Inner function/loop pipeline parser	`sub_2368220`
Alias analysis name resolver (globals-aa, basic-aa, scev-aa, tbaa)	`sub_233BD40`
Hash table insertion (pass_name -> constructor)	`sub_E41FB0`

Zone 18: IPO / Attributor / OpenMP Optimization (0x2400000 - 0x29FFFFF)

Sub-range	Contents	Key functions
`0x2400000`-`0x25FFFFF`	Attributor framework	`sub_251CD10` runTillFixpoint (53KB)
`0x2590000`-`0x265FFFF`	Sanitizer instrumentation (ASan, HWASan)
`0x266E000`-`0x269FFFF`	OpenMP target offloading	`sub_2686D90` runtime table (215KB, ~160 `__kmpc_*` entries), `sub_26968A0` Generic-to-SPMD transform (61KB, "OMP120")
`0x2678420`	41 KB	OpenMP state machine for generic kernels
`0x2680940`	52 KB	Parallel region merging
`0x26A0000`-`0x29FFFFF`	Coroutine support, LTO infrastructure, PGO lowering

Zone 19: Loop Transforms (0x2A00000 - 0x2CFFFFF)

Function	Address	Size
LoopPeeling ("llvm.loop.peeled.count")	`sub_2A07DE0`	76 KB
LoopRotation (".lr.ph", "h.rot")	`sub_2A0CFD0`	65 KB
UnrollLoop main ("loop-unroll", "UnrollCount")	`sub_2A15A20`	85 KB
UnrollAndJamLoop ("loop-unroll-and-jam")	`sub_2A1CF00`	58 KB
Runtime unrolling (".epil.preheader", ".prol.preheader")	`sub_2A25260`	91 KB
IndVarSimplify IV widening ("iv.rem", ".sext", ".zext")	`sub_2A76A40`	67 KB
WidenIV / IV transformation	`sub_2A79EE0`	82 KB
Dead Synchronization Elimination (island -- the larger copy; see also `sub_1C47810`)	`sub_2C84BA0`	94 KB

Note: sub_2C84BA0 is a second copy of the dead synchronization elimination pass located outside the main NVIDIA custom pass zone. This is the 94KB variant analyzed in depth (p2b.6-01), with the four-category fixed-point R/W dataflow algorithm and red-black tree maps.

Zone 20: Codegen Target Options / SelectionDAG Lowering (0x2D00000 - 0x2FFFFFF)

5,217 functions. Contains LLVM TargetMachine option registration and the core SelectionDAG infrastructure used by the NVPTX backend.

Sub-range	Contents	Key functions
`0x2D00000`-`0x2D8FFFF`	SelectionDAG core	DAG combine, node creation, legalization helpers
`0x2D97F20`	112 KB	TargetOptions registration (all `cl::opt` for -march/-mcpu/-mattr/relocation/code model)
`0x2E00000`-`0x2FFFFF`	SelectionDAG continued	Type legalization, custom lowering, pattern matching

Zone 21: NVPTX ISel + SelectionDAG Lowering (0x3000000 - 0x36FFFFF)

7 MB. The NVPTX instruction selection and target-specific DAG lowering.

Sub-range	Contents	Key functions
`0x3000000`-`0x328FFFF`	DAG node construction, EVT/MVT helpers
`0x3290000`-`0x32FFFFF`	NVPTXTargetLowering	`sub_32E3060` LowerOperation dispatcher (111KB), `sub_32A1EF0` type legalization (109KB), `sub_32D2680` load/store lowering (81KB)
`0x3300000`-`0x33AFFFF`	Intrinsic lowering (DAG level)	`sub_33B0210` intrinsic switch (343KB)
`0x33B0000`-`0x36FFFFF`	ISel pattern helpers, register info

Zone 22: NVPTX Instruction Selector / Machine Tail (0x3700000 - 0x3BFFFFF)

Sub-range	Contents	Key functions
`0x3700000`-`0x37AFFFF`	Table-driven instruction selector	`sub_376DE90` main pattern matcher (138KB -- per-SM opcode legality gating via compressed table at offset 521536)
`0x372FEE0`	104 KB	DAG operand tree copier (recursive)
`0x374DD20`	67 KB	NVPTX custom lowering entry
`0x3900000`-`0x396FFFF`	NVIDIA register pressure / remat (machine-level)	`sub_396A6C0` RP reporting ("Register Pressure: N"), `sub_3964ED0` ".remat" naming
`0x3937240`	14 KB	ABI Preserve directive emission
`0x395CFD0`	11 KB	GEP Splitting pass
`sub_395DD20`	66 KB	DAG pattern computation
`0x3970000`-`0x397FFFF`	AsmPrinter / PTX emission	`sub_3979400` emitFunctionBody (62KB), `sub_397DF10` emitInlineAsm (30KB)
`sub_3970E40`	18 KB	BB print + `.pragma "nounroll"`
`0x3980000`-`0x3BFFFFF`	MC layer, DWARF, ELF emission	Object file writers, section management

Pass Factory Address Summary

The pipeline assembler (sub_12E54A0) calls pass factory functions to construct the pipeline. Each factory address below is called directly from the pipeline builder and uniquely identifies a pass in the binary.

Factory address	Pass identity	Type
`sub_1654860`	BreakCriticalEdges	F
`sub_17060B0`	PrintModulePass (debug dump)	M
`sub_1832270`	InstructionCombining	F
`sub_1833EB0`	TailCallElim / JumpThreading	F
`sub_1841180`	FunctionAttrs	M
`sub_1842BC0`	SCCP	F
`sub_184CD60`	ConstantMerge / GlobalDCE	M
`sub_1857160`	NVVMReflect	F
`sub_185D600`	IPConstantPropagation	M
`sub_1869C50`	Sink / MemorySSA	F
`sub_18A3430`	NVVMPredicateOpt	F
`sub_18B1DE0`	LoopPass (barrier opt)	F
`sub_18DEFF0`	DCE	F
`sub_18EEA90`	CorrelatedValuePropagation	F
`sub_18F5480`	DSE	F
`sub_18FD350`	DeadArgumentElimination	M
`sub_190BB10`	SimplifyCFG	F
`sub_195E880`	LICM / LoopRotate	F
`sub_1952F90`	LoopIndexSplit	L
`sub_197E720`	LoopUnroll / LoopVectorize	F
`sub_198DF00`	LoopSimplify / IndVarSimplify	F
`sub_198E2A0`	SROA	F
`sub_19401A0`	InstCombine variant	F
`sub_19B73C0`	SROA variant / LoopUnswitch	F
`sub_19CE990`	NVIDIA pass (unknown)	F
`sub_1A13320`	NVVMRematerialization (IR-level)	F
`sub_1A223D0`	NVVMIRVerification	M
`sub_1A62BF0`	LLVM standard pass pipeline (parameterized)	M
`sub_1A68E70`	LoopIdiomRecognize	F
`sub_1A7A9F0`	InstructionSimplify	F
`sub_1B26330`	MemCpyOpt	F
`sub_1B7FDF0`	Reassociate / Sinking	F
`sub_1C4B6F0`	AlwaysInliner	M
`sub_1C6FCA0`	ADCE	F
`sub_1C8A4D0`	EarlyCSE	F
`sub_1C8E680`	NVVMMemorySpaceOpt	M
`sub_1C98160`	NVVMLowerBarriers	F
`sub_1CB4E40`	NVVMIntrinsicLowering (~10 insertions)	F
`sub_1CB73C0`	NVVMBranchDist	F
`sub_1CC60B0`	NVVMSinking2	F
`sub_1CE7DD0`	NVVMRematerialization (main)	F
`sub_1CEBD10`	Final NVVM lowering	F
`sub_1CEF8F0`	NVVMPeephole	F
`sub_1CB0F50`	ProfileSummaryInfoWrapper / NVVMModulePass	F
`sub_12D4560`	NVVMVerifier / ModuleVerifier	M
`sub_215D9D0`	NVVMAnnotationsProcessor	M
`sub_149CCE0`	TargetLibraryInfoWrapperPass	M
`sub_1BFB520`	TargetTransformInfoWrapperPass	F
`sub_14A7550`	createVerifierPass / BasicAliasAnalysis	M
`sub_1361950`	AssumptionCacheTracker	M

Type: M = ModulePass, F = FunctionPass, L = LoopPass.

Embedded Data Payloads

Libdevice Bitcode

Two identical copies of NVIDIA's libdevice are embedded directly in the .rodata section as raw LLVM bitcode. Each copy is approximately 456 KB and contains around 400 math intrinsic implementations (__nv_sinf, __nv_expf, __nv_sqrtf, etc.). The duplication supports the dual-path architecture: Path A (LibNVVM API mode) references one copy at 0x3EA0080; Path B (standalone mode) references the other at 0x420FD80. The bitcode is linked into the user's module during the LNK phase via the bitcode linker at sub_12C06E0.

String Tables

IDA Pro extracts 188,141 strings from the binary. These fall into several categories:

Category	Approximate count	Example
LLVM `cl::opt` descriptions	~1,689	`"Enable aggressive reassociation"`
LLVM error/diagnostic messages	~5,000	`"Invalid bitcode signature"`
EDG error messages	~2,500	`"expected a declaration"`
LLVM pass names	~440	`"instcombine"`, `"gvn"`, `"nvvm-memspace-opt"`
PTX instruction templates	~800	`"mov.b32 %0, %1;"`
NVVM builtin names	~770	`"__nvvm_atom_cas_gen_i"`
jemalloc config strings	~200	`"background_thread"`, `"dirty_decay_ms"`
NVVM container field names	~144	`"SmMajor"`, `"FastMath.Ftz"`
Miscellaneous (format strings, assertions)	~170,000+	`"%s:%d: assertion failed"`

String cross-referencing is the single most productive technique for identifying functions in a stripped binary. The LLVM pass registration pattern is especially reliable: a string like "nvvm-memspace-opt" appears exactly once, in the constructor of that pass, which IDA locates via xref.

NVVM Container Format

The binary includes a proprietary container format for wrapping LLVM bitcode with compilation metadata. The container uses a 24-byte binary header with magic 0x7F4E5C7D, followed by delta-encoded tag/value pairs (only fields that differ from defaults are serialized). There are 144 distinct tag IDs spanning core options (tags 1-39), compression metadata (tag 99), extended target options (tags 101-173), blob data (tags 201-218), and structured hardware descriptors (tags 401-402 for TMA/TCGen05 configurations). Serialization and deserialization are handled by sub_CDD2D0 and sub_CD1D80 respectively.

jemalloc Integration

NVIDIA statically links jemalloc 5.3.x as the process-wide memory allocator. The jemalloc functions cluster around 0x12FC000 (approximately 400 functions). The configuration initialization function sub_12FCDB0 (129 KB, one of the largest functions in the binary) parses 199 configuration strings from the MALLOC_CONF environment variable.

Key jemalloc entry points visible in the binary:

Function	Address
`malloc_conf_init` (199 config strings)	`0x12FCDB0`
`vsnprintf` (jemalloc stats formatting)	`0x40D5CA`
Core arena management, tcache, extent allocator	`0x12FC000` range

The jemalloc integration is significant for reverse engineering because it means malloc/free calls throughout the binary resolve to jemalloc's arena-based allocator rather than glibc's ptmalloc2. When tracing memory allocation patterns in IDA, look for calls into the 0x12FC000 range.

Global Constructors

The region from 0x430000 to 0x5CFFFF (~1.6 MB) is dominated by global constructors that execute before main(). The primary purpose of these constructors is LLVM cl::opt registration: approximately 1,689 command-line option objects are initialized, each registering a string name, description, default value, and storage location into LLVM's global option registry.

The .init_array section contains function pointers to these constructors. They execute in linker-determined order and populate a global hash table that sub_8F9C90 (the real main) later queries during CLI parsing. In IDA Pro, navigating to any cl::opt constructor reveals the option name string and its associated global variable, which is invaluable for understanding what flag controls what behavior.

Additional global constructors handle:

LLVM pass registration (RegisterPass<T> and PassInfo objects)
LLVM target initialization (NVPTX target machine factory)
jemalloc allocator bootstrapping
EDG frontend static initialization tables

Dual-Path Code Duplication

A distinctive structural feature of the binary is the presence of two near-complete copies of the NVVM bridge and backend entry points. Path A (LibNVVM API mode) lives around 0x90xxxx; Path B (standalone/nvcc mode) lives around 0x126xxxx. Each path has its own:

Component	Path A	Path B
Simple compile entry	`sub_902D10`	`sub_1262860`
Multi-stage pipeline	`sub_905EE0` (43 KB)	`sub_1265970` (48 KB)
CLI parsing	`sub_900130`	`sub_125FB30`
Builtin resolution table	`sub_90AEE0` (109 KB)	`sub_126A910` (123 KB)
Embedded libdevice ref	`unk_3EA0080`	`unk_420FD80`
Version string	`nvvm-latest`	`nvvm70`

In IDA, if you have identified a function in one path, search for a structurally similar function at the corresponding offset in the other path. The code is not byte-identical -- Path B is generally slightly larger due to additional standalone-mode logic -- but the control flow graphs are nearly congruent.

When opening cicc in IDA Pro for the first time, the auto-analysis will take several minutes due to the 60 MB size. The following workflow accelerates orientation:

Start with strings. Open the Strings window (Shift+F12), filter for known LLVM pass names ("instcombine", "gvn", "nvvm-"). Each xref leads directly to a pass constructor or registration site.
Use the address map above. If you are looking at an address in the 0xC00000-0x12CFFFF range, you are in LLVM optimization passes. The 0x3000000-0x36FFFFF range is NVPTX instruction selection. The 0x5D0000-0x8EFFFF range is EDG. Context narrows the search space immediately.
Watch for vtable patterns. LLVM passes are C++ classes with virtual methods. IDA's vtable reconstruction reveals inheritance hierarchies. Every FunctionPass, ModulePass, and LoopPass subclass has a vtable with runOnFunction/runOnModule at a consistent slot offset.
Anchor on mega-functions. The largest functions are the easiest to locate and serve as landmarks: sub_A939D0 (457 KB, X86 AutoUpgrade), sub_10EE7A0 (396 KB, InstCombine), sub_20019C0 (341 KB, LegalizeTypes). These anchors partition the address space.
Follow the pipeline. Entry at sub_8F9C90 calls into EDG at sub_5D2A80, pipeline assembly at sub_12E54A0, and PTX emission starting at 0x2100000. Tracing callgraph edges from these known entry points maps out the entire compilation flow.
Mark jemalloc early. Identifying and labeling the jemalloc cluster at 0x12FC000 prevents wasted time reverse-engineering well-known allocator internals. The 199-string malloc_conf_init function is an unmistakable fingerprint.
Locate NVIDIA passes via factory addresses. The Pass Factory Address Summary table above maps every pipeline-inserted pass to its constructor address. In IDA, setting a breakpoint at sub_12DE0B0 (AddPass) and logging the second argument reveals the exact pass insertion order at runtime.

Master Address-Range Map

The definitive quick-reference for "what lives at address X?" Every major address range in the cicc v13.0 binary, sorted by start address, consolidated from all subsystem pages in this wiki.

.text Section (0x400000 - 0x3BFFFFF)

Start	End	Size	Subsystem	Zone
`0x400000`	`0x40CFFF`	52 KB	CRT startup (`_start`, libc stubs)	1
`0x40D000`	`0x41FFFF`	80 KB	jemalloc stats (`vsnprintf` at `sub_40D5CA`)	1
`0x420000`	`0x42FFFF`	64 KB	libc helpers (memcpy, memset, strlen, math)	1
`0x430000`	`0x5CFFFF`	1.6 MB	Global constructors (~1,689 `cl::opt` registrations, pass/target init)	2
`0x5D0000`	`0x8EFFFF`	3.2 MB	EDG 6.6 C++ Frontend (parser, constexpr, templates, IL walkers, SARIF, preprocessor)	3
`0x8F0000`	`0x8FFFFF`	64 KB	Real main / CLI (`sub_8F9C90` entry, flag mapping, XOR deobfuscator)	4
`0x900000`	`0x92FFFF`	192 KB	Path A entry (LibNVVM API: CLI parse, pipeline driver, builtin tables)	4
`0x930000`	`0x95FFFF`	192 KB	Path A builtins (pre-opt builtin lowering, 770-entry resolution)	4
`0x960000`	`0x9EFFFF`	576 KB	Architecture detection (`-arch` fan-out, NVVM option parsing)	4
`0x9F0000`	`0xAEFFFF`	1 MB	Bitcode reader (`parseFunctionBody` 166KB, metadata reader 121KB)	5
`0xAF0000`	`0xBEFFFF`	1 MB	X86 AutoUpgrade (`sub_A939D0` 457KB -- legacy intrinsic upgrader)	5
`0xBF0000`	`0xBFFFFF`	64 KB	LLVM IR Verifier (entry points, `visitCallInst` 207KB)	5
`0xC00000`	`0xCAFFFF`	704 KB	LLVM Support/ADT (APInt, CommandLine, ConstantRange, JSON, Timer, YAML, VFS)	6
`0xCB0000`	`0xCBFFFF`	64 KB	YAML parser/emitter (libyaml)	7
`0xCC0000`	`0xCCFFFF`	64 KB	LLVM Triple parsing (`Triple_normalize` 35KB)	7
`0xCCD000`	`0xCDFFFF`	76 KB	NVVM container format (serialize `sub_CDD2D0`, deserialize `sub_CD1D80`, 144 tags)	7
`0xCE0000`	`0xD5FFFF`	512 KB	NVVM options (container validators, option parsers)	7
`0xD60000`	`0xD82FFF`	140 KB	NV Module Summary / LTO (`buildModuleSummary` 74KB, `runOnModule` 56KB)	7
`0xD83000`	`0xDFFFFF`	500 KB	ScalarEvolution (SCEV) (AddRecExpr, backedge analysis, trip counts)	7
`0xE00000`	`0xE0FFFF`	64 KB	DWARF debug info (string/enum tables)	7
`0xE10000`	`0xE2FFFF`	128 KB	Itanium name demangler (`parseExpr` 47KB)	7
`0xE30000`	`0xEBFFFF`	576 KB	MC assembler layer (ELF/COFF/MachO section parsers, expression evaluator)	7
`0xEC0000`	`0xED0000`	64 KB	MC directives (`sub_ECB300` ELF section parser 40KB)	7
`0xED0000`	`0xEF8000`	160 KB	InstrProf / MemProf reader (profiling data infrastructure)	7
`0xEF8000`	`0xF05000`	52 KB	Bitstream remark serialization	7
`0xF05000`	`0xF6FFFF`	428 KB	SelectionDAG infrastructure (DAG node creation, SDValue, EVT/MVT helpers)	7
`0xF70000`	`0xF8FFFF`	128 KB	Loop vectorization runtime checks (`vectorizeLoop` 37KB, `canVectorizeMemory` 29KB)	7
`0xF90000`	`0xFCFFFF`	256 KB	SimplifyCFG + code sinking (switch table gen, speculative exec)	7
`0xFD0000`	`0xFEFFFF`	128 KB	AliasSet / register pressure (CFG graphviz)	7
`0xFF0000`	`0x101FFFF`	192 KB	Block scheduling (RPO traversal, constant folding)	7
`0x1020000`	`0x103FFFF`	128 KB	Inline ASM + scheduling model (CUTLASS kernel detection 41KB)	7
`0x1040000`	`0x106FFFF`	192 KB	Divergence analysis (DAG utilities, IR linker)	7
`0x1070000`	`0x10CFFFF`	384 KB	MC object emission + InstructionSimplify (`visitAdd` 94KB)	7
`0x10D0000`	`0x122FFFF`	1.4 MB	InstCombine mega-region (main visitor 396KB, KnownBits 125KB, SimplifyLibCalls, LLParser)	8
`0x1230000`	`0x12CFFFF`	640 KB	NVVM Bridge / IR codegen (AST-to-IR, Path B entry, builtin tables, bitcode linker)	9
`0x12D0000`	`0x12FBFFF`	176 KB	Pipeline builder (NVVMPassOptions 125KB, `AddPass`, tier builders, master assembler 50KB)	10
`0x12FC000`	`0x133FFFF`	256 KB	jemalloc core (~400 functions, `malloc_conf_init` 129KB)	10
`0x1340000`	`0x16FFFFF`	3.8 MB	IR infrastructure / PassManager (IR types, constants, instructions, metadata, execution engine, IR linker)	11
`0x1700000`	`0x17FFFFF`	1 MB	InstCombine (NewPM) + Sanitizers + PGO (MSan, TSan, coverage, GCov)	12
`0x1800000`	`0x18DFFFF`	896 KB	Standard scalar passes (InstructionCombining, TailCallElim, FunctionAttrs, SCCP, Sink, MemorySSA)	13
`0x18E0000`	`0x18FFFFF`	128 KB	DCE / CVP / DSE (Dead Code Elimination, CorrelatedValuePropagation, Dead Store Elimination)	13
`0x1900000`	`0x193FFFF`	256 KB	GVN family (`runOnFunction` 83KB, PRE 26KB, NewGVN 43KB)	13
`0x1940000`	`0x19FFFFF`	768 KB	Scalar passes continued (LICM, LoopRotate, LoopIndexSplit, LoopUnroll, SROA)	13
`0x1A00000`	`0x1AFFFFF`	1 MB	NVVMRematerialization / LLVM standard pipeline / InstructionSimplify	13
`0x1B00000`	`0x1B7FFFF`	512 KB	Loop unrolling + switch lowering (main driver 68KB, Unroll-and-Jam 55KB, peeling 39KB)	13
`0x1B80000`	`0x1BFFFFF`	512 KB	Loop/SLP vectorizer (LoopVectorize 43KB, VPlan 32KB, SLP 47KB+62KB)	13
`0x1C00000`	`0x1C3FFFF`	256 KB	NVVM module validation + config (codegen config 33KB, compile mode 28KB, intrinsic lowering 112KB, module validator 48KB)	13
`0x1C40000`	`0x1CFFFFF`	768 KB	NVIDIA custom IR passes (dead-sync-elim, common-base-elim, base-addr-sr, memspace-opt, loop-index-split, printf-lowering, iv-demotion, remat, peephole, sinking2, NLO)	13
`0x1D00000`	`0x1DFFFFF`	1 MB	SelectionDAG ISel / CodeGenPrepare (bytecode interpreter 97KB, address sinking 65KB)	14
`0x1E00000`	`0x1EFFFFF`	1 MB	Register allocation infrastructure (Greedy RA, live intervals, spill cost)	14
`0x1F00000`	`0x1FFFFFF`	1 MB	Backend codegen infrastructure (ScheduleDAG, ShrinkWrapping, SpillPlacement, register coalescer, TwoAddressInstruction)	15
`0x2000000`	`0x20FFFFF`	1 MB	LegalizeTypes (`sub_20019C0` 341KB -- third largest function)	15
`0x2100000`	`0x21FFFFF`	1 MB	NVPTX target backend (AsmPrinter, PTX emission, MMA/tensor codegen, atomics, TargetMachine)	16
`0x2200000`	`0x233FFFF`	1.25 MB	(gap: misc codegen, late passes)	--
`0x2340000`	`0x23FFFFF`	768 KB	New PM pass registration (master registrar 2,816 lines, 526 passes, pipeline text parser)	17
`0x2400000`	`0x258FFFF`	1.6 MB	Attributor framework (`runTillFixpoint` 53KB)	18
`0x2590000`	`0x265FFFF`	832 KB	Sanitizer instrumentation (ASan, HWASan)	18
`0x2660000`	`0x269FFFF`	256 KB	OpenMP target offloading (194-entry `__kmpc_*` table, Generic-to-SPMD 61KB, state machine 41KB)	18
`0x26A0000`	`0x29FFFFF`	3.5 MB	Coroutines / LTO infrastructure / PGO lowering / EarlyCSE / SROA (NewPM)	18
`0x2A00000`	`0x2CFFFFF`	3 MB	Loop transforms (LoopPeeling, LoopRotation, UnrollLoop, IndVarSimplify, dead-sync-elim island)	19
`0x2D00000`	`0x2FFFFFF`	3 MB	Codegen target options / SelectionDAG lowering (TargetOptions 112KB, DAG combine, type legalization)	20
`0x3000000`	`0x36FFFFF`	7 MB	NVPTX ISel + DAG lowering (NVPTXTargetLowering 111KB, intrinsic switch 343KB, register info)	21
`0x3700000`	`0x37AFFFF`	704 KB	Table-driven instruction selector (main matcher 138KB, per-SM opcode gating)	22
`0x37B0000`	`0x38FFFFF`	1.3 MB	Late machine passes (inliner cost model at `0x38576C0`, pipeline helpers)	22
`0x3900000`	`0x397FFFF`	512 KB	NVIDIA machine-level passes (register pressure, remat, ABI preserve, GEP split, AsmPrinter/PTX emission)	22
`0x3980000`	`0x399FFFF`	128 KB	MC layer / DWARF emission (object file writers, DWARF sections at `0x3990000`-`0x39DF000`)	22
`0x39A0000`	`0x3BFFFFF`	2.4 MB	Trailing codegen (section management, CRT finalization)	22

.rodata / .data Sections (0x3C00000+)

Start	End	Size	Contents
`0x3C00000`	`0x3EAFFFF`	~2.7 MB	Read-only data (strings, jump tables, XOR-encrypted env vars at `0x3C23A7B`)
`0x3EA0080`	`0x3F1FFFF`	456 KB	Embedded libdevice bitcode (Path A)
`0x3F252E0`	`0x3F3E6C0`+	varies	NVPTX tables (constraint type table, constraint word table, MVT tables)
`0x420FD80`	`0x428FFFF`	456 KB	Embedded libdevice bitcode (Path B)
`0x42812C0`	--	varies	Obfuscated version strings (XOR+ROT13 ciphertext)
`0x444C4A0`	`0x4456580`+	varies	MVT tables (operand type, vector element count, scalarized MVT)
`0x4F00000`+	--	large	BSS (`cl::opt` storage, hash tables, global state)

Usage

Given an IDA address, find the row whose Start <= address < End. The Subsystem column tells you which component of cicc you are looking at. For pass-level detail within a zone, jump to the corresponding Zone section above.

Cross-References

Pipeline Overview -- compilation flow from entry to PTX emission
LLVM Pipeline -- 526-pass registration table and tier execution order
Optimizer -- two-phase model, AddPass mechanism, tier system
Pass Inventory -- complete pass catalog with dedicated deep-dive pages
NVVMPassOptions -- 222-slot pass configuration system
Function Map -- address-to-identity lookup table
CLI Flags -- flag-to-pipeline routing

Keyboard shortcuts

CICC Reverse Engineering Reference