Pipeline & Pass Ordering
CICC v13.0 implements the LLVM New Pass Manager pipeline infrastructure, with NVIDIA injecting 33 custom passes into the registration table alongside approximately 493 standard LLVM passes. The master registration function at sub_2342890 populates a StringMap<PassInfo> hash table with every known pass name at startup, and a text-based pipeline parser allows the full pass ordering to be specified as a parenthesized string (e.g., module(function(instcombine,dse))). This page documents the complete pass inventory, the registration mechanism, the NVIDIA-specific additions, and — critically — the runtime pass execution order for each optimization level including the tier system and pass factory addresses.
| Master registration | sub_2342890 (0x2342890, ~2,816 lines) |
| Hash table insert | sub_E41FB0 (0xE41FB0) -- open-addressing, 48-byte entries |
| String equality | sub_9691B0 (0x9691B0) -- len==len && memcmp==0 |
| AA name resolver | sub_233BD40 (0x233BD40) -- chain of string comparisons |
| AA pipeline parser | sub_233C0C0 (0x233C0C0) -- splits on ,, special-cases "default" |
| Extension callback | sub_233C300 (0x233C300) -- iterates [PassBuilder+2208], stride 32 |
| Option parser | sub_233A120 (0x233A120) -- splits on ;, validates tokens |
| Help/listing | sub_233C410 (0x233C410) -- --print-pipeline-passes handler |
| Pipeline assembler | sub_12E54A0 (0x12E54A0, 49.8KB, 1,553 lines) |
| AddPass | sub_12DE0B0 (0x12DE0B0, hash-based pass insertion) |
| Tier 0 sub-pipeline | sub_12DE330 (0x12DE330, ~40 passes) |
| Tier 1/2/3 sub-pipeline | sub_12DE8F0 (0x12DE8F0, phase-conditional) |
| Codegen dispatch | sub_12DFE00 (0x12DFE00, 20.7KB) |
| Total passes | ~526 unique registrations |
| NVIDIA additions | 33 passes (12 module, 20 function, 1 loop) |
Registration Architecture
The pipeline infrastructure follows the standard LLVM New Pass Manager design. At startup, sub_2342890 is called once and inserts every known pass into a StringMap living at [PassBuilder+8]. The insertion function sub_E41FB0 uses open-addressing with linear probing; each entry occupies 48 bytes containing the key pointer, key length, value pointer, value length, and 16 bytes of inline storage for short class names.
Pass lookup during pipeline parsing uses the hash function at sub_C94890 (likely DJB/FNV-family). Parameterized passes are detected by the presence of <...> angle brackets after the pass name; the parameter string is extracted and forwarded to a pass-specific callback. The generic parameter validator sub_233A120 splits option strings on semicolons and compares each token to expected values, emitting "invalid {PassName} pass parameter '{token}'" on mismatch.
The alias analysis pipeline has its own parser at sub_233C0C0. It special-cases the string "default" (which calls sub_23A1380 then sub_23038C0 to build the default AA stack), and otherwise splits on commas, resolving each name through sub_233BD40:
| AA Name | Constructor |
|---|---|
globals-aa | sub_2396EC0 |
basic-aa | sub_2361CE0 |
objc-arc-aa | sub_2361F60 |
scev-aa | sub_2362040 |
scoped-noalias-aa | sub_2362120 |
tbaa | sub_2362200 |
Extension callbacks for target-specific pipeline customization are stored at [PassBuilder+2208] with a count at [PassBuilder+2216]. Each entry is 32 bytes with a guard at offset +16 (must be non-null) and the callback function pointer at offset +24. The string "all" in extension context triggers invalidate<all>.
Pipeline Text Parser
The pipeline text parser accepts a nesting grammar where each level specifies the pass manager scope:
module(
function(
instcombine<max-iterations=1>,
dse,
loop(indvars, loop-deletion)
),
globalopt
)
The parser splits on commas and parentheses, recognizing module(...), cgscc(...), function(...), and loop(...) as scope wrappers. Bare names are looked up in the StringMap built by sub_2342890. For parameterized passes, the <...> suffix is extracted and dispatched to per-pass option parsers. Several NVIDIA-specific parameter parsers are thin wrappers around sub_233A120:
| Parser | Pass | Recognized Options |
|---|---|---|
sub_233A330 | process-restrict | propagate-only |
sub_233A370 | lower-struct-args | opt-byval |
sub_233A3B0 | lower-aggr-copies | lower-aggr-func-args |
More complex passes (GVN, SimplifyCFG, InstCombine) use chained sub_9691B0 string comparisons for multi-option parsing.
The pipeline name strings recognized by the nvopt<> dispatch table are:
| Pipeline Name | CLI Source | Pass Count |
|---|---|---|
nvopt<O0> | (no -O flag, no -Ofc) | ~5--8 |
nvopt<O1> | -O1 | ~35 |
nvopt<O2> | -O2 | ~35+ |
nvopt<O3> | -O3 | ~35+ |
nvopt<Ofcmax> | -Ofast-compile=max / -Ofc=max | ~12--15 |
nvopt<Ofcmid> | -Ofast-compile=mid / -Ofc=mid | ~25--30 |
nvopt<Ofcmin> | -Ofast-compile=min / -Ofc=min | ~30--35 |
Key addresses for pipeline name dispatch: sub_226C400 selects the pipeline name string, which is passed to sub_2277440 (pipeline text parser). The nvopt prefix is registered in sub_225D540 (new PM) and sub_12C35D0 (legacy PM), both calling into a pipeline builder class at vtable unk_4A08350.
Mutual exclusion: combining -O# with --passes= is an error: "Cannot specify -O#/-Ofast-compile=<min,mid,max> and --passes=/--foo-pass, use -passes='default<O#>,other-pass' or -passes='default<Ofcmax>,other-pass'".
Complete Pass Inventory
The following tables list every pass in exact registration order within sub_2342890. NVIDIA-specific passes are marked with bold names. Registration line numbers are from the decompiled output.
Module Analyses (18)
| # | Pass Name | LLVM Class | Reg. Line |
|---|---|---|---|
| 1 | callgraph | CallGraphAnalysis | 514 |
| 2 | collector-metadata | CollectorMetadataAnalysis | — |
| 3 | ctx-prof-analysis | CtxProfAnalysis | — |
| 4 | dxil-metadata | DXILMetadataAnalysis | — |
| 5 | dxil-resource-binding | DXILResourceBindingAnalysis | — |
| 6 | dxil-resource-type | DXILResourceTypeAnalysis | — |
| 7 | inline-advisor | InlineAdvisorAnalysis | — |
| 8 | ir-similarity | IRSimilarityAnalysis | — |
| 9 | last-run-tracking | via sub_2342820 | — |
| 10 | lcg | LazyCallGraphAnalysis | — |
| 11 | module-summary | ModuleSummaryIndexAnalysis | — |
| 12 | no-op-module | NoOpModuleAnalysis | — |
| 13 | pass-instrumentation | via sub_2342830 | — |
| 14 | profile-summary | ProfileSummaryAnalysis | — |
| 15 | reg-usage | PhysicalRegisterUsageAnalysis | — |
| 16 | stack-safety | StackSafetyGlobalAnalysis | — |
| 17 | verify | via sub_2342840 | 596 |
| 18 | globals-aa | GlobalsAA | — |
Module Passes (131)
Registration lines 599--1153 in sub_2342890. The first 121 entries are standard LLVM; the final 12 are NVIDIA custom passes registered at lines 1096--1153.
Standard LLVM Module Passes (entries 19--131)
| # | Pass Name | LLVM Class |
|---|---|---|
| 19 | always-inline | AlwaysInlinerPass |
| 20 | annotation2metadata | Annotation2MetadataPass |
| 21 | assign-guid | AssignGUIDPass |
| 22 | attributor | AttributorPass |
| 23 | attributor-light | AttributorLightPass |
| 24 | called-value-propagation | CalledValuePropagationPass |
| 25 | canonicalize-aliases | CanonicalizeAliasesPass |
| 26 | check-debugify | NewPMCheckDebugifyPass |
| 27 | constmerge | ConstantMergePass |
| 28 | coro-cleanup | CoroCleanupPass |
| 29 | coro-early | CoroEarlyPass |
| 30 | cross-dso-cfi | CrossDSOCFIPass |
| 31 | ctx-instr-gen | PGOInstrumentationGen |
| 32 | ctx-prof-flatten | PGOCtxProfFlatteningPass |
| 33 | noinline-nonprevailing | NoinlineNonPrevailing |
| 34 | deadargelim | DeadArgumentEliminationPass |
| 35 | debugify | NewPMDebugifyPass |
| 36 | dfsan | DataFlowSanitizerPass |
| 37 | dot-callgraph | CallGraphDOTPrinterPass |
| 38 | dxil-upgrade | DXILUpgradePass |
| 39 | elim-avail-extern | EliminateAvailableExternallyPass |
| 40 | extract-blocks | BlockExtractorPass |
| 41 | expand-variadics | ExpandVariadicsPass |
| 42 | forceattrs | ForceFunctionAttrsPass |
| 43 | function-import | FunctionImportPass |
| 44 | global-merge-func | GlobalMergeFuncPass |
| 45 | globalopt | GlobalOptPass |
| 46 | globalsplit | GlobalSplitPass |
| 47 | hotcoldsplit | HotColdSplittingPass |
| 48 | inferattrs | InferFunctionAttrsPass |
| 49 | inliner-ml-advisor-release | via sub_2342850 (InlinerWrapper) |
| 50 | inliner-wrapper | via sub_2342850 (InlinerWrapper) |
| 51 | inliner-wrapper-no-mandatory-first | via sub_2342850 |
| 52 | insert-gcov-profiling | GCOVProfilerPass |
| 53 | instrorderfile | InstrOrderFilePass |
| 54 | instrprof | InstrProfilingLoweringPass |
| 55 | ctx-instr-lower | PGOCtxProfLoweringPass |
| 56 | print<ctx-prof-analysis> | CtxProfAnalysisPrinterPass |
| 57 | invalidate<all> | via sub_2342860 |
| 58 | iroutliner | IROutlinerPass |
| 59 | jmc-instrumenter | JMCInstrumenterPass |
| 60 | lower-emutls | LowerEmuTLSPass |
| 61 | lower-global-dtors | LowerGlobalDtorsPass |
| 62 | lower-ifunc | LowerIFuncPass |
| 63 | lowertypetests | LowerTypeTestsPass |
| 64 | fatlto-cleanup | FatLtoCleanup |
| 65 | pgo-force-function-attrs | PGOForceFunctionAttrsPass |
| 66 | memprof-context-disambiguation | MemProfContextDisambiguation |
| 67 | memprof-module | ModuleMemProfilerPass |
| 68 | mergefunc | MergeFunctionsPass |
| 69 | metarenamer | MetaRenamerPass |
| 70 | module-inline | ModuleInlinerPass |
| 71 | name-anon-globals | NameAnonGlobalPass |
| 72 | no-op-module | NoOpModulePass |
| 73 | nsan | NumericalStabilitySanitizerPass |
| 74 | objc-arc-apelim | ObjCARCAPElimPass |
| 75 | openmp-opt | OpenMPOptPass |
| 76 | openmp-opt-postlink | OpenMPOptPass |
| 77 | partial-inliner | PartialInlinerPass |
| 78 | pgo-icall-prom | PGOIndirectCallPromotion |
| 79 | pgo-instr-gen | PGOInstrumentationGen |
| 80 | pgo-instr-use | PGOInstrumentationUse |
| 81 | pre-isel-intrinsic-lowering | PreISelIntrinsicLoweringPass |
| 82 | print | PrintModulePass |
| 83 | print-callgraph | CallGraphPrinterPass |
| 84 | print-callgraph-sccs | CallGraphSCCsPrinterPass |
| 85 | print-ir-similarity | IRSimilarityAnalysisPrinterPass |
| 86 | print-lcg | LazyCallGraphPrinterPass |
| 87 | print-lcg-dot | LazyCallGraphDOTPrinterPass |
| 88 | print-must-be-executed-contexts | MustBeExecutedContextPrinterPass |
| 89 | print-profile-summary | ProfileSummaryPrinterPass |
| 90 | print-stack-safety | StackSafetyGlobalPrinterPass |
| 91 | print<dxil-metadata> | DXILMetadataAnalysisPrinterPass |
| 92 | print<dxil-resource-binding> | DXILResourceBindingPrinterPass |
| 93 | print<inline-advisor> | InlineAdvisorAnalysisPrinterPass |
| 94 | print<module-debuginfo> | ModuleDebugInfoPrinterPass |
| 95 | print<reg-usage> | PhysicalRegisterUsageInfoPrinterPass |
| 96 | pseudo-probe | SampleProfileProbePass |
| 97 | pseudo-probe-update | PseudoProbeUpdatePass |
| 98 | recompute-globalsaa | RecomputeGlobalsAAPass |
| 99 | rel-lookup-table-converter | RelLookupTableConverterPass |
| 100 | rewrite-statepoints-for-gc | RewriteStatepointsForGC |
| 101 | rewrite-symbols | RewriteSymbolPass |
| 102 | rpo-function-attrs | ReversePostOrderFunctionAttrsPass |
| 103 | rtsan | RealtimeSanitizerPass |
| 104 | sample-profile | SampleProfileLoaderPass |
| 105 | sancov-module | SanitizerCoveragePass |
| 106 | sanmd-module | SanitizerBinaryMetadataPass |
| 107 | scc-oz-module-inliner | via sub_2342850 (InlinerWrapper) |
| 108 | shadow-stack-gc-lowering | ShadowStackGCLoweringPass |
| 109 | strip | StripSymbolsPass |
| 110 | strip-dead-debug-info | StripDeadDebugInfoPass |
| 111 | strip-dead-prototypes | StripDeadPrototypesPass |
| 112 | strip-debug-declare | StripDebugDeclarePass |
| 113 | strip-nondebug | StripNonDebugSymbolsPass |
| 114 | strip-nonlinetable-debuginfo | StripNonLineTableDebugInfoPass |
| 115 | trigger-crash-module | TriggerCrashModulePass |
| 116 | trigger-verifier-error | TriggerVerifierErrorPass |
| 117 | tsan-module | ModuleThreadSanitizerPass |
| 118 | tysan | TypeSanitizerPass |
| 119 | verify | via sub_2342870 |
| 120 | view-callgraph | CallGraphViewerPass |
| 121 | wholeprogramdevirt | WholeProgramDevirtPass |
NVIDIA Module Passes (entries 122--131)
| # | Pass Name | LLVM Class | Reg. Line | Purpose |
|---|---|---|---|---|
| 122 | check-gep-index | CheckGepIndexPass | 1096 | Validates GEP index bounds |
| 123 | check-kernel-functions | NVPTXSetFunctionLinkagesPass | 1101 | Enforces kernel linkage |
| 124 | cnp-launch-check | CNPLaunchCheckPass | 1106 | Cooperative launch validation |
| 125 | ipmsp | IPMSPPass | 1111 | Inter-procedural memory space propagation |
| 126 | nv-early-inliner | via sub_2342850 | 1114 | NVIDIA early inlining heuristic |
| 127 | nv-inline-must | InlineMustPass | 1119 | Force-inlines __forceinline__ functions |
| 128 | nvvm-pretreat | PretreatPass | 1124 | IR canonicalization before optimization |
| 129 | nvvm-verify | NVVMIRVerifierPass | 1129 | NVVM IR constraint validation |
| 130 | printf-lowering | PrintfLoweringPass | 1134 | Lowers printf to vprintf ABI |
| 131 | select-kernels | SelectKernelsPass | 1139 | Selects kernels for compilation |
Parameterized Module Passes (entries 132--145)
| # | Pass Name | Class | Parameters |
|---|---|---|---|
| 132 | asan | AddressSanitizerPass | kernel |
| 133 | cg-profile | CGProfilePass | in-lto-post-link |
| 134 | global-merge | GlobalMergePass | group-by-use;ignore-single-use;max-offset=N |
| 135 | embed-bitcode | EmbedBitcodePass | thinlto;emit-summary |
| 136 | globaldce | GlobalDCEPass | in-lto-post-link |
| 137 | hwasan | HWAddressSanitizerPass | kernel;recover |
| 138 | internalize | InternalizePass | preserve-gv=GV |
| 139 | ipsccp | IPSCCPPass | no-func-spec;func-spec |
| 140 | loop-extract | LoopExtractorPass | single |
| 141 | memprof-use | MemProfUsePass | profile-filename=S |
| 142 | msan | MemorySanitizerPass | recover;kernel;eager-checks;track-origins=N |
| 143 | print<structural-hash> | StructuralHashPrinterPass | detailed;call-target-ignored |
| 144 | lower-ops | LowerOpsPass | enable-optimization |
| 145 | set-global-array-alignment | SetGlobalArrayAlignmentPass | modify-shared-mem;skip-shared-mem;modify-global-mem;skip-global-mem |
CGSCC Analyses and Passes (entries 146--158)
| # | Pass Name | LLVM Class | Level |
|---|---|---|---|
| 146 | no-op-cgscc | NoOpCGSCCAnalysis | Analysis |
| 147 | fam-proxy | FunctionAnalysisManagerCGSCCProxy | Analysis |
| 148 | pass-instrumentation | via sub_2342830 | Analysis |
| 149 | argpromotion | ArgumentPromotionPass | Pass |
| 150 | attributor-cgscc | AttributorCGSCCPass | Pass |
| 151 | attributor-light-cgscc | AttributorLightCGSCCPass | Pass |
| 152 | invalidate<all> | via sub_2342860 | Pass |
| 153 | no-op-cgscc | NoOpCGSCCPass | Pass |
| 154 | openmp-opt-cgscc | OpenMPOptCGSCCPass | Pass |
| 155 | coro-annotation-elide | CoroAnnotationElidePass | Pass |
| 156 | coro-split | CoroSplitPass | Param: reuse-storage |
| 157 | function-attrs | PostOrderFunctionAttrsPass | Param: skip-non-recursive-function-attrs |
| 158 | inline | InlinerPass | Param: only-mandatory |
Function Analyses (entries 159--201)
Registration lines 1208--1415 in sub_2342890.
| # | Pass Name | LLVM Class |
|---|---|---|
| 159 | aa | AAManager |
| 160 | access-info | LoopAccessAnalysis |
| 161 | assumptions | AssumptionAnalysis |
| 162 | bb-sections-profile-reader | BasicBlockSectionsProfileReaderAnalysis |
| 163 | block-freq | BlockFrequencyAnalysis |
| 164 | branch-prob | BranchProbabilityAnalysis |
| 165 | cycles | CycleAnalysis |
| 166 | da | DependenceAnalysis |
| 167 | debug-ata | DebugAssignmentTrackingAnalysis |
| 168 | demanded-bits | DemandedBitsAnalysis |
| 169 | domfrontier | DominanceFrontierAnalysis |
| 170 | domtree | DominatorTreeAnalysis |
| 171 | func-properties | FunctionPropertiesAnalysis |
| 172 | machine-function-info | MachineFunctionAnalysis |
| 173 | gc-function | GCFunctionAnalysis |
| 174 | inliner-size-estimator | InlineSizeEstimatorAnalysis |
| 175 | last-run-tracking | via sub_2342820 |
| 176 | lazy-value-info | LazyValueAnalysis |
| 177 | loops | LoopAnalysis |
| 178 | memdep | MemoryDependenceAnalysis |
| 179 | memoryssa | MemorySSAAnalysis |
| 180 | no-op-function | NoOpFunctionAnalysis |
| 181 | opt-remark-emit | OptimizationRemarkEmitterAnalysis |
| 182 | pass-instrumentation | via sub_2342830 |
| 183 | phi-values | PhiValuesAnalysis |
| 184 | postdomtree | PostDominatorTreeAnalysis |
| 185 | regions | RegionInfoAnalysis |
| 186 | scalar-evolution | ScalarEvolutionAnalysis |
| 187 | should-not-run-function-passes | ShouldNotRunFunctionPassesAnalysis |
| 188 | should-run-extra-vector-passes | ShouldRunExtraVectorPasses |
| 189 | ssp-layout | SSPLayoutAnalysis |
| 190 | stack-safety-local | StackSafetyAnalysis |
| 191 | target-ir | TargetIRAnalysis |
| 192 | target-lib-info | TargetLibraryAnalysis |
| 193 | uniformity | UniformityInfoAnalysis |
| 194 | verify | via sub_2342840 |
| 195 | rpa | RegisterPressureAnalysis |
| 196 | merge-sets | MergeSetsAnalysis |
Function AA Analyses (entries 197--201)
| # | Pass Name | LLVM Class |
|---|---|---|
| 197 | basic-aa | BasicAA |
| 198 | objc-arc-aa | objcarc::ObjCARCAA |
| 199 | scev-aa | SCEVAA |
| 200 | scoped-noalias-aa | ScopedNoAliasAA |
| 201 | tbaa | TypeBasedAA |
Function Passes (entries 202--419)
Registration lines 1420--2319 in sub_2342890. The first 173 entries (202--374) are standard LLVM; entries 376--392 are NVIDIA-specific; entries 393--419 are parameterized passes (both standard and NVIDIA).
Standard LLVM Function Passes (entries 202--375)
| # | Pass Name | LLVM Class |
|---|---|---|
| 202 | aa-eval | AAEvaluator |
| 203 | adce | ADCEPass |
| 204 | add-discriminators | AddDiscriminatorsPass |
| 205 | aggressive-instcombine | AggressiveInstCombinePass |
| 206 | alignment-from-assumptions | AlignmentFromAssumptionsPass |
| 207 | annotation-remarks | AnnotationRemarksPass |
| 208 | assume-builder | AssumeBuilderPass |
| 209 | assume-simplify | AssumeSimplifyPass |
| 210 | atomic-expand | AtomicExpandPass |
| 211 | bdce | BDCEPass |
| 212 | break-crit-edges | BreakCriticalEdgesPass |
| 213 | callbr-prepare | CallBrPreparePass |
| 214 | callsite-splitting | CallSiteSplittingPass |
| 215 | chr | ControlHeightReductionPass |
| 216 | codegenprepare | CodeGenPreparePass |
| 217 | complex-deinterleaving | ComplexDeinterleavingPass |
| 218 | consthoist | ConstantHoistingPass |
| 219 | constraint-elimination | ConstraintEliminationPass |
| 220 | coro-elide | CoroElidePass |
| 221 | correlated-propagation | CorrelatedValuePropagationPass |
| 222 | count-visits | CountVisitsPass |
| 223 | dce | DCEPass |
| 224 | declare-to-assign | AssignmentTrackingPass |
| 225 | dfa-jump-threading | DFAJumpThreadingPass |
| 226 | div-rem-pairs | DivRemPairsPass |
| 227 | dot-cfg | CFGPrinterPass |
| 228 | dot-cfg-only | CFGOnlyPrinterPass |
| 229 | dot-dom | DOTGraphTraitsPrinter<DominatorTree, false> |
| 230 | dot-dom-only | DOTGraphTraitsPrinter<DominatorTree, true> |
| 231 | dot-post-dom | DOTGraphTraitsPrinter<PostDominatorTree, false> |
| 232 | dot-post-dom-only | DOTGraphTraitsPrinter<PostDominatorTree, true> |
| 233 | dse | DSEPass |
| 234 | dwarf-eh-prepare | DwarfEHPreparePass |
| 235 | expand-large-div-rem | ExpandLargeDivRemPass |
| 236 | expand-large-fp-convert | ExpandLargeFpConvertPass |
| 237 | expand-memcmp | ExpandMemCmpPass |
| 238 | extra-vector-passes | ExtraFunctionPassManager<ShouldRunExtraVectorPasses> |
| 239 | fix-irreducible | FixIrreduciblePass |
| 240 | flatten-cfg | FlattenCFGPass |
| 241 | float2int | Float2IntPass |
| 242 | gc-lowering | GCLoweringPass |
| 243 | guard-widening | via sub_2342880 |
| 244 | gvn-hoist | GVNHoistPass |
| 245 | gvn-sink | GVNSinkPass |
| 246 | helloworld | HelloWorldPass |
| 247 | indirectbr-expand | IndirectBrExpandPass |
| 248 | infer-address-spaces | InferAddressSpacesPass |
| 249 | infer-alignment | InferAlignmentPass |
| 250 | inject-tli-mappings | InjectTLIMappings |
| 251 | instcount | InstCountPass |
| 252 | instnamer | InstructionNamerPass |
| 253 | instsimplify | InstSimplifyPass |
| 254 | interleaved-access | InterleavedAccessPass |
| 255 | interleaved-load-combine | InterleavedLoadCombinePass |
| 256 | invalidate<all> | via sub_2342860 |
| 257 | irce | IRCEPass |
| 258 | jump-threading | JumpThreadingPass |
| 259 | jump-table-to-switch | JumpTableToSwitchPass |
| 260 | kcfi | KCFIPass |
| 261 | kernel-info | KernelInfoPrinter |
| 262 | lcssa | LCSSAPass |
| 263 | libcalls-shrinkwrap | LibCallsShrinkWrapPass |
| 264 | lint | LintPass |
| 265 | load-store-vectorizer | LoadStoreVectorizerPass |
| 266 | loop-data-prefetch | LoopDataPrefetchPass |
| 267 | loop-distribute | LoopDistributePass |
| 268 | loop-fusion | LoopFusePass |
| 269 | loop-load-elim | LoopLoadEliminationPass |
| 270 | loop-simplify | LoopSimplifyPass |
| 271 | loop-sink | LoopSinkPass |
| 272 | loop-versioning | LoopVersioningPass |
| 273 | lower-atomic | LowerAtomicPass |
| 274 | lower-constant-intrinsics | LowerConstantIntrinsicsPass |
| 275 | lower-expect | LowerExpectIntrinsicPass |
| 276 | lower-guard-intrinsic | LowerGuardIntrinsicPass |
| 277 | lower-invoke | LowerInvokePass |
| 278 | lower-widenable-condition | LowerWidenableConditionPass |
| 279 | make-guards-explicit | MakeGuardsExplicitPass |
| 280 | mem2reg | PromotePass |
| 281 | memcpyopt | MemCpyOptPass |
| 282 | memprof | MemProfilerPass |
| 283 | mergeicmps | MergeICmpsPass |
| 284 | mergereturn | UnifyFunctionExitNodesPass |
| 285 | move-auto-init | MoveAutoInitPass |
| 286 | nary-reassociate | NaryReassociatePass |
| 287 | newgvn | NewGVNPass |
| 288 | no-op-function | NoOpFunctionPass |
| 289 | normalize | IRNormalizerPass |
| 290 | objc-arc | ObjCARCOptPass |
| 291 | objc-arc-contract | ObjCARCContractPass |
| 292 | objc-arc-expand | ObjCARCExpandPass |
| 293 | pa-eval | PAEvalPass |
| 294 | partially-inline-libcalls | PartiallyInlineLibCallsPass |
| 295 | pgo-memop-opt | PGOMemOPSizeOpt |
| 296 | place-safepoints | PlaceSafepointsPass |
| 297 | print | PrintFunctionPass |
| 298--338 | print<access-info> ... print-predicateinfo | (41 printer passes) |
| 339 | reassociate | ReassociatePass |
| 340 | redundant-dbg-inst-elim | RedundantDbgInstEliminationPass |
| 341 | reg2mem | RegToMemPass |
| 342 | safe-stack | SafeStackPass |
| 343 | sandbox-vectorizer | SandboxVectorizerPass |
| 344 | scalarize-masked-mem-intrin | ScalarizeMaskedMemIntrinPass |
| 345 | sccp | SCCPPass |
| 346 | select-optimize | SelectOptimizePass |
| 347 | separate-const-offset-from-gep | SeparateConstOffsetFromGEPPass |
| 348 | sink | SinkingPass |
| 349 | sjlj-eh-prepare | SjLjEHPreparePass |
| 350 | slp-vectorizer | SLPVectorizerPass |
| 351 | slsr | StraightLineStrengthReducePass |
| 352 | stack-protector | StackProtectorPass |
| 353 | strip-gc-relocates | StripGCRelocates |
| 354 | tailcallelim | TailCallElimPass |
| 355 | transform-warning | WarnMissedTransformationsPass |
| 356 | trigger-crash-function | TriggerCrashFunctionPass |
| 357 | trigger-verifier-error | TriggerVerifierErrorPass |
| 358 | tsan | ThreadSanitizerPass |
| 359 | unify-loop-exits | UnifyLoopExitsPass |
| 360 | vector-combine | VectorCombinePass |
| 361 | verify | via sub_2342870 |
| 362--368 | verify<cycles> ... verify<scalar-evolution> | (7 verifiers) |
| 369--374 | view-cfg ... view-post-dom-only | (6 viewers) |
| 375 | wasm-eh-prepare | WasmEHPreparePass |
NVIDIA Function Passes (entries 376--392)
Registered at lines 2212--2292 of sub_2342890.
| # | Pass Name | LLVM Class | Reg. Line | Purpose |
|---|---|---|---|---|
| 376 | basic-dbe | BasicDeadBarrierEliminationPass | 2212 | Removes dead bar.sync instructions |
| 377 | branch-dist | BranchDistPass | 2217 | Branch distribution for divergence control |
| 378 | byval-mem2reg | ByValMem2RegPass | 2222 | Promotes byval arguments to registers |
| 379 | bypass-slow-division | BypassSlowDivisionPass | 2227 | Fast-path for small-operand division |
| 380 | normalize-gep | NormalizeGepPass | 2232 | GEP canonicalization for address arithmetic |
| 381 | nvvm-reflect-pp | SimplifyConstantConditionalsPass | 2237 | Folds __nvvm_reflect results (post-processing) |
| 382 | nvvm-peephole-optimizer | NVVMPeepholeOptimizerPass | 2242 | NVVM-specific peephole rewrites |
| 383 | old-load-store-vectorizer | OldLoadStoreVectorizerPass | 2247 | Legacy load/store vectorization |
| 384 | print<merge-sets> | MergeSetsAnalysisPrinterPass | 2252 | Printer for merge-sets analysis |
| 385 | remat | RematerializationPass | 2257 | Register-pressure-aware rematerialization |
| 386 | print<rpa> | RegisterPressurePrinterPass | 2262 | Printer for register pressure analysis |
| 387 | propagate-alignment | PropagateAlignmentPass | 2267 | Propagates alignment through pointer chains |
| 388 | reuse-local-memory | ReuseLocalMemoryPass | 2272 | Shares local memory across kernels |
| 389 | set-local-array-alignment | SetLocalArrayAlignmentPass | 2277 | Aligns stack arrays for coalescing |
| 390 | sinking2 | Sinking2Pass | 2282 | Enhanced instruction sinking |
| 391 | d2ir-scalarizer | ScalarizerPass (NVIDIA alias) | 2287 | NVIDIA-branded scalarization |
| 392 | sink<rp-aware> | SinkingPass (variant) | 2292 | Register-pressure-aware sinking |
Parameterized Function Passes (entries 393--419)
| # | Pass Name | Class | Parameters |
|---|---|---|---|
| 393 | cfguard | CFGuardPass | check;dispatch |
| 394 | early-cse | EarlyCSEPass | memssa |
| 395 | ee-instrument | EntryExitInstrumenterPass | post-inline |
| 396 | function-simplification | (byte_3F871B3) | O1;O2;O3;Os;Oz |
| 397 | gvn | GVNPass | no-pre;pre;no-load-pre;load-pre;... |
| 398 | instcombine | InstCombinePass | no-aggressive-aggregate-splitting;...;max-iterations=N |
| 399 | loop-unroll | LoopUnrollPass | O0;O1;O2;O3;full-unroll-max=N;... |
| 400 | loop-vectorize | LoopVectorizePass | no-interleave-forced-only;... |
| 401 | lower-allow-check | LowerAllowCheckPass | (empty) |
| 402 | lower-matrix-intrinsics | LowerMatrixIntrinsicsPass | minimal |
| 403 | lower-switch | LowerSwitchPass | enable-jump-table |
| 404 | mldst-motion | MergedLoadStoreMotionPass | no-split-footer-bb;split-footer-bb |
| 405 | print<da> | DependenceAnalysisPrinterPass | normalized-results |
| 406 | print<memoryssa> | MemorySSAPrinterPass | no-ensure-optimized-uses |
| 407 | print<stack-lifetime> | StackLifetimePrinterPass | may;must |
| 408 | scalarizer | ScalarizerPass | load-store;no-load-store;variable-insert-extract;... |
| 409 | separate-const-offset-from-gep | SeparateConstOffsetFromGEPPass | lower-gep |
| 410 | simplifycfg | SimplifyCFGPass | simplify-unreachable;...;bonus-inst-threshold=N |
| 411 | speculative-execution | SpeculativeExecutionPass | only-if-divergent-target |
| 412 | sroa | SROAPass | preserve-cfg;modify-cfg |
| 413 | structurizecfg | StructurizeCFG | skip-uniform-regions |
| 414 | win-eh-prepare | WinEHPreparePass | demote-catchswitch-only |
| 415 | bounds-checking | BoundsCheckingPass (modified) | trap |
| 416 | memory-space-opt | MemorySpaceOptPass | first-time;second-time;no-warnings;warnings |
| 417 | lower-aggr-copies | LowerAggrCopiesPass | lower-aggr-func-args |
| 418 | lower-struct-args | LowerStructArgsPass | opt-byval |
| 419 | process-restrict | ProcessRestrictPass | propagate-only |
LoopNest Passes (entries 420--423)
| # | Pass Name | LLVM Class |
|---|---|---|
| 420 | loop-flatten | LoopFlattenPass |
| 421 | loop-interchange | LoopInterchangePass |
| 422 | loop-unroll-and-jam | LoopUnrollAndJamPass |
| 423 | no-op-loopnest | NoOpLoopNestPass |
Loop Analyses (entries 424--428)
| # | Pass Name | LLVM Class |
|---|---|---|
| 424 | ddg | DDGAnalysis |
| 425 | iv-users | IVUsersAnalysis |
| 426 | no-op-loop | NoOpLoopAnalysis |
| 427 | pass-instrumentation | via sub_2342830 |
| 428 | should-run-extra-simple-loop-unswitch | ShouldRunExtraSimpleLoopUnswitch |
Loop Passes (entries 429--455)
| # | Pass Name | LLVM Class |
|---|---|---|
| 429 | canon-freeze | CanonicalizeFreezeInLoopsPass |
| 430 | dot-ddg | DDGDotPrinterPass |
| 431 | guard-widening | via sub_2342880 |
| 432 | extra-simple-loop-unswitch-passes | ExtraLoopPassManager<...> |
| 433 | indvars | IndVarSimplifyPass |
| 434 | invalidate<all> | via sub_2342860 |
| 435 | loop-bound-split | LoopBoundSplitPass |
| 436 | loop-deletion | LoopDeletionPass |
| 437 | loop-idiom | LoopIdiomRecognizePass |
| 438 | loop-idiom-vectorize | LoopIdiomVectorizePass |
| 439 | loop-instsimplify | LoopInstSimplifyPass |
| 440 | loop-predication | LoopPredicationPass |
| 441 | loop-reduce | LoopStrengthReducePass |
| 442 | loop-term-fold | LoopTermFoldPass |
| 443 | loop-simplifycfg | LoopSimplifyCFGPass |
| 444 | loop-unroll-full | LoopFullUnrollPass |
| 445 | loop-versioning-licm | LoopVersioningLICMPass |
| 446 | no-op-loop | NoOpLoopPass |
| 447 | print | PrintLoopPass |
| 448--450 | print<ddg>, print<iv-users>, print<loop-cache-cost>, print<loopnest> | (printers) |
| 451 | loop-index-split | LoopIndexSplitPass |
Parameterized Loop Passes (entries 452--455)
| # | Pass Name | Class | Parameters |
|---|---|---|---|
| 452 | licm | LICMPass | allowspeculation;conservative-calls |
| 453 | lnicm | LNICMPass | allowspeculation |
| 454 | loop-rotate | LoopRotatePass | no-header-duplication;header-duplication;... |
| 455 | simple-loop-unswitch | SimpleLoopUnswitchPass | nontrivial;no-nontrivial;trivial;no-trivial |
Machine Function Analyses (entries 456--475)
| # | Pass Name | LLVM Class |
|---|---|---|
| 456 | edge-bundles | EdgeBundlesAnalysis |
| 457 | livedebugvars | LiveDebugVariablesAnalysis |
| 458 | live-intervals | LiveIntervalsAnalysis |
| 459 | live-reg-matrix | LiveRegMatrixAnalysis |
| 460 | live-stacks | LiveStacksAnalysis |
| 461 | live-vars | LiveVariablesAnalysis |
| 462 | machine-block-freq | MachineBlockFrequencyAnalysis |
| 463 | machine-branch-prob | MachineBranchProbabilityAnalysis |
| 464 | machine-cycles | MachineCycleAnalysis |
| 465 | machine-dom-tree | MachineDominatorTreeAnalysis |
| 466 | machine-loops | MachineLoopAnalysis |
| 467 | machine-opt-remark-emitter | MachineOptimizationRemarkEmitterAnalysis |
| 468 | machine-post-dom-tree | MachinePostDominatorTreeAnalysis |
| 469 | machine-trace-metrics | MachineTraceMetricsAnalysis |
| 470 | pass-instrumentation | via sub_2342830 |
| 471 | regalloc-evict | RegAllocEvictionAdvisorAnalysis |
| 472 | regalloc-priority | RegAllocPriorityAdvisorAnalysis |
| 473 | slot-indexes | SlotIndexesAnalysis |
| 474 | spill-code-placement | SpillPlacementAnalysis |
| 475 | virtregmap | VirtRegMapAnalysis |
Machine Function Passes (entries 476--526)
| # | Pass Name | LLVM Class |
|---|---|---|
| 476 | dead-mi-elimination | DeadMachineInstructionElimPass |
| 477 | detect-dead-lanes | DetectDeadLanesPass |
| 478 | early-ifcvt | EarlyIfConverterPass |
| 479 | early-machinelicm | EarlyMachineLICMPass |
| 480 | early-tailduplication | EarlyTailDuplicatePass |
| 481 | finalize-isel | FinalizeISelPass |
| 482 | fixup-statepoint-caller-saved | FixupStatepointCallerSavedPass |
| 483 | localstackalloc | LocalStackSlotAllocationPass |
| 484 | machine-cp | MachineCopyPropagationPass |
| 485 | machine-cse | MachineCSEPass |
| 486 | machine-latecleanup | MachineLateInstrsCleanupPass |
| 487 | machine-scheduler | MachineSchedulerPass |
| 488 | machinelicm | MachineLICMPass |
| 489 | no-op-machine-function | NoOpMachineFunctionPass |
| 490 | opt-phis | OptimizePHIsPass |
| 491 | patchable-function | PatchableFunctionPass |
| 492 | peephole-opt | PeepholeOptimizerPass |
| 493 | phi-node-elimination | PHIEliminationPass |
| 494 | post-RA-sched | PostRASchedulerPass |
| 495 | postmisched | PostMachineSchedulerPass |
| 496 | post-ra-pseudos | ExpandPostRAPseudosPass |
| 497 | print | PrintMIRPass |
| 498--510 | print<livedebugvars> ... print<virtregmap> | (13 MF printers) |
| 511 | reg-usage-collector | RegUsageInfoCollectorPass |
| 512 | reg-usage-propagation | RegUsageInfoPropagationPass |
| 513 | register-coalescer | RegisterCoalescerPass |
| 514 | rename-independent-subregs | RenameIndependentSubregsPass |
| 515 | remove-redundant-debug-values | RemoveRedundantDebugValuesPass |
| 516 | require-all-machine-function-properties | RequireAllMachineFunctionPropertiesPass |
| 517 | stack-coloring | StackColoringPass |
| 518 | stack-slot-coloring | StackSlotColoringPass |
| 519 | tailduplication | TailDuplicatePass |
| 520 | trigger-verifier-error | TriggerVerifierErrorPass |
| 521 | two-address-instruction | TwoAddressInstructionPass |
| 522 | verify | MachineVerifierPass |
| 523 | verify<machine-trace-metrics> | MachineTraceMetricsVerifierPass |
| 524 | machine-sink | MachineSinkingPass (parameterized) |
| 525 | regallocfast | RegAllocFastPass (parameterized) |
| 526 | greedy | RAGreedyPass (parameterized, LAST registered) |
No NVIDIA-specific machine function passes were identified in the registration table; NVIDIA's machine-level customizations are implemented through target hooks in the NVPTX backend rather than as separately registered passes.
Runtime Pass Execution Order
Registration order (above) describes what is known to the pipeline parser. Runtime execution order is determined by sub_12E54A0 (the pipeline assembler) and controlled by the tier system. The execution order varies dramatically depending on: (1) optimization level, (2) fast-compile mode, (3) language string, and (4) individual pass enable/disable flags in NVVMPassOptions.
The AddPass Mechanism -- sub_12DE0B0
All runtime pass insertion uses sub_12DE0B0 (0x12DE0B0), a hash-table-based function that:
- Hashes the pass pointer:
(pass >> 9) ^ (pass >> 4) - Probes an open-addressed hash table at
passMgr+80 - Stores the pass pointer and a flags byte (
flags | 2if barrier set) - Appends the pass pointer to a dynamic array at
passMgr[0] - Increments the counter at
passMgr+8
The third parameter encodes pass type: 0 = ModulePass/AnalysisPass, 1 = FunctionPass. The fourth parameter is a scheduling barrier hint.
Tier System Architecture
The tier system is NVIDIA's mechanism for interleaving custom passes with standard LLVM passes at precise points. The main optimization loop in sub_12E54A0 iterates over a plugin/extension pass array at opts[4488..4496] (16-byte stride: vtable + phase_id), and fires tier sub-pipelines when the accumulated phase counter exceeds their thresholds:
// Pseudocode from sub_12E54A0, lines 481-553
for (entry = opts[4488]; entry < opts[4496]; entry += 16) {
phase_id = entry[8];
if (opts[4224] && phase_id > opts[4228]) { // Tier 0
sub_12DE330(PM, opts); // Full optimization
opts[4224] = 0; // Fire once
}
if (opts[3528] && phase_id > opts[3532]) { // Tier 1
sub_12DE8F0(PM, 1, opts);
opts[3528] = 0;
}
if (opts[3568] && phase_id > opts[3572]) { // Tier 2
sub_12DE8F0(PM, 2, opts);
opts[3568] = 0;
}
if (opts[3608] && phase_id > opts[3612]) { // Tier 3
sub_12DE8F0(PM, 3, opts);
opts[3608] = 0;
}
pass = entry->vtable[72](); // Plugin pass factory call
sub_12DE0B0(PM, pass, 1, 0); // Insert plugin pass
if (opts[3904]) // Debug mode
insert_verifier_after_each();
}
// Remaining unfired tiers fire unconditionally after loop
The tier control fields in the NVVMPassOptions struct:
| Offset | Type | Field |
|---|---|---|
+3528 | bool | Tier 1 enable |
+3532 | int | Tier 1 phase threshold |
+3568 | bool | Tier 2 enable |
+3572 | int | Tier 2 phase threshold |
+3608 | bool | Tier 3 enable |
+3612 | int | Tier 3 phase threshold |
+4224 | bool | Tier 0 (full optimization) enable |
+4228 | int | Tier 0 phase threshold |
Infrastructure Setup (Always Runs)
These five passes are always inserted first, regardless of optimization level:
| Pos | Factory | Identity | AddPass Flags |
|---|---|---|---|
| 1 | sub_149CCE0 (alloc 368B) | TargetLibraryInfoWrapperPass | (PM, TLI, 0, 0) Module |
| 2 | sub_1BFB520 (alloc 208B) | TargetTransformInfoWrapperPass | (PM, TTI, 1, 0) Function |
| 3 | sub_14A7550 | VerifierPass / BasicAliasAnalysis | (PM, _, 0, 0) Module |
| 4 | sub_1361950 | AssumptionCacheTracker | (PM, _, 0, 0) Module |
| 5 | sub_1CB0F50 | ProfileSummaryInfoWrapperPass | (PM, _, 1, 0) Function |
Tier 0 -- Full Optimization (sub_12DE330)
Called when opts[4224] (optimization enabled) and the phase threshold is exceeded. This is the primary optimization sub-pipeline for O1/O2/O3, adding ~40 passes. Address: 0x12DE330.
Confidence note: Pass identifications are based on diagnostic strings, factory-function signatures, and pipeline ordering. Most identifications are HIGH confidence (confirmed by unique string literals). Entries marked
[MEDIUM confidence]are inferred from code structure, argument patterns, or address proximity rather than direct string evidence.
| Pos | Factory Address | Likely Pass | Guard Condition |
|---|---|---|---|
| 1 | sub_1654860(1) | BreakCriticalEdges | always |
| 2 | sub_1A62BF0(1,0,0,1,0,0,1) | LLVM standard pipeline #1 | always |
| 3 | sub_1B26330 | MemCpyOpt | always |
| 4 | sub_185D600 | IPConstantPropagation | always |
| 5 | sub_1C6E800 | GVN | always |
| 6 | sub_1C6E560 | NewGVN/GVNHoist [MEDIUM confidence] | always |
| 7 | sub_1857160 | NVVMReflect | always |
| 8 | sub_1842BC0 | SCCP | always |
| 9 | sub_17060B0(1,0) | PrintModulePass | opts[3160] |
| 10 | sub_12D4560 | NVVMVerifier | always |
| 11 | sub_18A3090 | NVVMPredicateOpt | always |
| 12 | sub_184CD60 | ConstantMerge | always |
| 13 | sub_1869C50(1,0,1) | Sink/MemSSA [MEDIUM confidence] -- three-arg factory matches Sink with MemSSA parameters, but could also be a custom sinking variant | !opts[1040] |
| 14 | sub_1833EB0(3) | TailCallElim/JumpThreading [MEDIUM confidence] -- integer arg=3 could be JumpThreading threshold or TailCallElim mode; no disambiguating string | always |
| 15 | sub_17060B0(1,0) | PrintModulePass | opts[3160] |
| 16 | sub_1952F90(-1) | LoopIndexSplit | always |
| 17 | sub_1A62BF0(1,...) | LLVM standard pipeline #1 | always |
| 18 | sub_1A223D0 | NVVMIRVerification | always |
| 19 | sub_17060B0(1,0) | PrintModulePass | opts[3160] |
| 20 | sub_1A7A9F0 | InstructionSimplify | always |
| 21 | sub_1A62BF0(1,...) | LLVM standard pipeline #1 | always |
| 22 | sub_1A02540 | GenericToNVVM | always |
| 23 | sub_198DF00(-1) | LoopSimplify | always |
| 24 | sub_1C76260 | ADCE | !opts[1320] |
| 25 | sub_195E880(0) | LICM | opts[2880] |
| 26 | sub_19C1680(0,1) | LoopUnroll | !opts[1360] |
| 27 | sub_17060B0(1,0) | PrintModulePass | opts[3160] |
| 28 | sub_19401A0 | InstCombine | always |
| 29 | sub_1968390 | SROA | always |
| 30 | sub_196A2B0 | EarlyCSE | always |
| 31 | sub_19B73C0(2,-1,-1,-1,-1,-1,-1) | LoopUnswitch | always |
| 32 | sub_17060B0(1,0) | PrintModulePass | opts[3160] |
| 33 | sub_190BB10(0,0) | SimplifyCFG | always |
| 34 | sub_1A13320 | NVVMRematerialization | always |
| 35 | sub_17060B0(1,1) | PrintModulePass (verified) | opts[3160] |
| 36 | sub_18F5480 | DSE | always |
| 37 | sub_18DEFF0 | DCE | always |
| 38 | sub_1A62BF0(1,...) | LLVM standard pipeline #1 | always |
| 39 | sub_18B1DE0 | NVVMLoopPass/BarrierOpt [MEDIUM confidence] -- address is in NVVM pass range, but dual name reflects ambiguity between loop optimization and barrier optimization roles | always |
| 40 | sub_1841180 | FunctionAttrs | always |
Tier 1/2/3 -- Phase-Specific Sub-pipeline (sub_12DE8F0)
Called with tier number (1, 2, or 3). Address: 0x12DE8F0. Stores the tier value to qword_4FBB410. When tier==3, sets qword_4FBB370 = 6 if BYTE4 was 0 (enables advanced barrier and memory space optimization features).
The pass sequence is significantly longer than Tier 0 and varies by tier. The following shows the superset of all passes that can be inserted; tier-based guards are annotated.
Confidence note: Same methodology as Tier 0 table above. Most identifications confirmed by diagnostic strings or NVVMPassOptions slot cross-references.
| Pos | Factory Address | Likely Pass | Guard |
|---|---|---|---|
| 1 | sub_1CB4E40(1) | NVVMIntrinsicLowering | !opts[2000] |
| 2 | sub_1A223D0 | NVVMIRVerification | !opts[2600] |
| 3 | sub_1CB4E40(1) | NVVMIntrinsicLowering (barrier) | !opts[2000] |
| 4 | sub_18E4A00 | NVVMBarrierAnalysis | opts[3488] |
| 5 | sub_1C98160(0) | NVVMLowerBarriers | opts[3488] |
| 6 | sub_17060B0(1,0) | PrintModulePass | opts[3160] && !opts[1080] |
| 7 | sub_12D4560 | NVVMVerifier | !opts[600] |
| 8 | sub_185D600 | IPConstPropagation | opts[3200] && !opts[920] |
| 9 | sub_1857160 | NVVMReflect | opts[3200] && !opts[880] |
| 10 | sub_18A3430 | NVVMPredicateOpt | opts[3200] && !opts[1120] |
| 11 | sub_1842BC0 | SCCP | opts[3200] && !opts[720] |
| 12 | sub_17060B0(1,0) | PrintModulePass | !opts[1080] |
| 13 | sub_12D4560 | NVVMVerifier | !opts[600] |
| 14 | sub_18A3090 | NVVMPredicateOpt variant | opts[3200] && !opts[2160] |
| 15 | sub_184CD60 | ConstantMerge | opts[3200] && !opts[1960] |
| 16 | sub_190BB10(1,0) | SimplifyCFG | tier!=1 && !opts[1040] && !opts[1200] |
| 17 | sub_1952F90(-1) | LoopIndexSplit | (same guard) && !opts[1160] |
| 18 | sub_12D4560 | NVVMVerifier | (same guard) && !opts[600] |
| 19 | sub_17060B0(1,0) | PrintModulePass | (same guard) && !opts[1080] |
| 20 | sub_195E880(0) | LICM | opts[3704] && opts[2880] && !opts[1240] |
| 21 | sub_1C8A4D0(v) | EarlyCSE | v=1 if opts[3704] |
| 22 | sub_1869C50(1,0,1) | Sink | tier!=1 && !opts[1040] |
| 23 | sub_1833EB0(3) | TailCallElim | tier==3 && !opts[320] |
| 24 | sub_1CC3990 | NVVMUnreachableBlockElim | !opts[2360] |
| 25 | sub_18EEA90 | CorrelatedValuePropagation | opts[3040] |
| 26 | sub_12D4560 | NVVMVerifier | !opts[600] |
| 27 | sub_1A223D0 | NVVMIRVerification | !opts[2600] |
| 28 | sub_1CB4E40(1) | NVVMIntrinsicLowering | !opts[2000] |
| 29 | sub_1C4B6F0 | Inliner | !opts[440] && !opts[480] |
| 30 | sub_17060B0(1,0) | PrintModulePass | opts[3160] && !opts[1080] |
| 31 | sub_1A7A9F0 | InstructionSimplify | !opts[2720] |
| 32 | sub_12D4560 | NVVMVerifier | !opts[600] |
| 33 | sub_1A02540 | GenericToNVVM | !opts[2200] |
| 34 | sub_198DF00(-1) | LoopSimplify | !opts[1520] |
| 35 | sub_1C76260 | ADCE | !opts[1320] && !opts[1480] |
| 36 | sub_17060B0(1,0) | PrintModulePass | (same guard) |
| 37 | sub_12D4560 | NVVMVerifier | (same guard) |
| 38 | sub_195E880(0) | LICM | opts[2880] && !opts[1240] |
| 39 | sub_1C98160(0/1) | NVVMLowerBarriers | opts[3488] |
| 40 | sub_19C1680(0,1) | LoopUnroll | !opts[1360] |
| 41 | sub_17060B0(1,0) | PrintModulePass | !opts[1080] |
| 42 | sub_19401A0 | InstCombine | !opts[1000] |
| 43 | sub_196A2B0 | EarlyCSE | !opts[1440] |
| 44 | sub_1968390 | SROA | !opts[1400] |
| 45 | sub_19B73C0(tier,...) | LoopUnswitch | tier!=1, SM-arch-dependent params |
| 46 | sub_17060B0(1,0) | PrintModulePass | opts[3160] && !opts[1080] |
| 47 | sub_19B73C0(tier,...) | LoopUnswitch (2nd) | !opts[2760] |
| 48 | sub_1A62BF0(1,...) | LLVM standard pipeline | !opts[600] |
| 49 | sub_1A223D0 | NVVMIRVerification | !opts[2600] |
| 50 | sub_1CB4E40(1) | NVVMIntrinsicLowering | !opts[2000] |
| 51 | sub_17060B0(1,0) | PrintModulePass | !opts[1080] |
| 52 | sub_190BB10(0,0) | SimplifyCFG | !opts[960] |
| 53 | sub_1922F90 | NVIDIA loop pass | opts[3080] |
| 54 | sub_195E880(0) | LICM | opts[2880] && !opts[1240] |
| 55 | sub_1A13320 | NVVMRematerialization | !opts[2320] |
| 56 | sub_1968390 | SROA | !opts[1400] |
| 57 | sub_17060B0(1,0) | PrintModulePass | opts[3160] && !opts[1080] |
| 58 | sub_18EEA90 | CorrelatedValuePropagation | opts[3040] |
| 59 | sub_18F5480 | DSE | !opts[760] |
| 60 | sub_18DEFF0 | DCE | !opts[280] |
| 61 | sub_1A62BF0(1,...) | LLVM standard pipeline | !opts[600] |
| 62 | sub_1AAC510 | NVIDIA-specific pass | !opts[520] && !opts[560] |
| 63 | sub_1A223D0 | NVVMIRVerification | !opts[2600] |
| 64 | sub_1CB4E40(1) | NVVMIntrinsicLowering | !opts[2000] |
| 65 | sub_1C8E680 | MemorySpaceOpt | !opts[2680], param from opts[3120] |
| 66 | sub_1A223D0 | NVVMIRVerification | opts[3120] && !opts[2600] |
| 67 | sub_17060B0(1,0) | PrintModulePass (barrier) | !opts[1080] |
| 68 | sub_1CC71E0 | NVVMGenericAddrOpt | !opts[2560] |
| 69 | sub_1C98270(1,opts[2920]) | NVVMLowerBarriers variant | opts[3488] |
| 70 | sub_17060B0(1,0) | PrintModulePass | opts[3160] && !opts[1080] |
| 71 | sub_1C6FCA0 | ADCE | opts[2840] && !opts[1840] |
| 72 | sub_18B1DE0 | LoopOpt/BarrierOpt | opts[3200] && !opts[2640] |
| 73 | sub_1857160 | NVVMReflect | opts[3200] && tier==3 && !opts[880] |
| 74 | sub_1841180 | FunctionAttrs | opts[3200] && !opts[680] |
| 75 | sub_1C46000 | NVVMLateOpt | tier==3 && !opts[360] |
| 76 | sub_1841180 | FunctionAttrs (2nd) | opts[3200] && !opts[680] |
| 77 | sub_1CBC480 | NVVMLowerAlloca | !opts[2240] && !opts[2280] |
| 78 | sub_1CB73C0 | NVVMBranchDist | !opts[2080] && !opts[2120] |
| 79 | sub_1C7F370(1) | NVVMWarpShuffle | opts[3328] && !opts[1640] |
| 80 | sub_1CC5E00 | NVVMReduction | opts[3328] && !opts[2400] |
| 81 | sub_1CC60B0 | NVVMSinking2 | opts[3328] && !opts[2440] |
| 82 | sub_1CB73C0 | NVVMBranchDist (2nd) | opts[3328] && !opts[2080] && !opts[2120] |
| 83 | sub_17060B0(1,0) | PrintModulePass | opts[3328] && !opts[1080] |
| 84 | sub_1B7FDF0(3) | Reassociate | opts[3328] && !opts[1280] |
| 85 | sub_17060B0(1,0) | PrintModulePass (final) | opts[3160] && !opts[1080] |
Optimization Level Summary
| Pipeline | Sub-pipeline called | lsa-opt | mem-space-opt | Approx. passes |
|---|---|---|---|---|
nvopt<O0> | (minimal, sub_1C8A4D0(0) only) | off | off | ~5--8 |
nvopt<Ofcmax> | Sinking2 + common tail only | forced 0 | forced 0 | ~12--15 |
nvopt<Ofcmid> | mid-level pipeline | normal | enabled | ~25--30 |
nvopt<Ofcmin> | close to full pipeline | normal | enabled | ~30--35 |
nvopt<O1> | sub_12DE330 (Tier 0) | normal | enabled | ~35 |
nvopt<O2> | sub_12DE330 + Tier 1/2 | normal | enabled | ~35+ |
nvopt<O3> | sub_12DE330 + Tier 1/2/3 | normal | enabled | ~35+ |
O1/O2/O3 all route through the same sub_12DE330 (Tier 0). The difference manifests through the tiered pass inserter sub_12DE8F0: O1 only fires Tier 1, O2 fires Tiers 1--2, O3 fires all three tiers. Within the tiers, passes additionally vary by: loop unroll factor (parameter to sub_1833EB0), vectorizer width (parameters to sub_19B73C0), CGSCC iteration count (first parameter to sub_1A62BF0), and the SM-architecture-dependent late passes gated by opts[3328].
Ofcmax critical behavior: when fast-compile level == 2 (max), the libnvvm pipeline builder forces -lsa-opt=0 and -memory-space-opt=0 even if the user explicitly enables them. This is confirmed in both sub_9624D0 (line 1358) and sub_12CC750 (line 2025).
Codegen Dispatch -- sub_12DFE00
After all optimization tiers complete, sub_12DFE00 (0x12DFE00) performs codegen pass scheduling. This is NOT a simple pass adder -- it performs a full dependency graph construction:
- Reads optimization level from
opts[200](0 = minimal, >1 = enable dependency tracking) - Iterates all passes already in the pass manager
- For each pass, calls
vtable+112(isCodeGenOnly()) to filter - Calls
vtable+16(getAnalysisUsage()) to extract dependencies - Builds a secondary hash table of ordering constraints
- Dispatches each pass to the codegen subsystem in topological order via the subtarget hook at
vtable+16
Pass Classification Statistics
| Category | Count |
|---|---|
| Module analyses | 18 |
| Module passes | ~131 |
| CGSCC analyses | 3 |
| CGSCC passes | ~10 |
| Function analyses | ~39 |
| Function AA analyses | 5 |
| Function passes | ~219 |
| LoopNest passes | 4 |
| Loop analyses | 5 |
| Loop passes | ~26 |
| MachineFunction analyses | 20 |
| MachineFunction passes | ~50 |
| Total | ~526 |
| NVIDIA additions | 33 |
| Standard LLVM | ~493 |
Complete Pass Factory Address Map
Every unique pass factory address observed in sub_12E54A0, sub_12DE330, and sub_12DE8F0:
| Function | Address | Size | Role |
|---|---|---|---|
| NVVMVerifier | sub_12D4560 | many (tiers) | many (tiers) |
| AssumptionCacheTracker | sub_1361950 | 1 | 1 |
| TargetLibraryInfoWrapperPass | sub_149CCE0 | 1 | 1 |
| VerifierPass/BasicAA | sub_14A7550 | 1 | 1 |
| BreakCriticalEdges | sub_1654860 | 2 | 2 |
| PrintModulePass (debug dump) | sub_17060B0 | ~30+ | ~30+ |
| InstructionCombining | sub_1832270 | 2 | 2 |
| TailCallElim/JumpThreading | sub_1833EB0 | 3 | 3 |
| FunctionAttrs | sub_1841180 | 3 | 3 |
| SCCP | sub_1842BC0 | 2 | 2 |
| NVVMReflect | sub_1857160 | ~8 | ~8 |
| IPConstantPropagation | sub_185D600 | 3 | 3 |
| Sink (MemorySSA-based) | sub_1869C50 | 3 | 3 |
| NVVMPredicateOpt | sub_18A3090 | 2 | 2 |
| AggressiveInstCombine | sub_18A3430 | 2 | 2 |
| NVVMLoopOpt/BarrierOpt | sub_18B1DE0 | 3 | 3 |
| Sinking2Pass (fast-mode) | sub_18B3080 | 1 | 1 |
| DCE | sub_18DEFF0 | 4 | 4 |
| NVVMBarrierAnalysis | sub_18E4A00 | 1 | 1 |
| CorrelatedValuePropagation | sub_18EEA90 | 3 | 3 |
| DSE | sub_18F5480 | 2 | 2 |
| DeadArgElimination | sub_18FD350 | 5 | 5 |
| SimplifyCFG | sub_190BB10 | 4 | 4 |
| NVIDIA loop pass | sub_1922F90 | 1 | 1 |
| LoopIndexSplit | sub_1952F90 | 3 | 3 |
| LICM | sub_195E880 | 4 | 4 |
| SROA | sub_1968390 | 2 | 2 |
| EarlyCSE | sub_196A2B0 | 2 | 2 |
| LoopUnroll/Vectorize | sub_197E720 | 1 | 1 |
| LoopSimplify/IndVarSimplify | sub_198DF00 | 3 | 3 |
| CorrelatedValuePropagation | sub_198E2A0 | 1 | 1 |
| InstCombine | sub_19401A0 | 2 | 2 |
| LoopUnswitch | sub_19B73C0 | 3 | 3 |
| LoopUnroll | sub_19C1680 | 2 | 2 |
| NVIDIA pass (unknown) | sub_19CE990 | 1 | 1 |
| GenericToNVVM | sub_1A02540 | 1 | 1 |
| NVVMRematerialization | sub_1A13320 | 3 | 3 |
| NVVMIRVerification | sub_1A223D0 | 5+ | 5+ |
| LLVM StandardPassPipeline | sub_1A62BF0 | ~9 | ~9 |
| LoopIdiomRecognize | sub_1A68E70 | 1 | 1 |
| InstructionSimplify | sub_1A7A9F0 | 3 | 3 |
| NVIDIA-specific pass | sub_1AAC510 | 1 | 1 |
| MemCpyOpt | sub_1B26330 | 4 | 4 |
| Reassociate/Sinking | sub_1B7FDF0 | 3 | 3 |
| TTIWrapperPass | sub_1BFB520 | 1 | 1 |
| NVVMLateOpt | sub_1C46000 | 1 | 1 |
| Inliner/AlwaysInline | sub_1C4B6F0 | 2 | 2 |
| NewGVN/GVNHoist | sub_1C6E560 | 1 | 1 |
| GVN | sub_1C6E800 | 2 | 2 |
| ADCE (AggressiveDCE) | sub_1C6FCA0 | 2 | 2 |
| ADCE variant | sub_1C76260 | 2 | 2 |
| NVVMWarpShuffle | sub_1C7F370 | 1 | 1 |
| EarlyCSE/GVN variant | sub_1C8A4D0 | 3 | 3 |
| MemorySpaceOpt | sub_1C8E680 | 4 | 4 |
| NVVMLowerBarriers | sub_1C98160 | 4 | 4 |
| NVVMLowerBarriers variant | sub_1C98270 | 1 | 1 |
| ProfileSummaryInfo | sub_1CB0F50 | 1 | 1 |
| NVVMIntrinsicLowering | sub_1CB4E40 | ~10 | ~10 |
| NVVMBranchDist | sub_1CB73C0 | 3 | 3 |
| NVVMLowerAlloca | sub_1CBC480 | 1 | 1 |
| NVVMUnreachableBlockElim | sub_1CC3990 | 1 | 1 |
| NVVMReduction | sub_1CC5E00 | 1 | 1 |
| NVVMSinking2 | sub_1CC60B0 | 3 | 3 |
| NVVMGenericAddrOpt | sub_1CC71E0 | 1 | 1 |
| NVVMFinalLowering | sub_1CEBD10 | 1 | 1 |
| NVVMPeephole | sub_1CEF8F0 | 2 | 2 |
| NVVMAnnotationsProcessor | sub_215D9D0 | 2 | 2 |
Total unique pass factories: ~65.
NVVMPassOptions Offset-to-Pass Guard Map
The NVVMPassOptions struct (4,512 bytes, 221 slots) controls which passes execute. The pipeline assembler reads boolean flags at specific offsets to gate pass insertion. See NVVMPassOptions for the full slot layout. Key offset-to-pass mappings:
| Offset | Slot | Type | Controls |
|---|---|---|---|
| +200 | 9 | int | Optimization level (0/1/2/3) |
| +280 | 15 | bool | DCE disable |
| +320 | 17 | bool | TailCallElim/JumpThreading disable |
| +360 | 19 | bool (default=1) | NVVMLateOpt disable |
| +600 | 31 | bool | NVVMVerifier disable |
| +720 | 37 | bool | SCCP disable |
| +760 | 39 | bool | DSE disable |
| +880 | 45 | bool | NVVMReflect disable |
| +920 | 47 | bool | IPConstantPropagation disable |
| +960 | 49 | bool | SimplifyCFG disable |
| +1000 | 51 | bool | InstCombine disable |
| +1040 | 53 | bool | Sink/MemSSA disable |
| +1080 | 55 | bool | PrintModulePass disable |
| +1160 | 59 | bool | LoopIndexSplit disable |
| +1240 | 63 | bool | LICM disable |
| +1280 | 65 | bool | Reassociate disable |
| +1320 | 67 | bool | ADCE disable |
| +1360 | 69 | bool | LoopUnroll disable |
| +1400 | 71 | bool | SROA disable |
| +1440 | 73 | bool | EarlyCSE disable |
| +1760 | 89 | bool | MemorySpaceOpt disable |
| +2000 | 101 | bool | NVVMIntrinsicLowering disable |
| +2320 | 117 | bool (default=1) | NVVMRematerialization disable |
| +2440 | 123 | bool | NVVMSinking2 disable |
| +2600 | 131 | bool | NVVMIRVerification disable |
| +2840 | 141 | bool (default=1) | ADCE enable (reversed logic) |
| +2880 | 143 | bool (default=1) | LICM enable (reversed logic) |
| +3120 | 155 | bool (default=1) | MemorySpaceOpt (2nd pass) enable |
| +3160 | 157 | bool (default=1) | PrintModulePass/debug dump enable |
| +3200 | 159 | bool (default=1) | Advanced NVIDIA passes group enable |
| +3328 | 165 | bool (default=1) | SM-specific late passes enable |
| +3488 | 175 | bool | Barrier optimization enable |
| +3648 | 181 | ptr | Language string ("ptx"/"mid"/"idn") |
| +3656 | — | int | Language string length |
| +3704 | 185 | bool | Late optimization / address-space flag |
| +4064 | 201 | bool | Concurrent compilation enable |
| +4104 | 203 | int (default=-1) | Thread count |
| +4224 | 211 | bool (default=1) | Master optimization enable |
| +4304 | 213 | bool | Device-code / separate-compilation flag |
| +4384 | 217 | bool | Fast-compile bypass (skip LLVM pipeline) |
| +4464 | 219 | bool (default=1) | Late CFG cleanup guard |
Infrastructure Functions
| Address | Function | Role |
|---|---|---|
0x2342890 | sub_2342890 | Master pass registration (~2,816 lines) |
0xE41FB0 | sub_E41FB0 | StringMap::insert (48-byte entries, open-addressing) |
0xE41C70 | sub_E41C70 | StringMap::grow (hash table resize) |
0xC94890 | sub_C94890 | String hash function (DJB/FNV-family) |
0x9691B0 | sub_9691B0 | String equality (len + memcmp) |
0xC931B0 | sub_C931B0 | StringRef::find_first_of (delimiter search) |
0x95CB50 | sub_95CB50 | StringRef::consume_front (strip llvm:: prefix) |
0x233C410 | sub_233C410 | Help listing (--print-pipeline-passes) |
0x233BD40 | sub_233BD40 | AA name resolver (chain of comparisons) |
0x233C0C0 | sub_233C0C0 | AA pipeline parser |
0x233C300 | sub_233C300 | Extension callback dispatch |
0x233A120 | sub_233A120 | Generic parameterized option parser |
0x12E54A0 | sub_12E54A0 | Master pipeline assembler (49.8KB) |
0x12DE0B0 | sub_12DE0B0 | AddPass (hash-table-based insertion) |
0x12DE330 | sub_12DE330 | Tier 0 full optimization sub-pipeline |
0x12DE8F0 | sub_12DE8F0 | Tier 1/2/3 phase-specific sub-pipeline |
0x12DFE00 | sub_12DFE00 | Codegen dispatch (dependency-ordered) |
0x226C400 | sub_226C400 | Pipeline name selector (nvopt<O#>) |
0x2277440 | sub_2277440 | Pipeline text parser entry |
0x225D540 | sub_225D540 | New PM nvopt registration |
0x12C35D0 | sub_12C35D0 | Legacy PM pipeline orchestrator |
0x2342820 | sub_2342820 | LastRunTrackingAnalysis factory |
0x2342830 | sub_2342830 | PassInstrumentationAnalysis factory |
0x2342840 | sub_2342840 | VerifierAnalysis factory |
0x2342850 | sub_2342850 | InlinerWrapper factory (shared by 4 inliner variants) |
0x2342860 | sub_2342860 | InvalidateAllAnalysesPass factory |
0x2342870 | sub_2342870 | VerifierPass factory |
0x2342880 | sub_2342880 | GuardWideningPass factory |
0x2339850 | sub_2339850 | PassBuilder destructor |
0x233B610 | sub_233B610 | PassBuilder::~PassBuilder cleanup |
Cross-References
- Optimizer -- runtime pipeline assembly, two-phase model, concurrent compilation
- NVVMPassOptions -- 221-slot option struct controlling pass enablement
- Optimization Levels -- O0/O1/O2/O3 and Ofcmin/Ofcmid/Ofcmax
- Concurrent Compilation -- Phase I/II, thread pool, GNU Jobserver