Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Pipeline & Pass Ordering

CICC v13.0 implements the LLVM New Pass Manager pipeline infrastructure, with NVIDIA injecting 33 custom passes into the registration table alongside approximately 493 standard LLVM passes. The master registration function at sub_2342890 populates a StringMap<PassInfo> hash table with every known pass name at startup, and a text-based pipeline parser allows the full pass ordering to be specified as a parenthesized string (e.g., module(function(instcombine,dse))). This page documents the complete pass inventory, the registration mechanism, the NVIDIA-specific additions, and — critically — the runtime pass execution order for each optimization level including the tier system and pass factory addresses.

Master registrationsub_2342890 (0x2342890, ~2,816 lines)
Hash table insertsub_E41FB0 (0xE41FB0) -- open-addressing, 48-byte entries
String equalitysub_9691B0 (0x9691B0) -- len==len && memcmp==0
AA name resolversub_233BD40 (0x233BD40) -- chain of string comparisons
AA pipeline parsersub_233C0C0 (0x233C0C0) -- splits on ,, special-cases "default"
Extension callbacksub_233C300 (0x233C300) -- iterates [PassBuilder+2208], stride 32
Option parsersub_233A120 (0x233A120) -- splits on ;, validates tokens
Help/listingsub_233C410 (0x233C410) -- --print-pipeline-passes handler
Pipeline assemblersub_12E54A0 (0x12E54A0, 49.8KB, 1,553 lines)
AddPasssub_12DE0B0 (0x12DE0B0, hash-based pass insertion)
Tier 0 sub-pipelinesub_12DE330 (0x12DE330, ~40 passes)
Tier 1/2/3 sub-pipelinesub_12DE8F0 (0x12DE8F0, phase-conditional)
Codegen dispatchsub_12DFE00 (0x12DFE00, 20.7KB)
Total passes~526 unique registrations
NVIDIA additions33 passes (12 module, 20 function, 1 loop)

Registration Architecture

The pipeline infrastructure follows the standard LLVM New Pass Manager design. At startup, sub_2342890 is called once and inserts every known pass into a StringMap living at [PassBuilder+8]. The insertion function sub_E41FB0 uses open-addressing with linear probing; each entry occupies 48 bytes containing the key pointer, key length, value pointer, value length, and 16 bytes of inline storage for short class names.

Pass lookup during pipeline parsing uses the hash function at sub_C94890 (likely DJB/FNV-family). Parameterized passes are detected by the presence of <...> angle brackets after the pass name; the parameter string is extracted and forwarded to a pass-specific callback. The generic parameter validator sub_233A120 splits option strings on semicolons and compares each token to expected values, emitting "invalid {PassName} pass parameter '{token}'" on mismatch.

The alias analysis pipeline has its own parser at sub_233C0C0. It special-cases the string "default" (which calls sub_23A1380 then sub_23038C0 to build the default AA stack), and otherwise splits on commas, resolving each name through sub_233BD40:

AA NameConstructor
globals-aasub_2396EC0
basic-aasub_2361CE0
objc-arc-aasub_2361F60
scev-aasub_2362040
scoped-noalias-aasub_2362120
tbaasub_2362200

Extension callbacks for target-specific pipeline customization are stored at [PassBuilder+2208] with a count at [PassBuilder+2216]. Each entry is 32 bytes with a guard at offset +16 (must be non-null) and the callback function pointer at offset +24. The string "all" in extension context triggers invalidate<all>.

Pipeline Text Parser

The pipeline text parser accepts a nesting grammar where each level specifies the pass manager scope:

module(
  function(
    instcombine<max-iterations=1>,
    dse,
    loop(indvars, loop-deletion)
  ),
  globalopt
)

The parser splits on commas and parentheses, recognizing module(...), cgscc(...), function(...), and loop(...) as scope wrappers. Bare names are looked up in the StringMap built by sub_2342890. For parameterized passes, the <...> suffix is extracted and dispatched to per-pass option parsers. Several NVIDIA-specific parameter parsers are thin wrappers around sub_233A120:

ParserPassRecognized Options
sub_233A330process-restrictpropagate-only
sub_233A370lower-struct-argsopt-byval
sub_233A3B0lower-aggr-copieslower-aggr-func-args

More complex passes (GVN, SimplifyCFG, InstCombine) use chained sub_9691B0 string comparisons for multi-option parsing.

The pipeline name strings recognized by the nvopt<> dispatch table are:

Pipeline NameCLI SourcePass Count
nvopt<O0>(no -O flag, no -Ofc)~5--8
nvopt<O1>-O1~35
nvopt<O2>-O2~35+
nvopt<O3>-O3~35+
nvopt<Ofcmax>-Ofast-compile=max / -Ofc=max~12--15
nvopt<Ofcmid>-Ofast-compile=mid / -Ofc=mid~25--30
nvopt<Ofcmin>-Ofast-compile=min / -Ofc=min~30--35

Key addresses for pipeline name dispatch: sub_226C400 selects the pipeline name string, which is passed to sub_2277440 (pipeline text parser). The nvopt prefix is registered in sub_225D540 (new PM) and sub_12C35D0 (legacy PM), both calling into a pipeline builder class at vtable unk_4A08350.

Mutual exclusion: combining -O# with --passes= is an error: "Cannot specify -O#/-Ofast-compile=<min,mid,max> and --passes=/--foo-pass, use -passes='default<O#>,other-pass' or -passes='default<Ofcmax>,other-pass'".

Complete Pass Inventory

The following tables list every pass in exact registration order within sub_2342890. NVIDIA-specific passes are marked with bold names. Registration line numbers are from the decompiled output.

Module Analyses (18)

#Pass NameLLVM ClassReg. Line
1callgraphCallGraphAnalysis514
2collector-metadataCollectorMetadataAnalysis
3ctx-prof-analysisCtxProfAnalysis
4dxil-metadataDXILMetadataAnalysis
5dxil-resource-bindingDXILResourceBindingAnalysis
6dxil-resource-typeDXILResourceTypeAnalysis
7inline-advisorInlineAdvisorAnalysis
8ir-similarityIRSimilarityAnalysis
9last-run-trackingvia sub_2342820
10lcgLazyCallGraphAnalysis
11module-summaryModuleSummaryIndexAnalysis
12no-op-moduleNoOpModuleAnalysis
13pass-instrumentationvia sub_2342830
14profile-summaryProfileSummaryAnalysis
15reg-usagePhysicalRegisterUsageAnalysis
16stack-safetyStackSafetyGlobalAnalysis
17verifyvia sub_2342840596
18globals-aaGlobalsAA

Module Passes (131)

Registration lines 599--1153 in sub_2342890. The first 121 entries are standard LLVM; the final 12 are NVIDIA custom passes registered at lines 1096--1153.

Standard LLVM Module Passes (entries 19--131)

#Pass NameLLVM Class
19always-inlineAlwaysInlinerPass
20annotation2metadataAnnotation2MetadataPass
21assign-guidAssignGUIDPass
22attributorAttributorPass
23attributor-lightAttributorLightPass
24called-value-propagationCalledValuePropagationPass
25canonicalize-aliasesCanonicalizeAliasesPass
26check-debugifyNewPMCheckDebugifyPass
27constmergeConstantMergePass
28coro-cleanupCoroCleanupPass
29coro-earlyCoroEarlyPass
30cross-dso-cfiCrossDSOCFIPass
31ctx-instr-genPGOInstrumentationGen
32ctx-prof-flattenPGOCtxProfFlatteningPass
33noinline-nonprevailingNoinlineNonPrevailing
34deadargelimDeadArgumentEliminationPass
35debugifyNewPMDebugifyPass
36dfsanDataFlowSanitizerPass
37dot-callgraphCallGraphDOTPrinterPass
38dxil-upgradeDXILUpgradePass
39elim-avail-externEliminateAvailableExternallyPass
40extract-blocksBlockExtractorPass
41expand-variadicsExpandVariadicsPass
42forceattrsForceFunctionAttrsPass
43function-importFunctionImportPass
44global-merge-funcGlobalMergeFuncPass
45globaloptGlobalOptPass
46globalsplitGlobalSplitPass
47hotcoldsplitHotColdSplittingPass
48inferattrsInferFunctionAttrsPass
49inliner-ml-advisor-releasevia sub_2342850 (InlinerWrapper)
50inliner-wrappervia sub_2342850 (InlinerWrapper)
51inliner-wrapper-no-mandatory-firstvia sub_2342850
52insert-gcov-profilingGCOVProfilerPass
53instrorderfileInstrOrderFilePass
54instrprofInstrProfilingLoweringPass
55ctx-instr-lowerPGOCtxProfLoweringPass
56print<ctx-prof-analysis>CtxProfAnalysisPrinterPass
57invalidate<all>via sub_2342860
58iroutlinerIROutlinerPass
59jmc-instrumenterJMCInstrumenterPass
60lower-emutlsLowerEmuTLSPass
61lower-global-dtorsLowerGlobalDtorsPass
62lower-ifuncLowerIFuncPass
63lowertypetestsLowerTypeTestsPass
64fatlto-cleanupFatLtoCleanup
65pgo-force-function-attrsPGOForceFunctionAttrsPass
66memprof-context-disambiguationMemProfContextDisambiguation
67memprof-moduleModuleMemProfilerPass
68mergefuncMergeFunctionsPass
69metarenamerMetaRenamerPass
70module-inlineModuleInlinerPass
71name-anon-globalsNameAnonGlobalPass
72no-op-moduleNoOpModulePass
73nsanNumericalStabilitySanitizerPass
74objc-arc-apelimObjCARCAPElimPass
75openmp-optOpenMPOptPass
76openmp-opt-postlinkOpenMPOptPass
77partial-inlinerPartialInlinerPass
78pgo-icall-promPGOIndirectCallPromotion
79pgo-instr-genPGOInstrumentationGen
80pgo-instr-usePGOInstrumentationUse
81pre-isel-intrinsic-loweringPreISelIntrinsicLoweringPass
82printPrintModulePass
83print-callgraphCallGraphPrinterPass
84print-callgraph-sccsCallGraphSCCsPrinterPass
85print-ir-similarityIRSimilarityAnalysisPrinterPass
86print-lcgLazyCallGraphPrinterPass
87print-lcg-dotLazyCallGraphDOTPrinterPass
88print-must-be-executed-contextsMustBeExecutedContextPrinterPass
89print-profile-summaryProfileSummaryPrinterPass
90print-stack-safetyStackSafetyGlobalPrinterPass
91print<dxil-metadata>DXILMetadataAnalysisPrinterPass
92print<dxil-resource-binding>DXILResourceBindingPrinterPass
93print<inline-advisor>InlineAdvisorAnalysisPrinterPass
94print<module-debuginfo>ModuleDebugInfoPrinterPass
95print<reg-usage>PhysicalRegisterUsageInfoPrinterPass
96pseudo-probeSampleProfileProbePass
97pseudo-probe-updatePseudoProbeUpdatePass
98recompute-globalsaaRecomputeGlobalsAAPass
99rel-lookup-table-converterRelLookupTableConverterPass
100rewrite-statepoints-for-gcRewriteStatepointsForGC
101rewrite-symbolsRewriteSymbolPass
102rpo-function-attrsReversePostOrderFunctionAttrsPass
103rtsanRealtimeSanitizerPass
104sample-profileSampleProfileLoaderPass
105sancov-moduleSanitizerCoveragePass
106sanmd-moduleSanitizerBinaryMetadataPass
107scc-oz-module-inlinervia sub_2342850 (InlinerWrapper)
108shadow-stack-gc-loweringShadowStackGCLoweringPass
109stripStripSymbolsPass
110strip-dead-debug-infoStripDeadDebugInfoPass
111strip-dead-prototypesStripDeadPrototypesPass
112strip-debug-declareStripDebugDeclarePass
113strip-nondebugStripNonDebugSymbolsPass
114strip-nonlinetable-debuginfoStripNonLineTableDebugInfoPass
115trigger-crash-moduleTriggerCrashModulePass
116trigger-verifier-errorTriggerVerifierErrorPass
117tsan-moduleModuleThreadSanitizerPass
118tysanTypeSanitizerPass
119verifyvia sub_2342870
120view-callgraphCallGraphViewerPass
121wholeprogramdevirtWholeProgramDevirtPass

NVIDIA Module Passes (entries 122--131)

#Pass NameLLVM ClassReg. LinePurpose
122check-gep-indexCheckGepIndexPass1096Validates GEP index bounds
123check-kernel-functionsNVPTXSetFunctionLinkagesPass1101Enforces kernel linkage
124cnp-launch-checkCNPLaunchCheckPass1106Cooperative launch validation
125ipmspIPMSPPass1111Inter-procedural memory space propagation
126nv-early-inlinervia sub_23428501114NVIDIA early inlining heuristic
127nv-inline-mustInlineMustPass1119Force-inlines __forceinline__ functions
128nvvm-pretreatPretreatPass1124IR canonicalization before optimization
129nvvm-verifyNVVMIRVerifierPass1129NVVM IR constraint validation
130printf-loweringPrintfLoweringPass1134Lowers printf to vprintf ABI
131select-kernelsSelectKernelsPass1139Selects kernels for compilation

Parameterized Module Passes (entries 132--145)

#Pass NameClassParameters
132asanAddressSanitizerPasskernel
133cg-profileCGProfilePassin-lto-post-link
134global-mergeGlobalMergePassgroup-by-use;ignore-single-use;max-offset=N
135embed-bitcodeEmbedBitcodePassthinlto;emit-summary
136globaldceGlobalDCEPassin-lto-post-link
137hwasanHWAddressSanitizerPasskernel;recover
138internalizeInternalizePasspreserve-gv=GV
139ipsccpIPSCCPPassno-func-spec;func-spec
140loop-extractLoopExtractorPasssingle
141memprof-useMemProfUsePassprofile-filename=S
142msanMemorySanitizerPassrecover;kernel;eager-checks;track-origins=N
143print<structural-hash>StructuralHashPrinterPassdetailed;call-target-ignored
144lower-opsLowerOpsPassenable-optimization
145set-global-array-alignmentSetGlobalArrayAlignmentPassmodify-shared-mem;skip-shared-mem;modify-global-mem;skip-global-mem

CGSCC Analyses and Passes (entries 146--158)

#Pass NameLLVM ClassLevel
146no-op-cgsccNoOpCGSCCAnalysisAnalysis
147fam-proxyFunctionAnalysisManagerCGSCCProxyAnalysis
148pass-instrumentationvia sub_2342830Analysis
149argpromotionArgumentPromotionPassPass
150attributor-cgsccAttributorCGSCCPassPass
151attributor-light-cgsccAttributorLightCGSCCPassPass
152invalidate<all>via sub_2342860Pass
153no-op-cgsccNoOpCGSCCPassPass
154openmp-opt-cgsccOpenMPOptCGSCCPassPass
155coro-annotation-elideCoroAnnotationElidePassPass
156coro-splitCoroSplitPassParam: reuse-storage
157function-attrsPostOrderFunctionAttrsPassParam: skip-non-recursive-function-attrs
158inlineInlinerPassParam: only-mandatory

Function Analyses (entries 159--201)

Registration lines 1208--1415 in sub_2342890.

#Pass NameLLVM Class
159aaAAManager
160access-infoLoopAccessAnalysis
161assumptionsAssumptionAnalysis
162bb-sections-profile-readerBasicBlockSectionsProfileReaderAnalysis
163block-freqBlockFrequencyAnalysis
164branch-probBranchProbabilityAnalysis
165cyclesCycleAnalysis
166daDependenceAnalysis
167debug-ataDebugAssignmentTrackingAnalysis
168demanded-bitsDemandedBitsAnalysis
169domfrontierDominanceFrontierAnalysis
170domtreeDominatorTreeAnalysis
171func-propertiesFunctionPropertiesAnalysis
172machine-function-infoMachineFunctionAnalysis
173gc-functionGCFunctionAnalysis
174inliner-size-estimatorInlineSizeEstimatorAnalysis
175last-run-trackingvia sub_2342820
176lazy-value-infoLazyValueAnalysis
177loopsLoopAnalysis
178memdepMemoryDependenceAnalysis
179memoryssaMemorySSAAnalysis
180no-op-functionNoOpFunctionAnalysis
181opt-remark-emitOptimizationRemarkEmitterAnalysis
182pass-instrumentationvia sub_2342830
183phi-valuesPhiValuesAnalysis
184postdomtreePostDominatorTreeAnalysis
185regionsRegionInfoAnalysis
186scalar-evolutionScalarEvolutionAnalysis
187should-not-run-function-passesShouldNotRunFunctionPassesAnalysis
188should-run-extra-vector-passesShouldRunExtraVectorPasses
189ssp-layoutSSPLayoutAnalysis
190stack-safety-localStackSafetyAnalysis
191target-irTargetIRAnalysis
192target-lib-infoTargetLibraryAnalysis
193uniformityUniformityInfoAnalysis
194verifyvia sub_2342840
195rpaRegisterPressureAnalysis
196merge-setsMergeSetsAnalysis

Function AA Analyses (entries 197--201)

#Pass NameLLVM Class
197basic-aaBasicAA
198objc-arc-aaobjcarc::ObjCARCAA
199scev-aaSCEVAA
200scoped-noalias-aaScopedNoAliasAA
201tbaaTypeBasedAA

Function Passes (entries 202--419)

Registration lines 1420--2319 in sub_2342890. The first 173 entries (202--374) are standard LLVM; entries 376--392 are NVIDIA-specific; entries 393--419 are parameterized passes (both standard and NVIDIA).

Standard LLVM Function Passes (entries 202--375)

#Pass NameLLVM Class
202aa-evalAAEvaluator
203adceADCEPass
204add-discriminatorsAddDiscriminatorsPass
205aggressive-instcombineAggressiveInstCombinePass
206alignment-from-assumptionsAlignmentFromAssumptionsPass
207annotation-remarksAnnotationRemarksPass
208assume-builderAssumeBuilderPass
209assume-simplifyAssumeSimplifyPass
210atomic-expandAtomicExpandPass
211bdceBDCEPass
212break-crit-edgesBreakCriticalEdgesPass
213callbr-prepareCallBrPreparePass
214callsite-splittingCallSiteSplittingPass
215chrControlHeightReductionPass
216codegenprepareCodeGenPreparePass
217complex-deinterleavingComplexDeinterleavingPass
218consthoistConstantHoistingPass
219constraint-eliminationConstraintEliminationPass
220coro-elideCoroElidePass
221correlated-propagationCorrelatedValuePropagationPass
222count-visitsCountVisitsPass
223dceDCEPass
224declare-to-assignAssignmentTrackingPass
225dfa-jump-threadingDFAJumpThreadingPass
226div-rem-pairsDivRemPairsPass
227dot-cfgCFGPrinterPass
228dot-cfg-onlyCFGOnlyPrinterPass
229dot-domDOTGraphTraitsPrinter<DominatorTree, false>
230dot-dom-onlyDOTGraphTraitsPrinter<DominatorTree, true>
231dot-post-domDOTGraphTraitsPrinter<PostDominatorTree, false>
232dot-post-dom-onlyDOTGraphTraitsPrinter<PostDominatorTree, true>
233dseDSEPass
234dwarf-eh-prepareDwarfEHPreparePass
235expand-large-div-remExpandLargeDivRemPass
236expand-large-fp-convertExpandLargeFpConvertPass
237expand-memcmpExpandMemCmpPass
238extra-vector-passesExtraFunctionPassManager<ShouldRunExtraVectorPasses>
239fix-irreducibleFixIrreduciblePass
240flatten-cfgFlattenCFGPass
241float2intFloat2IntPass
242gc-loweringGCLoweringPass
243guard-wideningvia sub_2342880
244gvn-hoistGVNHoistPass
245gvn-sinkGVNSinkPass
246helloworldHelloWorldPass
247indirectbr-expandIndirectBrExpandPass
248infer-address-spacesInferAddressSpacesPass
249infer-alignmentInferAlignmentPass
250inject-tli-mappingsInjectTLIMappings
251instcountInstCountPass
252instnamerInstructionNamerPass
253instsimplifyInstSimplifyPass
254interleaved-accessInterleavedAccessPass
255interleaved-load-combineInterleavedLoadCombinePass
256invalidate<all>via sub_2342860
257irceIRCEPass
258jump-threadingJumpThreadingPass
259jump-table-to-switchJumpTableToSwitchPass
260kcfiKCFIPass
261kernel-infoKernelInfoPrinter
262lcssaLCSSAPass
263libcalls-shrinkwrapLibCallsShrinkWrapPass
264lintLintPass
265load-store-vectorizerLoadStoreVectorizerPass
266loop-data-prefetchLoopDataPrefetchPass
267loop-distributeLoopDistributePass
268loop-fusionLoopFusePass
269loop-load-elimLoopLoadEliminationPass
270loop-simplifyLoopSimplifyPass
271loop-sinkLoopSinkPass
272loop-versioningLoopVersioningPass
273lower-atomicLowerAtomicPass
274lower-constant-intrinsicsLowerConstantIntrinsicsPass
275lower-expectLowerExpectIntrinsicPass
276lower-guard-intrinsicLowerGuardIntrinsicPass
277lower-invokeLowerInvokePass
278lower-widenable-conditionLowerWidenableConditionPass
279make-guards-explicitMakeGuardsExplicitPass
280mem2regPromotePass
281memcpyoptMemCpyOptPass
282memprofMemProfilerPass
283mergeicmpsMergeICmpsPass
284mergereturnUnifyFunctionExitNodesPass
285move-auto-initMoveAutoInitPass
286nary-reassociateNaryReassociatePass
287newgvnNewGVNPass
288no-op-functionNoOpFunctionPass
289normalizeIRNormalizerPass
290objc-arcObjCARCOptPass
291objc-arc-contractObjCARCContractPass
292objc-arc-expandObjCARCExpandPass
293pa-evalPAEvalPass
294partially-inline-libcallsPartiallyInlineLibCallsPass
295pgo-memop-optPGOMemOPSizeOpt
296place-safepointsPlaceSafepointsPass
297printPrintFunctionPass
298--338print<access-info> ... print-predicateinfo(41 printer passes)
339reassociateReassociatePass
340redundant-dbg-inst-elimRedundantDbgInstEliminationPass
341reg2memRegToMemPass
342safe-stackSafeStackPass
343sandbox-vectorizerSandboxVectorizerPass
344scalarize-masked-mem-intrinScalarizeMaskedMemIntrinPass
345sccpSCCPPass
346select-optimizeSelectOptimizePass
347separate-const-offset-from-gepSeparateConstOffsetFromGEPPass
348sinkSinkingPass
349sjlj-eh-prepareSjLjEHPreparePass
350slp-vectorizerSLPVectorizerPass
351slsrStraightLineStrengthReducePass
352stack-protectorStackProtectorPass
353strip-gc-relocatesStripGCRelocates
354tailcallelimTailCallElimPass
355transform-warningWarnMissedTransformationsPass
356trigger-crash-functionTriggerCrashFunctionPass
357trigger-verifier-errorTriggerVerifierErrorPass
358tsanThreadSanitizerPass
359unify-loop-exitsUnifyLoopExitsPass
360vector-combineVectorCombinePass
361verifyvia sub_2342870
362--368verify<cycles> ... verify<scalar-evolution>(7 verifiers)
369--374view-cfg ... view-post-dom-only(6 viewers)
375wasm-eh-prepareWasmEHPreparePass

NVIDIA Function Passes (entries 376--392)

Registered at lines 2212--2292 of sub_2342890.

#Pass NameLLVM ClassReg. LinePurpose
376basic-dbeBasicDeadBarrierEliminationPass2212Removes dead bar.sync instructions
377branch-distBranchDistPass2217Branch distribution for divergence control
378byval-mem2regByValMem2RegPass2222Promotes byval arguments to registers
379bypass-slow-divisionBypassSlowDivisionPass2227Fast-path for small-operand division
380normalize-gepNormalizeGepPass2232GEP canonicalization for address arithmetic
381nvvm-reflect-ppSimplifyConstantConditionalsPass2237Folds __nvvm_reflect results (post-processing)
382nvvm-peephole-optimizerNVVMPeepholeOptimizerPass2242NVVM-specific peephole rewrites
383old-load-store-vectorizerOldLoadStoreVectorizerPass2247Legacy load/store vectorization
384print<merge-sets>MergeSetsAnalysisPrinterPass2252Printer for merge-sets analysis
385rematRematerializationPass2257Register-pressure-aware rematerialization
386print<rpa>RegisterPressurePrinterPass2262Printer for register pressure analysis
387propagate-alignmentPropagateAlignmentPass2267Propagates alignment through pointer chains
388reuse-local-memoryReuseLocalMemoryPass2272Shares local memory across kernels
389set-local-array-alignmentSetLocalArrayAlignmentPass2277Aligns stack arrays for coalescing
390sinking2Sinking2Pass2282Enhanced instruction sinking
391d2ir-scalarizerScalarizerPass (NVIDIA alias)2287NVIDIA-branded scalarization
392sink<rp-aware>SinkingPass (variant)2292Register-pressure-aware sinking

Parameterized Function Passes (entries 393--419)

#Pass NameClassParameters
393cfguardCFGuardPasscheck;dispatch
394early-cseEarlyCSEPassmemssa
395ee-instrumentEntryExitInstrumenterPasspost-inline
396function-simplification(byte_3F871B3)O1;O2;O3;Os;Oz
397gvnGVNPassno-pre;pre;no-load-pre;load-pre;...
398instcombineInstCombinePassno-aggressive-aggregate-splitting;...;max-iterations=N
399loop-unrollLoopUnrollPassO0;O1;O2;O3;full-unroll-max=N;...
400loop-vectorizeLoopVectorizePassno-interleave-forced-only;...
401lower-allow-checkLowerAllowCheckPass(empty)
402lower-matrix-intrinsicsLowerMatrixIntrinsicsPassminimal
403lower-switchLowerSwitchPassenable-jump-table
404mldst-motionMergedLoadStoreMotionPassno-split-footer-bb;split-footer-bb
405print<da>DependenceAnalysisPrinterPassnormalized-results
406print<memoryssa>MemorySSAPrinterPassno-ensure-optimized-uses
407print<stack-lifetime>StackLifetimePrinterPassmay;must
408scalarizerScalarizerPassload-store;no-load-store;variable-insert-extract;...
409separate-const-offset-from-gepSeparateConstOffsetFromGEPPasslower-gep
410simplifycfgSimplifyCFGPasssimplify-unreachable;...;bonus-inst-threshold=N
411speculative-executionSpeculativeExecutionPassonly-if-divergent-target
412sroaSROAPasspreserve-cfg;modify-cfg
413structurizecfgStructurizeCFGskip-uniform-regions
414win-eh-prepareWinEHPreparePassdemote-catchswitch-only
415bounds-checkingBoundsCheckingPass (modified)trap
416memory-space-optMemorySpaceOptPassfirst-time;second-time;no-warnings;warnings
417lower-aggr-copiesLowerAggrCopiesPasslower-aggr-func-args
418lower-struct-argsLowerStructArgsPassopt-byval
419process-restrictProcessRestrictPasspropagate-only

LoopNest Passes (entries 420--423)

#Pass NameLLVM Class
420loop-flattenLoopFlattenPass
421loop-interchangeLoopInterchangePass
422loop-unroll-and-jamLoopUnrollAndJamPass
423no-op-loopnestNoOpLoopNestPass

Loop Analyses (entries 424--428)

#Pass NameLLVM Class
424ddgDDGAnalysis
425iv-usersIVUsersAnalysis
426no-op-loopNoOpLoopAnalysis
427pass-instrumentationvia sub_2342830
428should-run-extra-simple-loop-unswitchShouldRunExtraSimpleLoopUnswitch

Loop Passes (entries 429--455)

#Pass NameLLVM Class
429canon-freezeCanonicalizeFreezeInLoopsPass
430dot-ddgDDGDotPrinterPass
431guard-wideningvia sub_2342880
432extra-simple-loop-unswitch-passesExtraLoopPassManager<...>
433indvarsIndVarSimplifyPass
434invalidate<all>via sub_2342860
435loop-bound-splitLoopBoundSplitPass
436loop-deletionLoopDeletionPass
437loop-idiomLoopIdiomRecognizePass
438loop-idiom-vectorizeLoopIdiomVectorizePass
439loop-instsimplifyLoopInstSimplifyPass
440loop-predicationLoopPredicationPass
441loop-reduceLoopStrengthReducePass
442loop-term-foldLoopTermFoldPass
443loop-simplifycfgLoopSimplifyCFGPass
444loop-unroll-fullLoopFullUnrollPass
445loop-versioning-licmLoopVersioningLICMPass
446no-op-loopNoOpLoopPass
447printPrintLoopPass
448--450print<ddg>, print<iv-users>, print<loop-cache-cost>, print<loopnest>(printers)
451loop-index-splitLoopIndexSplitPass

Parameterized Loop Passes (entries 452--455)

#Pass NameClassParameters
452licmLICMPassallowspeculation;conservative-calls
453lnicmLNICMPassallowspeculation
454loop-rotateLoopRotatePassno-header-duplication;header-duplication;...
455simple-loop-unswitchSimpleLoopUnswitchPassnontrivial;no-nontrivial;trivial;no-trivial

Machine Function Analyses (entries 456--475)

#Pass NameLLVM Class
456edge-bundlesEdgeBundlesAnalysis
457livedebugvarsLiveDebugVariablesAnalysis
458live-intervalsLiveIntervalsAnalysis
459live-reg-matrixLiveRegMatrixAnalysis
460live-stacksLiveStacksAnalysis
461live-varsLiveVariablesAnalysis
462machine-block-freqMachineBlockFrequencyAnalysis
463machine-branch-probMachineBranchProbabilityAnalysis
464machine-cyclesMachineCycleAnalysis
465machine-dom-treeMachineDominatorTreeAnalysis
466machine-loopsMachineLoopAnalysis
467machine-opt-remark-emitterMachineOptimizationRemarkEmitterAnalysis
468machine-post-dom-treeMachinePostDominatorTreeAnalysis
469machine-trace-metricsMachineTraceMetricsAnalysis
470pass-instrumentationvia sub_2342830
471regalloc-evictRegAllocEvictionAdvisorAnalysis
472regalloc-priorityRegAllocPriorityAdvisorAnalysis
473slot-indexesSlotIndexesAnalysis
474spill-code-placementSpillPlacementAnalysis
475virtregmapVirtRegMapAnalysis

Machine Function Passes (entries 476--526)

#Pass NameLLVM Class
476dead-mi-eliminationDeadMachineInstructionElimPass
477detect-dead-lanesDetectDeadLanesPass
478early-ifcvtEarlyIfConverterPass
479early-machinelicmEarlyMachineLICMPass
480early-tailduplicationEarlyTailDuplicatePass
481finalize-iselFinalizeISelPass
482fixup-statepoint-caller-savedFixupStatepointCallerSavedPass
483localstackallocLocalStackSlotAllocationPass
484machine-cpMachineCopyPropagationPass
485machine-cseMachineCSEPass
486machine-latecleanupMachineLateInstrsCleanupPass
487machine-schedulerMachineSchedulerPass
488machinelicmMachineLICMPass
489no-op-machine-functionNoOpMachineFunctionPass
490opt-phisOptimizePHIsPass
491patchable-functionPatchableFunctionPass
492peephole-optPeepholeOptimizerPass
493phi-node-eliminationPHIEliminationPass
494post-RA-schedPostRASchedulerPass
495postmischedPostMachineSchedulerPass
496post-ra-pseudosExpandPostRAPseudosPass
497printPrintMIRPass
498--510print<livedebugvars> ... print<virtregmap>(13 MF printers)
511reg-usage-collectorRegUsageInfoCollectorPass
512reg-usage-propagationRegUsageInfoPropagationPass
513register-coalescerRegisterCoalescerPass
514rename-independent-subregsRenameIndependentSubregsPass
515remove-redundant-debug-valuesRemoveRedundantDebugValuesPass
516require-all-machine-function-propertiesRequireAllMachineFunctionPropertiesPass
517stack-coloringStackColoringPass
518stack-slot-coloringStackSlotColoringPass
519tailduplicationTailDuplicatePass
520trigger-verifier-errorTriggerVerifierErrorPass
521two-address-instructionTwoAddressInstructionPass
522verifyMachineVerifierPass
523verify<machine-trace-metrics>MachineTraceMetricsVerifierPass
524machine-sinkMachineSinkingPass (parameterized)
525regallocfastRegAllocFastPass (parameterized)
526greedyRAGreedyPass (parameterized, LAST registered)

No NVIDIA-specific machine function passes were identified in the registration table; NVIDIA's machine-level customizations are implemented through target hooks in the NVPTX backend rather than as separately registered passes.

Runtime Pass Execution Order

Registration order (above) describes what is known to the pipeline parser. Runtime execution order is determined by sub_12E54A0 (the pipeline assembler) and controlled by the tier system. The execution order varies dramatically depending on: (1) optimization level, (2) fast-compile mode, (3) language string, and (4) individual pass enable/disable flags in NVVMPassOptions.

The AddPass Mechanism -- sub_12DE0B0

All runtime pass insertion uses sub_12DE0B0 (0x12DE0B0), a hash-table-based function that:

  1. Hashes the pass pointer: (pass >> 9) ^ (pass >> 4)
  2. Probes an open-addressed hash table at passMgr+80
  3. Stores the pass pointer and a flags byte (flags | 2 if barrier set)
  4. Appends the pass pointer to a dynamic array at passMgr[0]
  5. Increments the counter at passMgr+8

The third parameter encodes pass type: 0 = ModulePass/AnalysisPass, 1 = FunctionPass. The fourth parameter is a scheduling barrier hint.

Tier System Architecture

The tier system is NVIDIA's mechanism for interleaving custom passes with standard LLVM passes at precise points. The main optimization loop in sub_12E54A0 iterates over a plugin/extension pass array at opts[4488..4496] (16-byte stride: vtable + phase_id), and fires tier sub-pipelines when the accumulated phase counter exceeds their thresholds:

// Pseudocode from sub_12E54A0, lines 481-553
for (entry = opts[4488]; entry < opts[4496]; entry += 16) {
    phase_id = entry[8];

    if (opts[4224] && phase_id > opts[4228]) {   // Tier 0
        sub_12DE330(PM, opts);                    // Full optimization
        opts[4224] = 0;                           // Fire once
    }
    if (opts[3528] && phase_id > opts[3532]) {    // Tier 1
        sub_12DE8F0(PM, 1, opts);
        opts[3528] = 0;
    }
    if (opts[3568] && phase_id > opts[3572]) {    // Tier 2
        sub_12DE8F0(PM, 2, opts);
        opts[3568] = 0;
    }
    if (opts[3608] && phase_id > opts[3612]) {    // Tier 3
        sub_12DE8F0(PM, 3, opts);
        opts[3608] = 0;
    }

    pass = entry->vtable[72]();                   // Plugin pass factory call
    sub_12DE0B0(PM, pass, 1, 0);                  // Insert plugin pass

    if (opts[3904])                               // Debug mode
        insert_verifier_after_each();
}
// Remaining unfired tiers fire unconditionally after loop

The tier control fields in the NVVMPassOptions struct:

OffsetTypeField
+3528boolTier 1 enable
+3532intTier 1 phase threshold
+3568boolTier 2 enable
+3572intTier 2 phase threshold
+3608boolTier 3 enable
+3612intTier 3 phase threshold
+4224boolTier 0 (full optimization) enable
+4228intTier 0 phase threshold

Infrastructure Setup (Always Runs)

These five passes are always inserted first, regardless of optimization level:

PosFactoryIdentityAddPass Flags
1sub_149CCE0 (alloc 368B)TargetLibraryInfoWrapperPass(PM, TLI, 0, 0) Module
2sub_1BFB520 (alloc 208B)TargetTransformInfoWrapperPass(PM, TTI, 1, 0) Function
3sub_14A7550VerifierPass / BasicAliasAnalysis(PM, _, 0, 0) Module
4sub_1361950AssumptionCacheTracker(PM, _, 0, 0) Module
5sub_1CB0F50ProfileSummaryInfoWrapperPass(PM, _, 1, 0) Function

Tier 0 -- Full Optimization (sub_12DE330)

Called when opts[4224] (optimization enabled) and the phase threshold is exceeded. This is the primary optimization sub-pipeline for O1/O2/O3, adding ~40 passes. Address: 0x12DE330.

Confidence note: Pass identifications are based on diagnostic strings, factory-function signatures, and pipeline ordering. Most identifications are HIGH confidence (confirmed by unique string literals). Entries marked [MEDIUM confidence] are inferred from code structure, argument patterns, or address proximity rather than direct string evidence.

PosFactory AddressLikely PassGuard Condition
1sub_1654860(1)BreakCriticalEdgesalways
2sub_1A62BF0(1,0,0,1,0,0,1)LLVM standard pipeline #1always
3sub_1B26330MemCpyOptalways
4sub_185D600IPConstantPropagationalways
5sub_1C6E800GVNalways
6sub_1C6E560NewGVN/GVNHoist [MEDIUM confidence]always
7sub_1857160NVVMReflectalways
8sub_1842BC0SCCPalways
9sub_17060B0(1,0)PrintModulePassopts[3160]
10sub_12D4560NVVMVerifieralways
11sub_18A3090NVVMPredicateOptalways
12sub_184CD60ConstantMergealways
13sub_1869C50(1,0,1)Sink/MemSSA [MEDIUM confidence] -- three-arg factory matches Sink with MemSSA parameters, but could also be a custom sinking variant!opts[1040]
14sub_1833EB0(3)TailCallElim/JumpThreading [MEDIUM confidence] -- integer arg=3 could be JumpThreading threshold or TailCallElim mode; no disambiguating stringalways
15sub_17060B0(1,0)PrintModulePassopts[3160]
16sub_1952F90(-1)LoopIndexSplitalways
17sub_1A62BF0(1,...)LLVM standard pipeline #1always
18sub_1A223D0NVVMIRVerificationalways
19sub_17060B0(1,0)PrintModulePassopts[3160]
20sub_1A7A9F0InstructionSimplifyalways
21sub_1A62BF0(1,...)LLVM standard pipeline #1always
22sub_1A02540GenericToNVVMalways
23sub_198DF00(-1)LoopSimplifyalways
24sub_1C76260ADCE!opts[1320]
25sub_195E880(0)LICMopts[2880]
26sub_19C1680(0,1)LoopUnroll!opts[1360]
27sub_17060B0(1,0)PrintModulePassopts[3160]
28sub_19401A0InstCombinealways
29sub_1968390SROAalways
30sub_196A2B0EarlyCSEalways
31sub_19B73C0(2,-1,-1,-1,-1,-1,-1)LoopUnswitchalways
32sub_17060B0(1,0)PrintModulePassopts[3160]
33sub_190BB10(0,0)SimplifyCFGalways
34sub_1A13320NVVMRematerializationalways
35sub_17060B0(1,1)PrintModulePass (verified)opts[3160]
36sub_18F5480DSEalways
37sub_18DEFF0DCEalways
38sub_1A62BF0(1,...)LLVM standard pipeline #1always
39sub_18B1DE0NVVMLoopPass/BarrierOpt [MEDIUM confidence] -- address is in NVVM pass range, but dual name reflects ambiguity between loop optimization and barrier optimization rolesalways
40sub_1841180FunctionAttrsalways

Tier 1/2/3 -- Phase-Specific Sub-pipeline (sub_12DE8F0)

Called with tier number (1, 2, or 3). Address: 0x12DE8F0. Stores the tier value to qword_4FBB410. When tier==3, sets qword_4FBB370 = 6 if BYTE4 was 0 (enables advanced barrier and memory space optimization features).

The pass sequence is significantly longer than Tier 0 and varies by tier. The following shows the superset of all passes that can be inserted; tier-based guards are annotated.

Confidence note: Same methodology as Tier 0 table above. Most identifications confirmed by diagnostic strings or NVVMPassOptions slot cross-references.

PosFactory AddressLikely PassGuard
1sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]
2sub_1A223D0NVVMIRVerification!opts[2600]
3sub_1CB4E40(1)NVVMIntrinsicLowering (barrier)!opts[2000]
4sub_18E4A00NVVMBarrierAnalysisopts[3488]
5sub_1C98160(0)NVVMLowerBarriersopts[3488]
6sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]
7sub_12D4560NVVMVerifier!opts[600]
8sub_185D600IPConstPropagationopts[3200] && !opts[920]
9sub_1857160NVVMReflectopts[3200] && !opts[880]
10sub_18A3430NVVMPredicateOptopts[3200] && !opts[1120]
11sub_1842BC0SCCPopts[3200] && !opts[720]
12sub_17060B0(1,0)PrintModulePass!opts[1080]
13sub_12D4560NVVMVerifier!opts[600]
14sub_18A3090NVVMPredicateOpt variantopts[3200] && !opts[2160]
15sub_184CD60ConstantMergeopts[3200] && !opts[1960]
16sub_190BB10(1,0)SimplifyCFGtier!=1 && !opts[1040] && !opts[1200]
17sub_1952F90(-1)LoopIndexSplit(same guard) && !opts[1160]
18sub_12D4560NVVMVerifier(same guard) && !opts[600]
19sub_17060B0(1,0)PrintModulePass(same guard) && !opts[1080]
20sub_195E880(0)LICMopts[3704] && opts[2880] && !opts[1240]
21sub_1C8A4D0(v)EarlyCSEv=1 if opts[3704]
22sub_1869C50(1,0,1)Sinktier!=1 && !opts[1040]
23sub_1833EB0(3)TailCallElimtier==3 && !opts[320]
24sub_1CC3990NVVMUnreachableBlockElim!opts[2360]
25sub_18EEA90CorrelatedValuePropagationopts[3040]
26sub_12D4560NVVMVerifier!opts[600]
27sub_1A223D0NVVMIRVerification!opts[2600]
28sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]
29sub_1C4B6F0Inliner!opts[440] && !opts[480]
30sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]
31sub_1A7A9F0InstructionSimplify!opts[2720]
32sub_12D4560NVVMVerifier!opts[600]
33sub_1A02540GenericToNVVM!opts[2200]
34sub_198DF00(-1)LoopSimplify!opts[1520]
35sub_1C76260ADCE!opts[1320] && !opts[1480]
36sub_17060B0(1,0)PrintModulePass(same guard)
37sub_12D4560NVVMVerifier(same guard)
38sub_195E880(0)LICMopts[2880] && !opts[1240]
39sub_1C98160(0/1)NVVMLowerBarriersopts[3488]
40sub_19C1680(0,1)LoopUnroll!opts[1360]
41sub_17060B0(1,0)PrintModulePass!opts[1080]
42sub_19401A0InstCombine!opts[1000]
43sub_196A2B0EarlyCSE!opts[1440]
44sub_1968390SROA!opts[1400]
45sub_19B73C0(tier,...)LoopUnswitchtier!=1, SM-arch-dependent params
46sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]
47sub_19B73C0(tier,...)LoopUnswitch (2nd)!opts[2760]
48sub_1A62BF0(1,...)LLVM standard pipeline!opts[600]
49sub_1A223D0NVVMIRVerification!opts[2600]
50sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]
51sub_17060B0(1,0)PrintModulePass!opts[1080]
52sub_190BB10(0,0)SimplifyCFG!opts[960]
53sub_1922F90NVIDIA loop passopts[3080]
54sub_195E880(0)LICMopts[2880] && !opts[1240]
55sub_1A13320NVVMRematerialization!opts[2320]
56sub_1968390SROA!opts[1400]
57sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]
58sub_18EEA90CorrelatedValuePropagationopts[3040]
59sub_18F5480DSE!opts[760]
60sub_18DEFF0DCE!opts[280]
61sub_1A62BF0(1,...)LLVM standard pipeline!opts[600]
62sub_1AAC510NVIDIA-specific pass!opts[520] && !opts[560]
63sub_1A223D0NVVMIRVerification!opts[2600]
64sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]
65sub_1C8E680MemorySpaceOpt!opts[2680], param from opts[3120]
66sub_1A223D0NVVMIRVerificationopts[3120] && !opts[2600]
67sub_17060B0(1,0)PrintModulePass (barrier)!opts[1080]
68sub_1CC71E0NVVMGenericAddrOpt!opts[2560]
69sub_1C98270(1,opts[2920])NVVMLowerBarriers variantopts[3488]
70sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]
71sub_1C6FCA0ADCEopts[2840] && !opts[1840]
72sub_18B1DE0LoopOpt/BarrierOptopts[3200] && !opts[2640]
73sub_1857160NVVMReflectopts[3200] && tier==3 && !opts[880]
74sub_1841180FunctionAttrsopts[3200] && !opts[680]
75sub_1C46000NVVMLateOpttier==3 && !opts[360]
76sub_1841180FunctionAttrs (2nd)opts[3200] && !opts[680]
77sub_1CBC480NVVMLowerAlloca!opts[2240] && !opts[2280]
78sub_1CB73C0NVVMBranchDist!opts[2080] && !opts[2120]
79sub_1C7F370(1)NVVMWarpShuffleopts[3328] && !opts[1640]
80sub_1CC5E00NVVMReductionopts[3328] && !opts[2400]
81sub_1CC60B0NVVMSinking2opts[3328] && !opts[2440]
82sub_1CB73C0NVVMBranchDist (2nd)opts[3328] && !opts[2080] && !opts[2120]
83sub_17060B0(1,0)PrintModulePassopts[3328] && !opts[1080]
84sub_1B7FDF0(3)Reassociateopts[3328] && !opts[1280]
85sub_17060B0(1,0)PrintModulePass (final)opts[3160] && !opts[1080]

Optimization Level Summary

PipelineSub-pipeline calledlsa-optmem-space-optApprox. passes
nvopt<O0>(minimal, sub_1C8A4D0(0) only)offoff~5--8
nvopt<Ofcmax>Sinking2 + common tail onlyforced 0forced 0~12--15
nvopt<Ofcmid>mid-level pipelinenormalenabled~25--30
nvopt<Ofcmin>close to full pipelinenormalenabled~30--35
nvopt<O1>sub_12DE330 (Tier 0)normalenabled~35
nvopt<O2>sub_12DE330 + Tier 1/2normalenabled~35+
nvopt<O3>sub_12DE330 + Tier 1/2/3normalenabled~35+

O1/O2/O3 all route through the same sub_12DE330 (Tier 0). The difference manifests through the tiered pass inserter sub_12DE8F0: O1 only fires Tier 1, O2 fires Tiers 1--2, O3 fires all three tiers. Within the tiers, passes additionally vary by: loop unroll factor (parameter to sub_1833EB0), vectorizer width (parameters to sub_19B73C0), CGSCC iteration count (first parameter to sub_1A62BF0), and the SM-architecture-dependent late passes gated by opts[3328].

Ofcmax critical behavior: when fast-compile level == 2 (max), the libnvvm pipeline builder forces -lsa-opt=0 and -memory-space-opt=0 even if the user explicitly enables them. This is confirmed in both sub_9624D0 (line 1358) and sub_12CC750 (line 2025).

Codegen Dispatch -- sub_12DFE00

After all optimization tiers complete, sub_12DFE00 (0x12DFE00) performs codegen pass scheduling. This is NOT a simple pass adder -- it performs a full dependency graph construction:

  1. Reads optimization level from opts[200] (0 = minimal, >1 = enable dependency tracking)
  2. Iterates all passes already in the pass manager
  3. For each pass, calls vtable+112 (isCodeGenOnly()) to filter
  4. Calls vtable+16 (getAnalysisUsage()) to extract dependencies
  5. Builds a secondary hash table of ordering constraints
  6. Dispatches each pass to the codegen subsystem in topological order via the subtarget hook at vtable+16

Pass Classification Statistics

CategoryCount
Module analyses18
Module passes~131
CGSCC analyses3
CGSCC passes~10
Function analyses~39
Function AA analyses5
Function passes~219
LoopNest passes4
Loop analyses5
Loop passes~26
MachineFunction analyses20
MachineFunction passes~50
Total~526
NVIDIA additions33
Standard LLVM~493

Complete Pass Factory Address Map

Every unique pass factory address observed in sub_12E54A0, sub_12DE330, and sub_12DE8F0:

FunctionAddressSizeRole
NVVMVerifiersub_12D4560many (tiers)many (tiers)
AssumptionCacheTrackersub_136195011
TargetLibraryInfoWrapperPasssub_149CCE011
VerifierPass/BasicAAsub_14A755011
BreakCriticalEdgessub_165486022
PrintModulePass (debug dump)sub_17060B0~30+~30+
InstructionCombiningsub_183227022
TailCallElim/JumpThreadingsub_1833EB033
FunctionAttrssub_184118033
SCCPsub_1842BC022
NVVMReflectsub_1857160~8~8
IPConstantPropagationsub_185D60033
Sink (MemorySSA-based)sub_1869C5033
NVVMPredicateOptsub_18A309022
AggressiveInstCombinesub_18A343022
NVVMLoopOpt/BarrierOptsub_18B1DE033
Sinking2Pass (fast-mode)sub_18B308011
DCEsub_18DEFF044
NVVMBarrierAnalysissub_18E4A0011
CorrelatedValuePropagationsub_18EEA9033
DSEsub_18F548022
DeadArgEliminationsub_18FD35055
SimplifyCFGsub_190BB1044
NVIDIA loop passsub_1922F9011
LoopIndexSplitsub_1952F9033
LICMsub_195E88044
SROAsub_196839022
EarlyCSEsub_196A2B022
LoopUnroll/Vectorizesub_197E72011
LoopSimplify/IndVarSimplifysub_198DF0033
CorrelatedValuePropagationsub_198E2A011
InstCombinesub_19401A022
LoopUnswitchsub_19B73C033
LoopUnrollsub_19C168022
NVIDIA pass (unknown)sub_19CE99011
GenericToNVVMsub_1A0254011
NVVMRematerializationsub_1A1332033
NVVMIRVerificationsub_1A223D05+5+
LLVM StandardPassPipelinesub_1A62BF0~9~9
LoopIdiomRecognizesub_1A68E7011
InstructionSimplifysub_1A7A9F033
NVIDIA-specific passsub_1AAC51011
MemCpyOptsub_1B2633044
Reassociate/Sinkingsub_1B7FDF033
TTIWrapperPasssub_1BFB52011
NVVMLateOptsub_1C4600011
Inliner/AlwaysInlinesub_1C4B6F022
NewGVN/GVNHoistsub_1C6E56011
GVNsub_1C6E80022
ADCE (AggressiveDCE)sub_1C6FCA022
ADCE variantsub_1C7626022
NVVMWarpShufflesub_1C7F37011
EarlyCSE/GVN variantsub_1C8A4D033
MemorySpaceOptsub_1C8E68044
NVVMLowerBarrierssub_1C9816044
NVVMLowerBarriers variantsub_1C9827011
ProfileSummaryInfosub_1CB0F5011
NVVMIntrinsicLoweringsub_1CB4E40~10~10
NVVMBranchDistsub_1CB73C033
NVVMLowerAllocasub_1CBC48011
NVVMUnreachableBlockElimsub_1CC399011
NVVMReductionsub_1CC5E0011
NVVMSinking2sub_1CC60B033
NVVMGenericAddrOptsub_1CC71E011
NVVMFinalLoweringsub_1CEBD1011
NVVMPeepholesub_1CEF8F022
NVVMAnnotationsProcessorsub_215D9D022

Total unique pass factories: ~65.

NVVMPassOptions Offset-to-Pass Guard Map

The NVVMPassOptions struct (4,512 bytes, 221 slots) controls which passes execute. The pipeline assembler reads boolean flags at specific offsets to gate pass insertion. See NVVMPassOptions for the full slot layout. Key offset-to-pass mappings:

OffsetSlotTypeControls
+2009intOptimization level (0/1/2/3)
+28015boolDCE disable
+32017boolTailCallElim/JumpThreading disable
+36019bool (default=1)NVVMLateOpt disable
+60031boolNVVMVerifier disable
+72037boolSCCP disable
+76039boolDSE disable
+88045boolNVVMReflect disable
+92047boolIPConstantPropagation disable
+96049boolSimplifyCFG disable
+100051boolInstCombine disable
+104053boolSink/MemSSA disable
+108055boolPrintModulePass disable
+116059boolLoopIndexSplit disable
+124063boolLICM disable
+128065boolReassociate disable
+132067boolADCE disable
+136069boolLoopUnroll disable
+140071boolSROA disable
+144073boolEarlyCSE disable
+176089boolMemorySpaceOpt disable
+2000101boolNVVMIntrinsicLowering disable
+2320117bool (default=1)NVVMRematerialization disable
+2440123boolNVVMSinking2 disable
+2600131boolNVVMIRVerification disable
+2840141bool (default=1)ADCE enable (reversed logic)
+2880143bool (default=1)LICM enable (reversed logic)
+3120155bool (default=1)MemorySpaceOpt (2nd pass) enable
+3160157bool (default=1)PrintModulePass/debug dump enable
+3200159bool (default=1)Advanced NVIDIA passes group enable
+3328165bool (default=1)SM-specific late passes enable
+3488175boolBarrier optimization enable
+3648181ptrLanguage string ("ptx"/"mid"/"idn")
+3656intLanguage string length
+3704185boolLate optimization / address-space flag
+4064201boolConcurrent compilation enable
+4104203int (default=-1)Thread count
+4224211bool (default=1)Master optimization enable
+4304213boolDevice-code / separate-compilation flag
+4384217boolFast-compile bypass (skip LLVM pipeline)
+4464219bool (default=1)Late CFG cleanup guard

Infrastructure Functions

AddressFunctionRole
0x2342890sub_2342890Master pass registration (~2,816 lines)
0xE41FB0sub_E41FB0StringMap::insert (48-byte entries, open-addressing)
0xE41C70sub_E41C70StringMap::grow (hash table resize)
0xC94890sub_C94890String hash function (DJB/FNV-family)
0x9691B0sub_9691B0String equality (len + memcmp)
0xC931B0sub_C931B0StringRef::find_first_of (delimiter search)
0x95CB50sub_95CB50StringRef::consume_front (strip llvm:: prefix)
0x233C410sub_233C410Help listing (--print-pipeline-passes)
0x233BD40sub_233BD40AA name resolver (chain of comparisons)
0x233C0C0sub_233C0C0AA pipeline parser
0x233C300sub_233C300Extension callback dispatch
0x233A120sub_233A120Generic parameterized option parser
0x12E54A0sub_12E54A0Master pipeline assembler (49.8KB)
0x12DE0B0sub_12DE0B0AddPass (hash-table-based insertion)
0x12DE330sub_12DE330Tier 0 full optimization sub-pipeline
0x12DE8F0sub_12DE8F0Tier 1/2/3 phase-specific sub-pipeline
0x12DFE00sub_12DFE00Codegen dispatch (dependency-ordered)
0x226C400sub_226C400Pipeline name selector (nvopt<O#>)
0x2277440sub_2277440Pipeline text parser entry
0x225D540sub_225D540New PM nvopt registration
0x12C35D0sub_12C35D0Legacy PM pipeline orchestrator
0x2342820sub_2342820LastRunTrackingAnalysis factory
0x2342830sub_2342830PassInstrumentationAnalysis factory
0x2342840sub_2342840VerifierAnalysis factory
0x2342850sub_2342850InlinerWrapper factory (shared by 4 inliner variants)
0x2342860sub_2342860InvalidateAllAnalysesPass factory
0x2342870sub_2342870VerifierPass factory
0x2342880sub_2342880GuardWideningPass factory
0x2339850sub_2339850PassBuilder destructor
0x233B610sub_233B610PassBuilder::~PassBuilder cleanup

Cross-References