Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Optimization Levels

cicc v13.0 supports four standard optimization levels (O0 through O3) and three fast-compile tiers (Ofcmin, Ofcmid, Ofcmax). These are mutually exclusive with the custom --passes= interface. The pipeline name is selected in the new-PM driver sub_226C400 and assembled by sub_12E54A0. The full optimization pipeline builder is sub_12DE330, with tier-specific insertion handled by sub_12DE8F0.

Pipeline Name Selection

The new-PM driver at sub_226C400 selects a pipeline name string based on boolean flags in the config struct:

Config OffsetFlagPipeline Name
byte[888]O0nvopt<O0>
byte[928]O1nvopt<O1>
byte[968]O2nvopt<O2>
byte[1008]O3nvopt<O3>
qw[131..132]fc="max"nvopt<Ofcmax>
qw[131..132]fc="mid"nvopt<Ofcmid>
qw[131..132]fc="min"nvopt<Ofcmin>

Selection logic in sub_226C400 (lines 828--874):

if (O1_flag)       -> "nvopt<O1>"
else if (O2_flag)  -> "nvopt<O2>"
else if (O3_flag)  -> "nvopt<O3>"
else if (fc_len == 3) {
  if (fc == "max") -> "nvopt<Ofcmax>"
  if (fc == "mid") -> "nvopt<Ofcmid>"
  if (fc == "min") -> "nvopt<Ofcmin>"
}
else               -> "nvopt<O0>"

Combining -O# with --passes= is an error:

"Cannot specify -O#/-Ofast-compile=<min,mid,max> and --passes=/--foo-pass, use -passes='default<O#>,other-pass'"

The pipeline name is passed to sub_2277440 (new-PM text parser), which constructs the actual PassManager. The nvopt prefix is registered as a pipeline element in sub_225D540 (new PM) and sub_12C35D0 (legacy PM), with vtables at 0x4A08350 / 0x49E6A58.

Fast-Compile Level Encoding

The fast-compile level is stored as an integer at offset 1640 (or 1648 in the clone) of the compilation context:

ValueCLI SourceBehavior
0(no flag, or -Ofast-compile=0)Normal O-level pipeline
1-Ofast-compile=0Forwarded then reset to 0
2-Ofast-compile=max / -Ofc=maxMinimal pipeline, fastest compile
3-Ofast-compile=mid / -Ofc=midMedium pipeline
4-Ofast-compile=min / -Ofc=minClose to full optimization

Any other value produces: "libnvvm : error: -Ofast-compile called with unsupported level".

When level=1, the flag is forwarded to the optimizer phase as a pass argument and then the level is reset to 0 at offset 1640 (so it becomes normal O-level optimization). When level=2 (max), the optimizer arg string -Ofast-compile=max is appended. When level=3 (mid), -Ofast-compile=mid is appended. When level=4 (min), -Ofast-compile=min is appended.

Tier Summary

PipelineApprox PassesLSA-OptMemSpaceOptCompile Speed
nvopt<O0>5--8offoffFastest (no opt)
nvopt<Ofcmax>12--15forced 0forced 0Fast
nvopt<Ofcmid>25--30normalenabledMedium
nvopt<Ofcmin>30--35normalenabledSlower
nvopt<O1>~40 + tier-1normalenabledNormal
nvopt<O2>~40 + tier-1/2normalenabledNormal
nvopt<O3>~40 + tier-1/2/3normalenabledSlowest

Pipeline Architecture: Tier 0 + Tiers 1/2/3

O1/O2/O3 share a common pipeline construction path. The key insight is that optimization happens in layers:

  1. Tier 0 (sub_12DE330): The full base pipeline of ~40 passes. Fires for ALL of O1, O2, and O3 when opts[4224] (optimization-enabled) is set.
  2. Tier 1 (sub_12DE8F0(PM, 1, opts)): Additional passes gated by opts[3528]. Fires for O1, O2, and O3.
  3. Tier 2 (sub_12DE8F0(PM, 2, opts)): Additional passes gated by opts[3568]. Fires for O2 and O3 only.
  4. Tier 3 (sub_12DE8F0(PM, 3, opts)): Additional passes gated by opts[3608]. Fires for O3 only.

The tier control fields in the NVVMPassOptions struct at 4512 bytes:

OffsetTypeMeaning
3528boolTier 1 enable (O1+)
3532intTier 1 phase threshold
3568boolTier 2 enable (O2+)
3572intTier 2 phase threshold
3608boolTier 3 enable (O3+)
3612intTier 3 phase threshold
4224boolTier 0 enable (any O-level)
4228intTier 0 phase threshold

The assembler loop in sub_12E54A0 (lines 481--553) iterates over the plugin/external pass list at opts[4488]. Each entry has a phase_id; when the phase_id exceeds a tier's threshold, that tier fires:

for each entry in opts[4488..4496]:
  phase_id = entry[8..12]
  if (opts[4224] && phase_id > opts[4228]):
    sub_12DE330(PM, opts)   // Tier 0
    opts[4224] = 0          // one-shot
  if (opts[3528] && phase_id > opts[3532]):
    sub_12DE8F0(PM, 1, opts) // Tier 1
    opts[3528] = 0
  if (opts[3568] && phase_id > opts[3572]):
    sub_12DE8F0(PM, 2, opts) // Tier 2
    opts[3568] = 0
  if (opts[3608] && phase_id > opts[3612]):
    sub_12DE8F0(PM, 3, opts) // Tier 3
    opts[3608] = 0
  AddPass(PM, entry->createPass())

After the loop, any remaining unfired tiers fire unconditionally.

Tier 0: Full Base Pipeline (sub_12DE330)

sub_12DE330 at 0x12DE330 is called for all O1/O2/O3 compilations. It constructs the ~40-pass base pipeline:

#FactoryPassGuardNotes
1sub_1654860(1)VerifierPassalways
2sub_1A62BF0(1,0,0,1,0,0,1)CGSCC/InlineralwaysPipeline EP 1, 1 iteration
3sub_1B26330()NVVMReflectalways
4sub_185D600()SROAalways
5sub_1C6E800()NVVMLowerArgsalways
6sub_1C6E560()NVVMLowerAllocaalways
7sub_1857160()SimplifyCFGalways
8sub_1842BC0()InstCombinealways
9sub_17060B0(1,0)GVNopts[3160]Debug-dump enabled
10sub_12D4560()NVVMVerifyalways
11sub_18A3090()LoopRotatealways
12sub_184CD60()LICMalways
13sub_1869C50(1,0,1)IndVarSimplify!opts[1040]
14sub_1833EB0(3)LoopUnrollalwaysFactor = 3
15sub_17060B0(1,0)GVNalways
16sub_1952F90(-1)LoopIndexSplit/SCCPalwaysThreshold = -1 (unlimited)
17sub_1A62BF0(1,0,0,1,0,0,1)CGSCC/Inlineralways
18sub_1A223D0()DSEalways
19sub_17060B0(1,0)GVNalways
20sub_1A7A9F0()MemCpyOptalways
21sub_1A62BF0(1,0,0,1,0,0,1)CGSCC/Inlineralways
22sub_1A02540()ADCEalways
23sub_198DF00(-1)JumpThreading/CVPalwaysThreshold = -1
24sub_1C76260()NVVMDivergenceLowering!opts[1320]
25sub_195E880(0)Reassociateopts[2880]Default on (slot 143)
26sub_19C1680(0,1)SpeculativeExecution!opts[1360]
27sub_17060B0(1,0)GVNopts[3160]Debug-dump enabled
28sub_19401A0()SCCPalways
29sub_1968390()GlobalDCE/ConstantPropalways
30sub_196A2B0()GlobalOptalways
31sub_19B73C0(2,-1,-1,-1,-1,-1,-1)LoopVectorize/SLPalwaysWidth=2, thresholds=-1
32sub_17060B0(1,0)GVNalways
33sub_190BB10(0,0)EarlyCSEalways
34sub_1A13320()TailCallElimalways
35sub_17060B0(1,1)GVN (verified)opts[3160]Verify mode
36sub_18F5480()NewGVNalways
37sub_18DEFF0()Sinkalways
38sub_1A62BF0(1,0,0,1,0,0,1)CGSCC/Inlineralways
39sub_18B1DE0()Sinking2alwaysNVIDIA custom
40sub_1841180()LoopSimplify/LCSSAalways

After sub_12DE330 returns, opts[4224] is cleared (one-shot).

Tiers 1/2/3: Phase-Specific Sub-Pipeline (sub_12DE8F0)

sub_12DE8F0 at 0x12DE8F0 is a single function called with tier in {1, 2, 3}. The tier value is stored into qword_4FBB410 (phase tracker). When tier==3 and qword_4FBB370 byte4 is 0, the feature flags are set to 6 (enabling advanced barrier opt + memory space opt gates).

The following table lists every pass in sub_12DE8F0 with its tier-dependent guard condition. A pass runs only when ALL conditions in its Guard column are satisfied.

#FactoryPassGuardO1O2O3
1sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]YYY
2sub_1A223D0()NVVMIRVerification!opts[2600]YYY
3sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]YYY
4sub_18E4A00()NVVMBarrierAnalysisopts[3488]YYY
5sub_1C98160(0)NVVMLowerBarriersopts[3488]YYY
6sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]YYY
7sub_12D4560()NVVMVerifier!opts[600]YYY
8sub_185D600()IPConstPropagationopts[3200] && !opts[920]YYY
9sub_1857160()NVVMReflectopts[3200] && !opts[880]YYY
10sub_18A3430()NVVMPredicateOptopts[3200] && !opts[1120]YYY
11sub_1842BC0()SCCPopts[3200] && !opts[720]YYY
12sub_17060B0(1,0)PrintModulePass!opts[1080]YYY
13sub_12D4560()NVVMVerifier!opts[600]YYY
14sub_18A3090()NVVMPredicateOpt variantopts[3200] && !opts[2160]YYY
15sub_184CD60()ConstantMergeopts[3200] && !opts[1960]YYY
16sub_190BB10(1,0)SimplifyCFGtier!=1 && !opts[1040] && !opts[1200]-YY
17sub_1952F90(-1)LoopIndexSplit(same as #16) && !opts[1160]-YY
18sub_12D4560()NVVMVerifier(same as #16) && !opts[600]-YY
19sub_17060B0(1,0)PrintModulePass(same as #16) && !opts[1080]-YY
20sub_195E880(0)LICMopts[3704] && opts[2880] && !opts[1240]YYY
21sub_1C8A4D0(v12)EarlyCSEalways; v12=1 if opts[3704]YYY
22sub_1869C50(1,0,1)Sinktier!=1 && !opts[1040]-YY
23sub_1833EB0(3)TailCallElimtier==3 && !opts[320]--Y
24sub_1CC3990()NVVMUnreachableBlockElim!opts[2360]YYY
25sub_18EEA90()CorrelatedValuePropagationopts[3040]YYY
26sub_12D4560()NVVMVerifier!opts[600]YYY
27sub_1A223D0()NVVMIRVerification!opts[2600]YYY
28sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]YYY
29sub_1C4B6F0()Inliner!opts[440] && !opts[480]YYY
30sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]YYY
31sub_1A7A9F0()InstructionSimplify!opts[2720]YYY
32sub_12D4560()NVVMVerifier!opts[600]YYY
33sub_1A02540()GenericToNVVM!opts[2200]YYY
34sub_198DF00(-1)LoopSimplify!opts[1520]YYY
35sub_1C76260()ADCE!opts[1320] && !opts[1480]YYY
36sub_17060B0(1,0)PrintModulePass(same as #35) && !opts[1080]YYY
37sub_12D4560()NVVMVerifier(same as #35) && !opts[600]YYY
38sub_195E880(0)LICMopts[2880] && !opts[1240]YYY
39sub_1C98160(0/1)NVVMLowerBarriersopts[3488]YYY
40sub_19C1680(0,1)LoopUnroll!opts[1360]YYY
41sub_17060B0(1,0)PrintModulePass!opts[1080]YYY
42sub_19401A0()InstCombine!opts[1000]YYY
43sub_196A2B0()EarlyCSE!opts[1440]YYY
44sub_1968390()SROA!opts[1400]YYY
45sub_19B73C0(tier,...)LoopVectorize/SLP (1st)tier!=1; params vary by SM-YY
46sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]YYY
47sub_19B73C0(tier,...)LoopVectorize/SLP (2nd)!opts[2760]YYY
48sub_1A62BF0(1,...)LLVM standard pipeline!opts[600]YYY
49sub_1A223D0()NVVMIRVerification!opts[2600]YYY
50sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]YYY
51sub_17060B0(1,0)PrintModulePass!opts[1080]YYY
52sub_190BB10(0,0)SimplifyCFG!opts[960]YYY
53sub_1922F90()NVIDIA loop passopts[3080]YYY
54sub_195E880(0)LICMopts[2880] && !opts[1240]YYY
55sub_1A13320()NVVMRematerialization!opts[2320]YYY
56sub_1968390()SROA!opts[1400]YYY
57sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]YYY
58sub_18EEA90()CorrelatedValuePropagationopts[3040]YYY
59sub_18F5480()DSE!opts[760]YYY
60sub_18DEFF0()DCE!opts[280]YYY
61sub_1A62BF0(1,...)LLVM standard pipeline!opts[600]YYY
62sub_1AAC510()NVIDIA-specific pass!opts[520] && !opts[560]YYY
63sub_1A223D0()NVVMIRVerification!opts[2600]YYY
64sub_1CB4E40(1)NVVMIntrinsicLowering!opts[2000]YYY
65sub_1C8E680()MemorySpaceOpt!opts[2680]; param from opts[3120]YYY
66sub_1A223D0()NVVMIRVerificationopts[3120] && !opts[2600]YYY
67sub_17060B0(1,0)PrintModulePass!opts[1080]YYY
68sub_1CC71E0()NVVMGenericAddrOpt!opts[2560]YYY
69sub_1C98270(1,opts[2920])NVVMLowerBarriers variantopts[3488]YYY
70sub_17060B0(1,0)PrintModulePassopts[3160] && !opts[1080]YYY
71sub_1C6FCA0()ADCEopts[2840] && !opts[1840]YYY
72sub_18B1DE0()LoopOpt/BarrierOptopts[3200] && !opts[2640]YYY
73sub_1857160()NVVMReflect (late)opts[3200] && tier==3 && !opts[880]--Y
74sub_1841180()FunctionAttrsopts[3200] && !opts[680]YYY
75sub_1C46000()NVVMLateOpttier==3 && !opts[360]--Y
76sub_1841180()FunctionAttrs (2nd)opts[3200] && !opts[680]YYY
77sub_1CBC480()NVVMLowerAlloca!opts[2240] && !opts[2280]YYY
78sub_1CB73C0()NVVMBranchDist!opts[2080] && !opts[2120]YYY
79sub_1C7F370(1)NVVMWarpShuffleopts[3328] && !opts[1640]YYY
80sub_1CC5E00()NVVMReductionopts[3328] && !opts[2400]YYY
81sub_1CC60B0()NVVMSinking2opts[3328] && !opts[2440]YYY
82sub_1CB73C0()NVVMBranchDist (2nd)opts[3328] && !opts[2080] && !opts[2120]YYY
83sub_17060B0(1,0)PrintModulePassopts[3328] && !opts[1080]YYY
84sub_1B7FDF0(3)Reassociateopts[3328] && !opts[1280]YYY
85sub_17060B0(1,0)PrintModulePass (final)opts[3160] && !opts[1080]YYY

O1 vs O2 vs O3: Complete Diff

The three O-levels differ through exactly five mechanisms. Every pass that is NOT listed here runs identically at all three levels.

1. Tier guard: tier!=1 (O2/O3 only)

These passes are present in sub_12DE8F0 but skip when tier==1 (O1):

PassFactoryEffect of skipping at O1
SimplifyCFGsub_190BB10(1,0)No inter-tier CFG cleanup
LoopIndexSplitsub_1952F90(-1)No inter-tier loop splitting
NVVMVerifier (post-split)sub_12D4560()No verification after split
Sinksub_1869C50(1,0,1)No inter-tier instruction sinking
LoopVectorize/SLP (1st call)sub_19B73C0(tier,...)No aggressive vectorization

At O1, the base pipeline (Tier 0) already includes one instance of LoopVectorize with sub_19B73C0(2,-1,-1,-1,-1,-1,-1) -- width 2, all thresholds at -1 (unlimited). The tier!=1 guard blocks a SECOND, more aggressive vectorization pass with SM-dependent parameters.

2. Tier guard: tier==3 (O3 only)

These passes run exclusively at O3:

PassFactoryPurpose
TailCallElimsub_1833EB0(3)Additional tail call optimization pass
NVVMReflect (late)sub_1857160()Second-round __nvvm_reflect resolution
NVVMLateOptsub_1C46000()O3-exclusive NVIDIA custom late optimization

sub_1C46000 (NVVMLateOpt) is the most significant O3-exclusive pass. It runs only when !opts[360] (not disabled) and only at tier==3. This is a dedicated NVIDIA optimization pass that performs additional transformations after the main pipeline is complete.

3. Feature flag qword_4FBB370 escalation

When tier==3 and qword_4FBB370 byte4 is 0, the function sets qword_4FBB370 = 6 (binary 110). This enables two feature gates:

  • Advanced barrier optimization (bit 1)
  • Memory space optimization extensions (bit 2)

These gates affect behavior in downstream passes that read qword_4FBB370, such as sub_12EC4F0 (the machine pass pipeline executor).

4. LoopVectorize/SLP parameter differences

sub_19B73C0 is called with different parameters depending on context:

Call siteParametersTier
Tier 0 (sub_12DE330 #31)(2, -1, -1, -1, -1, -1, -1)All O1/O2/O3
Tier 1/2/3, 1st call (#45)(tier, ...) SM-dependentO2/O3 only
Tier 1/2/3, 2nd call (#47)(tier, ...)All tiers
Ofcmid language path(3, -1, -1, 0, 0, -1, 0)Fast-compile

The 7 parameters to sub_19B73C0 control:

  • arg1: Vector width factor (2 at Tier 0, tier at higher tiers)
  • arg2..arg7: Thresholds for cost model, trip count, and SLP width. Value -1 means unlimited/auto; value 0 means conservative/disabled.

At O2, sub_19B73C0(2, ...) provides moderate vectorization. At O3, sub_19B73C0(3, ...) increases the vector width factor, enabling wider SIMD exploration. The SM-architecture-dependent parameters are resolved at runtime based on the target GPU.

5. CGSCC iteration count

sub_1A62BF0 is the CGSCC (Call Graph SCC) pass manager factory. The first argument is the pipeline extension point / iteration count:

ContextCallIterations
Tier 0 (all O-levels)sub_1A62BF0(1,0,0,1,0,0,1)1
Ofcmid pathsub_1A62BF0(5,0,0,1,0,0,1)5
Language "mid" pathsub_1A62BF0(8,0,0,1,1,0,1)8, with extra opt flag

O1/O2/O3 all use 1-iteration CGSCC in their shared Tier 0 pipeline. The iteration count differences appear in the fast-compile and language-specific paths, not between O-levels.

Complete O-Level Comparison Matrix

FeatureO0O1O2O3
Tier 0 base pipeline (~40 passes)-YYY
Tier 1 sub-pipeline-YYY
Tier 2 sub-pipeline--YY
Tier 3 sub-pipeline---Y
LoopVectorize (base, width=2)-YYY
LoopVectorize (tier, SM-dependent)--YY
SimplifyCFG (inter-tier)--YY
LoopIndexSplit (inter-tier)--YY
Sink (inter-tier)--YY
TailCallElim (extra)---Y
NVVMReflect (late round)---Y
NVVMLateOpt (sub_1C46000)---Y
Feature flags escalation (6)---Y
NVVMDivergenceLowering-YYY
SpeculativeExecution-YYY
MemorySpaceOpt-YYY
NVVMWarpShuffle-YYY
NVVMReduction-YYY
NVVMRematerialization-YYY
NVVMBranchDist-YYY
LSA optimizationoffononon

O0 Pipeline (Minimal)

When no O-level flag is set and no fast-compile level is active, the assembler falls through to LABEL_159 which calls:

sub_1C8A4D0(0)   -- NVVMFinalCleanup or similar minimal pass

Then the common tail at LABEL_84 adds:

  1. MemorySpaceOpt (conditional, skipped at O0 since opts[3488] is typically unset)
  2. sub_1CEBD10() -- NVVMFinal / cleanup
  3. sub_1654860(1) -- VerifierPass
  4. sub_12DFE00() -- Codegen pass setup

The O0 pipeline does NOT call sub_12DE330 or sub_12DE8F0. It runs only the infrastructure passes (TargetLibraryInfo, TargetTransformInfo, BasicAA, AssumptionCacheTracker, ProfileSummaryInfo) plus minimal canonicalization.

Ofcmax Pipeline (Fastest Compile)

Ofcmax bypasses the full pipeline entirely. It forces two optimizer flags:

  • -lsa-opt=0 (disables LSA optimization)
  • -memory-space-opt=0 (disables MemorySpaceOpt pass)

This forcing happens in BOTH sub_9624D0 (line 1358--1361) and sub_12CC750 (line 2025--2079). The condition is:

if (!compare(lsa_opt_flag, "0") || fc_level == 2):
  append("-lsa-opt=0")
  append("-memory-space-opt=0")

Additionally, when fc_level == 2 AND lsa_opt is NOT already "0", the libnvvm path also injects -lsa-opt=0, mem2reg, -memory-space-opt=0.

The minimal pass sequence:

#FactoryPass
1sub_18B3080(1)Sinking2Pass (fast mode, flag=1)
2sub_1857160()SimplifyCFG
3sub_19CE990()LoopStrengthReduce (if applicable)
4sub_1B26330()NVVMReflect
5sub_12D4560()NVVMVerify
6sub_184CD60()LICM
7sub_1C4B6F0()LowerSwitch
8sub_12D4560()NVVMVerify

Ofcmid Pipeline (Medium)

Ofcmid runs ~25--30 passes without forcing LSA or MemorySpaceOpt off. The pass sequence from sub_12E54A0 (lines 814--861):

#FactoryPassGuard
1sub_184CD60()LICM!opts[1960]
2sub_1CB4E40(0)AnnotationCleanupalways
3sub_1B26330()NVVMReflectalways
4sub_198E2A0()CorrelatedValuePropagationalways
5sub_1CEF8F0()NVVMPeepholealways
6sub_215D9D0()NVVMPeephole2/TcgenAnnotationalways
7sub_17060B0(1,0)GVN!opts[1080]
8sub_198DF00(-1)JumpThreading/CVPalways
9sub_17060B0(1,0)GVN!opts[1080]
10sub_1C6E800()NVVMLowerArgsalways
11sub_1832270(1)LoopSimplifyalways
12sub_1A62BF0(5,0,0,1,0,0,1)CGSCC (5 iterations)always
13sub_1CB4E40(0)AnnotationCleanupalways
14sub_18FD350(0)DCEalways
15sub_1841180()LCSSAalways
16sub_18DEFF0()Sinkalways
17sub_17060B0(1,0)GVNalways
18sub_184CD60()LICMalways
19sub_195E880(0)Reassociatealways
20sub_190BB10(0,0)EarlyCSEalways
21sub_19B73C0(3,-1,-1,0,0,-1,0)LoopVectorize (conservative)always
22sub_1A223D0()DSEalways
23sub_1C98160(0)MemorySpaceOptalways
24sub_1C8E680(0)MemorySpaceOpt2always
25sub_1B7FDF0(3)BranchFolding/CFGSimplifyalways
26sub_18B1DE0()Sinking2always

Key differences from the O1+ pipeline: Ofcmid uses 5-iteration CGSCC (vs 1 at O1+), includes NVVMPeephole/Peephole2 early, uses conservative LoopVectorize parameters (3,-1,-1,0,0,-1,0) with some thresholds zeroed, and skips NVVMDivergenceLowering, SpeculativeExecution, NVVMBranchDist, NVVMRematerialization, and the entire tier sub-pipeline.

Ofcmin Pipeline (Closest to Full Optimization)

Ofcmin takes the same path as Ofcmid through LABEL_297 in sub_12E54A0 but with the v238 flag set differently, enabling more aggressive settings. The pipeline is essentially the Ofcmid sequence with:

  • More aggressive loop optimizer thresholds
  • Additional CGSCC framework passes
  • Closer parameter alignment to the O2 full pipeline

Ofcmin does NOT force -lsa-opt=0 or -memory-space-opt=0. Like Ofcmid, it still skips the tier 1/2/3 sub-pipeline entirely, keeping compile time lower than O1.

Post-Optimization Common Tail

Regardless of pipeline tier, sub_12E54A0 always appends at LABEL_84 (lines 640--653):

#FactoryPassGuard
1sub_1C98160(opts[2920]!=0)MemorySpaceOpt!v244 && opts[3488]
2sub_1CEBD10()NVVMFinal / cleanupalways
3sub_1654860(1)VerifierPass!opts[2800] && !opts[4464]
4sub_12DFE00(PM, v253, opts)Codegen pass dispatchalways

sub_12DFE00 (codegen dispatch) reads the optimization level from opts[200] to determine codegen aggressiveness. When opts[200] > 1, full dependency tracking is enabled across all codegen passes.

Always-Added Analysis Passes

Before any optimization, the pipeline assembler inserts (lines 396--420):

#FactoryPass
1sub_149CCE0 (368 bytes alloc)TargetLibraryInfoWrapperPass
2sub_1BFB520 (208 bytes alloc)TargetTransformInfoWrapperPass
3sub_14A7550()VerifierPass / BasicAliasAnalysis
4sub_1361950()AssumptionCacheTracker
5sub_1CB0F50()ProfileSummaryInfoWrapperPass

These five passes run at ALL optimization levels including O0.

NVVMPassOptions Offset-to-Guard Map

The passes gated by NVVMPassOptions boolean flags (opts struct at 4512 bytes). Slot defaults from sub_12D6300:

OffsetSlotDefaultControlsUsed By
28015offDCE disableTier 0 #37, Tier 1/2/3 #60
32017offTailCallElim disableTier 1/2/3 #23 (O3 only)
36019onNVVMLateOpt disableTier 1/2/3 #75 (O3 only)
44023offInliner flag A disableTier 1/2/3 #29
48025onInliner flag B disableTier 1/2/3 #29
60031offNVVMVerifier disableTier 1/2/3 #7,13,18,26,32,37
68035offFunctionAttrs disableTier 1/2/3 #74,76
72037offSCCP disableTier 1/2/3 #11
76039offDSE disableTier 1/2/3 #59
88045offNVVMReflect disableTier 1/2/3 #9,73
92047offIPConstPropagation disableTier 1/2/3 #8
96049offSimplifyCFG disableTier 1/2/3 #52
100051offInstCombine disableTier 1/2/3 #42
104053offSink/SimplifyCFG disableTier 0 #13, Tier 1/2/3 #16,22
108055offPrintModulePass disablemany
112057offNVVMPredicateOpt disableTier 1/2/3 #10
116059offLoopIndexSplit disableTier 1/2/3 #17
120061offSimplifyCFG tier guardTier 1/2/3 #16
124063offLICM disableTier 1/2/3 #20,38,54
128065offReassociate disableTier 1/2/3 #84
132065offNVVMDivergenceLow disableTier 0 #24, Tier 1/2/3 #35
136067offLoopUnroll disableTier 0 #26, Tier 1/2/3 #40
140069offSROA disableTier 1/2/3 #44,56
144071offEarlyCSE disableTier 1/2/3 #43
148073offADCE extra guardTier 1/2/3 #35
152075offLoopSimplify disableTier 1/2/3 #34
164081offNVVMWarpShuffle disableTier 1/2/3 #79
176087offMemorySpaceOpt disableCommon tail, language paths
184091offADCE variant disableTier 1/2/3 #71
196097offConstantMerge disableTier 1/2/3 #15
2000101offNVVMIntrinsicLowering disableTier 1/2/3 #1,3,28,50,64
2080103offNVVMBranchDist disable ATier 1/2/3 #78,82
2120105offNVVMBranchDist disable BTier 1/2/3 #78,82
2200109offGenericToNVVM disableTier 1/2/3 #33
2240111offNVVMLowerAlloca A disableTier 1/2/3 #77
2280113offNVVMLowerAlloca B disableTier 1/2/3 #77
2320115offNVVMRematerialization disableTier 1/2/3 #55
2360117onNVVMUnreachableBlockElim disableTier 1/2/3 #24
2400119offNVVMReduction disableTier 1/2/3 #80
2440121offNVVMSinking2 disableTier 1/2/3 #81
2560127offNVVMGenericAddrOpt disableTier 1/2/3 #68
2600129offNVVMIRVerification disableTier 1/2/3 #2,27,49,63,66
2640131offLoopOpt/BarrierOpt disableTier 1/2/3 #72
2680133offMemorySpaceOpt (2nd) disableTier 1/2/3 #65
2720135offInstructionSimplify disableTier 1/2/3 #31
2760137offLoopVectorize 2nd disableTier 1/2/3 #47
2840141onADCE enable (reversed)Tier 1/2/3 #71
2880143onLICM enable (reversed)Tier 0 #25, Tier 1/2/3 #20,38,54
2920145offLowerBarriers parameterCommon tail
3000151onEarly pass guardPre-opt phase
3040153offCorrelatedValueProp enableTier 1/2/3 #25,58
3080155onNVIDIA loop pass enableTier 1/2/3 #53
3120155onMemorySpaceOpt(2nd) enableTier 1/2/3 #65,66
3160157onPrintModulePass enableTier 0 #9,27,35; Tier 1/2/3 many
3200159onAdvanced NVIDIA passes groupTier 1/2/3 #8-11,14-15,72-76
3328165onSM-specific late passes blockTier 1/2/3 #79-84
3488173offNVVMBarrierAnalysis enableTier 1/2/3 #4,5,39,69
3528175offTier 1 enablePipeline assembler
3568177offTier 2 enablePipeline assembler
3608179offTier 3 enablePipeline assembler
3648181""Language/fc-level string ptrPipeline name selection
3704183offLate optimization flagTier 1/2/3 #20,21; Pipeline B
3904192offDebug/naming mode flagBB naming loop
4064201offConcurrent compilation flagThread count decision
4104203-1Thread count (integer)sub_12E7E70
4224209offTier 0 enable (opt active)Pipeline assembler loop
4304213offDevice-code / additional optPipeline B; fc dispatch
4384217offFast-compile bypass flagPipeline A vs B branch
4464221offLate CFG cleanup guardCommon tail #3

Codegen Optimization Level Propagation

The -optO and -llcO flags propagate the optimization level to the backend code generator. In sub_12E54A0 (lines 1451--1460):

if (lsa_opt == "0" && some_flag == "1"):
  append("-optO<level>")
  append("-llcO2")

The codegen dispatch sub_12DFE00 reads opts[200] (the integer optimization level):

  • opts[200] == 0: Minimal codegen (no dependency tracking)
  • opts[200] >= 1: Standard codegen
  • opts[200] >= 2: Full dependency tracking enabled (v121 = true)

Cross-References