Common Base Elimination
The Common Base Elimination pass hoists shared base address expressions to dominating points in the control flow graph, eliminating redundant recomputations of the same base pointer across multiple basic blocks. Where Base Address Strength Reduction targets intra-loop patterns driven by induction variables, Common Base Elimination operates at the inter-block level: it groups memory operations that share the same base pointer regardless of loop structure, finds their common dominator, and creates a single base computation at that dominator. Every original address is then rewritten as (hoisted_base + relative_offset).
This is a strictly GPU-motivated optimization. NVIDIA GPUs have limited integer ALU throughput relative to their floating-point pipelines, so any reduction in address arithmetic directly translates to freed execution slots for other work. On a typical CUDA kernel performing strided accesses across multiple branches (e.g., different cases of a switch over tile indices), the pass can eliminate dozens of redundant GEP chains that independently recompute the same base address.
The two-pass approach -- Common Base Elimination first at the IR level for inter-block redundancies, then Base Address Strength Reduction for intra-loop induction-variable patterns -- ensures comprehensive coverage of GPU address computation overhead.
Key Facts
| Property | Value |
|---|---|
| Pass name | "Common Base Elimination" |
| Entry point | sub_1C5DFC0 |
| Binary offset | 0x1C5DFC0 |
| Binary size | 38 KB (~850 decompiled lines) |
| Scope | Function-level |
| IR level | LLVM IR (pre-codegen) |
| Upstream equivalent | None -- entirely NVIDIA-proprietary |
| Complementary pass | Base Address Strength Reduction (sub_1C67780) |
| Primary knobs | scev-cgp-cross-block-limit -- limits common bases from a single block |
| Required analysis | Dominator tree (a1[23]), DataLayout |
Algorithm
The pass has four major phases: address decomposition, base pointer grouping, dominator-based hoisting, and address rewriting.
Phase 1 -- Address Expression Decomposition
For every memory operation (load, store, GEP-based address) in the function, the pass calls sub_1C53170 to decompose the address into a structured form:
struct AddressExpr {
Value *base_ptr; // The root pointer (alloca, global, argument)
Operand operands[]; // List of (index, constant_offset) pairs
unsigned operand_count; // Number of index terms
};
The result is stored as a (base_ptr, operand_list, operand_count) tuple. The decomposition strips away GEP chains to expose the underlying base pointer and accumulates constant offset terms separately from variable index terms. This is the same decomposition helper used by BASR (sub_1C67780), ensuring both passes reason about addresses in a compatible representation.
Phase 2 -- Base Pointer Grouping
The pass maintains two hash maps for grouping addresses:
Non-pointer-type bases (hash map at v382, keyed by base pointer value):
- Each memory operation whose decomposed base is not a pointer type (type_id != 15) is inserted via
sub_1C50900. - The hash entry accumulates a list of all instructions sharing that base.
- New bases are appended to worklist
v363.
Pointer-to-global bases (hash map at v378, keyed by underlying global variable):
- For pointer-type bases,
sub_1CCDC20extracts the underlying global variable by walking through bitcast and GEP chains. - This allows grouping addresses to the same global even when accessed through different local pointer variables.
- New globals are appended to worklist
v360.
The hash maps use the standard DenseMap growth policy (75% load factor, 12.5% tombstone compaction) with NVVM-layer sentinels (-8 / -16). sub_1C54050 handles both resize and in-place rehash. See Hash Table and Collection Infrastructure for the complete specification.
Phase 3 -- Dominator Walk and Base Hoisting
For each base pointer group containing two or more uses, the pass:
-
Finds the anchor. Among all constant offsets in the group, the operand with the minimum constant offset becomes the anchor. For offsets up to 64 bits, the constant is extracted directly from the GEP operand. For wider offsets (> 64 bits), the pass reads from extended-precision word arrays. Sign-extended comparisons determine the minimum.
-
Computes the common dominator. The pass reads the function's dominator tree from
a1[23]and walks it to find the nearest block that dominates all use sites. This is the standardfindNearestCommonDominatoroperation -- iteratively walk both paths toward the root until they meet. -
Inserts the hoisted base.
sub_13A5B00creates a new base address computation (a GEP or add instruction) at the terminator insertion point of the common dominator block. The hoisted instruction computesbase_ptr + min_offset, which is the anchor's address. -
Rewrites all uses. For each original memory operation in the group,
sub_14806B0rewrites the address as(hoisted_base + (original_offset - min_offset)). Since the anchor's own relative offset is zero, it becomes a direct use of the hoisted base.
In pseudocode:
fn run_common_base_elimination(F: &Function):
let dom_tree = F.dominator_tree // a1[23]
let data_layout = F.module.data_layout
// Phase 1+2: decompose and group
let base_groups: HashMap<Value*, Vec<(Instruction*, ConstantOffset)>> = {}
let global_groups: HashMap<GlobalVariable*, Vec<(Instruction*, ConstantOffset)>> = {}
for bb in F.basic_blocks():
for inst in bb.instructions():
if !is_memory_op(inst): continue
let (base, offsets, count) = sub_1C53170(inst)
if base.type_id != POINTER_TYPE:
base_groups[base].push((inst, offsets))
else:
let gv = sub_1CCDC20(base) // extract global
global_groups[gv].push((inst, offsets))
// Phase 3+4: hoist and rewrite
for (base, uses) in chain(base_groups, global_groups):
if uses.len() < 2: continue
let min_offset = uses.iter().map(|u| u.offset).min()
let anchor_inst = uses.find(|u| u.offset == min_offset).inst
// Find common dominator of all use blocks
let dom_block = uses[0].inst.parent
for use in uses[1..]:
dom_block = dom_tree.find_nearest_common_dominator(
dom_block, use.inst.parent)
// Hoist: create base+min_offset at dominator
let hoisted = sub_13A5B00(dom_block, base, min_offset)
// Rewrite all uses
for (inst, offset) in uses:
let relative = offset - min_offset
sub_14806B0(inst, hoisted, relative)
Phase 4 -- Pointer-to-Global Grouping
The global-variable grouping deserves special attention. Consider two local pointers p and q that both derive from the same global array g:
%p = getelementptr [1024 x float], ptr @g, i64 0, i64 %tid
%q = getelementptr [1024 x float], ptr @g, i64 0, i64 %tid2
Without the global extraction step, these would be in different groups (keyed by %p vs %q). The sub_1CCDC20 helper walks through the pointer chain to find the underlying @g, allowing the pass to recognize that both addresses target the same global and can share a hoisted base.
Cost-Benefit Analysis
The pass trades register pressure at the dominator for reduced address computation at use sites. This trade-off is particularly favorable on GPUs for two reasons:
Benefit -- Reduced integer ALU pressure. Each eliminated GEP chain frees integer ALU slots. On SM architectures, integer instructions compete for the same warp scheduler slots as floating-point instructions. A kernel with N memory operations sharing the same base saves up to (N-1) complete base address recomputations. For a kernel doing 8 loads from the same struct through different control-flow paths, this eliminates 7 redundant address computations.
Cost -- Extended live range at the dominator. The hoisted base must remain live from the dominator block down to every use site. On GPUs, each additional live register reduces occupancy (the number of concurrent warps per SM). The pass implicitly relies on the subsequent rematerialization pass (sub_1CE7DD0) to undo any hoisting decisions that prove too costly for register pressure -- if the hoisted value's live range crosses too many basic blocks, rematerialization will re-derive it closer to the use point.
The SCEV-CGP knob scev-cgp-cross-block-limit provides an explicit limit on how many common bases can be created from a single block, acting as a safety valve against excessive register pressure growth. The related scev-cgp-idom-level-limit constrains how far up the dominator tree the pass is willing to hoist.
Relationship with Base Address Strength Reduction
The two passes operate at different granularities and are intentionally complementary:
| Aspect | Common Base Elimination | Base Address Strength Reduction |
|---|---|---|
| Scope | Inter-block (dominator-based) | Intra-loop (induction-variable-based) |
| Target pattern | Multiple BBs accessing the same base | Loop body with base + stride * iv |
| Mechanism | Hoist to common dominator | Factor out common base, use incremented pointer |
| Key helper | sub_1C53170 (address decomposition) | sub_1C53170 (same decomposition) |
| Offset handling | Minimum-offset anchor | Minimum-offset anchor (same strategy) |
| Pipeline order | Runs first | Runs after CBE |
The shared address decomposition helper (sub_1C53170) and the shared rewriting infrastructure (sub_13A5B00 for creating new base computations, sub_14806B0 for rewriting addresses) confirm that these passes were designed as a coordinated pair. Common Base Elimination runs first to eliminate inter-block redundancies, leaving BASR to focus on the remaining intra-loop stride patterns. Without CBE running first, BASR would encounter more diverse base expressions in loop bodies, reducing its grouping effectiveness.
Both passes share the same 0x1C50000-0x1CCFFFF address range in the binary, and BASR's helper functions (e.g., sub_1C637F0 -- base address bitcast helper, strings "baseValue", "bitCastEnd") are directly adjacent to CBE's entry point.
Configuration
Direct Knobs
No CBE-specific enable/disable knob has been identified in the binary. The pass appears to be unconditionally enabled when the SCEV-CGP subsystem is active.
Related SCEV-CGP Knobs
| Knob | Type | Description |
|---|---|---|
scev-cgp-cross-block-limit | int | Maximum number of common bases that can be created from a single block. Limits the register pressure increase from hoisting. |
scev-cgp-idom-level-limit | int | Maximum dominator tree depth for hoisting. Prevents hoisting too far from use sites. |
do-scev-cgp | bool | Master enable for the SCEV-CGP subsystem. Disabling this may also disable CBE. |
do-base-address-strength-reduce | int | Two levels: 1 = basic, 2 = with conditions. Controls the companion BASR pass. |
do-base-address-strength-reduce-chain | bool | Enables chained strength reduction in BASR. |
base-address-strength-reduce-iv-limit | int | IV limit for BASR. |
base-address-strength-reduce-max-iv | int | Maximum IVs considered by BASR. |
BASR Aggressiveness Knob
The global dword_4FBCAE0 controls aggressiveness for negative-offset handling in the BASR companion pass. When dword_4FBCAE0 > 1, BASR also considers base groups where the maximum offset has a negative sign bit, checking via sub_1C51340 whether the base is loop-invariant before creating a separate common base via sub_1C55CE0. This knob does not directly affect CBE but influences how much address redundancy remains for CBE to handle.
Diagnostic Strings
"Common Base Elimination"
The pass registers a single diagnostic string (its name). No additional debug/dump strings have been identified. The pass does not appear to have a dedicated dump knob analogous to dump-base-address-strength-reduce for BASR.
Function Map
| Function | Address | Size | Role |
|---|---|---|---|
CommonBaseElimination::run | sub_1C5DFC0 | 38 KB | Main entry point -- orchestrates all four phases |
decomposeAddress | sub_1C53170 | -- | Decomposes a memory address into (base, offset_list, count) tuple. Shared with BASR. |
hashMapGrowOrRehash | sub_1C54050 | -- | Hash map resize/rehash with load-factor policy |
hashMapInsertOrLookup | sub_1C50900 | -- | Insert into or look up in the base-pointer hash map |
extractGlobalFromPointerChain | sub_1CCDC20 | -- | Walks bitcast/GEP chains to find the underlying GlobalVariable |
createCommonBaseForNegativeOffsets | sub_1C55CE0 | -- | Creates a separate common base when the max offset is negative. Used by BASR, available to CBE. |
isBaseLoopInvariant | sub_1C51340 | -- | Checks whether a base address is loop-invariant |
classifyAddressExpression | sub_1C57390 | -- | Classifies an instruction's address expression type |
createNewBaseInstruction | sub_13A5B00 | -- | Creates a new base address computation at the insertion point |
rewriteAddressAsBaseOffset | sub_14806B0 | -- | Rewrites an address as (new_base + relative_offset) |
extractBasePointer (SCEV helper) | sub_1456040 | -- | Extracts the base pointer from an address expression (SCEV getStart/getOperand(0)) |
Cross-References
- Base Address Strength Reduction -- the companion intra-loop pass
- SCEV-CGP knobs -- knobs controlling cross-block limits and IDOM depth
- NVIDIA Custom Passes Overview -- pass inventory and registration
- Rematerialization -- downstream pass that can undo costly hoisting by re-deriving values closer to use sites
- Other NVIDIA Passes -- summary entries for CBE and BASR
- LLVM Optimizer -- two-phase pipeline where CBE runs