Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ipmsp -- Inter-Procedural Memory Space Propagation

The IPMSP pass resolves generic (address space 0) pointer arguments to concrete NVIDIA address spaces by analyzing call sites across the entire module. When all callers of a function agree that a pointer argument points to a specific memory space (global, shared, local, constant), the pass either specializes the function in place or clones it with narrowed pointer types. This enables downstream passes to emit space-specific load/store instructions (e.g., ld.shared instead of generic ld) and eliminates addrspacecast overhead.

Disabling this pass (-disable-MemorySpaceOptPass) causes 2--20x performance regressions on real workloads. The pass is automatically disabled in OptiX IR mode (--emit-optix-ir routes -do-ip-msp=0).

Pass nameipmsp
Classllvm::IPMSPPass
ScopeModule pass
RegistrationNew PM slot 125, line 1111 in sub_2342890
Main functionsub_2CBBE90 (71 KB) -- MemorySpaceCloning worklist driver
LIBNVVM variantsub_1C6A6C0 (54 KB)
Inference enginesub_2CE96D0 -> sub_2CE8530
Cloning enginesub_F4BFF0 (CloneFunction)
Callee matchingsub_2CE7410
Propagationsub_2CF5840 -> sub_2CF51E0
Pipeline controldo-ip-msp NVVMPassOption (default: enabled)

NVPTX Address Spaces

The pass resolves generic (AS 0) pointers to specific address spaces: global (AS 1), shared (AS 3), constant (AS 4), local (AS 5), or param (AS 101). Generic pointers require a runtime address space check on every access; resolving them statically eliminates this overhead. See Address Spaces for the complete table with hardware mapping, pointer widths, aliasing rules, and the MemorySpaceOpt bitmask encoding.

Algorithm Overview

The pass operates as a worklist-driven inter-procedural fixed-point analysis. The top-level loop:

function IPMSP_Run(Module M):
    worklist = deque<Function*>{}
    argSpaceMap = map<Value*, int>{}        // formal arg -> resolved AS
    returnSpaceMap = map<Function*, int>{}  // function -> return AS
    calleeInfoMap = map<Function*, set<Function*>>{}  // reverse call graph

    // Phase 1: seed
    for each F in M.functions():
        if shouldProcess(F):
            worklist.push_back(F)
        for each caller of F:
            calleeInfoMap[F].insert(caller)

    debug("Initial work list size : %d", worklist.size())

    // Phase 2: fixed-point iteration
    while worklist not empty:
        F = worklist.pop_back()

        // Analyze and specialize F's callee arguments
        changed = analyzeAndSpecialize(F, argSpaceMap, calleeInfoMap)

        if changed:
            // Propagate to F's callees
            propagateSpacesToCallees(F, argSpaceMap)
            for each callee C of F in calleeInfoMap:
                if shouldProcess(C):
                    worklist.push_back(C)
            debug("%d callees are affected")

        // Check return space
        if resolveReturnSpace(F, returnSpaceMap):
            debug("%s : return memory space is resolved : %d")
            // propagate to callers and push them onto worklist

Phase 1: Build Worklist

The pass iterates all functions in the module. A function enters the worklist if sub_2CBA650 returns true, meaning:

  • The function is not a declaration or available_externally
  • Its linkage is not extern_weak or common
  • It is not an intrinsic (sub_B2DDD0 filter)
  • It has at least one formal argument that is a generic pointer not yet in the resolved-space map

Specifically, sub_2CBA650 checks:

function shouldProcess(this, F):
    if F has no users (F[16] == 0): return false

    linkage = F.linkage & 0xF
    if (linkage + 14) & 0xF <= 3: return false   // available_externally, appending
    if (linkage + 7) & 0xF <= 1: return false     // common, extern_weak

    if isIntrinsic(F): return false

    retType = F.getReturnType()
    if retType is pointer with AS 0 and not in returnSpaceMap:
        return true

    return hasUnresolvedPointerArgs(this, F)

sub_2CBA520 (hasUnresolvedPointerArgs) walks the formal arg list (stride 40 bytes) and returns true if any arg has type byte 14 (pointer) and is not already in the arg-space map.

A reverse call graph is also constructed: for each callee, the pass records which callers invoke it.

Debug output (when dump-ip-msp is enabled): "Initial work list size : N"

Phase 2: Per-Function Analysis

For each function popped from the worklist:

  1. Classify arguments: allocate a per-arg array initialized to 1000 ("unresolved"). Non-pointer args and already-resolved args are marked 2000 ("skip").

  2. Walk call sites: for each call instruction, examine each actual argument:

    • If the actual's address space is non-zero (already specific), record it.
    • If the actual is generic (AS 0), first check the callee-space map for a cached result. If not found, invoke the dataflow inference engine sub_2CE96D0 to trace the pointer's provenance.
    • If this is the first call site for this arg, record the space. If a subsequent call site disagrees, mark 2000 ("conflicting -- give up").
  3. Count resolved arguments: any arg where all call sites agree on a single address space is a candidate for specialization.

function analyzeArgSpaces(F, argSpaceMap, calleeSpaceMap):
    numArgs = F.arg_size()
    spaces[numArgs] = {1000, ...}     // 1000 = unresolved

    for i in 0..numArgs:
        arg = F.getArg(i)
        if arg.type != pointer:
            spaces[i] = 2000          // not a pointer, skip
        else if arg in argSpaceMap:
            spaces[i] = 2000          // already resolved

    for each CallInst CI using F:
        calledFn = CI.getCalledFunction()
        for i in 0..numArgs:
            if spaces[i] == 2000: continue
            actual = CI.getOperand(i)
            if actual == F.getArg(i): continue  // passthrough

            as = actual.type.addrspace
            if as == 0:
                // Check cache first
                if actual in calleeSpaceMap:
                    as = calleeSpaceMap[actual]
                else:
                    ok = inferAddressSpace(calledFn, actual, &as, ...)
                    if !ok:
                        spaces[i] = 2000
                        continue

            if spaces[i] == 1000:
                spaces[i] = as         // first call site
            else if spaces[i] != as:
                spaces[i] = 2000       // conflict

    return count(s for s in spaces if s != 1000 and s != 2000)

Debug output: "funcname : changed in argument memory space (N arguments)"

Phase 3: Specialization Decision

The pass chooses between two strategies based on linkage:

LinkageStrategyMechanism
Internal / Private (7, 8)In-place specializationModify the function's arg types directly. No clone needed since all callers are visible.
External / Linkonce / WeakCloneCreate a new function with specialized arg types and internal linkage. Rewrite matching call sites to target the clone. Keep the original for external callers.

The decision at line 1114 in sub_2CBBE90:

if (F.linkage & 0xF) - 7 <= 1:
    // Internal/Private: specialize in place
    for each resolved arg:
        argSpaceMap[arg] = resolvedAS
else:
    // External: must clone
    if resultsTree is empty:
        debug("avoid cloning of %s")
    else:
        createClone(F, resolvedArgs)

The clone is created by sub_F4BFF0 (CloneFunction):

  • Builds a new FunctionType with specific-space pointer arg types
  • Allocates a new Function object (136 bytes via sub_BD2DA0)
  • Copies the body via a ValueMap-based cloner (sub_F4BB00)
  • For each specialized arg, inserts an addrspacecast from specific back to generic at the clone's entry (these fold away in later optimization)
  • Sets clone linkage to internal (0x4007)

Debug output: "funcname is cloned"

Phase 4: Transitive Propagation

After specializing a function, the pass propagates resolved spaces to its callees via sub_2CF5840. This function:

  1. Creates an analysis context similar to sub_2CE96D0
  2. Calls sub_2CF51E0 which walks F's body
  3. For each call instruction in F that targets a known function, determines if the called function's args now have resolved spaces
  4. Updates the arg-space map accordingly

Affected callees are pushed back onto the worklist. This enables bottom-up resolution through call chains: if A -> B -> C, specializing A's args may resolve B's args, which in turn resolves C's args.

Debug output: "N callees are affected"

Phase 5: Return Space Resolution

After argument processing, the pass checks return values:

  • If the function returns a generic pointer, walk all ret instructions.
  • Follow the def chain through GEPs to the base pointer.
  • If all returns agree on a single address space, record it in the return-space map and propagate to callers.

Debug output: "funcname : return memory space is resolved : N"

The Dataflow Inference Engine

The inference engine is the core analysis that determines what address space a generic pointer actually points to. It is invoked when a call-site argument has address space 0 (generic) and the pass needs to determine the concrete space.

Entry Point: sub_2CE96D0

function inferAddressSpace(calledFn, actualArg, &result, module, symtab, argSpaceMap):
    as = actualArg.type.addrspace
    if as != 0:
        *result = as
        return true                    // trivially resolved

    // Generic pointer: need full analysis
    context = alloca(608)              // 608-byte stack context
    // Initialize 6 tracking sets:
    //   [0]  visited set (bitset for cycle detection in PHI chains)
    //   [1]  user-list collector
    //   [2]  callee mapping
    //   [3]  load tracking (when track-indir-load)
    //   [4]  inttoptr tracking (when track-int2ptr)
    //   [5]  alloca tracking

    return coreDataflowWalker(context, calledFn, actualArg,
                              &loadsVec, &callsVec, result)

The 608-byte context is allocated on the stack and contains all working state for the backward dataflow walk.

Core Backward Dataflow Walker: sub_2CE8530

The walker traces the pointer's provenance backward through the SSA def chain. It uses a worklist plus visited-set to handle cycles (primarily PHI nodes).

IR nodes handled:

IR nodeAction
getelementptrTransparent: follow the base pointer operand
bitcastTransparent: follow the source operand
addrspacecastExtract target address space, record it
phiAdd all incoming values to the worklist
selectAdd both arms to the worklist (result = OR of both)
call / invokeLook up callee in return-space map; if found, use that
loadIf track-indir-load enabled: follow the loaded pointer; otherwise opaque
inttoptrIf track-int2ptr enabled: follow the integer source; otherwise opaque
allocaIf process-alloca-always: immediately resolve to AS 5 (local)
argumentIf in arg-space map: use the recorded space

Inference rules (lattice):

The engine collects candidate address spaces from all reachable definitions. The resolution follows these rules:

// All sources agree:     resolved to that space
// Sources disagree:      unresolvable (return false)
// param bit set + param-always-point-to-global:  resolve to global (AS 1)
// alloca found + process-alloca-always:  resolve to local (AS 5)
// __builtin_assume(__isGlobal(p)) + process-builtin-assume:  resolve to global

The walker collects three separate vectors during traversal:

  • loads: pointers loaded from memory (indirect provenance)
  • GEPs: getelementptr instructions encountered along the chain
  • calls: function calls whose return values contribute to the pointer

Per-Callee Space Propagation: sub_2CE8CB0

This function is the heavy-weight driver called from the worklist loop for each function. It processes a function's call graph entries and determines concrete address spaces for callees by examining actual arguments at all call sites.

Architecture:

  1. A global limit at qword_3CE3528 caps maximum analysis depth to prevent explosion on large call graphs.

  2. The function iterates the BB instruction list (offset +328, linked list). For each callee encountered:

    • Check visited set. The set has two representations:
      • Small set: flat array at object offsets +32..+52 (checked when flag at +52 is set)
      • Large set: hash-based DenseSet at offset +24 (checked via sub_18363E0)
    • If callee has no body (*(_DWORD *)(callee + 120) == 0): collect it as a leaf and record its argument address spaces via sub_2CE80A0
    • Otherwise: skip (will be processed when popped from worklist)
  3. For each collected callee, a DenseMap cache at offset +160 is checked:

    • Hash function: (ptr >> 9) ^ (ptr >> 4), linear probing
    • Empty sentinel: -4096 (0xFFFFFFFFFFFFF000)
    • If found in cache: skip re-analysis (use cached result)
  4. After collecting all callees: invoke sub_2CE88B0 for merge/commit.

  5. For single-entry results (exactly 1 callee entry in the vector): special fast path via sub_2CE2F10 that commits directly through a vtable dispatch.

function perCalleePropagate(this, F):
    if this.firstVisit:
        // Reset tracking vectors
        clearUserVectors()

    // Walk BB instruction list
    for each BB in F.body():
        if BB in visitedSet: continue
        if BB.isDeclaration(): continue

        collectCalleeInfo(BB)       // -> sub_2CE80A0
        addToVisitedSet(BB)

    // Check depth limit
    if userVector.size() > depthLimit:
        return false

    // Merge phase
    if userVector.size() > 1:
        return mergeAndCommit(this, F)    // sub_2CE88B0
    elif userVector.size() == 1:
        commitSingleResult(this)          // fast path
    return false

Callee Matching Engine: sub_2CE7410

When multiple call instructions target the same callee, this function determines the best pair to use for space inference. This is critical for correctness -- the pass must ensure that the inferred space is valid for all uses.

Algorithm:

  1. Parallel operand walk: for each pair of call instructions to the same callee, walk their operand use-chains in parallel. Compare the instructions at each position via the instruction equivalence DenseMap at offset +80.

  2. Coverage scoring: count the number of matching operands (variable v95). Higher coverage means more confidence in the match.

  3. Dominance check: call sub_2403DE0(A, B) to test if BB A dominates BB B. Both directions are checked:

    • If A dominates B and B dominates A (same BB or trivial loop): strong match.
    • If only one direction: check if the non-dominating one is the entry BB's first instruction.
  4. Loop membership gate: sub_24B89F0 checks whether both call instructions are in the same loop. If both are in the same loop and the coverage score > 1, the match is accepted even without strict dominance (loops create natural fixed-point convergence).

  5. Attribute check: for each matched pair, sub_245A9B0 verifies metadata flags (at instruction offset +44) to ensure the transformation is legal.

  6. Output: the best-scoring pair is written into the results vector for subsequent instruction rewriting.

Post-Inference Merge: sub_2CE88B0

After the per-callee analysis produces a list of (instruction, resolved_space) entries:

function mergeAndCommit(this, F):
    entries = this.resultVector
    if entries.size() > 1:
        qsort(entries, comparator=sub_2CE2BD0)  // sort by callee ID

    changed = false
    while entries.size() > 1:
        entry = entries.back()
        calleeId = entry.calleeId

        // Find best match for this callee
        matchScore = sub_2CE7410(this, calleeId, ...)

        if matchScore > 0:
            // Commit via instruction specialization
            sub_2CE4830(this, matchedCallee)     // edge weight
            sub_2CE3B60(this, bestMatchIdx)      // commit space

            // Propagate to other entries sharing this callee
            for each other entry with same callee:
                if other != bestMatch:
                    sub_2CE3780(this, other.users, matchedCallee)

            // Compact the entries vector
            changed = true
        else:
            // No match: fallback propagation
            sub_2CE3A70(this, calleeId, ...)

    return changed

Instruction Specialization: sub_2CE8120

Once a callee's address space is determined, this function creates a specialized copy of the instruction:

  1. Legality check: vtable dispatch at offset +408 (sub_25AE460 default). Returns false if the instruction cannot be legally specialized (e.g., volatile operations, intrinsics with fixed types).

  2. Create specialized instruction: sub_244CA00 creates a new instruction with the modified pointer type (generic -> specific address space).

  3. Insert into BB: sub_24056C0 places the new instruction in the basic block's instruction list.

  4. Rewrite use chain: all uses of the old instruction are updated to reference the new specialized version.

  5. Update DenseMap caches:

    • Instruction-to-space map at offset +80: insert mapping from new instruction to resolved space
    • Edge count at offset +72: update via sub_24D8EE0
    • If nested clone tracking (offset +131 flag): update debug info via sub_2D2DBE0

Handling Recursion and Clone Limits

  • Transitive: clones are pushed back onto the worklist, so chains A->B->C are handled iteratively.
  • Mutual recursion: already-resolved args are detected via the map (marked 2000), preventing infinite re-processing.
  • Self-recursion: after the first pass resolves args, re-processing finds agreement and applies specialization.
  • Clone limit: do-clone-for-ip-msp (default -1 = unlimited) caps the total number of clones. Each clone increments a counter at this[200]. When the limit is exceeded, cloning stops but in-place specialization continues for internal functions.
  • Analysis depth limit: qword_3CE3528 limits the per-function callee analysis depth to prevent explosion on large modules.

The LIBNVVM Variant

A second implementation at sub_1C6A6C0 (54 KB) serves the LIBNVVM/module-pass path. Key differences:

  • Uses DenseMap-style hash tables (empty sentinel = -8, tombstone = -16, 16-byte entries)
  • Includes loop-induction analysis via sub_1BF8310 with maxLoopInd tracking (debug: "phi maxLoopInd = N: Function name")
  • Three processing phases controlled by globals:
    • Phase A (dword_4FBD1E0, default=4): call-site collection, threshold dword_4FBC300 = 500
    • Phase B (dword_4FBD2C0, default=2): address space resolution. If dword_4FBCAE0 (special mode), picks the callee with the smallest constant value (minimum address space ID).
    • Phase C (dword_4FBCD80, default=2): WMMA-specific sub-pass via sub_1C5FDC0, called with wmma_mode=1 first (WMMA-specific), then wmma_mode=0
  • Threshold: v302 > 5 triggers sub_1C67780 for deeper analysis
  • Pre/post analysis toggle: byte_4FBC840 controls calls to sub_1C5A4D0

Interaction with memory-space-opt

The ipmsp and memory-space-opt passes are complementary:

  • ipmsp is inter-procedural: it analyzes call graphs, infers address spaces across function boundaries, and specializes function signatures via cloning.
  • memory-space-opt is intra-procedural: it resolves generic pointers within a single function body using backward dataflow analysis and bitmask accumulation.

The typical pipeline flow:

  1. ipmsp runs first (module pass) to propagate address spaces across function boundaries
  2. memory-space-opt runs with first-time mode to resolve obvious intra-procedural cases
  3. Further optimization passes run (may create new generic pointers via inlining, SROA, etc.)
  4. memory-space-opt runs with second-time mode to clean up remaining generic pointers, fold isspacep intrinsics to constants

Both passes share the same set of knobs (with ias- prefixed mirrors for the IAS variant). The inference engine sub_2CE96D0 is shared between IPMSP and the alternate algorithm selected by mem-space-alg.

Knobs

IPMSP-Specific Knobs

KnobDefaultStorageDescription
dump-ip-msp0qword_5013548Enable debug tracing
do-clone-for-ip-msp-1 (unlimited)qword_5013468Max clones allowed
do-ip-msp1 (enabled)NVVMPassOptionEnable/disable the entire pass

Shared Inference Knobs (MemorySpaceOpt variant)

KnobDefaultStorageDescription
param-always-point-to-globaltrueunk_4FBE1EDParameter pointers always resolve to global (AS 1)
strong-global-assumptionstrue(adjacent)Assume constant buffer pointers always point to globals
process-alloca-alwaystrueunk_4FBE4A0Treat alloca-derived pointers as local (AS 5) unconditionally
wmma-memory-space-opttrueunk_4FBE3C0Specialize WMMA call args to shared memory (AS 3)
track-indir-loadtruebyte_4FBDE40Track indirect loads during inference
track-int2ptrtruebyte_4FBDC80Track inttoptr in inference
mem-space-alg2dword_4FBDD60Algorithm selection for address space optimization
process-builtin-assume--(ctor_531_0)Process __builtin_assume(__is*(p)) for space deduction

IAS Variant Knobs (IPMSPPass path, ctor_610)

Each shared knob has an ias- prefixed mirror that controls the InferAddressSpaces-based code path (sub_2CBBE90):

KnobMirrors
ias-param-always-point-to-globalparam-always-point-to-global
ias-strong-global-assumptionsstrong-global-assumptions
ias-wmma-memory-space-optwmma-memory-space-opt
ias-track-indir-loadtrack-indir-load
ias-track-int2ptrtrack-int2ptr

The unprefixed versions control the LIBNVVM variant (sub_1C6A6C0). The ias- prefixed versions control the New PM / IAS variant (sub_2CBBE90).

LIBNVVM Variant Globals

GlobalDefaultDescription
dword_4FBD1E04Phase A call-site collection level
dword_4FBD2C02Phase B resolution level
dword_4FBCD802Phase C WMMA sub-pass level
dword_4FBC300500Max analysis depth threshold
dword_4FBCAE0--Special minimum-selection mode
byte_4FBC840--Pre/post analysis toggle
dword_4FBD020--Debug: maxLoopInd dump

Debug Dump Knobs

KnobDescription
dump-ir-before-memory-space-optDump IR before MemorySpaceOpt runs
dump-ir-after-memory-space-optDump IR after MemorySpaceOpt completes
dump-process-builtin-assumeDump __builtin_assume processing
msp-for-wmmaEnable Memory Space Optimization for WMMA (tensor core)

Data Structures

Worklist

The worklist is a std::deque<Function*> with 512-byte pages (64 pointers per page). Push-back via sub_2CBB610 (extends the deque when the current page is full). Pop-back from the last page.

Red-Black Tree Maps

The cloning engine uses red-black trees (std::map) for four separate maps:

MapKeyValuePurpose
Return-spaceFunction*Resolved ASReturn value address space
Arg-spaceValue*Resolved ASPer-argument address space
Callee-spaceValue*Resolved ASCallee pointer spaces (cached inference results)
Callee-infoFunction*Sub-treeReverse call graph (which callers invoke this callee)

Red-black tree nodes are 0x58 bytes with the standard {left, right, parent, color, key} layout at offsets 16, 24, 8, 0, 32.

DenseMap Caches

The inference engine and per-callee propagation use DenseMap hash tables with LLVM-layer sentinels (-4096 / -8192) and 16-byte entries (key + value). Growth is handled by sub_240C8E0. See Hash Table and Collection Infrastructure for the hash function, probing, and growth policy.

Three independent DenseMaps are used:

  1. Offset +80: instruction -> resolved space (per-function analysis cache)
  2. Offset +160: callee -> inference result (cross-function cache)
  3. Offset +232: edge weight tracking (call graph weights for profitability)

Visited Sets

Two representations depending on set size:

  • Small set (flag at offset +52): flat array at offsets +32..+44, capacity at +40, count at +44. Linear scan for membership test.
  • Large set (default): hash-based DenseSet at offset +24 via sub_18363E0 for insert and sub_18363E0 for membership test.

Inference Context

The 608-byte stack-allocated context for sub_2CE8530 contains:

Offset rangeContent
0--23Result vector (pointer, size, capacity)
24--47Loads vector (indirect pointer sources)
48--71GEPs vector (getelementptr chains)
72--95Calls vector (call instructions returning pointers)
96--127Worklist for PHI traversal
128--607Visited bitset, callee tracking, metadata

Sentinel Values

ValueMeaningUsed in
1000Unresolved pointer argument (not yet seen at any call site)Per-arg analysis array
2000Non-pointer, already resolved, or conflicting (skip)Per-arg analysis array
-4096DenseMap empty slotAll DenseMap caches
-8192DenseMap tombstone (deleted entry)All DenseMap caches

Diagnostic Messages

MessageSourceCondition
"Initial work list size : %d"sub_2CBBE90Always (when dump-ip-msp)
"funcname : changed in argument memory space (N arguments)"sub_2CBBE90Args resolved
"funcname is cloned"sub_2CBBE90Clone created
"avoid cloning of funcname"sub_2CBBE90External linkage, empty results
"N callees are affected"sub_2CBBE90After propagation
"funcname : return memory space is resolved : N"sub_2CBBE90Return space resolved
"phi maxLoopInd = N: Function name"sub_1C6A6C0LIBNVVM loop-ind analysis

Function Map

FunctionAddressSizeRole
MemorySpaceCloningsub_2CBBE9071 KBWorklist driver (New PM variant)
IPMSPPasssub_1C6A6C054 KBLIBNVVM variant
inferAddressSpacesub_2CE96D0--Inference entry point
coreDataflowWalkersub_2CE8530--Backward dataflow analysis
perCalleePropagatesub_2CE8CB0--Per-callee space propagation
mergeAndCommitsub_2CE88B0--Post-inference merge (qsort)
rewriteCalleePairsub_2CE85D0--Instruction rewriting for matched pairs
calleeMatchingEnginesub_2CE7410--Dominance + coverage scoring
pushInferenceResultsub_2CE80A0--Append to result vector
vectorReallocsub_2CE7E60--Grow inference result vector
computeEdgeWeightsub_2CE4830--Call graph edge weight
commitSpacesub_2CE3B60--Commit resolved space to callee
fallbackPropagatesub_2CE3A70--Propagate unmatched entries
propagateToAlternatesub_2CE3780--Propagate to alternate callee users
commitSingleCalleesub_2CE2F10--Single-callee commit via vtable
singlePredecessorChecksub_2CE2DE0--Check single-predecessor property
qsortComparatorsub_2CE2BD0--Compare callee entries for sorting
mergeSmallVectorssub_2CE2A70--Merge small vector pairs
extractAddressSpacesub_2CE27A0--Extract AS from Value's type
cloneInstructionsub_2CE8120--Clone instruction + DenseMap update
populateUserSetsub_2CE97F0--Build per-arg user list
propagateSpacesToCalleessub_2CF5840--Post-specialization propagation
bodyWalkersub_2CF51E0--Walk function body for propagation
shouldProcessFunctionsub_2CBA650--Worklist eligibility predicate
hasUnresolvedPointerArgssub_2CBA520--Check for unresolved generic ptr args
CloneFunctionsub_F4BFF0--Full function clone with arg rewriting
ValueMapClonersub_F4BB00--ValueMap-based body cloner
replaceAllUsesWithsub_BD84D0--Redirect call sites to clone
mapInsertOrFindsub_2CBB230--Red-black tree insert
mapLookupsub_2CBB490--Red-black tree search
dequeGrowsub_2CBB610--Worklist deque push_back
checkAttributeBundlesub_245A9B0--Attribute flag membership test
instructionEquivalencesub_245AA10--Test instruction equivalence
bbDominatessub_2403DE0--BasicBlock dominance test
loopMembershipsub_24B89F0--Check if two instructions share a loop
createSpecializedInstsub_244CA00--Create instruction with modified types
insertIntoBlocksub_24056C0--Insert instruction into BB
updateDebugInfosub_2D2DBE0--Debug info update for cloned inst

Cross-References