Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Minor NVIDIA Passes

This page indexes NVIDIA-proprietary passes that are too small or insufficiently decompiled for dedicated pages. For the ten passes that were previously documented here and now have full pages, see the links below.

Passes with Dedicated Pages

PassPage
NVVM IR Verifiernvvm-verify (Deep Dive)
NVVM Intrinsic Loweringnvvm-intrinsic-lowering
Dead Synchronization Eliminationdead-sync-elimination
IV Demotioniv-demotion
Struct/Aggregate Splittingstruct-splitting
Base Address Strength Reductionbase-address-sr
Common Base Eliminationcommon-base-elim
CSSA (Conventional SSA)cssa
FP128/I128 Emulationfp128-emulation
Memmove Unrollingmemmove-unroll

alloca-hoisting -- Entry Block Alloca Consolidation

FieldValue
Pass IDalloca-hoisting
Entry pointsub_21BC7D0
ScopeMachine-level pass

PTX requires all stack allocations to reside in the entry block. This pass moves alloca instructions inserted by inlining or loop transforms into the entry block, preserving order and alignment. Without it, non-entry-block allocas produce invalid PTX.

image-optimizer -- Texture/Surface Access Optimization

FieldValue
Pass IDnvptx-image-optimizer
Entry pointsub_21BCF10
ScopeMachine-level pass (pre-emission)

Groups related texture loads for cache utilization and merges redundant surface operations. Works in coordination with Replace Image Handles (below). See also Machine-Level Passes.

nvptx-peephole -- Machine-Level Peephole

FieldValue
Pass IDnvptx-peephole
Entry pointsub_21DB090
ScopeMachine-level pass (pre-RA)
Knobenable-nvvm-peephole (default: on)

PTX-specific peephole that folds redundant cvta address space conversions, optimizes predicate patterns, and simplifies PTX-specific instruction sequences. Distinct from the IR-level NVVM Peephole. See Machine-Level Passes for pipeline position.

proxy-reg-erasure -- Redundant cvta.to.local Removal

FieldValue
Pass IDnvptx-proxy-reg-erasure
Entry pointsub_21DA810
ScopeMachine-level pass (late post-RA)

Removes redundant cvta.to.local instructions left by address space lowering. Runs late in the pipeline after register allocation. See Machine-Level Passes.

valid-global-names -- PTX Identifier Sanitization

FieldValue
Pass IDnvptx-assign-valid-global-names
Entry pointsub_21BCD80
ScopeMachine-level pass (pre-emission)

Rewrites global symbol names to comply with PTX naming rules, removing characters illegal in PTX identifiers (@, $, etc.). Runs immediately before PTX emission.

replace-image-handles -- Texture/Surface Handle Substitution

FieldValue
Pass IDnvptx-replace-image-handles
Entry pointsub_21DBEA0
ScopeMachine-level pass (pre-emission)

Replaces IR-level texture/surface handle references with PTX-level .tex / .surf declarations. Paired with image-optimizer above. See Machine-Level Passes.

extra-mi-printer -- Register Pressure Diagnostics

FieldValue
Pass IDextra-machineinstr-printer
Entry pointsub_21E9E80
ScopeDiagnostic (debug-only)

Prints per-function register pressure statistics. Used for tuning pressure heuristics during development. Not active in release builds.

nvvm-intr-range -- Intrinsic Range Metadata

FieldValue
Pass IDnvvm-intr-range
Entry pointsub_216F4B0
ScopeFunction pass (IR level)
Knobnvvm-intr-range-sm (ctor_359)

Attaches !range metadata to NVVM intrinsics that return hardware-bounded values (threadIdx.x, blockIdx.x, etc.), enabling downstream known-bits analysis and range-based dead code elimination. Tightens ranges when __launch_bounds__ metadata is present. Documented in detail in KnownBits & DemandedBits.

GenericToNVVM -- Global Address Space Migration

FieldValue
Pass IDgeneric-to-nvvm
Entry pointsub_215DC20
Size36 KB

Moves global variables from generic address space (AS 0) to global address space (AS 1), inserting addrspacecast at use sites. Required because PTX globals must reside in .global memory. Documented in detail in PTX Emission.


Other Passes Documented Elsewhere

These passes appear in the NVPTX backend but have primary documentation on other pages:

PassEntryPrimary Page
nvvm-pretreatPretreatPass (New PM slot 128)Optimizer Pipeline
NLO (Simplify Live Output)sub_1CE10B0, sub_1CDC1F0Rematerialization
Prolog/Epilogsub_21DB5F0Machine-Level Passes, PrologEpilogInserter
LDG Transformsub_21F2780 (ldgxform)Machine-Level Passes, Code Generation
Machine Mem2Regsub_21F9920 (nvptx-mem2reg)Machine-Level Passes, Code Generation