Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

NVPTX Register Classes

This page is the single authoritative reference for the nine NVPTX register classes used throughout cicc v13.0. Register class tables previously duplicated in Register Allocation, Register Coalescing, PTX Emission, and AsmPrinter are consolidated here. When those pages reference register classes, they should cross-reference this page rather than maintaining inline copies.

Register encodingsub_21583D0 (4.6KB)
PTX type suffix mapsub_2163730 (1.7KB)
PTX prefix mapsub_21638D0 (1.6KB)
Copy opcode dispatchsub_2162350 (3.0KB)
Register info init (legacy)sub_2163AB0 / sub_2149CD0
Register info init (new PM)sub_30590F0 / sub_301F0C0
Register decl emissionsub_2158E80 (17KB)
Internal-only class vtableoff_4A026E0

The Nine Register Classes

NVPTX defines nine register classes that participate in PTX code generation. Each class is identified at runtime by its vtable pointer, which sub_2163730 and sub_21638D0 use as a switch key to produce the PTX type suffix and register prefix respectively. The encoding function sub_21583D0 maps each class to a 4-bit tag that occupies bits [31:28] of the 32-bit encoded register ID.

TagVtableClass NamePTX TypePrefixEncoded IDWidthDescription
1off_4A027A0Int1Regs.pred%p0x100000001Predicate (boolean)
2off_4A02720Int16Regs.b16%rs0x2000000016Short integer
3off_4A025A0Int32Regs.b32%r0x3000000032General-purpose integer
4off_4A024A0Int64Regs.b64%rd0x4000000064Double-width integer
5off_4A02620Float32Regs.f32%f0x5000000032Single-precision float
6off_4A02520Float64Regs.f64%fd0x6000000064Double-precision float
7off_4A02760Int16HalfRegs.b16%h0x7000000016Half-precision float (f16, bf16)
8off_4A026A0Int32HalfRegs.b32%hh0x8000000032Packed pair (v2f16, v2bf16, v2i16, v4i8)
9off_4A02460Int128Regs.b128%rq0x90000000128128-bit wide (tensor core)

Naming Discrepancy

Two naming conventions exist in the codebase, depending on whether the name was recovered from the emission functions or from the register allocator context:

VtableEmission name (sub_2163730/sub_21638D0)RA-context name (sub_2162350)Resolution
off_4A02760Int16HalfRegsFloat16RegsSame class. The emission functions use the TableGen-derived name Int16HalfRegs; the RA raw report uses the semantic alias Float16Regs. Both refer to off_4A02760.
off_4A026A0Int32HalfRegsFloat16x2RegsSame class. Int32HalfRegs is the TableGen name; Float16x2Regs is the semantic alias. Both refer to off_4A026A0.
off_4A02460Int128RegsSpecialRegsDifferent raw reports assigned different names to off_4A02460. The emission report identifies it as Int128Regs (based on .b128 type and %rq prefix). The earlier RA sweep report labeled it SpecialRegs. The emission-derived name Int128Regs is more accurate: .b128 / %rq is used for 128-bit tensor-core values (i128 on SM 70+), not for special/environment registers.

The tenth vtable off_4A026E0 is present in the binary but returns "!Special!" from both sub_2163730 and sub_21638D0. It is never assigned an encoded ID and never participates in register declaration emission. It is an internal-only sentinel class used within NVPTXRegisterInfo initialization (string "ENVREG10" at register info offset +72).

Throughout this wiki, the emission-derived names (Int16HalfRegs, Int32HalfRegs, Int128Regs) are canonical. Pages written before this consolidation may use the RA-context aliases.

Register Encoding Scheme -- sub_21583D0

Every virtual register in the NVPTX backend is encoded as a 32-bit value that packs the register class and a per-class index into a single integer. The encoding function at sub_21583D0 (4.6KB) implements this:

encoded_register = class_tag | (register_index & 0x0FFFFFFF)

The bit layout:

 31  28 27                             0
+------+-------------------------------+
| class|       register index          |
| tag  |       (28 bits)               |
+------+-------------------------------+
  • Bits [31:28] -- 4-bit class tag, values 0x1 through 0x9 as listed in the table above.
  • Bits [27:0] -- 28-bit register index within that class, supporting up to 268 million registers per class.

The function operates in two modes:

  1. Physical register (register_id >= 0): Returns the raw index directly (low 28 bits). Physical registers on NVPTX are a vestigial concept -- the target has no fixed register file -- but LLVM's infrastructure requires them for reserved registers like %SP and %SPL.

  2. Virtual register (register_id < 0, i.e., bit 31 set in LLVM's internal convention): Looks up the register class from the MachineRegisterInfo register map, matches the class vtable against the nine known vtable addresses, and returns class_encoded_id | (register_index & 0x0FFFFFFF).

If the vtable does not match any of the nine known classes, the function triggers a fatal error:

"Bad register class"

This is a hard abort, not a recoverable diagnostic. It indicates that either a new register class was added without updating the encoding function, or memory corruption produced an invalid vtable pointer.

Why Bits [31:28] and Not Bits [31:29]

LLVM's standard convention uses bit 31 (0x80000000) to distinguish physical from virtual registers internally. The NVPTX encoding reclaims this bit as part of the class tag because after encoding, the distinction between physical and virtual is no longer meaningful -- all registers in emitted PTX are virtual. Tag value 0x8 (Int32HalfRegs) has bit 31 set, which would collide with LLVM's virtual-register marker. This works because the encoding is applied only during emission, after register allocation is complete and the physical/virtual distinction is irrelevant.

Complete Class Separation

The nine register classes are completely disjoint. There is no cross-class interference: an Int32Regs register (%r) never conflicts with a Float32Regs register (%f) even though both are 32 bits wide. This is a fundamental consequence of PTX's typed register model. In PTX, .reg .b32 %r0 and .reg .f32 %f0 are distinct storage locations from ptxas's perspective. Two implications follow:

  1. No cross-class coalescing. The register coalescer at sub_34AF4A0 enforces a same-class check on every coalescing candidate. Cross-class copies (e.g., a bitcast from i32 to f32) must survive as explicit mov instructions in the emitted PTX.

  2. Per-class pressure accounting. The greedy register allocator at sub_2F5A640 tracks register pressure per class independently. The -maxreg limit bounds total live registers across all classes combined, but interference within any single class never spills over to another.

This is unlike CPU targets (x86, AArch64) where integer and floating-point registers can alias through sub-register relationships, or where a single physical register appears in multiple register classes.

Copy Opcodes -- sub_2162350

The function sub_2162350 (3.0KB, "Copy one register into another with a different width") dispatches copy instruction emission based on the source and destination register classes. Each class has two opcodes: one for same-class copies (e.g., mov.b32 %r1, %r0) and one for cross-class copies (e.g., bitcasting between Int32Regs and Float32Regs):

ClassSame-Class OpcodeCross-Class OpcodeNotes
Int1Regs3942439424No distinct cross-class path
Int16Regs3929639296No distinct cross-class path
Int32Regs3955210816Cross = mov.b32 bitcast to float
Int64Regs3968011008Cross = mov.b64 bitcast to double
Float32Regs3065610880Cross = mov.b32 bitcast to integer
Float64Regs3078411072Cross = mov.b64 bitcast to integer
Int16HalfRegs3052810688Cross = mov.b16 half-to-short
Int32HalfRegs3955239552Uses same opcode as Int32Regs same-class
Int128Regs3916839168No distinct cross-class path

Classes where both opcodes are identical (Int1Regs, Int16Regs, Int32HalfRegs, Int128Regs) have no meaningful cross-class copy path. For predicates (Int1Regs), this is because there is no other 1-bit type. For 128-bit registers, tensor-core values have no peer class to bitcast into. The Int32HalfRegs class shares its same-class opcode (39552) with Int32Regs because both emit .b32 copies -- the packed v2f16 value is simply treated as a 32-bit bitpattern for copying.

The five classes with distinct cross-class opcodes (Int32Regs, Int64Regs, Float32Regs, Float64Regs, Int16HalfRegs) are exactly those that participate in bitcast operations between integer and floating-point interpretations of the same bit width.

Register Declaration Emission -- sub_2158E80

During function body emission, sub_2158E80 (17KB) emits .reg declarations for every register class used by the function. The process:

  1. Iterate the register map at this+800 in the AsmPrinter state.
  2. Deduplicate classes using a hash table at this+808..832.
  3. Track the maximum index per class across all virtual registers.
  4. Emit one declaration per class in the format:
.reg .pred  %p<5>;       // 5 predicate registers (indices 0..4)
.reg .b16   %rs<12>;     // 12 short integer registers
.reg .b32   %r<47>;      // 47 general-purpose 32-bit
.reg .b64   %rd<8>;      // 8 double-width integer
.reg .f32   %f<20>;      // 20 single-precision float
.reg .f64   %fd<3>;      // 3 double-precision float
.reg .b16   %h<4>;       // 4 half-precision float
.reg .b32   %hh<2>;      // 2 packed-pair registers
.reg .b128  %rq<1>;      // 1 tensor-core 128-bit register

The count for each class is max_register_index + 1. The PTX declaration syntax %prefix<N> declares registers %prefix0 through %prefix(N-1).

Note that Int16HalfRegs and Int16Regs share the same PTX type suffix (.b16) but have different prefixes (%h vs %rs). Similarly, Int32HalfRegs and Int32Regs share .b32 but use %hh vs %r. The PTX assembler ptxas treats these as completely separate register namespaces -- the prefix, not the type, determines the namespace.

Stack pointer registers (%SP, %SPL) are emitted before the class declarations when the function has a non-zero local frame. These use .b64 in 64-bit mode or .b32 in 32-bit mode.

Per-Class Detail

Int1Regs -- Predicates

PropertyValue
Vtableoff_4A027A0
PTX type.pred
Prefix%p
Tag0x1
Width1 bit
Legal MVTsi1
Same-class copy39424

Predicate registers hold boolean values used for conditional branches (@%p1 bra target), select instructions (selp), and set-predicate results (setp). They are the only 1-bit registers in PTX. There is no cross-class copy path because no other class holds 1-bit values. The coalescer excludes predicates from cross-class analysis entirely.

Int16Regs -- Short Integers

PropertyValue
Vtableoff_4A02720
PTX type.b16
Prefix%rs
Tag0x2
Width16 bits
Legal MVTsi16
Same-class copy39296

Short integer registers hold 16-bit integer values. PTX .param space widens all scalars below 32 bits to .b32, so %rs registers appear primarily in computation, not in function signatures. The prefix %rs (register-short) distinguishes these from %h (Int16HalfRegs) even though both declare as .b16.

Int32Regs -- General-Purpose 32-bit

PropertyValue
Vtableoff_4A025A0
PTX type.b32
Prefix%r
Tag0x3
Width32 bits
Legal MVTsi32
Same-class copy39552
Cross-class copy10816

The workhorse register class. Holds 32-bit integers, addresses in 32-bit mode, loop indices, and general computation results. Cross-class copy opcode 10816 handles bitcast to Float32Regs (%f).

Int64Regs -- Double-Width Integer

PropertyValue
Vtableoff_4A024A0
PTX type.b64
Prefix%rd
Tag0x4
Width64 bits
Legal MVTsi64
Same-class copy39680
Cross-class copy11008

Holds 64-bit integers and device pointers in 64-bit mode (the common case). Cross-class copy opcode 11008 handles bitcast to Float64Regs (%fd).

Float32Regs -- Single-Precision Float

PropertyValue
Vtableoff_4A02620
PTX type.f32
Prefix%f
Tag0x5
Width32 bits
Legal MVTsf32
Same-class copy30656
Cross-class copy10880

Holds IEEE 754 single-precision floats. Note the .f32 type suffix rather than .b32 -- PTX distinguishes float from bitwise register types even at the same width. Cross-class copy opcode 10880 handles bitcast to Int32Regs (%r).

Float64Regs -- Double-Precision Float

PropertyValue
Vtableoff_4A02520
PTX type.f64
Prefix%fd
Tag0x6
Width64 bits
Legal MVTsf64
Same-class copy30784
Cross-class copy11072

Holds IEEE 754 double-precision floats. Cross-class copy opcode 11072 handles bitcast to Int64Regs (%rd).

Int16HalfRegs -- Half-Precision Float

PropertyValue
Vtableoff_4A02760
PTX type.b16
Prefix%h
Tag0x7
Width16 bits
Legal MVTsf16, bf16
Same-class copy30528
Cross-class copy10688

Despite the Int16 in the TableGen-derived name, this class holds half-precision floating-point values (f16 and bf16). The .b16 PTX type (bitwise 16-bit) is used rather than a hypothetical .f16 because PTX's type system uses .b16 for all 16-bit values that are not short integers. The %h prefix distinguishes these registers from %rs (Int16Regs). Cross-class copy opcode 10688 handles conversion to Int16Regs.

The semantic alias Float16Regs appears in some wiki pages and is equally valid.

Int32HalfRegs -- Packed Half-Precision Pairs

PropertyValue
Vtableoff_4A026A0
PTX type.b32
Prefix%hh
Tag0x8
Width32 bits
Legal MVTsv2f16, v2bf16, v2i16, v4i8
Same-class copy39552
Cross-class copy39552

This is the only register class for vector types on NVPTX. It holds exactly 32 bits of packed data: two f16 values, two bf16 values, two i16 values, or four i8 values. The %hh prefix distinguishes it from %r (Int32Regs). Both same-class and cross-class copy opcodes are 39552 (identical to Int32Regs same-class), because copies of packed values are simple 32-bit bitwise moves.

All vector types wider than 32 bits (v4f32, v2f64, v8i32, etc.) are illegal on NVPTX and must be split or scalarized during type legalization. See the vector legalization documentation for the split/scalarize dispatch.

The semantic alias Float16x2Regs appears in some wiki pages.

Int128Regs -- 128-bit Tensor Core Values

PropertyValue
Vtableoff_4A02460
PTX type.b128
Prefix%rq
Tag0x9
Width128 bits
Legal MVTsi128 (SM 70+)
Same-class copy39168
Cross-class copy39168

The widest register class, introduced for tensor core operations on Volta (SM 70) and later architectures. Holds 128-bit values used as operands and accumulators in mma and wmma instructions. The %rq prefix stands for "register quad" (4x32 bits). There is no cross-class copy path because no other class holds 128-bit values.

During register coalescing, 128-bit values are tracked as wide register pairs (two 64-bit halves). The coalescer at sub_3497B40 handles paired-register decomposition: when coalescing the low half, the high half inherits corresponding constraints.

An earlier raw report (p2c.5-01-register-alloc.txt) labeled off_4A02460 as SpecialRegs. This was an error in that report's identification. The vtable off_4A02460 emits .b128 / %rq, which is the 128-bit class for tensor core values, not a class for special/environment registers.

The Internal-Only Class -- off_4A026E0

PropertyValue
Vtableoff_4A026E0
PTX type"!Special!"
Prefix"!Special!"
Encoded IDNone

A tenth vtable address appears in the register info initialization path (sub_2163AB0). Both sub_2163730 and sub_21638D0 return the sentinel string "!Special!" for this vtable. It has no encoded ID, no PTX declaration, and never produces emitted registers. The string "ENVREG10" at register info offset +72 (alongside "Int1Regs" at offset +80) suggests this class is associated with environment registers -- hardware-defined read-only registers like %tid, %ctaid, %ntid, etc. These are emitted by dedicated special-register emission functions (sub_21E86B0, sub_21E9060) rather than through the register class encoding path.

Register Info Initialization

NVPTXRegisterInfo objects are created by two factory functions corresponding to the two pass manager generations:

Legacy PMNew PM
Factorysub_2149CD0sub_301F0C0
Initsub_2163AB0sub_30590F0
Object size224 bytes248 bytes

Both call sub_1F4A910 (TargetRegisterInfo::InitMCRegisterInfo) with the register descriptor table at off_49D26D0 and register unit data at unk_4327AF0. Key fields in the initialized structure:

OffsetContent
+44NumRegs (total register count)
+72"ENVREG10" (environment register class name)
+80"Int1Regs" (first register class name)
+96numRegClasses (initially 1, expanded during init)

Coalescing Constraints

The register coalescer imposes these constraints based on register class:

ClassCoalesceableConstraint Flag (offset +3, mask 0x10)
Int1RegsSame class onlySet
Int16RegsSame class onlySet
Int32RegsSame class onlySet (type code 12)
Int64RegsSame class onlySet (type code 13)
Float32RegsSame class onlySet (type code 15)
Float64RegsSame class onlySet
Int16HalfRegsSame class onlySet
Int32HalfRegsSame class onlySet
Int128RegsNever coalescedCleared

Int128Regs (the class at off_4A02460, previously mislabeled SpecialRegs in the coalescing page) has its constraint flag cleared, excluding it from the coalescing worklist entirely. This makes sense: tensor-core 128-bit values have specific register-pair relationships that the coalescer must not disturb.

Cross-class copies between Int32Regs/Float32Regs and between Int64Regs/Float64Regs are bitcasts that the coalescer never eliminates -- they must survive as explicit PTX mov instructions because the source and destination live in different register namespaces.

Differences from Upstream LLVM NVPTX

The upstream LLVM NVPTX backend (as of LLVM 20.0.0) defines these register classes in NVPTXRegisterInfo.td:

  • Int1Regs, Int16Regs, Int32Regs, Int64Regs -- identical.
  • Float16Regs, Float16x2Regs -- upstream names for cicc's Int16HalfRegs / Int32HalfRegs. The rename reflects NVIDIA's preference for the TableGen-derived integer-typed names.
  • Float32Regs, Float64Regs -- identical.
  • Int128Regs -- present in upstream, matches cicc.
  • No SpecialRegs class in upstream. Special registers are handled through dedicated physical registers, not a register class.
  • No off_4A026E0 internal-only class in upstream.

The encoding scheme (4-bit tag in [31:28], 28-bit index in [27:0]) and the fatal "Bad register class" error path are NVIDIA additions not present in upstream LLVM's NVPTX backend, which relies on standard MCRegisterInfo encoding.

Function Map

FunctionAddressSizeRole
Register class encoding (class tag OR index)sub_21583D04.6KB--
Register class -> PTX type suffix (.pred, .b32, .f32, ...)sub_21637301.7KB--
Register class -> PTX prefix (%p, %r, %f, ...)sub_21638D01.6KB--
Copy opcode dispatch by register classsub_21623503.0KB--
Stack frame + register declaration emissionsub_2158E8017KB--
NVPTXRegisterInfo init (legacy PM)sub_2163AB01.1KB--
NVPTXRegisterInfo factory (legacy PM)sub_2149CD0----
NVPTXRegisterInfo init (new PM)sub_30590F0----
NVPTXRegisterInfo factory (new PM)sub_301F0C0----
TargetRegisterInfo::InitMCRegisterInfosub_1F4A910----
Special register emission (%tid, %ctaid, %ntid, %nctaid)sub_21E86B0----
Cluster register emission (SM 90+)sub_21E9060----

Cross-References

  • Register Allocation -- greedy RA that operates on these classes; pressure tracking and -maxreg constraint
  • Register Coalescing -- same-class-only coalescing policy, copy opcode classification
  • PTX Emission -- function header orchestrator that calls the register declaration emitter
  • AsmPrinter -- per-instruction emission that calls the encoding function
  • Type Legalization -- vector type legalization driven by the Int32HalfRegs-only vector model
  • NVPTX Target Infrastructure -- NVPTXTargetMachine that owns the register info objects