NVPTX Register Classes
This page is the single authoritative reference for the nine NVPTX register classes used throughout cicc v13.0. Register class tables previously duplicated in Register Allocation, Register Coalescing, PTX Emission, and AsmPrinter are consolidated here. When those pages reference register classes, they should cross-reference this page rather than maintaining inline copies.
| Register encoding | sub_21583D0 (4.6KB) |
| PTX type suffix map | sub_2163730 (1.7KB) |
| PTX prefix map | sub_21638D0 (1.6KB) |
| Copy opcode dispatch | sub_2162350 (3.0KB) |
| Register info init (legacy) | sub_2163AB0 / sub_2149CD0 |
| Register info init (new PM) | sub_30590F0 / sub_301F0C0 |
| Register decl emission | sub_2158E80 (17KB) |
| Internal-only class vtable | off_4A026E0 |
The Nine Register Classes
NVPTX defines nine register classes that participate in PTX code generation. Each class is identified at runtime by its vtable pointer, which sub_2163730 and sub_21638D0 use as a switch key to produce the PTX type suffix and register prefix respectively. The encoding function sub_21583D0 maps each class to a 4-bit tag that occupies bits [31:28] of the 32-bit encoded register ID.
| Tag | Vtable | Class Name | PTX Type | Prefix | Encoded ID | Width | Description |
|---|---|---|---|---|---|---|---|
| 1 | off_4A027A0 | Int1Regs | .pred | %p | 0x10000000 | 1 | Predicate (boolean) |
| 2 | off_4A02720 | Int16Regs | .b16 | %rs | 0x20000000 | 16 | Short integer |
| 3 | off_4A025A0 | Int32Regs | .b32 | %r | 0x30000000 | 32 | General-purpose integer |
| 4 | off_4A024A0 | Int64Regs | .b64 | %rd | 0x40000000 | 64 | Double-width integer |
| 5 | off_4A02620 | Float32Regs | .f32 | %f | 0x50000000 | 32 | Single-precision float |
| 6 | off_4A02520 | Float64Regs | .f64 | %fd | 0x60000000 | 64 | Double-precision float |
| 7 | off_4A02760 | Int16HalfRegs | .b16 | %h | 0x70000000 | 16 | Half-precision float (f16, bf16) |
| 8 | off_4A026A0 | Int32HalfRegs | .b32 | %hh | 0x80000000 | 32 | Packed pair (v2f16, v2bf16, v2i16, v4i8) |
| 9 | off_4A02460 | Int128Regs | .b128 | %rq | 0x90000000 | 128 | 128-bit wide (tensor core) |
Naming Discrepancy
Two naming conventions exist in the codebase, depending on whether the name was recovered from the emission functions or from the register allocator context:
| Vtable | Emission name (sub_2163730/sub_21638D0) | RA-context name (sub_2162350) | Resolution |
|---|---|---|---|
off_4A02760 | Int16HalfRegs | Float16Regs | Same class. The emission functions use the TableGen-derived name Int16HalfRegs; the RA raw report uses the semantic alias Float16Regs. Both refer to off_4A02760. |
off_4A026A0 | Int32HalfRegs | Float16x2Regs | Same class. Int32HalfRegs is the TableGen name; Float16x2Regs is the semantic alias. Both refer to off_4A026A0. |
off_4A02460 | Int128Regs | SpecialRegs | Different raw reports assigned different names to off_4A02460. The emission report identifies it as Int128Regs (based on .b128 type and %rq prefix). The earlier RA sweep report labeled it SpecialRegs. The emission-derived name Int128Regs is more accurate: .b128 / %rq is used for 128-bit tensor-core values (i128 on SM 70+), not for special/environment registers. |
The tenth vtable off_4A026E0 is present in the binary but returns "!Special!" from both sub_2163730 and sub_21638D0. It is never assigned an encoded ID and never participates in register declaration emission. It is an internal-only sentinel class used within NVPTXRegisterInfo initialization (string "ENVREG10" at register info offset +72).
Throughout this wiki, the emission-derived names (Int16HalfRegs, Int32HalfRegs, Int128Regs) are canonical. Pages written before this consolidation may use the RA-context aliases.
Register Encoding Scheme -- sub_21583D0
Every virtual register in the NVPTX backend is encoded as a 32-bit value that packs the register class and a per-class index into a single integer. The encoding function at sub_21583D0 (4.6KB) implements this:
encoded_register = class_tag | (register_index & 0x0FFFFFFF)
The bit layout:
31 28 27 0
+------+-------------------------------+
| class| register index |
| tag | (28 bits) |
+------+-------------------------------+
- Bits [31:28] -- 4-bit class tag, values
0x1through0x9as listed in the table above. - Bits [27:0] -- 28-bit register index within that class, supporting up to 268 million registers per class.
The function operates in two modes:
-
Physical register (register_id >= 0): Returns the raw index directly (low 28 bits). Physical registers on NVPTX are a vestigial concept -- the target has no fixed register file -- but LLVM's infrastructure requires them for reserved registers like
%SPand%SPL. -
Virtual register (register_id < 0, i.e., bit 31 set in LLVM's internal convention): Looks up the register class from the
MachineRegisterInforegister map, matches the class vtable against the nine known vtable addresses, and returnsclass_encoded_id | (register_index & 0x0FFFFFFF).
If the vtable does not match any of the nine known classes, the function triggers a fatal error:
"Bad register class"
This is a hard abort, not a recoverable diagnostic. It indicates that either a new register class was added without updating the encoding function, or memory corruption produced an invalid vtable pointer.
Why Bits [31:28] and Not Bits [31:29]
LLVM's standard convention uses bit 31 (0x80000000) to distinguish physical from virtual registers internally. The NVPTX encoding reclaims this bit as part of the class tag because after encoding, the distinction between physical and virtual is no longer meaningful -- all registers in emitted PTX are virtual. Tag value 0x8 (Int32HalfRegs) has bit 31 set, which would collide with LLVM's virtual-register marker. This works because the encoding is applied only during emission, after register allocation is complete and the physical/virtual distinction is irrelevant.
Complete Class Separation
The nine register classes are completely disjoint. There is no cross-class interference: an Int32Regs register (%r) never conflicts with a Float32Regs register (%f) even though both are 32 bits wide. This is a fundamental consequence of PTX's typed register model. In PTX, .reg .b32 %r0 and .reg .f32 %f0 are distinct storage locations from ptxas's perspective. Two implications follow:
-
No cross-class coalescing. The register coalescer at
sub_34AF4A0enforces a same-class check on every coalescing candidate. Cross-class copies (e.g., a bitcast fromi32tof32) must survive as explicitmovinstructions in the emitted PTX. -
Per-class pressure accounting. The greedy register allocator at
sub_2F5A640tracks register pressure per class independently. The-maxreglimit bounds total live registers across all classes combined, but interference within any single class never spills over to another.
This is unlike CPU targets (x86, AArch64) where integer and floating-point registers can alias through sub-register relationships, or where a single physical register appears in multiple register classes.
Copy Opcodes -- sub_2162350
The function sub_2162350 (3.0KB, "Copy one register into another with a different width") dispatches copy instruction emission based on the source and destination register classes. Each class has two opcodes: one for same-class copies (e.g., mov.b32 %r1, %r0) and one for cross-class copies (e.g., bitcasting between Int32Regs and Float32Regs):
| Class | Same-Class Opcode | Cross-Class Opcode | Notes |
|---|---|---|---|
| Int1Regs | 39424 | 39424 | No distinct cross-class path |
| Int16Regs | 39296 | 39296 | No distinct cross-class path |
| Int32Regs | 39552 | 10816 | Cross = mov.b32 bitcast to float |
| Int64Regs | 39680 | 11008 | Cross = mov.b64 bitcast to double |
| Float32Regs | 30656 | 10880 | Cross = mov.b32 bitcast to integer |
| Float64Regs | 30784 | 11072 | Cross = mov.b64 bitcast to integer |
| Int16HalfRegs | 30528 | 10688 | Cross = mov.b16 half-to-short |
| Int32HalfRegs | 39552 | 39552 | Uses same opcode as Int32Regs same-class |
| Int128Regs | 39168 | 39168 | No distinct cross-class path |
Classes where both opcodes are identical (Int1Regs, Int16Regs, Int32HalfRegs, Int128Regs) have no meaningful cross-class copy path. For predicates (Int1Regs), this is because there is no other 1-bit type. For 128-bit registers, tensor-core values have no peer class to bitcast into. The Int32HalfRegs class shares its same-class opcode (39552) with Int32Regs because both emit .b32 copies -- the packed v2f16 value is simply treated as a 32-bit bitpattern for copying.
The five classes with distinct cross-class opcodes (Int32Regs, Int64Regs, Float32Regs, Float64Regs, Int16HalfRegs) are exactly those that participate in bitcast operations between integer and floating-point interpretations of the same bit width.
Register Declaration Emission -- sub_2158E80
During function body emission, sub_2158E80 (17KB) emits .reg declarations for every register class used by the function. The process:
- Iterate the register map at
this+800in the AsmPrinter state. - Deduplicate classes using a hash table at
this+808..832. - Track the maximum index per class across all virtual registers.
- Emit one declaration per class in the format:
.reg .pred %p<5>; // 5 predicate registers (indices 0..4)
.reg .b16 %rs<12>; // 12 short integer registers
.reg .b32 %r<47>; // 47 general-purpose 32-bit
.reg .b64 %rd<8>; // 8 double-width integer
.reg .f32 %f<20>; // 20 single-precision float
.reg .f64 %fd<3>; // 3 double-precision float
.reg .b16 %h<4>; // 4 half-precision float
.reg .b32 %hh<2>; // 2 packed-pair registers
.reg .b128 %rq<1>; // 1 tensor-core 128-bit register
The count for each class is max_register_index + 1. The PTX declaration syntax %prefix<N> declares registers %prefix0 through %prefix(N-1).
Note that Int16HalfRegs and Int16Regs share the same PTX type suffix (.b16) but have different prefixes (%h vs %rs). Similarly, Int32HalfRegs and Int32Regs share .b32 but use %hh vs %r. The PTX assembler ptxas treats these as completely separate register namespaces -- the prefix, not the type, determines the namespace.
Stack pointer registers (%SP, %SPL) are emitted before the class declarations when the function has a non-zero local frame. These use .b64 in 64-bit mode or .b32 in 32-bit mode.
Per-Class Detail
Int1Regs -- Predicates
| Property | Value |
|---|---|
| Vtable | off_4A027A0 |
| PTX type | .pred |
| Prefix | %p |
| Tag | 0x1 |
| Width | 1 bit |
| Legal MVTs | i1 |
| Same-class copy | 39424 |
Predicate registers hold boolean values used for conditional branches (@%p1 bra target), select instructions (selp), and set-predicate results (setp). They are the only 1-bit registers in PTX. There is no cross-class copy path because no other class holds 1-bit values. The coalescer excludes predicates from cross-class analysis entirely.
Int16Regs -- Short Integers
| Property | Value |
|---|---|
| Vtable | off_4A02720 |
| PTX type | .b16 |
| Prefix | %rs |
| Tag | 0x2 |
| Width | 16 bits |
| Legal MVTs | i16 |
| Same-class copy | 39296 |
Short integer registers hold 16-bit integer values. PTX .param space widens all scalars below 32 bits to .b32, so %rs registers appear primarily in computation, not in function signatures. The prefix %rs (register-short) distinguishes these from %h (Int16HalfRegs) even though both declare as .b16.
Int32Regs -- General-Purpose 32-bit
| Property | Value |
|---|---|
| Vtable | off_4A025A0 |
| PTX type | .b32 |
| Prefix | %r |
| Tag | 0x3 |
| Width | 32 bits |
| Legal MVTs | i32 |
| Same-class copy | 39552 |
| Cross-class copy | 10816 |
The workhorse register class. Holds 32-bit integers, addresses in 32-bit mode, loop indices, and general computation results. Cross-class copy opcode 10816 handles bitcast to Float32Regs (%f).
Int64Regs -- Double-Width Integer
| Property | Value |
|---|---|
| Vtable | off_4A024A0 |
| PTX type | .b64 |
| Prefix | %rd |
| Tag | 0x4 |
| Width | 64 bits |
| Legal MVTs | i64 |
| Same-class copy | 39680 |
| Cross-class copy | 11008 |
Holds 64-bit integers and device pointers in 64-bit mode (the common case). Cross-class copy opcode 11008 handles bitcast to Float64Regs (%fd).
Float32Regs -- Single-Precision Float
| Property | Value |
|---|---|
| Vtable | off_4A02620 |
| PTX type | .f32 |
| Prefix | %f |
| Tag | 0x5 |
| Width | 32 bits |
| Legal MVTs | f32 |
| Same-class copy | 30656 |
| Cross-class copy | 10880 |
Holds IEEE 754 single-precision floats. Note the .f32 type suffix rather than .b32 -- PTX distinguishes float from bitwise register types even at the same width. Cross-class copy opcode 10880 handles bitcast to Int32Regs (%r).
Float64Regs -- Double-Precision Float
| Property | Value |
|---|---|
| Vtable | off_4A02520 |
| PTX type | .f64 |
| Prefix | %fd |
| Tag | 0x6 |
| Width | 64 bits |
| Legal MVTs | f64 |
| Same-class copy | 30784 |
| Cross-class copy | 11072 |
Holds IEEE 754 double-precision floats. Cross-class copy opcode 11072 handles bitcast to Int64Regs (%rd).
Int16HalfRegs -- Half-Precision Float
| Property | Value |
|---|---|
| Vtable | off_4A02760 |
| PTX type | .b16 |
| Prefix | %h |
| Tag | 0x7 |
| Width | 16 bits |
| Legal MVTs | f16, bf16 |
| Same-class copy | 30528 |
| Cross-class copy | 10688 |
Despite the Int16 in the TableGen-derived name, this class holds half-precision floating-point values (f16 and bf16). The .b16 PTX type (bitwise 16-bit) is used rather than a hypothetical .f16 because PTX's type system uses .b16 for all 16-bit values that are not short integers. The %h prefix distinguishes these registers from %rs (Int16Regs). Cross-class copy opcode 10688 handles conversion to Int16Regs.
The semantic alias Float16Regs appears in some wiki pages and is equally valid.
Int32HalfRegs -- Packed Half-Precision Pairs
| Property | Value |
|---|---|
| Vtable | off_4A026A0 |
| PTX type | .b32 |
| Prefix | %hh |
| Tag | 0x8 |
| Width | 32 bits |
| Legal MVTs | v2f16, v2bf16, v2i16, v4i8 |
| Same-class copy | 39552 |
| Cross-class copy | 39552 |
This is the only register class for vector types on NVPTX. It holds exactly 32 bits of packed data: two f16 values, two bf16 values, two i16 values, or four i8 values. The %hh prefix distinguishes it from %r (Int32Regs). Both same-class and cross-class copy opcodes are 39552 (identical to Int32Regs same-class), because copies of packed values are simple 32-bit bitwise moves.
All vector types wider than 32 bits (v4f32, v2f64, v8i32, etc.) are illegal on NVPTX and must be split or scalarized during type legalization. See the vector legalization documentation for the split/scalarize dispatch.
The semantic alias Float16x2Regs appears in some wiki pages.
Int128Regs -- 128-bit Tensor Core Values
| Property | Value |
|---|---|
| Vtable | off_4A02460 |
| PTX type | .b128 |
| Prefix | %rq |
| Tag | 0x9 |
| Width | 128 bits |
| Legal MVTs | i128 (SM 70+) |
| Same-class copy | 39168 |
| Cross-class copy | 39168 |
The widest register class, introduced for tensor core operations on Volta (SM 70) and later architectures. Holds 128-bit values used as operands and accumulators in mma and wmma instructions. The %rq prefix stands for "register quad" (4x32 bits). There is no cross-class copy path because no other class holds 128-bit values.
During register coalescing, 128-bit values are tracked as wide register pairs (two 64-bit halves). The coalescer at sub_3497B40 handles paired-register decomposition: when coalescing the low half, the high half inherits corresponding constraints.
An earlier raw report (p2c.5-01-register-alloc.txt) labeled off_4A02460 as SpecialRegs. This was an error in that report's identification. The vtable off_4A02460 emits .b128 / %rq, which is the 128-bit class for tensor core values, not a class for special/environment registers.
The Internal-Only Class -- off_4A026E0
| Property | Value |
|---|---|
| Vtable | off_4A026E0 |
| PTX type | "!Special!" |
| Prefix | "!Special!" |
| Encoded ID | None |
A tenth vtable address appears in the register info initialization path (sub_2163AB0). Both sub_2163730 and sub_21638D0 return the sentinel string "!Special!" for this vtable. It has no encoded ID, no PTX declaration, and never produces emitted registers. The string "ENVREG10" at register info offset +72 (alongside "Int1Regs" at offset +80) suggests this class is associated with environment registers -- hardware-defined read-only registers like %tid, %ctaid, %ntid, etc. These are emitted by dedicated special-register emission functions (sub_21E86B0, sub_21E9060) rather than through the register class encoding path.
Register Info Initialization
NVPTXRegisterInfo objects are created by two factory functions corresponding to the two pass manager generations:
| Legacy PM | New PM | |
|---|---|---|
| Factory | sub_2149CD0 | sub_301F0C0 |
| Init | sub_2163AB0 | sub_30590F0 |
| Object size | 224 bytes | 248 bytes |
Both call sub_1F4A910 (TargetRegisterInfo::InitMCRegisterInfo) with the register descriptor table at off_49D26D0 and register unit data at unk_4327AF0. Key fields in the initialized structure:
| Offset | Content |
|---|---|
| +44 | NumRegs (total register count) |
| +72 | "ENVREG10" (environment register class name) |
| +80 | "Int1Regs" (first register class name) |
| +96 | numRegClasses (initially 1, expanded during init) |
Coalescing Constraints
The register coalescer imposes these constraints based on register class:
| Class | Coalesceable | Constraint Flag (offset +3, mask 0x10) |
|---|---|---|
| Int1Regs | Same class only | Set |
| Int16Regs | Same class only | Set |
| Int32Regs | Same class only | Set (type code 12) |
| Int64Regs | Same class only | Set (type code 13) |
| Float32Regs | Same class only | Set (type code 15) |
| Float64Regs | Same class only | Set |
| Int16HalfRegs | Same class only | Set |
| Int32HalfRegs | Same class only | Set |
| Int128Regs | Never coalesced | Cleared |
Int128Regs (the class at off_4A02460, previously mislabeled SpecialRegs in the coalescing page) has its constraint flag cleared, excluding it from the coalescing worklist entirely. This makes sense: tensor-core 128-bit values have specific register-pair relationships that the coalescer must not disturb.
Cross-class copies between Int32Regs/Float32Regs and between Int64Regs/Float64Regs are bitcasts that the coalescer never eliminates -- they must survive as explicit PTX mov instructions because the source and destination live in different register namespaces.
Differences from Upstream LLVM NVPTX
The upstream LLVM NVPTX backend (as of LLVM 20.0.0) defines these register classes in NVPTXRegisterInfo.td:
Int1Regs,Int16Regs,Int32Regs,Int64Regs-- identical.Float16Regs,Float16x2Regs-- upstream names for cicc'sInt16HalfRegs/Int32HalfRegs. The rename reflects NVIDIA's preference for the TableGen-derived integer-typed names.Float32Regs,Float64Regs-- identical.Int128Regs-- present in upstream, matches cicc.- No
SpecialRegsclass in upstream. Special registers are handled through dedicated physical registers, not a register class. - No
off_4A026E0internal-only class in upstream.
The encoding scheme (4-bit tag in [31:28], 28-bit index in [27:0]) and the fatal "Bad register class" error path are NVIDIA additions not present in upstream LLVM's NVPTX backend, which relies on standard MCRegisterInfo encoding.
Function Map
| Function | Address | Size | Role |
|---|---|---|---|
| Register class encoding (class tag OR index) | sub_21583D0 | 4.6KB | -- |
Register class -> PTX type suffix (.pred, .b32, .f32, ...) | sub_2163730 | 1.7KB | -- |
Register class -> PTX prefix (%p, %r, %f, ...) | sub_21638D0 | 1.6KB | -- |
| Copy opcode dispatch by register class | sub_2162350 | 3.0KB | -- |
| Stack frame + register declaration emission | sub_2158E80 | 17KB | -- |
| NVPTXRegisterInfo init (legacy PM) | sub_2163AB0 | 1.1KB | -- |
| NVPTXRegisterInfo factory (legacy PM) | sub_2149CD0 | -- | -- |
| NVPTXRegisterInfo init (new PM) | sub_30590F0 | -- | -- |
| NVPTXRegisterInfo factory (new PM) | sub_301F0C0 | -- | -- |
| TargetRegisterInfo::InitMCRegisterInfo | sub_1F4A910 | -- | -- |
| Special register emission (%tid, %ctaid, %ntid, %nctaid) | sub_21E86B0 | -- | -- |
| Cluster register emission (SM 90+) | sub_21E9060 | -- | -- |
Cross-References
- Register Allocation -- greedy RA that operates on these classes; pressure tracking and
-maxregconstraint - Register Coalescing -- same-class-only coalescing policy, copy opcode classification
- PTX Emission -- function header orchestrator that calls the register declaration emitter
- AsmPrinter -- per-instruction emission that calls the encoding function
- Type Legalization -- vector type legalization driven by the Int32HalfRegs-only vector model
- NVPTX Target Infrastructure -- NVPTXTargetMachine that owns the register info objects