R_CUDA Relocations
nvlink defines 119 CUDA-specific ELF relocation types for the EM_CUDA (190) machine type. These types are stored in .rela.* sections of device ELF (cubin) files and are consumed by the relocation engine during the link phase. Each type encodes how a resolved symbol address is patched into the instruction stream or data section: the bit-field width, the bit-field position within the 64- or 128-bit instruction word, and the computation to perform (absolute, PC-relative, hi/lo split, etc.).
The types are organized into two global descriptor tables baked into the nvlink binary's .rodata segment. A validation/dispatch function at sub_42F6C0 selects between them based on whether the relocation is a standard code/data relocation or a section-attribute relocation. The application engine at sub_468760 reads 64-byte per-type descriptors from these tables and performs up to three sequential bit-field patching actions per relocation.
Key Facts
| Property | Value |
|---|---|
| Machine type | EM_CUDA (190) |
| Total unique type names | 119 |
| Standard relocation table | off_1D37600 (117 entries, index 0--116) |
| Attribute relocation table | off_1D371E0 (65 entries, index 0--64) |
| Attribute type offset | 0x10000 (attribute type = standard type + 65536) |
| Descriptor size | 64 bytes per type (12-byte header + 3 actions x 16 bytes + 4-byte sentinel) |
| Validation function | sub_42F6C0 at 0x42F6C0 |
| Architecture class function | sub_42F8C0 at 0x42F8C0 |
| Max-type-for-class function | sub_42F690 at 0x42F690 |
| Application engine | sub_468760 at 0x468760 (14,322 bytes) |
| CUDA descriptor table | off_1D3DBE0 (used by relocation engine) |
| Mercury descriptor table | off_1D3CBE0 (Mercury types are CUDA types + 0x10000) |
Naming Convention
Every R_CUDA type name follows a systematic pattern:
R_CUDA_<category><bits>_<bitposition>
The components are:
- Category: the semantic class of the relocation (ABS, G, FUNC_DESC, TEX, etc.)
- Bits: the width of the relocated value in bits (8, 16, 19, 20, 21, 22, 24, 32, 47, 55, 56, 64, 128)
- Bit position: the starting bit offset within the instruction word where the value is inserted
For example, R_CUDA_ABS32_20 means: patch bits [20:52) of the instruction word with a 32-bit absolute address. R_CUDA_PCREL_IMM24_23 means: compute a PC-relative offset, take 24 bits, and insert starting at bit 23.
Some types use a compound suffix with _HI or _LO to indicate which half of a 32-bit value is being patched (high 16 bits or low 16 bits).
Dual Descriptor Tables
Standard Table (off_1D37600)
The standard relocation table at off_1D37600 contains 117 entries (indices 0 through 116, validated against limit 0x75 = 117). Each entry is a pointer pair in the table: the first pointer is the type name string (e.g., "R_CUDA_ABS32_20"), and additional fields encode the relocation class and architecture compatibility.
The validation function sub_42F6C0 checks:
// Standard relocation path
if (!is_attribute) {
if (type_index >= 117) // limit 0x75
error("unknown attribute");
entry = &off_1D37600[2 * type_index];
if (entry->arch_class > target_class)
warning("Relocation %s not supported on %s", entry->name, class_name);
}
Attribute Table (off_1D371E0)
The attribute relocation table at off_1D371E0 contains 65 entries (indices 0 through 64, validated against limit 0x41 = 65). Attribute relocations are identified by having their type encoded with the 0x10000 offset -- when the relocation engine encounters a type >= 0x10000, it subtracts 0x10000 and uses this table instead.
// Attribute relocation path (type >= 0x10000)
type_index -= 0x10000;
if (type_index >= 65) // limit 0x41
error("unknown attribute");
entry = &off_1D371E0[2 * type_index];
Attribute relocations apply to .nv.info.* attribute sections rather than to instruction streams. A separate validation function at sub_42F760 handles attribute-specific compatibility checking with a three-way dispatch based on the attribute usage field (dword_1D37D68[4 * type + 1]): value 0 = warning, value 1 = error, value 2 = silent ignore.
Architecture Class System
The function at sub_42F8C0 maps an SM version number to an architecture class used for relocation compatibility checking:
int reloc_arch_class(int sm_version) {
if (sm_version == 0) return 0; // invalid / unset
if (sm_version <= 70) return 1; // Kepler through Volta (sm_30--sm_70)
if (sm_version <= 72) return 2; // Volta extended (sm_72)
if (sm_version >= 76) return 5; // Ampere+ (sm_80--sm_90+)
return 3; // Turing (sm_75)
}
Each descriptor entry stores a minimum architecture class. The validation function compares the entry's class against the target to ensure the relocation type is supported on the architecture being linked. The five class names are stored in a string pointer array at off_1D371A0 (indexed 0--4), used in error/warning messages.
The maximum valid relocation index varies by architecture class. The function sub_42F690 scans backward from index 115 (the last non-special standard type) through the descriptor table, returning the first index whose architecture class is not 5 (the highest). This determines which types are valid for a given target.
Descriptor Format
Each relocation type has a 64-byte descriptor in the application engine's table (off_1D3DBE0 for CUDA, off_1D3CBE0 for Mercury). The descriptor is divided into a 12-byte header followed by three 16-byte action slots and a 4-byte sentinel:
Descriptor (64 bytes total):
+0 Header (12 bytes)
+0 uint32_t field_0; // Used by resolved-rela emitter (sub_46ADC0)
+4 uint32_t field_1; // Used by resolved-rela emitter
+8 uint32_t field_2; // Used by resolved-rela emitter
+12 Action 0 (16 bytes)
+12 uint32_t bit_offset; // Starting bit position in instruction word
+16 uint32_t bit_width; // Number of bits to patch
+20 uint32_t action_type; // Operation code (see table below)
+24 uint32_t reserved; // Padding / flags
+28 Action 1 (16 bytes)
+28 uint32_t bit_offset;
+32 uint32_t bit_width;
+36 uint32_t action_type;
+40 uint32_t reserved;
+44 Action 2 (16 bytes)
+44 uint32_t bit_offset;
+48 uint32_t bit_width;
+52 uint32_t action_type;
+56 uint32_t reserved;
+60 Sentinel (4 bytes, marks end of action array)
The application engine (sub_468760) iterates the three action slots sequentially. Action type 0x00 terminates the sequence early (the engine skips to the next slot and terminates only when the slot pointer reaches the sentinel). The engine indexes into the descriptor table and positions its action pointer and sentinel:
descriptor_base = table + (type_index << 6); // type_index * 64
action_ptr = descriptor_base + 12; // first action at byte offset +12
end_ptr = descriptor_base + 60; // sentinel at byte offset +60
The header fields at offsets +0, +4, +8 are not used by the application engine itself. They are consumed by the resolved-rela emitter (sub_46ADC0) during --preserve-relocs processing, where they specify up to three (present-flag, bit_offset, bit_width) triples for extracting the already-patched instruction fields back into addend records. The mapping is: descriptor uint32 indices [3,4,5] = action 0's extraction spec, [7,8,9] = action 1's, [11,12,13] = action 2's -- where the third element of each triple is the "present" flag gating whether extraction occurs.
Action Slot Processing Pseudocode
The core loop in sub_468760 processes each action slot in order. The following pseudocode captures the complete dispatch:
int reloc_apply_engine(
void* table, // descriptor table base (off_1D3DBE0 or off_1D3CBE0)
uint32_t type_index, // normalized relocation type index
bool is_absolute, // true if symbol has absolute address
uint64_t* patch_ptr, // pointer into section data (instruction words)
int64_t extra_offset, // reloc record extra field
int section_offset, // section base / PC address
uint64_t symbol_value, // resolved symbol address (S)
uint32_t symbol_size, // symbol st_size
uint32_t section_type, // section_type - 0x6FFFFF84
int64_t* output_value // receives extracted original value
) {
// Compute initial relocation value
uint64_t value = symbol_value;
if (is_absolute)
value = symbol_value + extra_offset; // S + A
uint8_t* desc = table + (type_index << 6);
uint32_t* action = (uint32_t*)(desc + 12); // first action slot
uint32_t* end = (uint32_t*)(desc + 60); // sentinel
*output_value = 0;
while (action != end) {
uint32_t bit_off = action[0];
uint32_t bit_wid = action[1];
uint32_t act_type = action[2];
switch (act_type) {
case 0x00: // END -- skip this slot, continue to next
action += 4;
break;
case 0x01: // ABS_FULL
case 0x12: // ABS_FULL (alias)
case 0x2E: // ABS_FULL (alias)
// Fast path: full 64-bit word write
if (bit_off == 0 && bit_wid == 64) {
if (!is_absolute) {
*output_value = *patch_ptr;
value += *patch_ptr;
}
*patch_ptr = value;
action += 4;
break;
}
// Narrow field: extract old, add, write back
if (!is_absolute) {
int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
value += old;
*output_value = old;
}
action += 4;
bitfield_write(patch_ptr, value, bit_off, bit_wid);
break;
case 0x06: // ABS_LO -- low bits of (S + A)
case 0x37: // ABS_LO (alias)
if (!is_absolute) {
int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
*output_value = old;
value += old;
}
// Write low 64 bits of value
bitfield_write(patch_ptr, (uint64_t)value, bit_off, bit_wid);
action += 4;
break;
case 0x07: // ABS_HI -- high 32 bits of (S + A)
case 0x38: // ABS_HI (alias)
if (!is_absolute) {
int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
*output_value = old;
value = (value >> 32) + old;
} else {
value = value >> 32;
}
bitfield_write(patch_ptr, value, bit_off, bit_wid);
action += 4;
break;
case 0x08: // ABS_SIZE -- S + A + symbol_size
if (!is_absolute) {
int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
value = old + symbol_size;
*output_value = old;
} else {
value = extra_offset + symbol_size;
}
bitfield_write(patch_ptr, value, bit_off, bit_wid);
action += 4;
break;
case 0x09: // SHIFTED_2 -- (S + A) >> 2
value >>= 2;
if (!is_absolute) {
int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
value += old;
*output_value = old;
}
bitfield_write(patch_ptr, value, bit_off, bit_wid);
action += 4;
break;
case 0x0A: // SEC_TYPE_LO -- low nybble of section type delta
value = section_type & (uint64_t)(0xFF >> (8 - bit_wid));
if (!is_absolute)
value += bitfield_extract(patch_ptr, bit_off, bit_wid);
bitfield_write(patch_ptr, value, bit_off, bit_wid);
action += 4;
break;
case 0x0B: // SEC_TYPE_HI -- high nybble of section type delta
value = (section_type >> 4) & (uint64_t)(0xFF >> (8 - bit_wid));
if (!is_absolute)
value += bitfield_extract(patch_ptr, bit_off, bit_wid);
bitfield_write(patch_ptr, value, bit_off, bit_wid);
action += 4;
break;
case 0x10: // PC_REL -- (int32_t)(S + A) - PC
if (!is_absolute) {
int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
value += old;
*output_value = old;
}
value = (int32_t)value - section_offset;
bitfield_write(patch_ptr, value, bit_off, bit_wid);
action += 4;
break;
case 0x13: // CLEAR -- zero the bit-field
case 0x14: // CLEAR (alias)
bitfield_write(patch_ptr, 0, bit_off, bit_wid);
action += 4;
break;
case 0x16: case 0x17: case 0x18: case 0x19: // MASKED_SHIFT 0-3
case 0x1A: case 0x1B: case 0x1C: case 0x1D: // MASKED_SHIFT 4-7
case 0x2F: case 0x30: case 0x31: case 0x32: // MASKED_SHIFT 8-11
case 0x33: case 0x34: case 0x35: case 0x36: // MASKED_SHIFT 12-15
{
int idx = act_type - 22;
if (!is_absolute) {
int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
value += old;
*output_value = old;
}
value = (value & mask_table[idx]) >> shift_table[idx];
bitfield_write(patch_ptr, value, bit_off, bit_wid);
action += 4;
break;
}
default:
return 0; // Unknown action -- caller emits "unexpected NVRS"
}
if (action == end)
return 1;
}
return 1;
}
Action Types
| Code | Name | Computation |
|---|---|---|
0x00 | end | Skip slot; terminate if at sentinel |
0x01 | abs_full | S + A -- full absolute, store all bits |
0x06 | abs_lo | (S + A) & mask -- low bits of absolute |
0x07 | abs_hi | ((S + A) >> 32) & mask -- high 32 bits of absolute |
0x08 | abs_size | S + A + symbol_size -- absolute plus symbol size addend |
0x09 | abs_shifted | (S + A) >> 2 -- absolute right-shifted by 2 (4-byte aligned) |
0x0A | sec_type_lo | section_type & (0xFF >> (8 - width)) -- low nybble of section type |
0x0B | sec_type_hi | (section_type >> 4) & (0xFF >> (8 - width)) -- high nybble of section type |
0x10 | pc_rel | (int32_t)(S + A) - PC -- PC-relative offset |
0x12 | abs_full | Alias of 0x01 (same behavior) |
0x13 | clear | Zero the bit-field (write 0) |
0x14 | clear | Alias of 0x13 (same behavior) |
0x16--0x1D | masked_shift_0..7 | (S + A) & mask_table[n] >> shift_table[n] |
0x2E | abs_full | Alias of 0x01 (same behavior, different encoding) |
0x2F--0x36 | masked_shift_8..15 | (S + A) & mask_table[n] >> shift_table[n] |
0x37 | abs_lo | Alias of 0x06 |
0x38 | abs_hi | Alias of 0x07 |
The masked-shift actions (codes 0x16--0x1D and 0x2F--0x36) use a pair of SSE constant vectors loaded from xmmword_1D3F8E0 through xmmword_1D3F930. These contain mask and shift values indexed by (action_type - 22), enabling a single code path to handle 16 different extract-and-place patterns for multi-field instruction encodings.
Bit-field Patching
The engine uses two helper functions for the actual bit manipulation:
sub_468670(bitfield_extract): extracts the current value of a bit-field from the instruction wordsub_4685B0(bitfield_write): splices a new value into a bit-field
Both handle fields that span 64-bit word boundaries. The instruction data is treated as an array of uint64_t words (little-endian). The general algorithms:
Extraction (sub_468670): Given a starting bit position and width, extract the field value:
int64_t bitfield_extract(uint64_t* words, int bit_offset, int bit_width) {
// Normalize: advance pointer past full 64-bit words
if (bit_offset >= 64) {
words += (bit_offset / 64);
bit_offset = bit_offset % 64;
}
int end_bit = bit_offset + bit_width;
// Single-word case: field fits within one uint64_t
if (end_bit <= 64)
return *words << (64 - end_bit) >> (64 - bit_width);
// Multi-word case: recursive split across 64-bit boundary
int64_t low = bitfield_extract(words, bit_offset, 64 - bit_offset);
int64_t high;
if (end_bit - 64 > 64) {
// Three-word span (up to 192 bits, theoretical max)
int64_t mid = bitfield_extract(words + 1, 0, 64);
high = bitfield_extract(words + 2, 0, end_bit - 128);
} else {
// Two-word span (common case for 128-bit instructions)
high = words[1] << (128 - end_bit) >> (64 - (end_bit - 64));
}
return low | (high << (64 - bit_offset));
}
Insertion (sub_4685B0): Given a value, starting bit position, and width, splice the value into the instruction words:
void bitfield_write(uint64_t* words, uint64_t value, int bit_offset, int bit_width) {
// Normalize: advance pointer past full 64-bit words
if (bit_offset >= 64) {
words += (bit_offset / 64);
bit_offset = bit_offset % 64;
}
int end_bit = bit_offset + bit_width;
// Multi-word case: iterate through intermediate words
if (end_bit > 64) {
uint64_t* end_word = words + ((end_bit - 65) / 64) + 1;
int off = bit_offset;
while (words != end_word) {
// Preserve bits below bit_offset, fill above with value
*words = (*words & ~(-1ULL << off)) | (value << off);
value >>= (64 - off);
off = 0;
words++;
}
end_bit = end_bit - (((end_bit - 65) / 64) * 64) - 64;
bit_width = end_bit;
}
// Final (or only) word: read-modify-write with constructed mask
// mask = bit_width ones positioned at [end_bit - bit_width, end_bit)
uint64_t mask = (-1ULL << (64 - bit_width)) >> (64 - end_bit);
*words = (*words & ~mask) | ((value << (64 - bit_width)) >> (64 - end_bit));
}
The mask formula (-1ULL << (64 - W)) >> (64 - E) where E = offset + W creates a window of W ones starting at bit position E - W. The value is aligned to the same position using an identical pair of shifts. This is a standard bit-field insertion idiom that avoids branching.
Worked Example: R_CUDA_ABS32_26
R_CUDA_ABS32_26 (standard table index 5) is a common relocation type that patches a 32-bit absolute address into a SASS instruction starting at bit 26. This section traces the complete path from relocation record to patched instruction.
Scenario: A MOV instruction at offset 0x100 in .text references a global symbol _Z10my_kernelPi resolved to address 0x0000_0000_DEAD_BEEF. The .rela.text section contains:
Elf64_Rela {
r_offset = 0x100, // instruction offset within .text
r_info = (sym << 32) | 5, // type = 5 (R_CUDA_ABS32_26)
r_addend = 0 // no addend
}
Step 1: Descriptor lookup. The relocation engine selects the CUDA descriptor table at off_1D3DBE0 and computes the descriptor address:
descriptor = off_1D3DBE0 + (5 << 6) = off_1D3DBE0 + 320
The 64-byte descriptor for type 5 contains (reconstructed from the type semantics):
Offset Bytes (hex) Interpretation
------ ----------- --------------
+0 xx xx xx xx xx xx xx xx xx xx xx xx Header (12 bytes, used by sub_46ADC0)
+12 1A 00 00 00 action[0].bit_offset = 26
+16 20 00 00 00 action[0].bit_width = 32
+20 01 00 00 00 action[0].action_type = 0x01 (ABS_FULL)
+24 00 00 00 00 action[0].reserved = 0
+28 00 00 00 00 action[1].bit_offset = 0
+32 00 00 00 00 action[1].bit_width = 0
+36 00 00 00 00 action[1].action_type = 0x00 (END)
+40 00 00 00 00 action[1].reserved = 0
+44 00 00 00 00 action[2].bit_offset = 0
+48 00 00 00 00 action[2].bit_width = 0
+52 00 00 00 00 action[2].action_type = 0x00 (END)
+56 00 00 00 00 action[2].reserved = 0
+60 xx xx xx xx Sentinel (4 bytes)
The engine positions its pointers: action_ptr = descriptor + 12, end_ptr = descriptor + 60.
Step 2: Value computation. The relocation is not absolute (is_absolute = false), so value = symbol_value = 0xDEAD_BEEF.
Step 3: Action dispatch. The engine reads action slot 0:
bit_offset = 26bit_width = 32action_type = 0x01(ABS_FULL)
This is not the fast path (bit_offset != 0 || bit_width != 64), so the engine takes the narrow-field path.
Step 4: Extract old value. Since is_absolute == false, the engine extracts the existing 32-bit field from the instruction word. Assume the instruction word at patch_ptr is 0x0000_0000_0000_0000 (field pre-initialized to zero):
old = bitfield_extract(patch_ptr, 26, 32)
With end_bit = 26 + 32 = 58 <= 64, this is the single-word case:
old = *patch_ptr << (64 - 58) >> (64 - 32)
= 0 << 6 >> 32
= 0
The engine computes value = value + old = 0xDEAD_BEEF + 0 = 0xDEAD_BEEF and stores *output_value = 0.
Step 5: Write new value. The engine constructs a mask and writes the value into the instruction word. With bit_offset = 26, bit_width = 32, end_bit = 58:
mask = (-1ULL << (64 - 32)) >> (64 - 58)
= 0xFFFF_FFFF_0000_0000 >> 6
= 0x03FF_FFFF_FC00_0000
value_positioned = (0xDEAD_BEEF << (64 - 32)) >> (64 - 58)
= (0xDEAD_BEEF_0000_0000) >> 6
= 0x037A_B6FB_BC00_0000
*patch_ptr = (*patch_ptr & ~mask) | value_positioned
= (0 & 0xFC00_0000_03FF_FFFF) | 0x037A_B6FB_BC00_0000
= 0x037A_B6FB_BC00_0000
The 32 bits of 0xDEAD_BEEF now occupy bits [26:58) of the instruction word:
Bit layout of patched instruction word:
bits [63:58] = 0b000000 (unchanged)
bits [57:26] = 0xDEAD_BEEF (patched value)
bits [25:0] = 0x0000000 (unchanged)
Step 6: Advance and terminate. The action pointer advances by 16 bytes to action slot 1, which has action_type = 0x00 (END). The engine skips to the next slot. The action pointer advances to offset +44 (action slot 2), which is also END. After advancing once more, action_ptr == end_ptr (offset +60), so the engine returns 1 (success).
Multi-word variant: If the relocation were R_CUDA_ABS47_34 (47-bit field at bit offset 34), the field would span two 64-bit words: bits [34:64) in word 0 (30 bits) and bits [0:17) in word 1 (17 bits). The engine would call bitfield_write which iterates: first writing the low 30 bits into word 0 at offset 34, then shifting value right by 30 and writing the remaining 17 bits into word 1 at offset 0.
Relocation Categories
The 119 types fall into 15 semantic categories based on their name prefix and purpose.
No-op and Sentinel
| Index | Name | Description |
|---|---|---|
| 0 | R_CUDA_NONE | No relocation (placeholder / deleted) |
| 116 | R_CUDA_NONE_LAST | Sentinel marking end of valid type range |
Full-Width Data Relocations
These apply to data sections (.nv.global, .nv.constant*, etc.) rather than instructions.
| Index | Name | Bits | Description |
|---|---|---|---|
| 1 | R_CUDA_32 | 32 | 32-bit absolute address |
| 2 | R_CUDA_32_HI | 16 | Upper 16 bits of 32-bit address |
| 3 | R_CUDA_32_LO | 16 | Lower 16 bits of 32-bit address |
| 4 | R_CUDA_64 | 64 | 64-bit absolute address |
Byte-Level Relocations (R_CUDA_8_*)
Byte-granularity relocations for patching individual bytes within data structures, typically in descriptor tables or attribute sections.
| Index | Name | Byte offset | Description |
|---|---|---|---|
| 5 | R_CUDA_8_0 | 0 | Byte at offset 0 |
| 6 | R_CUDA_8_8 | 1 | Byte at offset 8 bits |
| 7 | R_CUDA_8_16 | 2 | Byte at offset 16 bits |
| 8 | R_CUDA_8_24 | 3 | Byte at offset 24 bits |
| 9 | R_CUDA_8_32 | 4 | Byte at offset 32 bits |
| 10 | R_CUDA_8_40 | 5 | Byte at offset 40 bits |
| 11 | R_CUDA_8_48 | 6 | Byte at offset 48 bits |
| 12 | R_CUDA_8_56 | 7 | Byte at offset 56 bits |
Global Address Relocations (R_CUDA_G*)
Used for global memory address references. These are the primary relocations for .nv.global section symbols.
| Index | Name | Bits | Description |
|---|---|---|---|
| 13 | R_CUDA_G32 | 32 | 32-bit global address |
| 14 | R_CUDA_G64 | 64 | 64-bit global address |
| 15 | R_CUDA_G8_0 | 8 | Global byte at offset 0 |
| 16 | R_CUDA_G8_8 | 8 | Global byte at offset 8 |
| 17 | R_CUDA_G8_16 | 8 | Global byte at offset 16 |
| 18 | R_CUDA_G8_24 | 8 | Global byte at offset 24 |
| 19 | R_CUDA_G8_32 | 8 | Global byte at offset 32 |
| 20 | R_CUDA_G8_40 | 8 | Global byte at offset 40 |
| 21 | R_CUDA_G8_48 | 8 | Global byte at offset 48 |
| 22 | R_CUDA_G8_56 | 8 | Global byte at offset 56 |
Absolute Instruction Relocations (R_CUDA_ABS*)
These patch absolute addresses into SASS instruction bit-fields. The first number is the bit-width of the value; the second is the starting bit position within the instruction word.
| Index | Name | Width | Bit pos | Description |
|---|---|---|---|---|
| 23 | R_CUDA_ABS16_20 | 16 | 20 | 16-bit absolute at bit 20 |
| 24 | R_CUDA_ABS16_23 | 16 | 23 | 16-bit absolute at bit 23 |
| 25 | R_CUDA_ABS16_26 | 16 | 26 | 16-bit absolute at bit 26 |
| 26 | R_CUDA_ABS16_32 | 16 | 32 | 16-bit absolute at bit 32 |
| 27 | R_CUDA_ABS20_44 | 20 | 44 | 20-bit absolute at bit 44 |
| 28 | R_CUDA_ABS24_20 | 24 | 20 | 24-bit absolute at bit 20 |
| 29 | R_CUDA_ABS24_23 | 24 | 23 | 24-bit absolute at bit 23 |
| 30 | R_CUDA_ABS24_26 | 24 | 26 | 24-bit absolute at bit 26 |
| 31 | R_CUDA_ABS24_32 | 24 | 32 | 24-bit absolute at bit 32 |
| 32 | R_CUDA_ABS24_40 | 24 | 40 | 24-bit absolute at bit 40 |
| 33 | R_CUDA_ABS32_20 | 32 | 20 | 32-bit absolute at bit 20 |
| 34 | R_CUDA_ABS32_23 | 32 | 23 | 32-bit absolute at bit 23 |
| 35 | R_CUDA_ABS32_26 | 32 | 26 | 32-bit absolute at bit 26 |
| 36 | R_CUDA_ABS32_32 | 32 | 32 | 32-bit absolute at bit 32 |
| 37 | R_CUDA_ABS32_HI_20 | 16 | 20 | High 16 bits of 32-bit absolute at bit 20 |
| 38 | R_CUDA_ABS32_HI_23 | 16 | 23 | High 16 bits of 32-bit absolute at bit 23 |
| 39 | R_CUDA_ABS32_HI_26 | 16 | 26 | High 16 bits of 32-bit absolute at bit 26 |
| 40 | R_CUDA_ABS32_HI_32 | 16 | 32 | High 16 bits of 32-bit absolute at bit 32 |
| 41 | R_CUDA_ABS32_LO_20 | 16 | 20 | Low 16 bits of 32-bit absolute at bit 20 |
| 42 | R_CUDA_ABS32_LO_23 | 16 | 23 | Low 16 bits of 32-bit absolute at bit 23 |
| 43 | R_CUDA_ABS32_LO_26 | 16 | 26 | Low 16 bits of 32-bit absolute at bit 26 |
| 44 | R_CUDA_ABS32_LO_32 | 16 | 32 | Low 16 bits of 32-bit absolute at bit 32 |
| 45 | R_CUDA_ABS47_34 | 47 | 34 | 47-bit absolute at bit 34 (sm_75+ wide immediate) |
| 46 | R_CUDA_ABS55_16_34 | 55 | 34 | 55-bit absolute at bit 34 (16-bit aligned) |
| 47 | R_CUDA_ABS56_16_34 | 56 | 34 | 56-bit absolute at bit 34 (16-bit aligned) |
The HI/LO variants are used in instruction pairs where a 32-bit address is split across two instructions: one loads the upper 16 bits (MOV32I or LUI-like) and the other loads the lower 16 bits.
The ABS47, ABS55, and ABS56 types are newer additions (sm_75+/sm_90+) that exploit wider immediate fields in Turing and later ISA encodings.
PC-Relative Relocations (R_CUDA_PCREL_*)
| Index | Name | Width | Bit pos | Description |
|---|---|---|---|---|
| 48 | R_CUDA_PCREL_IMM24_23 | 24 | 23 | PC-relative 24-bit immediate at bit 23 |
| 49 | R_CUDA_PCREL_IMM24_26 | 24 | 26 | PC-relative 24-bit immediate at bit 26 |
PC-relative relocations compute (S + A) - PC where PC is the address of the instruction being patched. These are used for branch instructions (BRA, BRX, CALL). The 24-bit width limits the branch offset to +/- 8M instructions (each instruction being 8 or 16 bytes depending on encoding).
Function Descriptor Relocations (R_CUDA_FUNC_DESC_*)
These relocate references to function descriptor entries, used for indirect calls, virtual function tables, and device-side function pointers.
| Index | Name | Bits | Description |
|---|---|---|---|
| 50 | R_CUDA_FUNC_DESC_32 | 32 | 32-bit function descriptor reference |
| 51 | R_CUDA_FUNC_DESC_64 | 64 | 64-bit function descriptor reference |
| 52 | R_CUDA_FUNC_DESC_8_0 | 8 | Descriptor byte at offset 0 |
| 53 | R_CUDA_FUNC_DESC_8_8 | 8 | Descriptor byte at offset 8 |
| 54 | R_CUDA_FUNC_DESC_8_16 | 8 | Descriptor byte at offset 16 |
| 55 | R_CUDA_FUNC_DESC_8_24 | 8 | Descriptor byte at offset 24 |
| 56 | R_CUDA_FUNC_DESC_8_32 | 8 | Descriptor byte at offset 32 |
| 57 | R_CUDA_FUNC_DESC_8_40 | 8 | Descriptor byte at offset 40 |
| 58 | R_CUDA_FUNC_DESC_8_48 | 8 | Descriptor byte at offset 48 |
| 59 | R_CUDA_FUNC_DESC_8_56 | 8 | Descriptor byte at offset 56 |
| 60 | R_CUDA_FUNC_DESC32_20 | 32 | 20 |
| 61 | R_CUDA_FUNC_DESC32_23 | 32 | 23 |
| 62 | R_CUDA_FUNC_DESC32_32 | 32 | 32 |
| 63 | R_CUDA_FUNC_DESC32_HI_20 | 16 | 20 |
| 64 | R_CUDA_FUNC_DESC32_HI_23 | 16 | 23 |
| 65 | R_CUDA_FUNC_DESC32_HI_32 | 16 | 32 |
| 66 | R_CUDA_FUNC_DESC32_LO_20 | 16 | 20 |
| 67 | R_CUDA_FUNC_DESC32_LO_23 | 16 | 23 |
| 68 | R_CUDA_FUNC_DESC32_LO_32 | 16 | 32 |
The byte-level variants (FUNC_DESC_8_) are used for patching function descriptors in data sections rather than instruction immediate fields. The instruction variants (FUNC_DESC32_) patch call instructions that embed the function descriptor index.
Texture, Sampler, and Surface Relocations
These handle bindable resource references in SASS instructions.
| Index | Name | Description |
|---|---|---|
| 69 | R_CUDA_TEX_HEADER_INDEX | Texture header index (binds to .nv.tex.header) |
| 70 | R_CUDA_TEX_SLOT | Texture slot number |
| 71 | R_CUDA_SAMP_HEADER_INDEX | Sampler header index |
| 72 | R_CUDA_SAMP_SLOT | Sampler slot number |
| 73 | R_CUDA_SAMP_HEADER_INDEX_0 | Sampler header index (variant 0) |
| 74 | R_CUDA_SURF_HEADER_INDEX | Surface header index |
| 75 | R_CUDA_SURF_SLOT | Surface slot number |
| 76 | R_CUDA_SURF_HW_DESC | Surface hardware descriptor |
| 77 | R_CUDA_SURF_HW_SW_DESC | Surface combined hardware/software descriptor |
These are resolved during linking by looking up the resource in the merged texture/sampler/surface header tables. The HEADER_INDEX types reference the global .nv.tex.header / .nv.samp.header / .nv.surf.header sections, while SLOT types reference the logical binding slot number.
Constant Bank Relocations (R_CUDA_CONST_FIELD*)
Relocations for references into constant memory banks (.nv.constant0, .nv.constant2, etc.).
| Index | Name | Width | Bit pos | Description |
|---|---|---|---|---|
| 78 | R_CUDA_CONST_FIELD19_20 | 19 | 20 | 19-bit constant offset at bit 20 |
| 79 | R_CUDA_CONST_FIELD19_23 | 19 | 23 | 19-bit constant offset at bit 23 |
| 80 | R_CUDA_CONST_FIELD19_26 | 19 | 26 | 19-bit constant offset at bit 26 |
| 81 | R_CUDA_CONST_FIELD19_28 | 19 | 28 | 19-bit constant offset at bit 28 |
| 82 | R_CUDA_CONST_FIELD19_40 | 19 | 40 | 19-bit constant offset at bit 40 |
| 83 | R_CUDA_CONST_FIELD21_20 | 21 | 20 | 21-bit constant offset at bit 20 |
| 84 | R_CUDA_CONST_FIELD21_23 | 21 | 23 | 21-bit constant offset at bit 23 |
| 85 | R_CUDA_CONST_FIELD21_26 | 21 | 26 | 21-bit constant offset at bit 26 |
| 86 | R_CUDA_CONST_FIELD21_38 | 21 | 38 | 21-bit constant offset at bit 38 |
| 87 | R_CUDA_CONST_FIELD22_37 | 22 | 37 | 22-bit constant offset at bit 37 |
The 19-bit variants can address up to 512 KB of constant memory (19 bits * 4-byte aligned = 2 MB byte addressable, or 512 K dwords). The 21-bit and 22-bit variants (sm_75+) expand the addressable range for larger constant banks.
Bindless Texture Relocations (R_CUDA_TEX_BINDLESSOFF* / R_CUDA_BINDLESSOFF*)
| Index | Name | Width | Bit pos | Description |
|---|---|---|---|---|
| 88 | R_CUDA_TEX_BINDLESSOFF13_32 | 13 | 32 | Texture bindless offset at bit 32 |
| 89 | R_CUDA_TEX_BINDLESSOFF13_41 | 13 | 41 | Texture bindless offset at bit 41 |
| 90 | R_CUDA_TEX_BINDLESSOFF13_45 | 13 | 45 | Texture bindless offset at bit 45 |
| 91 | R_CUDA_TEX_BINDLESSOFF13_47 | 13 | 47 | Texture bindless offset at bit 47 |
| 92 | R_CUDA_TEX_SLOT9_49 | 9 | 49 | Texture slot 9-bit at bit 49 |
| 93 | R_CUDA_BINDLESSOFF13_36 | 13 | 36 | Generic bindless offset at bit 36 |
| 94 | R_CUDA_BINDLESSOFF14_40 | 14 | 40 | Generic bindless 14-bit offset at bit 40 |
Bindless texture relocations patch the bindless resource offset into texture sampling instructions. The 13-bit width supports up to 8192 unique textures per kernel launch. See Bindless Relocations for the full resolution pipeline.
Unified Table Relocations (R_CUDA_UNIFIED_*)
Relocations for the Unified Descriptor Table (UDT) and Unified Function Table (UFT). These are the primary relocation types for CUDA Dynamic Parallelism and indirect function calls through the unified tables.
| Index | Name | Bits | Description |
|---|---|---|---|
| 95 | R_CUDA_UNIFIED | special | Unified table reference (generic) |
| 96 | R_CUDA_UNIFIED_32 | 32 | 32-bit unified table offset |
| 97 | R_CUDA_UNIFIED32_HI_32 | 16 | High 16 bits of unified table offset at bit 32 |
| 98 | R_CUDA_UNIFIED32_LO_32 | 16 | Low 16 bits of unified table offset at bit 32 |
| 99 | R_CUDA_UNIFIED_8_0 | 8 | Unified byte at offset 0 |
| 100 | R_CUDA_UNIFIED_8_8 | 8 | Unified byte at offset 8 |
| 101 | R_CUDA_UNIFIED_8_16 | 8 | Unified byte at offset 16 |
| 102 | R_CUDA_UNIFIED_8_24 | 8 | Unified byte at offset 24 |
| 103 | R_CUDA_UNIFIED_8_32 | 8 | Unified byte at offset 32 |
| 104 | R_CUDA_UNIFIED_8_40 | 8 | Unified byte at offset 40 |
| 105 | R_CUDA_UNIFIED_8_48 | 8 | Unified byte at offset 48 |
| 106 | R_CUDA_UNIFIED_8_56 | 8 | Unified byte at offset 56 |
During the relocation phase, unified relocations (types 102--113 in the internal remapping) are translated to their base equivalents. Relocations targeting synthetic symbols (__UFT_OFFSET, __UDT_OFFSET, __UFT_CANONICAL, __UDT_CANONICAL, __UDT, __UFT, __UFT_END, __UDT_END) are resolved to type 0 (no-op) because the unified table manager has already computed the final offsets.
Instruction-Level Relocations (R_CUDA_INSTRUCTION*)
| Index | Name | Width | Description |
|---|---|---|---|
| 107 | R_CUDA_INSTRUCTION64 | 64 | Full 64-bit instruction replacement |
| 108 | R_CUDA_INSTRUCTION128 | 128 | Full 128-bit instruction replacement |
These are whole-instruction relocations that replace the entire instruction word. Used by the instruction-level optimization pass (peephole) and for instruction encoding conversions where the entire instruction must be rewritten (e.g., converting a 64-bit instruction to a 128-bit encoding or vice versa).
Yield Relocations (R_CUDA_YIELD_*)
| Index | Name | Description |
|---|---|---|
| 109 | R_CUDA_YIELD_OPCODE9_0 | Patch 9-bit opcode field at bit 0 for YIELD |
| 110 | R_CUDA_YIELD_CLEAR_PRED4_87 | Clear 4-bit predicate field at bit 87 for YIELD |
YIELD relocations are used to convert YIELD instructions to NOP when forward-progress guarantees are required. The relocation engine checks the forward-progress-required flag (ctx+94) and suppresses the conversion if active, with the trace message: "Ignoring the reloc to convert YIELD to NOP due to forward progress requirement."
Unused-Clear Relocations
| Index | Name | Width | Description |
|---|---|---|---|
| 111 | R_CUDA_UNUSED_CLEAR32 | 32 | Clear 32 bits (write zeros) |
| 112 | R_CUDA_UNUSED_CLEAR64 | 64 | Clear 64 bits (write zeros) |
These zero out fields in sections that are no longer needed after linking, such as placeholder entries in merged data sections.
Miscellaneous Types
| Index | Name | Description |
|---|---|---|
| 113 | R_CUDA_QUERY_DESC21_37 | 21-bit query descriptor offset at bit 37 |
| 114 | R_CUDA_6_31 | 6-bit value at bit 31 |
| 115 | R_CUDA_2_47 | 2-bit value at bit 47 |
The QUERY_DESC type is used for CUDA's query descriptor mechanism. The 6-bit and 2-bit types are narrow-field relocations for specific instruction encoding fields in newer architectures.
Per-Architecture Vtable
In addition to the descriptor-table-driven application engine, nvlink maintains a per-architecture relocation vtable created by sub_459640 (16,109 bytes). This 632-byte table (79 function pointer slots, 8 bytes each) contains architecture-specific handler functions for relocation types that require different patching behavior across GPU generations.
The vtable is populated based on the target SM range:
| SM Range | Architecture | Notes |
|---|---|---|
| 30--39 | Kepler | Shared "legacy" handler set |
| 50--59 | Maxwell | Adds additional instruction-field handlers |
| 60--69 | Pascal | Adds 60+ series handlers, wider field support |
| 70--72, 73--79 | Volta/Turing | New instruction format, result[33]/result[34] populated |
| 80--88, 89 | Ampere/Ada | Adds bindless handlers, new field variants |
| 90--99 | Hopper | Major differences in slots 10/11/28/50--53, new desc types |
| 100--103, 110--121 | Mercury (Blackwell+) | Distinct handler for slot 13, new constant field sizes |
The vtable is allocated via sub_4307C0 (arena allocator) and the first 78 slots are populated. Slots that are not explicitly set remain NULL (zero), and the relocation engine skips NULL handlers. This is how unsupported relocation types are detected at runtime -- a NULL vtable entry for a required type triggers an error.
Mercury vs CUDA Type Mapping
Mercury (sm >= 100) uses a parallel set of relocation types offset by 0x10000. When the linker context's ELF class byte (offset +7) is 'A' (0x41, indicating Mercury), the relocation engine subtracts 0x10000 from the type and uses the Mercury descriptor table (off_1D3CBE0) instead of the CUDA table (off_1D3DBE0).
if (ctx->elf_class == 'A') { // Mercury
if (reloc_type <= 0x10000)
fatal("unexpected reloc"); // should always be >= 0x10000
reloc_type -= 0x10000;
descriptor_table = off_1D3CBE0; // Mercury table
} else { // CUDA
descriptor_table = off_1D3DBE0; // CUDA table
}
Both tables have the same structure (64 bytes per entry) but different action encodings reflecting the different instruction formats between pre-Mercury (64-bit SASS) and Mercury (128-bit SASS) architectures.
Validation Error Messages
The validation infrastructure produces these diagnostic messages:
| Source function | Message | Condition |
|---|---|---|
sub_42F6C0 | "unknown attribute" | Type index exceeds table bounds |
sub_42F6C0 | "Relocation %s not supported on %s" | Architecture class mismatch |
sub_42F760 | "unknown attribute" | Attribute type index > 96 |
sub_42F760 | "Attribute %s not supported on %s" | Attribute arch class mismatch |
sub_42F760 | "unknown usage" | Usage field has unrecognized value |
sub_42F850 | "STO_CUDA_OBSCURE" | Symbol with obscure storage class |
sub_469D60 | "unexpected reloc" | Mercury type found without Mercury context |
The error descriptors at unk_2A5BAB0 (warning) and unk_2A5BAC0 (error) control whether these diagnostics are warnings or fatal errors.
Full Type Catalog
The complete list of 119 R_CUDA relocation types extracted from nvlink v13.0.88, sorted by name:
| # | Name | Type |
|---|---|---|
| 1 | R_CUDA_2_47 | misc |
| 2 | R_CUDA_32 | data |
| 3 | R_CUDA_32_HI | data |
| 4 | R_CUDA_32_LO | data |
| 5 | R_CUDA_6_31 | misc |
| 6 | R_CUDA_64 | data |
| 7 | R_CUDA_8_0 | byte |
| 8 | R_CUDA_8_16 | byte |
| 9 | R_CUDA_8_24 | byte |
| 10 | R_CUDA_8_32 | byte |
| 11 | R_CUDA_8_40 | byte |
| 12 | R_CUDA_8_48 | byte |
| 13 | R_CUDA_8_56 | byte |
| 14 | R_CUDA_8_8 | byte |
| 15 | R_CUDA_ABS16_20 | abs-instr |
| 16 | R_CUDA_ABS16_23 | abs-instr |
| 17 | R_CUDA_ABS16_26 | abs-instr |
| 18 | R_CUDA_ABS16_32 | abs-instr |
| 19 | R_CUDA_ABS20_44 | abs-instr |
| 20 | R_CUDA_ABS24_20 | abs-instr |
| 21 | R_CUDA_ABS24_23 | abs-instr |
| 22 | R_CUDA_ABS24_26 | abs-instr |
| 23 | R_CUDA_ABS24_32 | abs-instr |
| 24 | R_CUDA_ABS24_40 | abs-instr |
| 25 | R_CUDA_ABS32_20 | abs-instr |
| 26 | R_CUDA_ABS32_23 | abs-instr |
| 27 | R_CUDA_ABS32_26 | abs-instr |
| 28 | R_CUDA_ABS32_32 | abs-instr |
| 29 | R_CUDA_ABS32_HI_20 | abs-instr |
| 30 | R_CUDA_ABS32_HI_23 | abs-instr |
| 31 | R_CUDA_ABS32_HI_26 | abs-instr |
| 32 | R_CUDA_ABS32_HI_32 | abs-instr |
| 33 | R_CUDA_ABS32_LO_20 | abs-instr |
| 34 | R_CUDA_ABS32_LO_23 | abs-instr |
| 35 | R_CUDA_ABS32_LO_26 | abs-instr |
| 36 | R_CUDA_ABS32_LO_32 | abs-instr |
| 37 | R_CUDA_ABS47_34 | abs-instr |
| 38 | R_CUDA_ABS55_16_34 | abs-instr |
| 39 | R_CUDA_ABS56_16_34 | abs-instr |
| 40 | R_CUDA_BINDLESSOFF13_36 | bindless |
| 41 | R_CUDA_BINDLESSOFF14_40 | bindless |
| 42 | R_CUDA_CONST_FIELD19_20 | const |
| 43 | R_CUDA_CONST_FIELD19_23 | const |
| 44 | R_CUDA_CONST_FIELD19_26 | const |
| 45 | R_CUDA_CONST_FIELD19_28 | const |
| 46 | R_CUDA_CONST_FIELD19_40 | const |
| 47 | R_CUDA_CONST_FIELD21_20 | const |
| 48 | R_CUDA_CONST_FIELD21_23 | const |
| 49 | R_CUDA_CONST_FIELD21_26 | const |
| 50 | R_CUDA_CONST_FIELD21_38 | const |
| 51 | R_CUDA_CONST_FIELD22_37 | const |
| 52 | R_CUDA_FUNC_DESC_32 | func-desc |
| 53 | R_CUDA_FUNC_DESC32_20 | func-desc |
| 54 | R_CUDA_FUNC_DESC32_23 | func-desc |
| 55 | R_CUDA_FUNC_DESC32_32 | func-desc |
| 56 | R_CUDA_FUNC_DESC32_HI_20 | func-desc |
| 57 | R_CUDA_FUNC_DESC32_HI_23 | func-desc |
| 58 | R_CUDA_FUNC_DESC32_HI_32 | func-desc |
| 59 | R_CUDA_FUNC_DESC32_LO_20 | func-desc |
| 60 | R_CUDA_FUNC_DESC32_LO_23 | func-desc |
| 61 | R_CUDA_FUNC_DESC32_LO_32 | func-desc |
| 62 | R_CUDA_FUNC_DESC_64 | func-desc |
| 63 | R_CUDA_FUNC_DESC_8_0 | func-desc |
| 64 | R_CUDA_FUNC_DESC_8_16 | func-desc |
| 65 | R_CUDA_FUNC_DESC_8_24 | func-desc |
| 66 | R_CUDA_FUNC_DESC_8_32 | func-desc |
| 67 | R_CUDA_FUNC_DESC_8_40 | func-desc |
| 68 | R_CUDA_FUNC_DESC_8_48 | func-desc |
| 69 | R_CUDA_FUNC_DESC_8_56 | func-desc |
| 70 | R_CUDA_FUNC_DESC_8_8 | func-desc |
| 71 | R_CUDA_G32 | global |
| 72 | R_CUDA_G64 | global |
| 73 | R_CUDA_G8_0 | global |
| 74 | R_CUDA_G8_16 | global |
| 75 | R_CUDA_G8_24 | global |
| 76 | R_CUDA_G8_32 | global |
| 77 | R_CUDA_G8_40 | global |
| 78 | R_CUDA_G8_48 | global |
| 79 | R_CUDA_G8_56 | global |
| 80 | R_CUDA_G8_8 | global |
| 81 | R_CUDA_INSTRUCTION128 | instr |
| 82 | R_CUDA_INSTRUCTION64 | instr |
| 83 | R_CUDA_NONE | sentinel |
| 84 | R_CUDA_NONE_LAST | sentinel |
| 85 | R_CUDA_PCREL_IMM24_23 | pc-rel |
| 86 | R_CUDA_PCREL_IMM24_26 | pc-rel |
| 87 | R_CUDA_QUERY_DESC21_37 | misc |
| 88 | R_CUDA_SAMP_HEADER_INDEX | sampler |
| 89 | R_CUDA_SAMP_HEADER_INDEX_0 | sampler |
| 90 | R_CUDA_SAMP_SLOT | sampler |
| 91 | R_CUDA_SURF_HEADER_INDEX | surface |
| 92 | R_CUDA_SURF_HW_DESC | surface |
| 93 | R_CUDA_SURF_HW_SW_DESC | surface |
| 94 | R_CUDA_SURF_SLOT | surface |
| 95 | R_CUDA_TEX_BINDLESSOFF13_32 | bindless |
| 96 | R_CUDA_TEX_BINDLESSOFF13_41 | bindless |
| 97 | R_CUDA_TEX_BINDLESSOFF13_45 | bindless |
| 98 | R_CUDA_TEX_BINDLESSOFF13_47 | bindless |
| 99 | R_CUDA_TEX_HEADER_INDEX | texture |
| 100 | R_CUDA_TEX_SLOT | texture |
| 101 | R_CUDA_TEX_SLOT9_49 | texture |
| 102 | R_CUDA_UNIFIED | unified |
| 103 | R_CUDA_UNIFIED_32 | unified |
| 104 | R_CUDA_UNIFIED32_HI_32 | unified |
| 105 | R_CUDA_UNIFIED32_LO_32 | unified |
| 106 | R_CUDA_UNIFIED_8_0 | unified |
| 107 | R_CUDA_UNIFIED_8_16 | unified |
| 108 | R_CUDA_UNIFIED_8_24 | unified |
| 109 | R_CUDA_UNIFIED_8_32 | unified |
| 110 | R_CUDA_UNIFIED_8_40 | unified |
| 111 | R_CUDA_UNIFIED_8_48 | unified |
| 112 | R_CUDA_UNIFIED_8_56 | unified |
| 113 | R_CUDA_UNIFIED_8_8 | unified |
| 114 | R_CUDA_UNUSED_CLEAR32 | clear |
| 115 | R_CUDA_UNUSED_CLEAR64 | clear |
| 116 | R_CUDA_YIELD_CLEAR_PRED4_87 | yield |
| 117 | R_CUDA_YIELD_OPCODE9_0 | yield |
| 118 | R_CUDA_INSTRUCTION64 | instr |
| 119 | R_CUDA_INSTRUCTION128 | instr |
Note: The catalog contains 119 unique name strings as extracted from the binary. Some types appear in both the standard and attribute tables with the same name but different table indices and different descriptor actions. The total across both tables is 117 + 65 = 182 table slots, but many attribute slots share names with their standard counterparts.
Cross-References
- Relocation Phase -- the pipeline stage that consumes these types
- Finalization Phase -- second-pass relocation application
- Relocation Application Engine -- the bit-field patching engine
- Bindless Relocations -- bindless texture/surface resolution
- Symbol Resolution -- symbol resolution that feeds resolved addresses to relocation application
- Section Merging -- merge phase that collects
.rela.*sections from input objects - Binary Layout -- locations of descriptor tables in the nvlink binary
Sibling Wiki
- ptxas wiki: Relocations & Symbols -- how ptxas generates R_CUDA and R_MERCURY relocation entries during code emission (the producer side of what nvlink consumes)
Confidence Assessment
| Claim | Confidence | Evidence |
|---|---|---|
sub_42F6C0 validates standard table at off_1D37600 (117 entries, limit 0x75) | HIGH | Decompiled sub_42F6C0_0x42f6c0.c line 26: a1 >= 0x75; line 25: &off_1D37600 |
sub_42F6C0 validates attribute table at off_1D371E0 (65 entries, limit 0x41) | HIGH | Decompiled line 17: &off_1D371E0; line 18: a1 < 0x41 |
| Attribute type offset is 0x10000 | HIGH | Decompiled line 15: a1 -= 0x10000 |
sub_42F8C0 arch class mapping: sm<=70 class 1, sm<=72 class 2, sm>=76 class 5, sm 73-75 class 3 | HIGH | Decompiled sub_42F8C0_0x42f8c0.c: 2*(a1>=76)+3 formula confirmed |
sub_42F6C0 emits "unknown attribute" on bounds violation | HIGH | Decompiled line 21: exact string literal |
Five architecture class names at off_1D371A0 | HIGH | Decompiled line 36: &off_1D371A0 + v10 |
64-byte descriptor entries at off_1D3DBE0 (CUDA) and off_1D3CBE0 (Mercury) | HIGH | Decompiled sub_468760: type_index << 6 indexing confirmed |
Application engine sub_468760 at 0x468760 with 10-parameter signature | HIGH | Decompiled file exists with matching signature |
| 119 unique R_CUDA type name strings extracted from binary | HIGH | All names extracted from nvlink_strings.json; complete catalog verified |
"STO_CUDA_OBSCURE" emitted by sub_42F850 | HIGH | String confirmed in nvlink_strings.json |
| Action types and aliases (0x01/0x12/0x2E, 0x06/0x37, 0x07/0x38) | MEDIUM | Reconstructed from decompiled sub_468760 switch; full per-action verification not performed |
Per-architecture vtable with 79 slots from sub_459640 | MEDIUM | Function exists; slot count inferred from 632-byte allocation (79 * 8) |
| YIELD relocation forward-progress check at ctx+94 | MEDIUM | Reconstructed from decompiled relocation phase analysis |