Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Relocation Application Engine

The relocation application engine is the bit-level instruction patching core of nvlink's relocation pipeline. It comprises three tightly-coupled layers: the main relocation loop (sub_469D60), the descriptor-driven action dispatcher (sub_468760), and two bit-field helper functions (sub_468670 for extraction and sub_4685B0 for insertion). Together they transform unresolved relocation records into patched instruction words and data, encoding resolved symbol addresses into the non-byte-aligned bit fields that GPU instructions use for immediates, offsets, and function descriptors.

This page documents all three layers in reimplementation-grade detail. For the surrounding pipeline context -- how the relocation phase fits between layout and finalization, the 10-step resolution algorithm, alias chain resolution, dead function filtering, and unified relocation remapping -- see Relocation Phase. For the catalog of relocation type names and their descriptor table entries, see R_CUDA Relocations.

Key Facts

PropertyValue
Main relocation loopsub_469D60 at 0x469D60 (26,578 bytes, 985 lines)
Action dispatchersub_468760 at 0x468760 (14,322 bytes, 582 lines)
Bit-field extractorsub_468670 at 0x468670 (~240 bytes)
Bit-field writersub_4685B0 at 0x4685B0 (~240 bytes)
Resolved-rela emittersub_46ADC0 at 0x46ADC0 (11,515 bytes)
CUDA descriptor tableoff_1D3DBE0 -- indexed by raw R_CUDA type
Mercury descriptor tableoff_1D3CBE0 -- indexed by R_MERCURY type minus 0x10000
Descriptor entry size64 bytes (12-byte header + 3 x 16-byte action slots + 4-byte sentinel)
Maximum bit-field width128 bits (spans up to three 64-bit words)
SSE optimization_mm_loadu_si128 for 128-bit relocation record loading

Architecture Overview

sub_469D60 (apply_relocations)                       MAIN LOOP
 |
 |  for each reloc_node in linked list at ctx+376:
 |    1. Load 32-byte relocation record via _mm_loadu_si128 (2x 128-bit)
 |    2. Resolve addend symbol (sub_440590)
 |    3. Select descriptor table (Mercury vs CUDA)
 |    4. Resolve target symbol, get section record
 |    5. Handle special cases (aliases, dead funcs, unified remapping)
 |    6. Walk section data chunks to find patch_ptr
 |    7. Call sub_468760 (action dispatcher)
 |    8. Unlink node from list, optionally preserve for resolved-rela
 |
 +---> sub_468760 (action dispatcher)                ACTION LOOP
        |
        |  descriptor = table + (type << 6)
        |  for each action in descriptor[+12..+60]:
        |    switch (action_type):
        |      case 1/0x12/0x2E: ABS_FULL
        |      case 6/0x37:      ABS_LO
        |      case 7/0x38:      ABS_HI
        |      case 8:           PC_REL_SIZE
        |      case 9:           SHIFTED_2
        |      case 0xA:         SEC_TYPE_LO
        |      case 0xB:         SEC_TYPE_HI
        |      case 0x10:        PC_REL
        |      case 0x13/0x14:   CLEAR
        |      case 0x16-0x1D,0x2F-0x36: MASKED_SHIFT
        |
        +---> sub_468670 (extract old value)         BIT-FIELD READ
        +---> sub_4685B0 (write new value)           BIT-FIELD WRITE

Main Relocation Loop: sub_469D60

Signature

char apply_relocations(
    void*               ctx,          // a1: linker context object
    pthread_mutexattr_t* mutex_attr   // a2: mutex attributes (passed through)
);

Relocation Record Format

Each relocation is stored as a 32-byte record accessed through a singly-linked list at ctx+376. Each list node is a 16-byte pair: [next_ptr, record_ptr]. The 32-byte record is loaded as two 128-bit SSE values via _mm_loadu_si128:

// At line 236-237 of sub_469D60:
v162 = _mm_loadu_si128(v5);        // bytes [0:16]  of reloc record
v163 = _mm_loadu_si128(v5 + 1);    // bytes [16:32] of reloc record

The record layout:

Offset  Size    Field             Access in decompiled code
------  ------  ----------------  ---------------------------------
+0      int64   addend            v5->m128i_i64[0]
+8      int32   reloc_type        (uint32_t)(v5->m128i_i64[1])
+12     int32   symbol_index      SHIDWORD(v5->m128i_i64[1])  (v5->m128i_i32[3])
+16     uint32  section_idx       v5[1].m128i_u32[2]
+20     uint32  sym_addend_idx    v5[1].m128i_i32[3]
+24     int64   extra             v5[1].m128i_i64[0]

The SSE _mm_loadu_si128 loads are an optimization: rather than reading individual fields with scalar loads, the entire 32-byte record is loaded in two unaligned 128-bit operations, which the CPU can handle efficiently. The decompiler shows the record stored in __m128i typed variables (v5 is an __m128i*), with fields accessed through the union members .m128i_i32[], .m128i_u32[], and .m128i_i64[].

Loop Structure Pseudocode

void apply_relocations(linker_ctx* ctx, pthread_mutexattr_t* mutex_attr) {
    // Initialize output buffers
    int64_t  output_value = 0;
    __m128i  record_lo, record_hi;
    uint16_t link_type = *(uint16_t*)(ctx + 16);

    if (!*(uint8_t*)(ctx + 81))
        sub_44DB00(ctx);              // pre-pass initialization

    reloc_node* node = *(reloc_node**)(ctx + 376);
    reloc_node* prev = NULL;

    while (node != NULL) {
        reloc_record* rec = node->record;

        // ---- Step 1: Addend symbol resolution ----
        uint32_t sym_addend_idx = rec->sym_addend_idx;   // offset +20
        if (sym_addend_idx != 0) {
            symbol_rec* sym = sub_440590(ctx, sym_addend_idx);
            rec->addend += *(int64_t*)(sym + 8);
        }

        // ---- Step 2: Architecture-dependent table selection ----
        uint32_t reloc_type = (uint32_t)rec->reloc_info;
        uint32_t flags_mask = (*(uint8_t*)(ctx + 7) == 'A') ? 1 : 0x80000000;
        uint32_t adjusted_type;
        void** descriptor_table;

        if (flags_mask & *(uint32_t*)(ctx + 48)) {
            // Mercury path
            descriptor_table = &off_1D3CBE0;
            if (reloc_type != 0) {
                if (reloc_type <= 0x10000)
                    error("unexpected reloc");
                adjusted_type = reloc_type - 0x10000;
            } else {
                adjusted_type = 0;
            }
        } else {
            // CUDA path
            descriptor_table = &off_1D3DBE0;
            adjusted_type = reloc_type;
        }

        // ---- Step 3: Symbol resolution ----
        symbol_rec* target_sym = sub_440590(ctx, rec->symbol_index);
        int sym_section = sub_440350(ctx, target_sym);

        // ---- Step 4: Section lookup ----
        section_rec* sec = sub_442270(ctx, rec->section_idx);
        section_rec* parent = sub_442270(ctx, sec->parent_idx);  // +44

        // ---- Step 5: Special handling ----
        // (UFT/UDT magic section 0x6FFFFF0E, alias chains, dead func filtering,
        //  unified relocation remapping -- see pipeline/relocate.md for full detail)

        // ---- Step 6: Descriptor compatibility check ----
        // Validate PC-relative branch is within same section:
        if (descriptor_table[5 * adjusted_type] == 16 &&
            rec->section_idx != parent->output_idx)
            error("PC relative branch address should be in the same section");

        // ---- Step 7: Locate patch address in section data ----
        chunk_node* chunk = *(chunk_node**)(parent + 72);
        uint64_t target_offset = rec->addend;
        uint64_t* patch_ptr = NULL;

        while (chunk) {
            chunk_data* data = chunk->data;
            if (target_offset >= data->base) {
                uint64_t delta = target_offset - data->base;
                if (delta < data->size) {
                    patch_ptr = data->buffer + delta;
                    break;
                }
            }
            chunk = chunk->next;
        }
        if (!patch_ptr)
            error("reloc address not found");

        // ---- Step 8: Verbose trace ----
        if (*(uint8_t*)(ctx + 64) & 4)
            fprintf(stderr, "resolve reloc %d for sym=%d+%lld at "
                    "<section=%d,offset=%llx>\n",
                    reloc_type, rec->symbol_index,
                    rec->extra, rec->section_idx, rec->addend);

        // ---- Step 9: Apply via action dispatcher ----
        bool is_absolute = (sec->sh_type == 4);  // SHT_RELA
        int success = sub_468760(
            descriptor_table,           // a1: table base
            adjusted_type,              // a2: type index
            is_absolute,                // a3: absolute flag
            patch_ptr,                  // a4: instruction word pointer
            rec->extra,                 // a5: extra offset field
            rec->addend,                // a6: addend / section offset
            *(uint64_t*)(target_sym + 8),  // a7: resolved symbol value
            *(uint32_t*)(target_sym + 28), // a8: symbol st_size
            parent->sh_type - 0x6FFFFF84,  // a9: section type delta
            &output_value               // a10: receives extracted old value
        );

        if (!success)
            error("unexpected NVRS");

        // ---- Step 10: Unlink node and optionally preserve ----
        reloc_node* next = node->next;
        if (prev)
            prev->next = next;
        else
            *(reloc_node**)(ctx + 376) = next;

        // Preserve-relocs path (--preserve-relocs, byte at ctx+85)
        if (*(uint8_t*)(ctx + 85) &&
            ((target_sym->st_bind & 3) != 1 ||     // not STB_LOCAL
             (sym_section != 0 && parent_has_data)))
        {
            if (sec->sh_type != 4)     // not SHT_RELA
                rec->extra = output_value;
            sub_4644C0(rec, ctx + 384);  // append to preserve-relocs list
        } else {
            sub_431000(node->record);    // free record
        }
        sub_431000(node);                // free list node

        node = prev ? *prev->next_field : *(reloc_node**)(ctx + 376);
    }

    // ---- Post-loop: .nv.rel.action emission (non-Mercury, link_type == 2) ----
    // ... (see section below)
}

SSE Relocation Record Loading

The use of _mm_loadu_si128 at line 236-237 of the decompiled code is a deliberate optimization. Instead of 8 separate 32-bit loads (or 4 separate 64-bit loads) to access the 32-byte relocation record, the compiler (or original author) uses two 128-bit unaligned loads:

v162 = _mm_loadu_si128(v5);        // load bytes [0:15]  into xmm register
v163 = _mm_loadu_si128(v5 + 1);    // load bytes [16:31] into xmm register

The _mm_loadu_si128 intrinsic generates a single MOVDQU instruction on x86-64, which loads 16 bytes from an unaligned address into an XMM register. This is cache-friendly (two cache-line-width loads vs. many scattered accesses) and avoids potential store-forwarding stalls. The individual fields are then extracted from the __m128i values using union member access (.m128i_i32[n], .m128i_i64[n]).

Symbol Resolution During Relocation

Symbol resolution in sub_469D60 is a multi-step process:

  1. Addend symbol resolution (sym_addend_idx at record offset +20): If nonzero, sub_440590(ctx, sym_addend_idx) returns the symbol record, and *(int64_t*)(sym + 8) (the resolved symbol value) is added to the record's addend at offset +0. This implements the S + A (symbol + addend) pattern.

  2. Target symbol resolution (symbol_index at record offset +12): sub_440590(ctx, SHIDWORD(rec->reloc_info)) returns the target symbol record. The symbol's resolved address at offset +8 becomes the a7 (symbol_value) argument to sub_468760.

  3. Section-relative vs absolute: The is_absolute flag (parameter a3 to the action dispatcher) is derived from whether the target section type equals SHT_RELA (type 4). When is_absolute == true, the engine computes value = symbol_value + extra_offset and writes it directly. When false (the common case), the engine first extracts the existing bit-field value from the instruction word, adds it to the symbol value, and writes back the sum -- implementing addend-based relocation.

  4. Alias chain traversal: When a target symbol is a weak function (STT_FUNC = 2) with an unresolved value (offset +8 is zero), the function follows the alias chain via sub_440350 to find the canonical definition. The verbose trace "change alias reloc %s to %s\n" is emitted when this occurs.

Section Data Chunk Walk

Section data in nvlink is not stored as a flat buffer. Instead, each section record has a linked list of data chunks at offset +72. Each chunk node has the structure:

chunk_node:
    [0]  next pointer     (chunk_node* or NULL)
    [1]  data descriptor  (chunk_data*)

chunk_data:
    [0]  buffer pointer   (void*)
    [1]  base offset      (uint64_t) -- starting offset within section
    [3]  size             (uint64_t) -- number of bytes in this chunk

The relocation loop walks this list linearly to find the chunk containing target_offset:

while (chunk) {
    chunk_data* data = chunk->data;
    if (target_offset >= data->base) {
        uint64_t delta = target_offset - data->base;
        if (delta < data->size) {
            patch_ptr = (uint64_t*)(data->buffer + delta);
            break;
        }
    }
    chunk = chunk->next;
}

The patch_ptr is then passed directly to the action dispatcher, which reads and modifies the instruction word(s) at that address.

Descriptor Table Lookup

Table Selection

The descriptor table is selected based on the architecture flag at ctx+7:

ArchitectureFlag byteDescriptor tableRelocation type normalization
Mercury (SM100+)'A' (0x41)off_1D3CBE0type -= 0x10000 (Mercury types are CUDA types + 65536)
CUDA (pre-Mercury)Otheroff_1D3DBE0type used directly

Descriptor Entry Structure

Each descriptor entry is located at table_base + (reloc_type_index << 6), yielding a 64-byte record. The << 6 shift is equivalent to multiplying by 64 (the entry size). The 64-byte layout is:

Byte offset  Size    Field
-----------  ------  -----------------------------------------------
+0           12 B    Header (3 x uint32: flags, mode, reserved)
                     Used by sub_46ADC0 for preserve-relocs extraction;
                     field at +5 (in uint32 units, i.e. +20 bytes) holds
                     the descriptor mode (16 = PC-relative)
+12          16 B    action[0]  {bit_offset, bit_width, action_type, reserved}
+28          16 B    action[1]  {bit_offset, bit_width, action_type, reserved}
+44          16 B    action[2]  {bit_offset, bit_width, action_type, reserved}
+60          4 B     Sentinel   (address stored in v100, marks end of action array)

Each action slot is 16 bytes = 4 x uint32_t:

struct reloc_action {
    uint32_t bit_offset;    // starting bit position in instruction word
    uint32_t bit_width;     // number of bits to patch
    uint32_t action_type;   // operation code (0 = END/skip)
    uint32_t reserved;      // unused / flags
};

Descriptor Header and Preserve-Relocs Extraction

The 12-byte header at the start of each descriptor entry is not used by the action dispatcher (sub_468760), but it is consumed by the resolved-rela emitter (sub_46ADC0). During preserve-relocs processing, sub_46ADC0 reads three field specifications from the header at uint32 offsets (+3,+4,+5), (+7,+8,+9), and (+11,+12,+13), each encoding a bit-field extraction recipe for recovering the instruction-encoded portions of the relocation value after patching.

Action Dispatcher: sub_468760

Signature

int reloc_apply_engine(
    void*       descriptor_table,    // a1: off_1D3CBE0 or off_1D3DBE0
    uint32_t    reloc_type_index,    // a2: normalized type index into table
    bool        is_absolute,         // a3: 1 if symbol has absolute address
    uint64_t*   patch_ptr,           // a4: pointer into section data
    int64_t     extra_offset,        // a5: reloc_record->extra field
    int         section_offset,      // a6: addend / section base offset
    uint64_t    symbol_value,        // a7: resolved symbol address
    uint32_t    symbol_size,         // a8: symbol st_size
    uint32_t    section_type_delta,  // a9: section_type - 0x6FFFFF84
    int64_t*    output_value         // a10: receives extracted old value
);
// Returns 1 on success, 0 on unrecognized action type.

Initialization and Value Computation

Before the action loop begins, the engine performs three setup operations:

// Line 122-128 of decompiled sub_468760:
uint64_t value = symbol_value;           // a7 -> v10
if (is_absolute)
    value = symbol_value + extra_offset; // a7 + a5
*output_value = 0;                       // zero the output

// Pre-load SSE constants for masked-shift actions:
__m128i mask0 = _mm_load_si128(&xmmword_1D3F8E0);  // mask table [0:1]
__m128i mask1 = _mm_load_si128(&xmmword_1D3F8F0);  // mask table [2:3]
__m128i mask2 = _mm_load_si128(&xmmword_1D3F900);  // mask table [4:5]
__m128i mask3 = _mm_load_si128(&xmmword_1D3F910);  // mask table [6:7]

// Compute descriptor entry pointer and action boundaries:
uint64_t desc_entry = table_base + ((uint64_t)reloc_type_index << 6);
uint32_t* action_ptr = (uint32_t*)(desc_entry + 12);   // first action
uint32_t* sentinel   = (uint32_t*)(desc_entry + 60);   // end marker

The four SSE constant loads (_mm_load_si128) happen once at function entry. They load 64 bytes of mask data into local storage, avoiding repeated memory access during the masked-shift action cases. The shift table (xmmword_1D3F920 and xmmword_1D3F930) is loaded on-demand inside the masked-shift case body.

Action Loop Pseudocode

while (true) {
    uint32_t action_type = action_ptr[2];    // v15[2] = action_type

    switch (action_type) {
    case 0:  // END
        action_ptr += 4;                     // advance to next 16-byte slot
        if (action_ptr == sentinel) return 1;
        continue;

    case 1: case 0x12: case 0x2E: {          // ABS_FULL
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];

        // Fast path: full 64-bit word write
        if (bit_off == 0 && bit_wid == 64) {
            if (!is_absolute) {
                *output_value = *patch_ptr;
                value += *patch_ptr;
            }
            *patch_ptr = value;
            action_ptr += 4;
            if (action_ptr == sentinel) return 1;
            continue;
        }

        // General path: bit-field write
        if (!is_absolute) {
            int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
            value += old;
            *output_value = old;
        }
        action_ptr += 4;
        bitfield_write(patch_ptr, value, bit_off, bit_wid);
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 6: case 0x37: {                     // ABS_LO (low 32 bits)
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];
        uint32_t lo = (uint32_t)value;       // truncate to low 32 bits

        if (!is_absolute) {
            int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
            *output_value = old;
            lo = (uint32_t)value + old;
        }
        write_bitfield_inline(patch_ptr, lo, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 7: case 0x38: {                     // ABS_HI (high 32 bits)
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];
        uint32_t hi = (uint32_t)(value >> 32);  // HIDWORD

        if (!is_absolute) {
            int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
            *output_value = old;
            hi = (uint32_t)(value >> 32) + old;
        }
        write_bitfield_inline(patch_ptr, hi, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 8: {                                // PC_REL_SIZE
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];

        if (is_absolute) {
            value = extra_offset + symbol_size;
        } else {
            int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
            value = old + symbol_size;
            *output_value = old;
        }
        write_bitfield_inline(patch_ptr, value, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 9: {                                // SHIFTED_2 (>> 2)
        value >>= 2;                        // byte offset to DWORD offset
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];

        if (!is_absolute) {
            int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
            value += old;
            *output_value = old;
        }
        write_bitfield_inline(patch_ptr, value, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 0xA: {                              // SEC_TYPE_LO
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];
        value = section_type_delta & (uint64_t)(255 >> (8 - bit_wid));

        if (!is_absolute)
            value += bitfield_extract(patch_ptr, bit_off, bit_wid);
        write_bitfield_inline(patch_ptr, value, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 0xB: {                              // SEC_TYPE_HI
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];
        value = ((uint64_t)section_type_delta >> 4) & (255 >> (8 - bit_wid));

        if (!is_absolute)
            value += bitfield_extract(patch_ptr, bit_off, bit_wid);
        write_bitfield_inline(patch_ptr, value, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 0x10: {                             // PC_REL
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];

        if (!is_absolute) {
            int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
            value += old;
            *output_value = old;
        }
        uint64_t pc_value = (int32_t)value - section_offset;
        write_bitfield_inline(patch_ptr, pc_value, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 0x13: case 0x14: {                  // CLEAR
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];
        // Write all zeros -- no extraction, no value computation
        write_bitfield_zero(patch_ptr, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    case 0x16: case 0x17: case 0x18: case 0x19:
    case 0x1A: case 0x1B: case 0x1C: case 0x1D:
    case 0x2F: case 0x30: case 0x31: case 0x32:
    case 0x33: case 0x34: case 0x35: case 0x36: {  // MASKED_SHIFT
        uint32_t idx = action_type - 22;
        uint32_t bit_off = action_ptr[0];
        uint32_t bit_wid = action_ptr[1];

        // Load shift table on demand
        uint32_t shift_table[8];
        _mm_store_si128(&shift_table[0], _mm_load_si128(&xmmword_1D3F920));
        _mm_store_si128(&shift_table[4], _mm_load_si128(&xmmword_1D3F930));

        // mask_table was pre-loaded at function entry
        uint64_t mask  = mask_table[idx];
        uint32_t shift = shift_table[idx];

        if (!is_absolute) {
            int64_t old = bitfield_extract(patch_ptr, bit_off, bit_wid);
            value += old;
            *output_value = old;
        }
        value = (mask & value) >> shift;
        write_bitfield_inline(patch_ptr, value, bit_off, bit_wid);
        action_ptr += 4;
        if (action_ptr == sentinel) return 1;
        continue;
    }

    default:
        return 0;  // unrecognized action type -> caller emits "unexpected NVRS"
    }
}

Complete Action Code Table

The following table enumerates every action code handled by the switch statement in sub_468760:

CodeHexNameValue computationNotes
00x00ENDNoneAdvance to next slot; terminate if at sentinel
10x01ABS_FULLvalue (unchanged)Standard absolute write. Fast path when bit_offset==0 && bit_width==64: direct 64-bit word write bypassing all bit-field logic
60x06ABS_LO(uint32_t)valueLow 32 bits of the relocation value
70x07ABS_HI(uint32_t)(value >> 32)High 32 bits of the relocation value
80x08PC_REL_SIZEextracted_old + symbol_sizeWhen is_absolute: extra_offset + symbol_size. Encodes PC-relative offset plus symbol size
90x09SHIFTED_2value >> 2Right-shift by 2 for 4-byte-aligned (DWORD) addresses. Converts byte offsets to DWORD indices
100x0ASEC_TYPE_LOsection_type_delta & (255 >> (8 - bit_width))Low bits of the section type offset, masked to fit bit_width
110x0BSEC_TYPE_HI(section_type_delta >> 4) & (255 >> (8 - bit_width))High bits of the section type offset, shifted right by 4 then masked
160x10PC_REL(int32_t)value - section_offsetPC-relative: sign-extends value to 32-bit, subtracts the section offset (argument a6)
180x12ABS_FULL_ALT1Same as code 1Alternate ABS_FULL code; shares identical case body
190x13CLEARZeroClears the target bit field to all-zeros
200x14CLEAR_ALTZeroAlternate CLEAR code; shares identical case body with 0x13
220x16MASKED_SHIFT_0(value & mask_table[0]) >> shift_table[0]Table-driven mask-and-shift, index 0
230x17MASKED_SHIFT_1(value & mask_table[1]) >> shift_table[1]Table-driven mask-and-shift, index 1
240x18MASKED_SHIFT_2(value & mask_table[2]) >> shift_table[2]Table-driven mask-and-shift, index 2
250x19MASKED_SHIFT_3(value & mask_table[3]) >> shift_table[3]Table-driven mask-and-shift, index 3
260x1AMASKED_SHIFT_4(value & mask_table[4]) >> shift_table[4]Table-driven mask-and-shift, index 4
270x1BMASKED_SHIFT_5(value & mask_table[5]) >> shift_table[5]Table-driven mask-and-shift, index 5
280x1CMASKED_SHIFT_6(value & mask_table[6]) >> shift_table[6]Table-driven mask-and-shift, index 6
290x1DMASKED_SHIFT_7(value & mask_table[7]) >> shift_table[7]Table-driven mask-and-shift, index 7
460x2EABS_FULL_ALT2Same as code 1Second alternate ABS_FULL code
470x2FMASKED_SHIFT_25(value & mask_table[25]) >> shift_table[25]Table-driven mask-and-shift, index 25
480x30MASKED_SHIFT_26(value & mask_table[26]) >> shift_table[26]Table-driven mask-and-shift, index 26
490x31MASKED_SHIFT_27(value & mask_table[27]) >> shift_table[27]Table-driven mask-and-shift, index 27
500x32MASKED_SHIFT_28(value & mask_table[28]) >> shift_table[28]Table-driven mask-and-shift, index 28
510x33MASKED_SHIFT_29(value & mask_table[29]) >> shift_table[29]Table-driven mask-and-shift, index 29
520x34MASKED_SHIFT_30(value & mask_table[30]) >> shift_table[30]Table-driven mask-and-shift, index 30
530x35MASKED_SHIFT_31(value & mask_table[31]) >> shift_table[31]Table-driven mask-and-shift, index 31
540x36MASKED_SHIFT_32(value & mask_table[32]) >> shift_table[32]Table-driven mask-and-shift, index 32
550x37ABS_LO_ALTSame as code 6Alternate ABS_LO code; shares case body
560x38ABS_HI_ALTSame as code 7Alternate ABS_HI code; shares case body

Action codes not in this table (0x02-0x05, 0x0C-0x0F, 0x11, 0x15, 0x1E-0x2D, 0x39+) fall to the default case and return 0 (failure).

The 3-Slot Action Model

Each 64-byte descriptor entry supports up to 3 active action slots (at offsets +12, +28, +44), bounded by a 4-byte sentinel at offset +60. This is a critical design choice: a single R_CUDA relocation type can perform up to 3 sequential bit-field modifications to the same instruction word. This enables complex multi-field relocations like:

  • HI/LO pairs within a single instruction: Action[0] writes the low 16 bits at one bit position, action[1] writes the high 16 bits at another position, all in one relocation record.
  • Section type encoding: Action[0] writes SEC_TYPE_LO, action[1] writes SEC_TYPE_HI to different bit positions.
  • Mixed clear-and-write: Action[0] clears one field, action[1] writes the value to another field.

The iteration logic:

// v15 starts at desc_entry + 12 (first action)
// v100 = desc_entry + 60 (sentinel)
// Each iteration: v15 += 4 (advance by 4 uint32s = 16 bytes)
// Loop terminates when v15 == v100 OR action_type == 0 (END)

Most CUDA relocation types use only action[0] with action[1].action_type == 0 (END). Multi-action types include certain 128-bit instruction relocations and the section-type encoding pairs.

Actions 0x16--0x36: Table-Driven Masked Shift

These 16 action codes share a single code path that uses two SSE-loaded lookup tables:

  • Mask table (v119[]): 8 uint64_t values (64 bytes total) from xmmword_1D3F8E0 through xmmword_1D3F910. These are AND-masks applied to the relocation value before shifting.
  • Shift table (v118[]): 8 uint32_t values (32 bytes total) from xmmword_1D3F920 and xmmword_1D3F930. These are right-shift amounts.

The index into both tables is action_type - 22. The computation:

uint32_t idx = action_type - 22;
uint64_t mask  = mask_table[idx];     // from v119
uint32_t shift = shift_table[idx];    // from v118
value = (value & mask) >> shift;

The 16 action codes map to table indices as follows:

Action code rangeDecimal rangeTable index range
0x16 -- 0x1D22 -- 290 -- 7
0x2F -- 0x3647 -- 5425 -- 32

The gap between indices 8 and 24 corresponds to the action codes 0x1E through 0x2E. Action code 0x2E is handled by the ABS_FULL case, and codes 0x1E through 0x2D fall to the default error path. The mask table has 8 uint64_t entries (indices 0--7) covering the first group; the second group (indices 25--32) accesses into the same 64-byte memory region at a higher offset, reading from the pre-loaded SSE vectors.

This table-driven approach eliminates the need for 16 separate switch cases, each with a hardcoded mask and shift. It supports extraction of arbitrary byte lanes, half-words, and sub-fields from the relocation value -- for example, extracting bits [16:24) of a 64-bit address and placing them into an 8-bit instruction field.

Action 1/0x12/0x2E: Fast Path for Full-Width Writes

The most common action type has an optimized fast path. When bit_offset == 0 and bit_width == 64, the engine bypasses all bit-field logic and writes value directly:

if (bit_offset == 0 && bit_width == 64) {
    if (!is_absolute) {
        *output_value = *patch_ptr;   // save old value for preserve-relocs
        value += *patch_ptr;          // S + A pattern
    }
    *patch_ptr = value;
    // advance and return
}

This handles 64-bit absolute relocations targeting full data pointers (e.g., R_CUDA_ABS64_0 patching a function pointer in a .nv.global data section). No masking, no shifting, just a direct load-add-store.

For narrower bit fields (the far more common case for instruction patching), the engine follows the general extraction/insertion path using the bit-field helpers.

Inlined Bit-Field Write Pattern

For all action types except the full-width fast path, the engine performs the bit-field write inline rather than always calling sub_4685B0. The pattern, visible in every case body, is:

// 1. Normalize bit_offset to find the target word
uint64_t* words = patch_ptr;
int local_offset = bit_offset;
if (bit_offset > 63) {
    local_offset = (bit_offset - 64) & 0x3F;
    words = &patch_ptr[((bit_offset - 64) >> 6) + 1];
}

// 2. Check if field spans multiple 64-bit words
int total_bits = local_offset + bit_width;
if (total_bits <= 64) {
    // Single-word case: inline read-modify-write
    uint64_t mask = (-1ULL << (64 - bit_width)) >> (64 - total_bits);
    *words = (*words & ~mask) | (value << (64 - bit_width) >> (64 - total_bits));
} else {
    // Multi-word case: loop through intermediate words using sub_4685B0
    int num_intermediate = ((total_bits - 65) >> 6);
    uint64_t* end_word = &words[num_intermediate + 1];
    int off = local_offset;
    do {
        sub_4685B0(words, value, off, 64 - off);
        value >>= (64 - off);
        off = 0;
        words++;
    } while (words != end_word);
    // Adjust total_bits and bit_width for final word
    total_bits = total_bits - (num_intermediate << 6) - 64;
    bit_width = total_bits;
    // Final word: inline read-modify-write as above
    *words = (*words & ~mask) | (value << (64 - bit_width) >> (64 - total_bits));
}

sub_4685B0 is called only for intermediate words in multi-word spans. The final word is always patched inline.

Bit-Field Extractor: sub_468670

Signature

int64_t bitfield_extract(
    uint64_t*  words,       // a1: pointer to instruction data
    int        bit_offset,  // a2: starting bit position
    int        bit_width    // a3: number of bits to extract
);

Algorithm

int64_t bitfield_extract(uint64_t* words, int bit_offset, int bit_width) {
    // 1. Word normalization: advance pointer if offset >= 64
    if (bit_offset > 63) {
        uint32_t excess = bit_offset - 64;
        bit_offset = excess & 0x3F;       // reduce modulo 64
        words += (excess >> 6) + 1;       // advance by (excess/64 + 1) words
    }

    int total = bit_width + bit_offset;

    // 2. Single-word case
    if (total <= 64) {
        return *words << (64 - total) >> (64 - bit_width);
    }

    // 3. Multi-word case (recursive)
    int64_t low_part = bitfield_extract(words, bit_offset, 64 - bit_offset);

    int64_t mid_or_high;
    if (total - 64 > 64) {
        // Spans 3 words (very rare: field > 64 bits crossing two boundaries)
        mid_or_high = bitfield_extract(words + 1, 0, 64);
        bitfield_extract(words + 2, 0, total - 128);  // return value unused
    } else {
        // Spans 2 words (common: 128-bit instruction with field crossing boundary)
        mid_or_high = words[1] << (128 - total) >> (64 - (total - 64));
    }

    return low_part | (mid_or_high << (64 - bit_offset));
}

The single-word extraction formula *words << (64 - total) >> (64 - bit_width) works by:

  1. Left-shifting to push the bits above the field off the top of the 64-bit register
  2. Right-shifting to align the field to bit 0

The recursion depth is bounded at 2 (for the three-word case). In practice, GPU instructions are at most 128 bits wide, so the two-word path (field straddling one 64-bit boundary) is the only multi-word case encountered.

Bit-Field Writer: sub_4685B0

Signature

void bitfield_write(
    uint64_t*  words,       // a1: pointer to instruction data
    uint64_t   value,       // a2: value to write
    int        bit_offset,  // a3: starting bit position
    int        bit_width    // a4: number of bits to write
);

Algorithm

void bitfield_write(uint64_t* words, uint64_t value, int bit_offset, int bit_width) {
    // 1. Word normalization
    if (bit_offset > 63) {
        uint32_t excess = bit_offset - 64;
        bit_offset = excess & 0x3F;
        words += (excess >> 6) + 1;
    }

    uint32_t total = bit_width + bit_offset;

    // 2. Multi-word loop (intermediate words)
    if (total > 64) {
        uint64_t* end_word = &words[((total - 65) >> 6) + 1];
        do {
            // Clear bits above bit_offset, OR in value shifted to position
            uint64_t keep_mask = ~(-1LL << bit_offset);
            *words = (*words & keep_mask) | (value << bit_offset);
            value >>= (64 - bit_offset);
            bit_offset = 0;
            words++;
        } while (words != end_word);
        // Adjust for remaining bits
        total = total - (((total - 65) >> 6) << 6) - 64;
        bit_width = total;
    }

    // 3. Final (or only) word: masked read-modify-write
    uint64_t mask = (-1ULL << (64 - bit_width)) >> (64 - total);
    *words = (*words & ~mask) | (value << (64 - bit_width) >> (64 - total));
}

Mask Construction Detail

The mask formula (-1ULL << (64 - W)) >> (64 - (O + W)) constructs a window of W ones starting at bit position O:

Step 1: (-1ULL << (64 - W))   -> W ones in the top bits
        Example (W=8): 0xFF00_0000_0000_0000

Step 2: >> (64 - (O + W))     -> shift the window down to start at bit O
        Example (O=20, W=8):   0x0000_000F_F000_0000
        (8 ones at positions 20..27)

The value insertion (value << (64 - W)) >> (64 - (O + W)) performs the same alignment:

Step 1: (value << (64 - W))   -> left-align value in the register
Step 2: >> (64 - (O + W))     -> shift down to the target position

This two-step approach ensures that only the lower W bits of value are placed, and they land exactly at the position defined by the mask.

.nv.rel.action Section Emission

At the end of sub_469D60 (lines 884--985 in the decompiled output), after all relocations have been applied, the function generates a .nv.rel.action section for non-Mercury relocatable links (link_type == 2). This section encodes the relocation descriptor actions in a compact format that downstream tools can use to re-apply relocations.

The emission logic:

if (link_type == 2 && !mercury_flag) {
    int max_type = sub_42F690();        // get max relocation type for this arch
    if (max_type != 116) {              // not at maximum (all types present)
        // Create header record: 8-byte entry containing the starting type index
        header = sub_4307C0(section, 8);
        header->type_index = max_type;
        sub_4644C0(header, ctx + 480);

        // Create action records: 8 bytes per (116 - max_type) entries
        int count = 116 - max_type;
        action_buf = sub_4307C0(section, 8 * count);
        memset(action_buf, 0, 8 * count);
        sub_4644C0(action_buf, ctx + 480);

        // Emit section
        section_idx = sub_441AC0(ctx, ".nv.rel.action", 0x7000000B, 0, 0, 0, 8, 8);
        sub_4343C0(ctx, section_idx, 0, header, 0, 8, 8);

        // Iterate descriptor table entries, convert to compact format
        uint8_t* out = action_buf;
        uint32_t* desc = (uint32_t*)(max_type * 64 + 30661612);
        int skipped = 0;

        while (true) {
            int action_type = desc[2];      // action_type at +8 in action slot
            if (action_type == 0) {
                skipped++;
                goto next;
            }
            if (action_type == 21 || action_type == 9) {
                out[0] = 0;
                out[1] = 2 * (action_type == 9);
            } else if (action_type == 1 || (action_type - 22) <= 7) {
                out[0] = (action_type == 1) ? 0 : (action_type - 22);
                out[1] = 0;                 // no shift flag
            } else if ((action_type - 30) <= 7) {
                out[0] = 1;                 // group 1
                out[1] = 0;
            } else if ((action_type - 38) <= 7) {
                out[0] = 2;                 // group 2
                out[1] = 0;
            } else if ((action_type - 46) <= 10) {
                out[0] = 9;                 // group 9
                out[1] = 0;
            } else if (action_type == 3) {
                out[0] = 3;
                out[1] = 0;
            } else {
                out[0] = 0;
                out[1] = 0;
            }
            // Copy remaining descriptor fields
            out[2] = desc[3];               // reserved field
            out[3] = desc[1];               // bit_width
            out[4] = desc[0];               // bit_offset
            out[5] = desc[7];               // action[1].reserved
            out[6] = desc[5];               // action[1].action_type
            out[7] = desc[4];               // action[1].bit_offset
        next:
            out += 8;
            desc += 16;                     // advance by 64 bytes (16 uint32s)
            if (out == end_of_buffer)
                break;
        }
        // Write compacted data, excluding skipped entries
        sub_4343C0(ctx, section_idx, 0, action_buf, 8, 8, 8 * (count - skipped));
    }
}

The .nv.rel.action section uses sh_type = 0x7000000B (SHT_CUDA_RELOCINFO). It provides a compact 8-byte-per-type representation of the relocation actions, allowing the CUDA runtime or downstream linker to interpret and re-apply relocations without access to the full 64-byte descriptor table.

Resolved-Rela Emitter: sub_46ADC0

Signature

void emit_resolved_relocations(
    void*  linker_ctx,   // a1: linker context
    void*  a2,           // a2: unused / mutex attrs
    void*  a3,           // a3: passed through to sub_442270
    int    a4,           // a4: passed through
    int    a5,           // a5: passed through
    int    a6            // a6: passed through
);

Overview

This function walks two linked lists and writes relocation records into output .nv.resolvedrela sections. It is called from the finalization phase when --preserve-relocs is active (byte at ctx+85 nonzero), producing relocations that a downstream linker or the CUDA runtime can re-apply at load time.

Primary List: ctx+376

The first loop walks the relocation list at ctx+376, processing entries that were applied during the relocation phase but retained for output. For each entry:

  1. ELF class check: Reads ctx+4 (ELF class byte) and ctx+16 (link type). Class 1 = ELF32-style relocations; class 2 = RELA-style.

  2. Symbol addend resolution: If the entry's symbol addend index (at record offset +28) is nonzero, calls sub_444720 to remap the symbol index and sub_440590 to look up the symbol record. Validates that the resolved value at symbol offset +8 is not -1 (fatal: "symbol never allocated"). Adds the resolved value to the record's addend.

  3. Section lookup: Calls sub_442270 for the target section and its parent section (at section offset +44).

  4. Offset validation: The parent section's data size (at offset +32) must be nonzero and must exceed the relocation's target offset (fatal: "relocation is past end of offset").

  5. Descriptor-driven bit-field extraction: When ctx+89 is set and the section type is not 4, the function selects the appropriate descriptor table and performs up to three rounds of bit-field extraction using sub_468670. For each of three field specifications at descriptor offsets (+3,+4,+5), (+7,+8,+9), and (+11,+12,+13) in uint32 units: if the "present" flag is nonzero, extracts a bit field and accumulates it into the record's extra addend at offset +16.

  6. Section data location: Same chunk-list walk as the main engine. The section record at offset +72 holds a linked list of data chunks. On failure: "reloc address not found".

  7. Symbol index remapping: Calls sub_444720 to remap the relocation's symbol index from internal to output ELF .symtab numbering.

  8. Rela section creation: Calls sub_442760 to find or create the .rela output section. On failure: "rela section never allocated".

  9. Output record writing: For RELA (class == 2): 24-byte record {offset, info, addend}. For REL (class != 2): packs info = (sym_index << 8) + (type & 0xFF) and writes 12 bytes.

Secondary List: ctx+384

A second loop processes ctx+384. Selection criteria:

  • Parent section has nonzero data size
  • Architecture flag check passes
  • Symbol type == 13 (STT_CUDA_TEXTURE/SURFACE-related), binding field & 0xE0 == 64 (0x40)

For qualifying entries, the function constructs the output section name by prepending ".nv.resolvedrela" to the parent section name, calls sub_4411D0 to find or create that section, and writes the relocation record. Section names are cached: if consecutive relocations target the same section, the lookup is skipped.

Error Conditions

Error stringSeveritySource functionCondition
"unexpected reloc"Fatalsub_469D60Relocation type nonzero but <= 0x10000 in Mercury mode
"reloc address not found"Fatalsub_469D60, sub_46ADC0Target offset not in any section data chunk
"unexpected NVRS"Fatalsub_469D60sub_468760 returned 0 (unrecognized action type)
"PC relative branch address should be in the same section"Fatalsub_469D60PC-relative relocation crosses section boundary
"symbol never allocated"Fatalsub_46ADC0Symbol value is -1 during resolved-rela emission
"relocation is past end of offset"Fatalsub_46ADC0Relocation offset exceeds section data size
"rela section never allocated"Fatalsub_46ADC0Could not find or create .nv.resolvedrela section

Diagnostic Traces

All traces are gated by (ctx->verbose_flags & 4) != 0 (bit 2 of the debug flags at ctx+64):

Trace stringWhen emitted
"resolve reloc %d for sym=%d+%lld at <section=%d,offset=%llx>\n"Per-relocation resolution trace
"change alias reloc %s to %s\n"Weak alias chain followed to canonical symbol
"ignore reloc on dead func %s\n"Relocation dropped because target function is dead
"replace unified reloc %d with %d\n"Unified table relocation type remapped to base
"ignore reloc on UFT_OFFSET\n"UFT_OFFSET relocation dropped when UDT mode inactive
"Ignoring the reloc to convert YIELD to NOP due to forward progress requirement.\n"YIELD conversion suppressed

Function Map

AddressSizeIdentityRole
0x469D6026,578 Bapply_relocationsMain relocation phase; iterates linked list, calls engine
0x46876014,322 Breloc_apply_engineDescriptor-driven action dispatcher; 30 action codes
0x468670~240 Bbitfield_extractExtracts arbitrary bit field from instruction word(s)
0x4685B0~240 Bbitfield_writeWrites value into arbitrary bit field in instruction word(s)
0x46ADC011,515 Bemit_resolved_relaWrites .nv.resolvedrela sections for preserve-relocs
0x44500055,681 Bfinalize_elfFinalization phase; second relocation pass using vtable
0x45964016,109 Breloc_vtable_createPer-arch relocation handler vtable (used by finalization)
0x42F690~256 Breloc_max_typeReturns maximum relocation type for current architecture
0x42F6C0~512 Breloc_validateValidates relocation type and selects descriptor table
0x444720~2 KBsym_remap_indexRemaps symbol index for output ELF numbering
0x440590~2 KBsym_idx_to_recordSymbol index to record pointer accessor
0x440350~2 KBsym_get_sectionGets section index containing a symbol
0x442270~2 KBsec_idx_to_recordSection index to record pointer accessor
0x442760~2 KBsec_find_or_create_relaFinds or creates .rela section for target
0x4336B0~2 KBsection_write_dataWrites bytes into section data buffer
0x4411D0~2 KBsection_find_by_nameFinds section by name string
0x441AC0~2 KBsection_createCreates a new section with given attributes
0x4343C0~2 KBsection_append_dataAppends data to section buffer at offset
0x463660~2 KBuft_get_offsetUFT/UDT offset resolver
0x4644C0~1 KBlist_appendAppends node to singly-linked list
0x431000~1 KBarena_freeFrees an arena-allocated block
0x467460~2 KBerror_emitVariadic error emission

Cross-References

  • Relocation Phase -- Pipeline context: the 10-step resolution algorithm, alias chains, dead function filtering, unified remapping, and 3 worked examples showing complete before/after hex dumps
  • Finalization Phase -- Second relocation pass using per-arch vtable dispatch
  • R_CUDA Relocations -- CUDA-specific relocation type catalog with all 119 type names and their descriptor entries
  • Unified Function Tables -- UFT/UDT structures referenced by unified relocations
  • Symbol Resolution -- How symbols are resolved before relocation
  • Bindless Relocations -- Bindless texture/surface relocation handling using the .nv.rel.action section

Confidence Assessment

ClaimConfidenceEvidence
sub_468760 at 0x468760, 10-parameter signatureHIGHDecompiled sub_468760_0x468760.c confirms address and 10-parameter function signature
sub_469D60 at 0x469D60, 26,578 bytesHIGHDecompiled sub_469D60_0x469d60.c confirms address, 985 lines
sub_468670 (bitfield_extract) at 0x468670 with recursive multi-word supportHIGHDecompiled sub_468670_0x468670.c: recursive calls at lines 24, 29-30; shift pattern << (64 - v5) >> (64 - a3) confirmed
sub_4685B0 (bitfield_write) uses mask (-1ULL << (64-W)) >> (64-(O+W))HIGHDecompiled sub_4685B0_0x4685b0.c line 35-36: exact mask formula confirmed
Relocation record loaded via _mm_loadu_si128 (2x 128-bit)HIGHDecompiled sub_469D60 lines 236-237: v162 = _mm_loadu_si128(v5), v163 = _mm_loadu_si128(v5 + 1)
Descriptor table indexed by type_index << 6 (64 bytes per entry)HIGHDecompiled sub_468760 line 126: v12 = a1 + ((unsigned __int64)a2 << 6)
Action pointer starts at descriptor+12, sentinel at +60HIGHDecompiled sub_468760 lines 130-132: v15 = (unsigned int *)(v12 + 12), v100 = (unsigned int *)(v12 + 60)
Each action is 16 bytes (4 x uint32), advance by v15 += 4HIGHAll case bodies in sub_468760 advance with v15 += 4
Fast path: bit_offset==0 && bit_width==64 for direct word writeHIGHDecompiled line 145: `if ( *v15
CUDA table at off_1D3DBE0, Mercury table at off_1D3CBE0HIGHDecompiled sub_469D60 lines 202, 214: both addresses confirmed
Mercury type normalization: type -= 0x10000HIGHDecompiled sub_469D60 line 203: v148 = v9 - 0x10000
30 distinct action codes in switch statementHIGHExhaustive enumeration from decompiled sub_468760 switch cases
Masked-shift tables at xmmword_1D3F8E0--xmmword_1D3F930HIGHDecompiled sub_468760 lines 123-131, 518-526: all six SSE vector addresses confirmed
.nv.rel.action section with sh_type 0x7000000BHIGHDecompiled sub_469D60 line 913: sub_441AC0(v2, ".nv.rel.action", 1879048203, ...) where 1879048203 = 0x7000000B
Chunk-list walk for section data at section record offset +72HIGHDecompiled sub_469D60 line 522: v107 = *(_QWORD **)(v53 + 72)
"unexpected NVRS" on engine failureHIGHDecompiled sub_469D60 line 436: sub_467460(dword_2A5B990, "unexpected NVRS")
"reloc address not found" errorHIGHDecompiled sub_469D60 line 547: sub_467460(dword_2A5B990, "reloc address not found")
"PC relative branch address should be in the same section"HIGHDecompiled sub_469D60 line 410
Preserve-relocs list at ctx+384MEDIUMDecompiled sub_469D60 line 454: sub_4644C0((__int64)v5, (pthread_mutexattr_t *)(v2 + 384))
Resolved-rela emitter dual-list processing (ctx+376, ctx+384)MEDIUMInferred from sub_46ADC0 decompiled analysis; offsets consistent with linker context
Action type names (ABS_FULL, ABS_LO, etc.)MEDIUMNames are editorial labels based on observed behavior; the binary does not contain symbolic names for action codes
.nv.rel.action compact format (8 bytes per type)MEDIUMReconstructed from decompiled sub_469D60 lines 913--979; field assignments inferred from pointer arithmetic