Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

.nv.info Metadata

The .nv.info section is NVIDIA's proprietary ELF metadata format that encodes per-kernel resource requirements and compilation artifacts. Every CUDA kernel carries a .nv.info section (or a per-function variant) that tells the GPU driver how many registers to allocate, how much shared memory to reserve, what barriers the kernel uses, and dozens of other resource descriptors. Without this metadata, the driver cannot launch the kernel -- it would have no way to know the kernel's hardware resource footprint.

nvlink both reads and writes .nv.info sections. During the merge phase it parses incoming .nv.info records to extract register counts for weak symbol resolution. During finalization it encodes computed properties (propagated register counts, barrier counts, stack sizes) back into .nv.info records in the output cubin. The embedded ptxas compiler produces .nv.info through a massive emission subsystem spanning ~190 functions across 1 MB of code.

Key Facts

PropertyValue
ELF section typeSHT_CUDA_INFO = 0x70000000 (1,879,048,192)
Section name (global).nv.info
Section name (per-function).nv.info.<function_name>
Record formatType-Length-Value (TLV), 4-byte aligned
Known attribute count97 EIATTR codes: 0--96 (v13.0.88)
Attribute validationsub_42F760: rejects codes > 0x60 (96) as "unknown attribute"
Section creationsub_4504B0 (global: line 46, per-function: line 63)
Record node creation (indexed)sub_450B70 (create_eiattr_indexed_record)
Record node creation (any format)sub_4508F0 (create_eiattr_node)
Encoder functionsub_468760 (nvinfo_encode, 14,322 bytes)
Master emission functionsub_15C58F0 (78,811 bytes -- largest nv.info emitter)
Emission function count~190 functions at 0x15CF070--0x160FFFF
Name table97 x 16-byte entries at VA 0x1D37D60 (8-byte string ptr + 8-byte metadata)
Diagnostic format string"nvinfo <fmt=%d,attr=%d,size=%d>, secidx=%d"
Validation error"Invalid section type in .nv.info section header"
Global update error"error while updating global nvinfo section"

Section Variants

A cubin contains two kinds of .nv.info sections, distinguished by name:

Global .nv.info

A single section named .nv.info with sh_link = 0 (no associated symbol). This contains attributes that apply to the entire compilation unit -- CUDA API version, compatibility flags, and shared metadata that is not specific to any one kernel.

sub_4504B0 creates this section when called with a2 = 0 (no symbol index). It calls sub_4411D0 to find an existing .nv.info section; if none exists, it creates one via sub_441AC0 with type 0x70000000, alignment 4, and sh_flags = 0.

Per-Function .nv.info.<name>

One section per kernel or device function, named .nv.info.<function_name> with sh_link pointing to the symbol table entry for that function. These sections carry per-kernel resource descriptors: register count, barrier count, stack sizes, parameter bank layout, and instruction-offset tables for various runtime patching needs.

sub_4504B0 creates per-function sections when called with a2 != 0. It constructs the name via sprintf(buf, "%s.%s", ".nv.info", func_name) where func_name is looked up from the symbol at offset +32. The section is created with sh_flags = 0x40 and sh_link pointing to the owning symbol. After creation, sub_4426D0 links the section to the function's symbol entry.

During the merge phase (sub_45E7D0), nvlink identifies .nv.info sections by checking sh_type == 0x70000000. The sh_link field determines whether a record is global (link=0) or per-function (link = symbol index). The merge function translates symbol indices from input-local to output-global using its mapping tables.

TLV Record Format

Each .nv.info section contains a flat sequence of 4-byte-aligned TLV (Type-Length-Value) records. There is no section header or record count -- the parser walks from byte 0 to sh_size, consuming records sequentially.

On-Disk Record Layout

Offset  Size  Field
------  ----  -----
0x00    1     format      Format byte (determines payload structure)
0x01    1     attr_code   EIATTR type code (identifies the attribute)
0x02    2     size        Payload size in bytes (little-endian uint16)
0x04    var   payload     Attribute-specific data (size bytes)

Total record size = 4 + size, padded to 4-byte alignment. The maximum theoretical payload is 65,535 bytes (16-bit size field), though in practice payloads rarely exceed a few hundred bytes.

In-Memory Node Layout (Linker Internal)

nvlink stores parsed TLV records as 16-byte linked-list nodes in an arena-allocated chain at elfw+392. Each node has:

Offset  Size  Field
------  ----  -----
0x00    1     format       Format byte (0x01--0x04)
0x01    1     attr_code    EIATTR type code
0x02    2     size         Payload size in bytes
0x04    4     secidx       Output section index (from sub_4504B0)
0x08    8     payload_ptr  Pointer to arena-allocated payload data

Nodes are linked via a separate linked-list structure managed by sub_4644C0 (list-append). The diagnostic function at sub_4478F0 iterates this list and prints each record as: "nvinfo <fmt=%d,attr=%d,size=%d>, secidx=%d" using fields at bytes 0, 1, 2, and 4 respectively.

Format Byte

The format byte at offset 0 controls how the payload is interpreted:

FormatNamePayload structureUsage
0x01Free formatRaw bytes, attribute-specific layoutInstruction offset tables, parameter info, WAR patches
0x02Value formatSingle 32-bit value (no symbol index)Module-wide flags (HAS_PRE_V10_OBJECT, NUM_BARRIERS in some paths)
0x03Sized format16-bit value in the size fieldMAXREG_COUNT, CBANK_PARAM_SIZE, MERCURY_ISA_VERSION
0x04Indexed format[sym_index:4] [value:4] -- per-symbol attributeMost per-function resource attributes

Format 0x04 (indexed) is the most common for per-function attributes. The 4-byte symbol index at payload offset 0 identifies which function the attribute applies to. The linker uses this index for symbol remapping during merge and for extracting per-function properties during finalization.

Format 0x03 (sized) encodes the value directly in the 16-bit size field of the TLV header, with no additional payload bytes. This is used for small integer attributes like MAXREG_COUNT and CBANK_PARAM_SIZE.

Format 0x01 (free) carries variable-length data. The size field gives the byte count. Offset tables use arrays of 4-byte instruction offsets; KPARAM_INFO uses structured records; EXTERNS uses arrays of symbol indices.

Parsing Pseudocode

From the decompiled merge path in sub_45E7D0 (lines 1900--2052):

uint8_t *ptr = section_data;
uint8_t *end = section_data + section_size;

while (ptr < end) {
    uint8_t  format    = ptr[0];
    uint8_t  attr_code = ptr[1];
    uint16_t size      = *(uint16_t *)(ptr + 2);

    if (format == 0x04) {
        /* Indexed format: first 4 bytes of payload = symbol index */
        uint32_t *payload = arena_alloc_copy(ptr + 4, size);

        /* Symbol index remapping per attr_code */
        switch (attr_code) {
        case 2: case 6: case 7: case 8: case 9:      /* texture/image */
        case 18: case 19: case 20: case 23: case 38:  /* param/cache */
        case 69:                                       /* kparam_info_v2 */
            payload[0] = sym_map[payload[0]];          /* remap first sym_idx */
            break;
        case 10:  /* PARAM_CBANK -- remap with lazy symbol creation */
            payload[0] = sym_map_or_create(payload[0]);
            break;
        case 15:  /* EXTERNS -- remap all uint32 entries */
            for (i = 0; i < size/4; i++)
                payload[i] = sym_map[payload[i]];
            break;
        case 17: case 35: case 47: case 59:  /* resource attrs */
            /* Check if symbol was deleted (weak-replaced) */
            if (is_deleted[payload[0]]) { ptr += 4 + size; continue; }
            payload[0] = sym_map[payload[0]];
            break;
        case 55:  /* CUDA_API_VERSION -- version compatibility check */
            validate_cuda_api_version(payload[0]);
            break;
        case 79:  /* AT_ENTRY_FRAGMENTS -- fragment type analysis */
            analyze_fragment_types(payload, size/4);
            break;
        default:
            break;
        }

        node = create_nvinfo_node(format, attr_code, nv_info_secidx);
        node->payload = payload;
        node->size = size;
    } else {
        /* Non-indexed: just create node with header, no payload copy */
        node = create_nvinfo_node(format, attr_code, nv_info_secidx);
        node->size = size;
        ptr += 4;  /* advance past header only, size already in node */
    }
    ptr = ptr + 4 + ALIGN_UP(size, 4);
}

EIATTR Attribute Catalog

nvlink v13.0.88 defines 97 EIATTR (ELF Info ATTRibute) codes, numbered 0 through 96. The name-to-code mapping was extracted directly from the pointer table at VA 0x1D37D60 in the nvlink binary (16-byte entries: 8-byte string pointer + 8-byte metadata, indexed by code). The string names reside at 0x1D36819--0x1D37170. Codes were verified against cubin TLV records produced by ptxas/nvcc v13.1 and cross-checked against the compute_entry_properties (sub_451D80) dispatch table. The validation function sub_42F760 confirms 97 codes by rejecting any attr_code > 0x60.

Name Table Structure

The EIATTR name table is a contiguous array of 97 x 16-byte entries at VA 0x1D37D60. Each entry:

Offset  Size  Field
------  ----  -----
0x00    8     name_ptr          Pointer to null-terminated string (e.g., "EIATTR_REGCOUNT")
0x08    4     min_toolkit_ver   Minimum CUDA toolkit version that supports this attribute
0x0C    4     usage_policy      0=warn, 1=error, 2=silently drop when version too old

The table is indexed by EIATTR code number. sub_42F760 reads dword_1D37D68[4 * attr_code] (the min_toolkit_ver field at offset +8) to check whether the current toolkit version (elfw+624) supports the attribute. If the toolkit version is too old, the usage_policy determines whether to emit a warning, error, or silently omit the record.

Complete Code Table (Sequential)

All 97 codes in numeric order. Use this as the authoritative reference when parsing .nv.info TLV records.

CodeHexNameFormatType
00x00EIATTR_ERROR--Sentinel
10x01EIATTR_PAD--Sentinel
20x02EIATTR_IMAGE_SLOTIndexedTexture
30x03EIATTR_JUMPTABLE_RELOCSFreeMetadata
40x04EIATTR_CTAIDZ_USEDIndexedMetadata
50x05EIATTR_MAX_THREADSIndexedResource
60x06EIATTR_IMAGE_OFFSETIndexedTexture
70x07EIATTR_IMAGE_SIZEIndexedTexture
80x08EIATTR_TEXTURE_NORMALIZEDIndexedTexture
90x09EIATTR_SAMPLER_INITIndexedTexture
100x0AEIATTR_PARAM_CBANKIndexedParam
110x0BEIATTR_SMEM_PARAM_OFFSETSFreeParam
120x0CEIATTR_CBANK_PARAM_OFFSETSFreeParam
130x0DEIATTR_SYNC_STACKIndexedMetadata
140x0EEIATTR_TEXID_SAMPID_MAPFreeTexture
150x0FEIATTR_EXTERNSFreeMetadata
160x10EIATTR_REQNTIDIndexedResource
170x11EIATTR_FRAME_SIZEIndexedResource
180x12EIATTR_MIN_STACK_SIZEIndexedResource
190x13EIATTR_SAMPLER_FORCE_UNNORMALIZEDIndexedTexture
200x14EIATTR_BINDLESS_IMAGE_OFFSETSFreeTexture
210x15EIATTR_BINDLESS_TEXTURE_BANKIndexedTexture
220x16EIATTR_BINDLESS_SURFACE_BANKIndexedTexture
230x17EIATTR_KPARAM_INFOFreeParam
240x18EIATTR_SMEM_PARAM_SIZEIndexedParam
250x19EIATTR_CBANK_PARAM_SIZESizedParam
260x1AEIATTR_QUERY_NUMATTRIBIndexedMetadata
270x1BEIATTR_MAXREG_COUNTSizedResource
280x1CEIATTR_EXIT_INSTR_OFFSETSFreeOffsets
290x1DEIATTR_S2RCTAID_INSTR_OFFSETSFreeOffsets
300x1EEIATTR_CRS_STACK_SIZEIndexedResource
310x1FEIATTR_NEED_CNP_WRAPPERIndexedMetadata
320x20EIATTR_NEED_CNP_PATCHIndexedMetadata
330x21EIATTR_EXPLICIT_CACHINGIndexedMetadata
340x22EIATTR_ISTYPEP_USEDIndexedMetadata
350x23EIATTR_MAX_STACK_SIZEIndexedResource
360x24EIATTR_SUQ_USEDIndexedMetadata
370x25EIATTR_LD_CACHEMOD_INSTR_OFFSETSFreeOffsets
380x26EIATTR_LOAD_CACHE_REQUESTIndexedMetadata
390x27EIATTR_ATOM_SYS_INSTR_OFFSETSFreeOffsets
400x28EIATTR_COOP_GROUP_INSTR_OFFSETSFreeOffsets
410x29EIATTR_COOP_GROUP_MASK_REGIDSIndexedCluster
420x2AEIATTR_SW1850030_WARFreeWAR
430x2BEIATTR_WMMA_USEDIndexedMetadata
440x2CEIATTR_HAS_PRE_V10_OBJECTValueMetadata
450x2DEIATTR_ATOMF16_EMUL_INSTR_OFFSETSFreeOffsets
460x2EEIATTR_ATOM16_EMUL_INSTR_REG_MAPFreeOffsets
470x2FEIATTR_REGCOUNTIndexedResource
480x30EIATTR_SW2393858_WARFreeWAR
490x31EIATTR_INT_WARP_WIDE_INSTR_OFFSETSFreeOffsets
500x32EIATTR_SHARED_SCRATCHIndexedShared
510x33EIATTR_STATISTICSFreeMetadata
520x34EIATTR_INDIRECT_BRANCH_TARGETSFreeOffsets
530x35EIATTR_SW2861232_WARFreeWAR
540x36EIATTR_SW_WARFreeWAR
550x37EIATTR_CUDA_API_VERSIONIndexedMetadata
560x38EIATTR_NUM_MBARRIERSIndexedResource
570x39EIATTR_MBARRIER_INSTR_OFFSETSFreeOffsets
580x3AEIATTR_COROUTINE_RESUME_OFFSETSFreeOffsets
590x3BEIATTR_SAM_REGION_STACK_SIZEIndexedResource
600x3CEIATTR_PER_REG_TARGET_PERF_STATSFreeMetadata
610x3DEIATTR_CTA_PER_CLUSTERIndexedCluster
620x3EEIATTR_EXPLICIT_CLUSTERIndexedCluster
630x3FEIATTR_MAX_CLUSTER_RANKIndexedCluster
640x40EIATTR_INSTR_REG_MAPFreeMetadata
650x41EIATTR_RESERVED_SMEM_USEDIndexedShared
660x42EIATTR_RESERVED_SMEM_0_SIZEIndexedShared
670x43EIATTR_UCODE_SECTION_DATAFreeMetadata
680x44EIATTR_UNUSED_LOAD_BYTE_OFFSETFreeOffsets
690x45EIATTR_KPARAM_INFO_V2FreeParam
700x46EIATTR_SYSCALL_OFFSETSFreeOffsets
710x47EIATTR_SW_WAR_MEMBAR_SYS_INSTR_OFFSETSFreeWAR
720x48EIATTR_GRAPHICS_GLOBAL_CBANKIndexedGraphics
730x49EIATTR_SHADER_TYPEIndexedGraphics
740x4AEIATTR_VRC_CTA_INIT_COUNTIndexedGraphics
750x4BEIATTR_TOOLS_PATCH_FUNCIndexedMetadata
760x4CEIATTR_NUM_BARRIERSIndexedResource
770x4DEIATTR_TEXMODE_INDEPENDENTIndexedTexture
780x4EEIATTR_PERF_STATISTICSFreeMetadata
790x4FEIATTR_AT_ENTRY_FRAGEMENTSFreeBlackwell
800x50EIATTR_SPARSE_MMA_MASKFreeBlackwell
810x51EIATTR_TCGEN05_1CTA_USEDIndexedBlackwell
820x52EIATTR_TCGEN05_2CTA_USEDIndexedBlackwell
830x53EIATTR_GEN_ERRBAR_AT_EXITIndexedBlackwell
840x54EIATTR_REG_RECONFIGIndexedBlackwell
850x55EIATTR_ANNOTATIONSFreeMetadata
860x56EIATTR_UNKNOWN--Sentinel
870x57EIATTR_STACK_CANARY_TRAP_OFFSETSFreeOffsets
880x58EIATTR_STUB_FUNCTION_KINDIndexedMetadata
890x59EIATTR_LOCAL_CTA_ASYNC_STORE_OFFSETSFreeOffsets
900x5AEIATTR_MERCURY_FINALIZER_OPTIONSFreeMercury
910x5BEIATTR_BLOCKS_ARE_CLUSTERSIndexedCluster
920x5CEIATTR_SANITIZEIndexedBlackwell
930x5DEIATTR_SYSCALLS_FALLBACKFreeMetadata
940x5EEIATTR_CUDA_REQFreeMetadata
950x5FEIATTR_MERCURY_ISA_VERSIONSizedMercury
960x60EIATTR_ERROR_LAST--Sentinel

Most Important EIATTR Entries

This section provides detailed format information for the EIATTR entries that are most critical to GPU kernel launch, driver resource allocation, and the linker's own processing.

EIATTR_REGCOUNT (0x2F) -- Register Count

The single most important occupancy-determining attribute. The GPU driver computes max_warps_per_SM = total_registers / (regcount * warp_size) to determine how many warps can execute concurrently.

Format: 0x04 (Indexed)
On-disk layout (12 bytes total):
  Byte 0:     0x04    (indexed format)
  Byte 1:     0x2F    (EIATTR_REGCOUNT)
  Bytes 2-3:  0x0008  (size = 8 bytes)
  Bytes 4-7:  sym_idx (uint32, function symbol index)
  Bytes 8-11: regcount (uint32, physical register count per thread)

nvlink creates REGCOUNT records via sub_450B70(elfw, 0x2F, 8, payload_ptr, sym_idx, 0). The payload is an 8-byte pair: [sym_idx:4][regcount:4].

Linker operations on REGCOUNT:

  1. Weak resolution (sub_45D180): When competing weak definitions exist, REGCOUNT is extracted to choose the winner. The definition with fewer registers wins, maximizing occupancy.

  2. Weak stripping (sub_45D180): Bitmask 0x800800020000 marks REGCOUNT (bit 47), FRAME_SIZE (bit 17), and MAX_STACK_SIZE (bit 35) as resource attributes. When a weak symbol is replaced, records matching these codes for the discarded definition are zeroed (*(_BYTE *)(record + 1) = 0, overwriting attr_code to ERROR). The debug message "remove weak nvinfo" is printed when zeroing the entire nv.info section, and "remove weak frame_size" when zeroing individual resource records.

  3. Propagation (sub_450ED0): propagate_register_counts walks the callgraph and propagates the maximum register count from callees to each entry kernel. If a callee uses more registers than the entry kernel, the entry kernel's REGCOUNT is raised to match. The verbose trace "regcount %d for %s propagated to entry %s" logs each propagation.

  4. Validation: If no REGCOUNT is found for an entry function, the linker emits "no regcount?" via the error system. If a max-regcount-limited entry function calls a callee with a higher register count, it prints: "entry function '%s' with max regcount of %d calls function '%s' with regcount of %d".

EIATTR_NUM_BARRIERS (0x4C) -- Named Barrier Count

Controls how many named barrier slots the CTA hardware allocates. Most architectures support up to 16 barriers per CTA.

Format: 0x04 (Indexed) when read from input cubins
        0x02 (Value) when synthesized by the linker during finalization
On-disk layout (indexed, 12 bytes):
  Byte 0:     0x04    (indexed format)
  Byte 1:     0x4C    (EIATTR_NUM_BARRIERS)
  Bytes 2-3:  size
  Bytes 4-7:  sym_idx
  Bytes 8-11: barrier_count
Linker-synthesized layout (internal node):
  Byte 0:     0x02    (value format)
  Byte 1:     0x4C    (EIATTR_NUM_BARRIERS)
  Byte 2:     barrier_count (stored in the size field low byte)
  Byte 3:     0x00
  Bytes 4-7:  secidx  (output section index)

Barrier migration from section flags: The barrier count is also encoded in the section flags of .text sections as bits 26:20 (7 bits, mask 0x07F00000). During finalization, if a kernel has no EIATTR_NUM_BARRIERS record but its section flags carry a non-zero barrier count, the linker synthesizes one. The verbose message is:

"Creating new EIATTR_NUM_BARRIERS and moving barcount %d
 from section flags of %s to nvinfo for entry symbol %s"

After creating the nv.info record, the section flags are cleared: sh_flags &= 0xF80FFFFF.

Barrier propagation: If a callee requires more barriers than the entry kernel's current count, the linker propagates the higher count: "Propagating higher barcount %d to the section flags of %s of entry symbol %s".

EIATTR_FRAME_SIZE (0x11) -- Per-Thread Local Memory Frame

Format: 0x04 (Indexed)
Payload: [sym_idx:4][frame_size:4]
frame_size: bytes of local memory per thread (register spills + local arrays)

One of the three resource attributes stripped during weak symbol replacement (bitmask bit 17). The frame size determines the .nv.local.<funcname> section size visible to the driver.

EIATTR_MAX_STACK_SIZE (0x23) -- Maximum Stack Size

Format: 0x04 (Indexed)
Payload: [sym_idx:4][max_stack_bytes:4]

The worst-case per-thread stack allocation, computed by propagating CRS (Call-Return Stack) sizes through the callgraph. If the driver allocates less than this, the kernel will corrupt memory. If it allocates more, occupancy drops unnecessarily.

One of the three resource attributes stripped during weak replacement (bitmask bit 35).

EIATTR_MIN_STACK_SIZE (0x12) -- Minimum Stack Size

Format: 0x04 (Indexed)
Payload: [sym_idx:4][min_stack_bytes:4]

The non-recursive stack size minimum. Used when the callgraph has no recursion and the exact stack depth is statically known. For recursive kernels, the linker emits "Stack size for entry function '%s' cannot be statically determined".

EIATTR_CRS_STACK_SIZE (0x1E) -- Call-Return Stack Size

Format: 0x04 (Indexed)
Payload: [sym_idx:4][crs_bytes:4]

Size of the call-return stack for nested function calls. Propagated through the callgraph to compute per-entry worst-case stack requirements.

EIATTR_SAM_REGION_STACK_SIZE (0x3B) -- SAM Region Stack

Format: 0x04 (Indexed)
Payload: [sym_idx:4][sam_stack_bytes:4]

SAM (Streaming Asynchronous Memory) region stack size. Processed by sub_44C880 during compute_entry_properties (case 0x3B at line 1322 in sub_451D80). One of the resource attributes subject to symbol index remapping during merge (same case group as 17, 35, 47).

EIATTR_MAX_THREADS (0x05) -- Maximum Threads Per Block

Format: 0x04 (Indexed)
Payload: [sym_idx:4][max_threads:4]
max_threads: maximum threads per block (from .maxntid PTX directive)

EIATTR_REQNTID (0x10) -- Required Thread Count

Format: 0x04 (Indexed)
Payload: [sym_idx:4][reqntid:4]
reqntid: required thread count per dimension (from .reqntid PTX directive)

EIATTR_MAXREG_COUNT (0x1B) -- Maximum Register Hint

Format: 0x03 (Sized)
On-disk layout (4 bytes total, no payload):
  Byte 0:     0x03    (sized format)
  Byte 1:     0x1B    (EIATTR_MAXREG_COUNT)
  Bytes 2-3:  maxreg  (uint16, from --maxrregcount or .maxnreg)

This is a compiler hint, not an absolute limit. The value comes from --maxrregcount or the .maxnreg PTX directive. Uses sized format (0x03) -- the value is encoded directly in the 16-bit size field with no additional payload.

EIATTR_NUM_MBARRIERS (0x38) -- Memory Barrier Count

Format: 0x04 (Indexed)
Payload: [sym_idx:4][num_mbarriers:4]

Number of mbarrier objects used by the kernel. Processed in compute_entry_properties at case 0x38.

EIATTR_PARAM_CBANK (0x0A) -- Parameter Constant Bank

Format: 0x04 (Indexed)
Payload: [sym_idx:4][bank_offset:4]
bank_offset: packed {bank_number:16, offset:16}

Identifies which constant bank and at what offset kernel parameters begin. During merge, the symbol index is remapped with lazy symbol creation (case 10 in the merge switch). The linker may need to create a new symbol table entry if the constant bank symbol doesn't yet exist in the output.

EIATTR_CBANK_PARAM_SIZE (0x19) -- Constant Bank Parameter Size

Format: 0x03 (Sized)
Value: uint16 in the size field = parameter constant bank size in bytes

EIATTR_KPARAM_INFO (0x17) -- Kernel Parameter Info

Format: 0x01 (Free)
Payload: variable-length array of parameter descriptors
  Each descriptor: [ordinal:2][offset:2][size:2][?:2] (8 bytes per param)

Describes the type, size, and alignment of each kernel parameter. The v2 variant (EIATTR_KPARAM_INFO_V2, code 69/0x45) carries additional fields.

EIATTR_EXTERNS (0x0F) -- External Symbol References

Format: 0x01 (Free)
Payload: array of uint32 symbol indices
  Each 4-byte entry is one external symbol reference

During merge, every 4-byte entry in the payload is remapped through the symbol index translation table (case 15 in the merge switch). During finalization, compute_entry_properties creates new EXTERNS records via sub_450B70(elfw, 0x0F, 4*count, payload, root_kernel_sym, ...).

EIATTR_CUDA_API_VERSION (0x37) -- CUDA API Version

Format: 0x04 (Indexed)
Payload: [sym_idx:4][api_version:4]
api_version: encoded CUDA version (e.g., 0x83 = CUDA 13.1)

During merge, this attribute triggers a version compatibility check (case 55). If the input's CUDA API version exceeds the linker's current maximum (elfw+628), the linker emits an error. This prevents linking objects compiled for a newer CUDA version than the linker supports.

EIATTR_AT_ENTRY_FRAGMENTS (0x4F) -- Entry Fragments (Blackwell)

Format: 0x01 (Free)
Payload: array of uint32 fragment type descriptors

During merge, each 4-byte entry is analyzed for fragment types: values 4--5 indicate type 1, values 6--7 indicate type 2. The detected fragment type is stored at elfw+664. If conflicting fragment types are found across inputs, an error is emitted. During finalization, EXTERNS-style record creation via sub_450B70(elfw, 0x4F, size, payload, root_sym, ...).

EIATTR_RESERVED_SMEM_USED (0x41) -- Reserved Shared Memory

Format: 0x04 (Indexed)
Payload: [sym_idx:4][flags:4]

Processed in compute_entry_properties at case 0x41. The linker resolves the owning section through sub_442270 and checks if the referenced .nv.reservedSmem.* section has content. Related sections in the binary: .nv.reservedSmem.begin, .nv.reservedSmem.cap, .nv.reservedSmem.offset0, .nv.reservedSmem.offset1, .nv.reservedSmem.end.

EIATTR_SHARED_SCRATCH (0x32) -- Shared Scratch Space

Format: 0x04 (Indexed)
Payload: [sym_idx:4][scratch_size:4]

Shared memory scratch space used for register spilling when the register file is exhausted.

Attributes by Category

Resource Allocation (GPU Driver Critical)

These attributes directly control how the GPU driver allocates hardware resources for kernel launch. Incorrect values cause silent performance degradation or launch failure.

CodeHexNameFormatDescription
470x2FEIATTR_REGCOUNTIndexedPhysical register count per thread. The GPU driver computes max_warps_per_SM = total_registers / (regcount * warp_size). This is the single most important occupancy-determining attribute.
50x05EIATTR_MAX_THREADSIndexedMaximum threads per block (from .maxntid PTX directive).
160x10EIATTR_REQNTIDIndexedRequired thread count per dimension (from .reqntid).
170x11EIATTR_FRAME_SIZEIndexedPer-thread local memory frame size in bytes.
180x12EIATTR_MIN_STACK_SIZEIndexedMinimum stack size per thread (non-recursive case).
350x23EIATTR_MAX_STACK_SIZEIndexedMaximum stack size per thread (recursive case).
300x1EEIATTR_CRS_STACK_SIZEIndexedCall-Return-Stack size for nested function calls.
590x3BEIATTR_SAM_REGION_STACK_SIZEIndexedSAM (Streaming Asynchronous Memory) region stack size.
760x4CEIATTR_NUM_BARRIERSIndexedNumber of named barriers used (max 16 on most architectures).
560x38EIATTR_NUM_MBARRIERSIndexedNumber of memory barriers (mbarrier objects) used.
270x1BEIATTR_MAXREG_COUNTSizedMaximum register count hint (from --maxrregcount or .maxnreg).

Parameter Bank Layout

These describe how kernel parameters are laid out in constant memory bank 0 (c[0x0]).

CodeHexNameFormatDescription
100x0AEIATTR_PARAM_CBANKIndexedConstant bank number and offset for kernel parameters. Merge uses lazy symbol creation for remapping.
250x19EIATTR_CBANK_PARAM_SIZESizedSize of the parameter constant bank in bytes.
240x18EIATTR_SMEM_PARAM_SIZEIndexedSize of shared memory parameter region.
110x0BEIATTR_SMEM_PARAM_OFFSETSFreeOffsets of parameters within shared memory.
120x0CEIATTR_CBANK_PARAM_OFFSETSFreeOffsets of parameters within constant bank.
230x17EIATTR_KPARAM_INFOFreeKernel parameter metadata (types, sizes, alignments).
690x45EIATTR_KPARAM_INFO_V2FreeExtended kernel parameter info (v2 format with additional fields).

Instruction Offset Tables

These record byte offsets of specific instruction types within the kernel's .text section, enabling the driver and tools to locate and patch instructions at load time.

CodeHexNameFormatDescription
280x1CEIATTR_EXIT_INSTR_OFFSETSFreeByte offsets of all EXIT instructions.
290x1DEIATTR_S2RCTAID_INSTR_OFFSETSFreeOffsets of S2R instructions reading SR_CTAID (CTA ID).
370x25EIATTR_LD_CACHEMOD_INSTR_OFFSETSFreeOffsets of load instructions with cache modifier.
390x27EIATTR_ATOM_SYS_INSTR_OFFSETSFreeOffsets of atomic instructions with .sys scope.
400x28EIATTR_COOP_GROUP_INSTR_OFFSETSFreeOffsets of cooperative group instructions.
450x2DEIATTR_ATOMF16_EMUL_INSTR_OFFSETSFreeOffsets of emulated FP16 atomic instructions.
460x2EEIATTR_ATOM16_EMUL_INSTR_REG_MAPFreeRegister map for 16-bit atomic emulation.
490x31EIATTR_INT_WARP_WIDE_INSTR_OFFSETSFreeOffsets of integer warp-wide instructions.
520x34EIATTR_INDIRECT_BRANCH_TARGETSFreeValid targets of indirect branches (for CFI).
570x39EIATTR_MBARRIER_INSTR_OFFSETSFreeOffsets of MBAR (memory barrier) instructions.
580x3AEIATTR_COROUTINE_RESUME_OFFSETSFreeResume point offsets for device-side coroutines.
680x44EIATTR_UNUSED_LOAD_BYTE_OFFSETFreeByte offset of unused load instruction.
700x46EIATTR_SYSCALL_OFFSETSFreeOffsets of __cuda_syscall invocations.
870x57EIATTR_STACK_CANARY_TRAP_OFFSETSFreeOffsets of stack canary trap instructions (stack protector).
890x59EIATTR_LOCAL_CTA_ASYNC_STORE_OFFSETSFreeOffsets of CTA-local async store instructions.

Texture and Surface Binding

CodeHexNameFormatDescription
20x02EIATTR_IMAGE_SLOTIndexedTexture/surface image slot assignment.
60x06EIATTR_IMAGE_OFFSETIndexedOffset within the image descriptor table.
70x07EIATTR_IMAGE_SIZEIndexedSize of the image descriptor.
80x08EIATTR_TEXTURE_NORMALIZEDIndexedWhether texture coordinates are normalized.
90x09EIATTR_SAMPLER_INITIndexedSampler initialization parameters.
140x0EEIATTR_TEXID_SAMPID_MAPFreeTexture ID to sampler ID mapping table.
190x13EIATTR_SAMPLER_FORCE_UNNORMALIZEDIndexedForce unnormalized sampler coordinates.
200x14EIATTR_BINDLESS_IMAGE_OFFSETSFreeOffsets for bindless image references.
210x15EIATTR_BINDLESS_TEXTURE_BANKIndexedConstant bank used for bindless texture descriptors.
220x16EIATTR_BINDLESS_SURFACE_BANKIndexedConstant bank used for bindless surface descriptors.
770x4DEIATTR_TEXMODE_INDEPENDENTIndexedIndependent texture mode flag.

Cluster and Cooperative Launch (sm_90+)

CodeHexNameFormatDescription
410x29EIATTR_COOP_GROUP_MASK_REGIDSIndexedRegister IDs used for cooperative group masks.
610x3DEIATTR_CTA_PER_CLUSTERIndexedNumber of CTAs per cluster (Hopper cluster launch).
620x3EEIATTR_EXPLICIT_CLUSTERIndexedWhether kernel uses explicit cluster dimensions.
630x3FEIATTR_MAX_CLUSTER_RANKIndexedMaximum cluster rank for scheduling.
910x5BEIATTR_BLOCKS_ARE_CLUSTERSIndexedCTA blocks are clusters flag.

Shared Memory and Reserved Resources

CodeHexNameFormatDescription
500x32EIATTR_SHARED_SCRATCHIndexedShared memory scratch space for register spilling.
650x41EIATTR_RESERVED_SMEM_USEDIndexedWhether reserved shared memory is used.
660x42EIATTR_RESERVED_SMEM_0_SIZEIndexedSize of reserved shared memory partition 0.

Software Workarounds

Hardware errata requiring instruction-level patching by the driver.

CodeHexNameFormatDescription
420x2AEIATTR_SW1850030_WARFreeWorkaround for HW bug 1850030.
480x30EIATTR_SW2393858_WARFreeWorkaround for HW bug 2393858.
530x35EIATTR_SW2861232_WARFreeWorkaround for HW bug 2861232.
540x36EIATTR_SW_WARFreeGeneric software workaround container.
710x47EIATTR_SW_WAR_MEMBAR_SYS_INSTR_OFFSETSFreeOffsets of MEMBAR.SYS instructions needing software workaround.

Compilation Metadata

CodeHexNameFormatDescription
30x03EIATTR_JUMPTABLE_RELOCSFreeJump table relocation entries.
40x04EIATTR_CTAIDZ_USEDIndexedWhether kernel uses %ctaid.z (3D grid).
130x0DEIATTR_SYNC_STACKIndexedSynchronization stack depth.
150x0FEIATTR_EXTERNSFreeExternal symbol references list.
260x1AEIATTR_QUERY_NUMATTRIBIndexedNumber of queryable attributes.
310x1FEIATTR_NEED_CNP_WRAPPERIndexedKernel needs CUDA Nested Parallelism wrapper.
320x20EIATTR_NEED_CNP_PATCHIndexedKernel needs CNP patching at load time.
330x21EIATTR_EXPLICIT_CACHINGIndexedExplicit cache control directives present.
340x22EIATTR_ISTYPEP_USEDIndexedisspacep instruction used.
360x24EIATTR_SUQ_USEDIndexedSurface query instruction used.
380x26EIATTR_LOAD_CACHE_REQUESTIndexedLoad cache request configuration.
430x2BEIATTR_WMMA_USEDIndexedWarp Matrix Multiply-Accumulate instructions used.
440x2CEIATTR_HAS_PRE_V10_OBJECTValueObject contains pre-CUDA 10 compiled code.
510x33EIATTR_STATISTICSFreeCompilation statistics (instruction counts, etc.).
550x37EIATTR_CUDA_API_VERSIONIndexedCUDA API version the kernel was compiled for. Value 0x83 = CUDA 13.1.
600x3CEIATTR_PER_REG_TARGET_PERF_STATSFreePer-register-target performance statistics.
640x40EIATTR_INSTR_REG_MAPFreeInstruction-to-register mapping for profiling.
670x43EIATTR_UCODE_SECTION_DATAFreeMicrocode section data (internal).
750x4BEIATTR_TOOLS_PATCH_FUNCIndexedFunction patching descriptor for CUDA tools.
780x4EEIATTR_PERF_STATISTICSFreePerformance statistics for the profiler.
850x55EIATTR_ANNOTATIONSFreeGeneral-purpose annotation data.
880x58EIATTR_STUB_FUNCTION_KINDIndexedStub function classification.
930x5DEIATTR_SYSCALLS_FALLBACKFreeSyscall fallback mechanism offsets.
940x5EEIATTR_CUDA_REQFreeCUDA requirements descriptor.

Graphics-Specific

CodeHexNameFormatDescription
720x48EIATTR_GRAPHICS_GLOBAL_CBANKIndexedGlobal constant bank for graphics shaders.
730x49EIATTR_SHADER_TYPEIndexedShader type (vertex, fragment, compute, etc.).
740x4AEIATTR_VRC_CTA_INIT_COUNTIndexedVirtual Register Count CTA init count. Processed in compute_entry_properties (case 0x4A) -- resolves owning section and tracks max value per entry function.

Blackwell+ Features (sm_100+)

CodeHexNameFormatDescription
790x4FEIATTR_AT_ENTRY_FRAGEMENTSFreeFragment descriptors at function entry (note: "FRAGEMENTS" is a typo in the binary; corrected variant EIATTR_AT_ENTRY_FRAGMENTS at 0x245E8D9).
800x50EIATTR_SPARSE_MMA_MASKFreeSparsity mask for structured-sparse MMA operations.
810x51EIATTR_TCGEN05_1CTA_USEDIndexedtcgen05 (5th-gen tensor core) single-CTA mode used.
820x52EIATTR_TCGEN05_2CTA_USEDIndexedtcgen05 two-CTA mode used.
830x53EIATTR_GEN_ERRBAR_AT_EXITIndexedGenerate error barrier at kernel exit.
840x54EIATTR_REG_RECONFIGIndexedDynamic register reconfiguration (setmaxnreg).
920x5CEIATTR_SANITIZEIndexedAddress sanitizer instrumentation present.

Mercury-Specific

CodeHexNameFormatDescription
900x5AEIATTR_MERCURY_FINALIZER_OPTIONSFreeOptions for the Mercury FNLZR post-link pass.
950x5FEIATTR_MERCURY_ISA_VERSIONSizedMercury ISA version for the shader binary.

Sentinel and Error

CodeHexNameFormatDescription
00x00EIATTR_ERROR--Invalid/error sentinel. Used to "delete" records: the merge sets attr_code = 0 to suppress them.
10x01EIATTR_PAD--Padding record (ignored by parser).
860x56EIATTR_UNKNOWN--Unknown attribute placeholder.
960x60EIATTR_ERROR_LAST--Upper bound sentinel for the main enum range.

Phase 1: Merge (Input Processing)

When merge_elf (sub_45E7D0) encounters a section with sh_type == 0x70000000 (line 1854 of decompiled code), it enters the .nv.info processing path. This path:

  1. Resolves the target section: If sh_link != 0, the symbol is looked up via the input symbol table, mapped to the output symbol index, and the per-function .nv.info.<name> section is created or found via sub_4504B0.

  2. Walks TLV records sequentially: The parser runs while (ptr < end) where end = section_data + sh_size. For each record, it reads format = ptr[0], attr_code = ptr[1], size = *(uint16*)(ptr+2).

  3. Dispatches on format byte: If format == 0x04 (indexed), the payload is copied to arena memory and symbol indices within the payload are remapped. If format != 0x04, the record header is stored directly (no payload copy needed for non-indexed records).

  4. Per-attribute symbol remapping (format 0x04 only): A switch on attr_code determines how symbol indices within the payload are translated:

Switch casesAttribute codesRemapping behavior
2,6,7,8,9,18,19,20,23,38,69IMAGE_SLOT through KPARAM_INFO_V2Remap first uint32 (simple sym_map[payload[0]])
10PARAM_CBANKRemap with lazy symbol creation (creates unnamed symbol if needed)
15EXTERNSRemap every uint32 in the payload array
17,35,47,59FRAME_SIZE, MAX_STACK_SIZE, REGCOUNT, SAM_REGION_STACK_SIZERemap sym_idx, but skip record entirely if source symbol was deleted
55CUDA_API_VERSIONVersion compatibility validation (no remap, but error if version mismatch)
79AT_ENTRY_FRAGMENTSFragment type analysis (scan payload for type 1/2 markers)
defaultAll othersPass through without remapping
  1. Weak symbol handling: For already-processed weak symbols, the entire .nv.info section is skipped with verbose message "weak %s already processed".

Phase 2: Weak Symbol Resolution

merge_weak_function (sub_45D180) strips resource attributes from the discarded weak definition. It walks the output's nv.info linked list at elfw+392 and for each node:

int64_t bitmask = 0x800800020000LL;  // bits 17, 35, 47
for (node = list_head; node != NULL; node = node->next) {
    if (node->secidx == target_nv_info_secidx) {
        // Section matches the discarded weak -- zero the entire record
        if (verbose) fprintf(stderr, "remove weak nvinfo\n");
    } else {
        uint8_t code = node->attr_code;
        if (code <= 0x2F && _bittest64(&bitmask, code)
            && payload[0] == target_sym_idx) {
            // Resource attribute for the discarded symbol -- zero it
            if (verbose) fprintf(stderr, "remove weak frame_size\n");
        } else {
            continue;  // keep this record
        }
    }
    node->attr_code = 0;  // overwrite to EIATTR_ERROR (suppressed)
}

The three codes matching bitmask 0x800800020000:

  • Bit 17 = EIATTR_FRAME_SIZE (0x11)
  • Bit 35 = EIATTR_MAX_STACK_SIZE (0x23)
  • Bit 47 = EIATTR_REGCOUNT (0x2F)

The rationale: when a weak function is replaced, the replacement's resource descriptors are the ones that matter. The discarded definition's REGCOUNT, FRAME_SIZE, and MAX_STACK_SIZE would be incorrect for the replacement and must not contaminate the output.

Phase 3: Pre-Finalization (Callgraph Processing)

sub_44C030 (the EIATTR serialization builder, called from sub_44DB00 in the pre-finalization fixup) processes the callgraph to prepare nv.info data:

  1. Callgraph traversal: Iterates all callgraph entries starting at elfw+408. For each function pair (caller, callee), resolves the callee's callgraph node and links it to the caller's callee list.

  2. Recursion detection: Uses a two-flag approach (bytes at offsets +48 and +49 in callgraph nodes). If recursion is detected, the function is marked as recursive ("recursion at function %d"). Recursive functions cannot have statically-determined stack sizes.

  3. Callee list propagation: For each entry function, the transitive closure of callees is computed. Each callee's entry is added to the entry function's "reachable" list via sub_4644C0.

Phase 4: Finalization (Output Generation)

compute_entry_properties (sub_451D80, 97,969 bytes -- the largest function in the linker) runs during the finalization phase. It operates on all nv.info records and computes derived properties for each kernel entry point:

4a. Symbol index fixup: For indexed records with codes 2, 6--10, 17--20, 23, 35, 38, 47, 59, 69 (the same set as in merge remapping), the function re-resolves symbol indices through the positive/negative symbol mapping tables (elfw+456/elfw+464). If a symbol was deleted but the record still references it, attr_code is set to 0 (suppressed).

4b. Register count propagation (sub_450ED0): Walks the callgraph and propagates the maximum register count from callees to each entry kernel.

for each entry_function in callgraph:
    regcount = entry_function.regcount;
    for each callee in transitive_callees(entry_function):
        if callee.regcount > regcount:
            /* Raise entry's count to match */
            entry_function.regcount = callee.regcount;
            if (verbose)
                fprintf(stderr, "regcount %d for %s propagated to entry %s\n",
                    callee.regcount, callee.name, entry_function.name);

If no REGCOUNT record exists for an entry function (e.g., it was a leaf function with the count stored in section flags), the linker creates one via sub_450B70(elfw, 0x2F, 8, payload, 0, 0). The payload is [sym_idx:4][regcount:4].

4c. Barrier count creation and propagation: When a kernel's section flags contain a barrier count (bits 26:20 of sh_flags) but no EIATTR_NUM_BARRIERS record exists, the function creates one:

"Creating new EIATTR_NUM_BARRIERS and moving barcount %d
 from section flags of %s to nvinfo for entry symbol %s"

The linker then clears the barrier bits from section flags (sh_flags &= 0xF80FFFFF) because the nv.info record is now the authoritative source.

If a callee's barrier count exceeds the entry kernel's:

"Propagating higher barcount %d to the section flags of %s of entry symbol %s"
"Propagating higher barcount %d to the section %s of entry symbol %s"

4d. EXTERNS and AT_ENTRY_FRAGMENTS creation: For entry functions, the finalization phase creates EIATTR_EXTERNS (0x0F) and EIATTR_AT_ENTRY_FRAGMENTS (0x4F) records for the transitive closure of external references and fragment descriptors.

4e. Encoding: nvinfo_encode (sub_468760, 14,322 bytes) serializes the in-memory linked-list nodes into on-disk TLV records using SSE2/AVX intrinsics for efficient bitfield packing. The encoder processes a descriptor array of 4 x uint32 entries per field, with the third uint32 serving as a type code that dispatches to one of ~15 encoding strategies (cases 0 through 0x12 in the encoder's switch).

Emission Subsystem (Embedded ptxas)

When nvlink performs LTO compilation, the embedded ptxas compiler generates .nv.info attributes through a dedicated emission subsystem at 0x15C5000--0x160FFFF. This subsystem is the single largest code region dedicated to .nv.info processing in the entire binary.

Architecture

The emission pipeline has three layers:

Layer 1: SM dispatch (sub_15C0CE0). A singleton initialization function registers per-SM callback tables for 12 architecture families (sm_75 through sm_121). Each SM gets an nv.info emitter callback looked up through map A8 via sub_15C3DB0. The callback creates a ~1,936-byte codegen state with architecture-specific constants at offsets 344 and 348 (compute capability encoding).

Layer 2: Master emitters (4 functions, 78--55 KB each). These are the top-level attribute-lowering functions that read compilation state and dispatch to per-attribute-type handlers:

AddressSizeIdentitySpecialty
sub_15C4A7023,547 Bemit_nv_info_section_type1Core attributes
sub_15C58F078,811 Bemit_nv_info_section_type2Comprehensive lowering (largest)
sub_15C8A8040,921 Bemit_nv_info_section_type3Texture/surface references
sub_15CA45054,675 Bemit_nv_info_section_extendedExtended attributes (sm_90+)

The master emitters use an FNV-1a hash table (offset basis 0x811C9DC5, prime 16,777,619) at object+488 for O(1) function-ID-to-attribute lookup using 24-byte entries.

Layer 3: Per-attribute handlers (~190 functions at 0x15CF070--0x160FFFF). Each function is 4--8 KB and handles exactly one EIATTR type. They follow a uniform template:

1. Read attribute descriptor pointer from a2
2. Read sub-attribute fields from known offsets (m128i-based, 32-byte descriptors)
3. Call sub_A4CBB0 to create attribute IR node
4. Call sub_A49120 to set EIATTR type code
5. Call sub_A49190/sub_A49140 for type validation
6. Write output via sub_4A3D60 (operand builder)
7. Return constructed attribute list

The uniformity across ~190 functions suggests they are generated from a data-driven table or macro expansion in NVIDIA's source code, one handler per EIATTR type.

Supporting Functions

AddressSizeIdentityRole
sub_15CCEE012,668 Bsort_and_merge_attribute_listsOrders nv.info entries before emission
sub_15CD6A025,822 Bmerge_attribute_sectionsMerges nv.info from multiple compilation units
sub_15CE65014,296 Bvalidate_and_emit_attributesValidates attributes before final emission
sub_163135044,830 Bprocess_kernel_attributesMaster kernel attribute processor (shared memory, regs, stack)
sub_16312F0smallemit_reserved_smem_attributesEmits .nv.reservedSmem.* attribute records
sub_1655A6030,332 Blower_nv_info_to_codegenConverts nv.info symbol references to codegen operands

How .nv.info Drives GPU Resource Allocation

The .nv.info section is not just metadata for tools -- it is the primary input to the GPU driver's kernel launch resource allocator. The relationship is:

  1. Register allocation: EIATTR_REGCOUNT tells the driver how many registers each thread needs. The driver computes: max_warps_per_SM = total_registers / (regcount * warp_size). This is the single most important occupancy-determining attribute.

  2. Shared memory reservation: EIATTR_SMEM_PARAM_SIZE and EIATTR_RESERVED_SMEM_0_SIZE determine how much shared memory to carve out before the kernel's dynamic shared memory allocation.

  3. Stack allocation: EIATTR_CRS_STACK_SIZE and EIATTR_MAX_STACK_SIZE determine per-thread stack allocation. If the driver gets this wrong (too small), the kernel will corrupt memory; if too large, occupancy drops.

  4. Barrier reservation: EIATTR_NUM_BARRIERS reserves named barrier slots. On most architectures the hardware supports 16 barriers per CTA. The driver must configure the barrier hardware before launch.

  5. Instruction patching: The offset tables (EIATTR_EXIT_INSTR_OFFSETS, EIATTR_S2RCTAID_INSTR_OFFSETS, EIATTR_SW*_WAR) tell the driver which instruction words to patch. This enables hardware workarounds and CTA-ID remapping for cluster launch without recompilation.

  6. Cluster configuration: EIATTR_CTA_PER_CLUSTER and EIATTR_EXPLICIT_CLUSTER (sm_90+) control the cluster launch hardware, determining how many CTAs share distributed shared memory.

Binary Artifacts

Typos Preserved in the Binary

String in binaryCorrect spellingLocation
EIATTR_AT_ENTRY_FRAGEMENTSEIATTR_AT_ENTRY_FRAGMENTS0x1D36E0F
"Invalid section type in .nv.info section header"(correct)0x2460218

A corrected variant EIATTR_AT_ENTRY_FRAGMENTS also exists at 0x245E8D9, suggesting awareness of the typo but preservation of the original for backward compatibility. Similarly, EIATTR_COROUTINE_RESUME_ID_OFFSETS at 0x245F010 is an alternate name for code 58 (EIATTR_COROUTINE_RESUME_OFFSETS), used in the embedded ptxas compiler.

Diagnostic Strings

"nvinfo <fmt=%d,attr=%d,size=%d>, secidx=%d"            (sub_4478F0 debug dump)
"no new register count found for %s, checking .nv.info"  (0x1D3BA68)
"no original register count found for %s, checking .nv.info" (0x1D3BAA0)
"regcount %d for %s propagated to entry %s"              (0x1D3B070)
"no regcount?"                                           (0x1D3AF6C)
"entry function '%s' with max regcount of %d calls function
 '%s' with regcount of %d"                               (0x1D39930)
"Creating new EIATTR_NUM_BARRIERS and moving barcount %d
 from section flags of %s to nvinfo for entry symbol %s" (0x1D3AF88)
"Propagating higher barcount %d to the section flags of
 %s of entry symbol %s"                                  (sub_450ED0)
"remove weak nvinfo"                                     (sub_45D180 debug)
"remove weak frame_size"                                 (sub_45D180 debug)
"weak %s already processed"                              (sub_45E7D0 merge)
"error while updating global nvinfo section"             (embedded ptxas)
"Stack size for entry function '%s' cannot be
 statically determined"                                  (sub_451D80)

Cross-References

Sibling wikis (ptxas):

Confidence Assessment

ClaimConfidenceEvidence
SHT_CUDA_INFO = 0x70000000HIGHVerified in merge_elf dispatch (line 1854) and sub_4504B0 calls
97 EIATTR codes (0-96)HIGHsub_42F760 rejects attr_code > 0x60 (96); all 97 names in nvlink_strings.json
Name table at VA 0x1D37D60 (16-byte entries)HIGHsub_42F760 accesses dword_1D37D68[4*a1] and off_1D37D60[2*a1] confirming 16-byte stride
Name table metadata: [min_version:4, usage_policy:4]HIGHsub_42F760 tests metadata[0] > toolkit_version and dispatches on metadata[1] (0=warn, 1=error, 2=drop)
TLV format: [format:1][attr:1][size:2][payload:var]HIGHConfirmed by diagnostic format string "nvinfo <fmt=%d,attr=%d,size=%d>, secidx=%d" and merge parser at sub_45E7D0:1900
Internal node: [fmt:1][attr:1][size:2][secidx:4][payload_ptr:8]HIGHConfirmed by sub_4508F0 which sets all fields, and sub_4478F0 which prints them
Format bytes 0x01--0x04 meaningHIGHVerified in merge parser: format==4 triggers payload copy + remap; format!=4 skips payload
Weak symbol bitmask 0x800800020000 = bits 17,35,47HIGHVerified: _bittest64(&0x800800020000, n) matches FRAME_SIZE(17), MAX_STACK_SIZE(35), REGCOUNT(47)
Merge per-attribute remap switch casesHIGHCases 2,6,7,8,9,10,15,17,18,19,20,23,35,38,47,55,59,69,79 verified in sub_45E7D0:1917-1997
sub_4504B0 creates .nv.info sectionsHIGHDecompiled: global path at line 46 (sh_flags=0), per-function at line 63 (sh_flags=0x40)
sub_450B70 creates indexed EIATTR nodesHIGHDecompiled: sets format=4, attr=a2, size=a3, calls sub_4504B0 and sub_4644C0
sub_4508F0 creates any-format nodesHIGHDecompiled: sets format=a2, attr=a3, secidx=a4, 16-byte arena allocation
EIATTR_NUM_BARRIERS synthesized as format=0x02HIGH*(_WORD *)v86 = 19458 (0x4C02) = format 0x02, attr 0x4C confirmed in sub_450ED0:207
Barrier count from section flags bits 26:20HIGH(*(_DWORD *)(v8 + 8) >> 20) & 0x7F extracts 7-bit barrier count; cleared with 0xF80FFFFF mask
Register count propagation via callgraphHIGH"regcount %d for %s propagated to entry %s" at sub_450ED0:417
sub_42F760 validates attr_code <= 0x60HIGHDecompiled: if (a1 > 0x60u) { error("unknown attribute"); return 0; }
Parser sub_44E8B0 is nv.info parserLOWsub_44E8B0 is actually a string tokenizer (handles quotes/brackets/escapes); the page's original claim was incorrect. The actual TLV parser is inline in sub_45E7D0 (merge) and sub_451D80 (finalize).
Encoder sub_468760 (14,322 bytes)HIGHDecompiled file exists; SSE2 constants and bitfield packing confirmed
Master emission sub_15C58F0 (78,811 bytes)HIGHDecompiled file exists
Four master emitter functionsHIGHAll four decompiled files exist
~190 per-attribute handlers at 0x15CF070MEDIUMRange confirmed by decompiled file enumeration; exact count is approximate
FNV-1a hash (offset basis 0x811C9DC5) in emitterMEDIUMHash constant seen in decompiled code; specific usage context inferred from pattern matching
sub_44C030 is callgraph-based nv.info builderHIGHDecompiled: accesses callgraph at elfw+408, handles recursion detection, builds callee lists
CUDA API version 0x83 = CUDA 13.1MEDIUMValue observed in cubin output; mapping to CUDA version is conventional
EIATTR_AT_ENTRY_FRAGEMENTS typoHIGHString at 0x1D36E0F confirmed in nvlink_strings.json with typo preserved