.nv.info Metadata
The .nv.info section is NVIDIA's proprietary ELF metadata format that encodes per-kernel resource requirements and compilation artifacts. Every CUDA kernel carries a .nv.info section (or a per-function variant) that tells the GPU driver how many registers to allocate, how much shared memory to reserve, what barriers the kernel uses, and dozens of other resource descriptors. Without this metadata, the driver cannot launch the kernel -- it would have no way to know the kernel's hardware resource footprint.
nvlink both reads and writes .nv.info sections. During the merge phase it parses incoming .nv.info records to extract register counts for weak symbol resolution. During finalization it encodes computed properties (propagated register counts, barrier counts, stack sizes) back into .nv.info records in the output cubin. The embedded ptxas compiler produces .nv.info through a massive emission subsystem spanning ~190 functions across 1 MB of code.
Key Facts
| Property | Value |
|---|---|
| ELF section type | SHT_CUDA_INFO = 0x70000000 (1,879,048,192) |
| Section name (global) | .nv.info |
| Section name (per-function) | .nv.info.<function_name> |
| Record format | Type-Length-Value (TLV), 4-byte aligned |
| Known attribute count | 97 EIATTR codes: 0--96 (v13.0.88) |
| Attribute validation | sub_42F760: rejects codes > 0x60 (96) as "unknown attribute" |
| Section creation | sub_4504B0 (global: line 46, per-function: line 63) |
| Record node creation (indexed) | sub_450B70 (create_eiattr_indexed_record) |
| Record node creation (any format) | sub_4508F0 (create_eiattr_node) |
| Encoder function | sub_468760 (nvinfo_encode, 14,322 bytes) |
| Master emission function | sub_15C58F0 (78,811 bytes -- largest nv.info emitter) |
| Emission function count | ~190 functions at 0x15CF070--0x160FFFF |
| Name table | 97 x 16-byte entries at VA 0x1D37D60 (8-byte string ptr + 8-byte metadata) |
| Diagnostic format string | "nvinfo <fmt=%d,attr=%d,size=%d>, secidx=%d" |
| Validation error | "Invalid section type in .nv.info section header" |
| Global update error | "error while updating global nvinfo section" |
Section Variants
A cubin contains two kinds of .nv.info sections, distinguished by name:
Global .nv.info
A single section named .nv.info with sh_link = 0 (no associated symbol). This contains attributes that apply to the entire compilation unit -- CUDA API version, compatibility flags, and shared metadata that is not specific to any one kernel.
sub_4504B0 creates this section when called with a2 = 0 (no symbol index). It calls sub_4411D0 to find an existing .nv.info section; if none exists, it creates one via sub_441AC0 with type 0x70000000, alignment 4, and sh_flags = 0.
Per-Function .nv.info.<name>
One section per kernel or device function, named .nv.info.<function_name> with sh_link pointing to the symbol table entry for that function. These sections carry per-kernel resource descriptors: register count, barrier count, stack sizes, parameter bank layout, and instruction-offset tables for various runtime patching needs.
sub_4504B0 creates per-function sections when called with a2 != 0. It constructs the name via sprintf(buf, "%s.%s", ".nv.info", func_name) where func_name is looked up from the symbol at offset +32. The section is created with sh_flags = 0x40 and sh_link pointing to the owning symbol. After creation, sub_4426D0 links the section to the function's symbol entry.
During the merge phase (sub_45E7D0), nvlink identifies .nv.info sections by checking sh_type == 0x70000000. The sh_link field determines whether a record is global (link=0) or per-function (link = symbol index). The merge function translates symbol indices from input-local to output-global using its mapping tables.
TLV Record Format
Each .nv.info section contains a flat sequence of 4-byte-aligned TLV (Type-Length-Value) records. There is no section header or record count -- the parser walks from byte 0 to sh_size, consuming records sequentially.
On-Disk Record Layout
Offset Size Field
------ ---- -----
0x00 1 format Format byte (determines payload structure)
0x01 1 attr_code EIATTR type code (identifies the attribute)
0x02 2 size Payload size in bytes (little-endian uint16)
0x04 var payload Attribute-specific data (size bytes)
Total record size = 4 + size, padded to 4-byte alignment. The maximum theoretical payload is 65,535 bytes (16-bit size field), though in practice payloads rarely exceed a few hundred bytes.
In-Memory Node Layout (Linker Internal)
nvlink stores parsed TLV records as 16-byte linked-list nodes in an arena-allocated chain at elfw+392. Each node has:
Offset Size Field
------ ---- -----
0x00 1 format Format byte (0x01--0x04)
0x01 1 attr_code EIATTR type code
0x02 2 size Payload size in bytes
0x04 4 secidx Output section index (from sub_4504B0)
0x08 8 payload_ptr Pointer to arena-allocated payload data
Nodes are linked via a separate linked-list structure managed by sub_4644C0 (list-append). The diagnostic function at sub_4478F0 iterates this list and prints each record as: "nvinfo <fmt=%d,attr=%d,size=%d>, secidx=%d" using fields at bytes 0, 1, 2, and 4 respectively.
Format Byte
The format byte at offset 0 controls how the payload is interpreted:
| Format | Name | Payload structure | Usage |
|---|---|---|---|
0x01 | Free format | Raw bytes, attribute-specific layout | Instruction offset tables, parameter info, WAR patches |
0x02 | Value format | Single 32-bit value (no symbol index) | Module-wide flags (HAS_PRE_V10_OBJECT, NUM_BARRIERS in some paths) |
0x03 | Sized format | 16-bit value in the size field | MAXREG_COUNT, CBANK_PARAM_SIZE, MERCURY_ISA_VERSION |
0x04 | Indexed format | [sym_index:4] [value:4] -- per-symbol attribute | Most per-function resource attributes |
Format 0x04 (indexed) is the most common for per-function attributes. The 4-byte symbol index at payload offset 0 identifies which function the attribute applies to. The linker uses this index for symbol remapping during merge and for extracting per-function properties during finalization.
Format 0x03 (sized) encodes the value directly in the 16-bit size field of the TLV header, with no additional payload bytes. This is used for small integer attributes like MAXREG_COUNT and CBANK_PARAM_SIZE.
Format 0x01 (free) carries variable-length data. The size field gives the byte count. Offset tables use arrays of 4-byte instruction offsets; KPARAM_INFO uses structured records; EXTERNS uses arrays of symbol indices.
Parsing Pseudocode
From the decompiled merge path in sub_45E7D0 (lines 1900--2052):
uint8_t *ptr = section_data;
uint8_t *end = section_data + section_size;
while (ptr < end) {
uint8_t format = ptr[0];
uint8_t attr_code = ptr[1];
uint16_t size = *(uint16_t *)(ptr + 2);
if (format == 0x04) {
/* Indexed format: first 4 bytes of payload = symbol index */
uint32_t *payload = arena_alloc_copy(ptr + 4, size);
/* Symbol index remapping per attr_code */
switch (attr_code) {
case 2: case 6: case 7: case 8: case 9: /* texture/image */
case 18: case 19: case 20: case 23: case 38: /* param/cache */
case 69: /* kparam_info_v2 */
payload[0] = sym_map[payload[0]]; /* remap first sym_idx */
break;
case 10: /* PARAM_CBANK -- remap with lazy symbol creation */
payload[0] = sym_map_or_create(payload[0]);
break;
case 15: /* EXTERNS -- remap all uint32 entries */
for (i = 0; i < size/4; i++)
payload[i] = sym_map[payload[i]];
break;
case 17: case 35: case 47: case 59: /* resource attrs */
/* Check if symbol was deleted (weak-replaced) */
if (is_deleted[payload[0]]) { ptr += 4 + size; continue; }
payload[0] = sym_map[payload[0]];
break;
case 55: /* CUDA_API_VERSION -- version compatibility check */
validate_cuda_api_version(payload[0]);
break;
case 79: /* AT_ENTRY_FRAGMENTS -- fragment type analysis */
analyze_fragment_types(payload, size/4);
break;
default:
break;
}
node = create_nvinfo_node(format, attr_code, nv_info_secidx);
node->payload = payload;
node->size = size;
} else {
/* Non-indexed: just create node with header, no payload copy */
node = create_nvinfo_node(format, attr_code, nv_info_secidx);
node->size = size;
ptr += 4; /* advance past header only, size already in node */
}
ptr = ptr + 4 + ALIGN_UP(size, 4);
}
EIATTR Attribute Catalog
nvlink v13.0.88 defines 97 EIATTR (ELF Info ATTRibute) codes, numbered 0 through 96. The name-to-code mapping was extracted directly from the pointer table at VA 0x1D37D60 in the nvlink binary (16-byte entries: 8-byte string pointer + 8-byte metadata, indexed by code). The string names reside at 0x1D36819--0x1D37170. Codes were verified against cubin TLV records produced by ptxas/nvcc v13.1 and cross-checked against the compute_entry_properties (sub_451D80) dispatch table. The validation function sub_42F760 confirms 97 codes by rejecting any attr_code > 0x60.
Name Table Structure
The EIATTR name table is a contiguous array of 97 x 16-byte entries at VA 0x1D37D60. Each entry:
Offset Size Field
------ ---- -----
0x00 8 name_ptr Pointer to null-terminated string (e.g., "EIATTR_REGCOUNT")
0x08 4 min_toolkit_ver Minimum CUDA toolkit version that supports this attribute
0x0C 4 usage_policy 0=warn, 1=error, 2=silently drop when version too old
The table is indexed by EIATTR code number. sub_42F760 reads dword_1D37D68[4 * attr_code] (the min_toolkit_ver field at offset +8) to check whether the current toolkit version (elfw+624) supports the attribute. If the toolkit version is too old, the usage_policy determines whether to emit a warning, error, or silently omit the record.
Complete Code Table (Sequential)
All 97 codes in numeric order. Use this as the authoritative reference when parsing .nv.info TLV records.
| Code | Hex | Name | Format | Type |
|---|---|---|---|---|
| 0 | 0x00 | EIATTR_ERROR | -- | Sentinel |
| 1 | 0x01 | EIATTR_PAD | -- | Sentinel |
| 2 | 0x02 | EIATTR_IMAGE_SLOT | Indexed | Texture |
| 3 | 0x03 | EIATTR_JUMPTABLE_RELOCS | Free | Metadata |
| 4 | 0x04 | EIATTR_CTAIDZ_USED | Indexed | Metadata |
| 5 | 0x05 | EIATTR_MAX_THREADS | Indexed | Resource |
| 6 | 0x06 | EIATTR_IMAGE_OFFSET | Indexed | Texture |
| 7 | 0x07 | EIATTR_IMAGE_SIZE | Indexed | Texture |
| 8 | 0x08 | EIATTR_TEXTURE_NORMALIZED | Indexed | Texture |
| 9 | 0x09 | EIATTR_SAMPLER_INIT | Indexed | Texture |
| 10 | 0x0A | EIATTR_PARAM_CBANK | Indexed | Param |
| 11 | 0x0B | EIATTR_SMEM_PARAM_OFFSETS | Free | Param |
| 12 | 0x0C | EIATTR_CBANK_PARAM_OFFSETS | Free | Param |
| 13 | 0x0D | EIATTR_SYNC_STACK | Indexed | Metadata |
| 14 | 0x0E | EIATTR_TEXID_SAMPID_MAP | Free | Texture |
| 15 | 0x0F | EIATTR_EXTERNS | Free | Metadata |
| 16 | 0x10 | EIATTR_REQNTID | Indexed | Resource |
| 17 | 0x11 | EIATTR_FRAME_SIZE | Indexed | Resource |
| 18 | 0x12 | EIATTR_MIN_STACK_SIZE | Indexed | Resource |
| 19 | 0x13 | EIATTR_SAMPLER_FORCE_UNNORMALIZED | Indexed | Texture |
| 20 | 0x14 | EIATTR_BINDLESS_IMAGE_OFFSETS | Free | Texture |
| 21 | 0x15 | EIATTR_BINDLESS_TEXTURE_BANK | Indexed | Texture |
| 22 | 0x16 | EIATTR_BINDLESS_SURFACE_BANK | Indexed | Texture |
| 23 | 0x17 | EIATTR_KPARAM_INFO | Free | Param |
| 24 | 0x18 | EIATTR_SMEM_PARAM_SIZE | Indexed | Param |
| 25 | 0x19 | EIATTR_CBANK_PARAM_SIZE | Sized | Param |
| 26 | 0x1A | EIATTR_QUERY_NUMATTRIB | Indexed | Metadata |
| 27 | 0x1B | EIATTR_MAXREG_COUNT | Sized | Resource |
| 28 | 0x1C | EIATTR_EXIT_INSTR_OFFSETS | Free | Offsets |
| 29 | 0x1D | EIATTR_S2RCTAID_INSTR_OFFSETS | Free | Offsets |
| 30 | 0x1E | EIATTR_CRS_STACK_SIZE | Indexed | Resource |
| 31 | 0x1F | EIATTR_NEED_CNP_WRAPPER | Indexed | Metadata |
| 32 | 0x20 | EIATTR_NEED_CNP_PATCH | Indexed | Metadata |
| 33 | 0x21 | EIATTR_EXPLICIT_CACHING | Indexed | Metadata |
| 34 | 0x22 | EIATTR_ISTYPEP_USED | Indexed | Metadata |
| 35 | 0x23 | EIATTR_MAX_STACK_SIZE | Indexed | Resource |
| 36 | 0x24 | EIATTR_SUQ_USED | Indexed | Metadata |
| 37 | 0x25 | EIATTR_LD_CACHEMOD_INSTR_OFFSETS | Free | Offsets |
| 38 | 0x26 | EIATTR_LOAD_CACHE_REQUEST | Indexed | Metadata |
| 39 | 0x27 | EIATTR_ATOM_SYS_INSTR_OFFSETS | Free | Offsets |
| 40 | 0x28 | EIATTR_COOP_GROUP_INSTR_OFFSETS | Free | Offsets |
| 41 | 0x29 | EIATTR_COOP_GROUP_MASK_REGIDS | Indexed | Cluster |
| 42 | 0x2A | EIATTR_SW1850030_WAR | Free | WAR |
| 43 | 0x2B | EIATTR_WMMA_USED | Indexed | Metadata |
| 44 | 0x2C | EIATTR_HAS_PRE_V10_OBJECT | Value | Metadata |
| 45 | 0x2D | EIATTR_ATOMF16_EMUL_INSTR_OFFSETS | Free | Offsets |
| 46 | 0x2E | EIATTR_ATOM16_EMUL_INSTR_REG_MAP | Free | Offsets |
| 47 | 0x2F | EIATTR_REGCOUNT | Indexed | Resource |
| 48 | 0x30 | EIATTR_SW2393858_WAR | Free | WAR |
| 49 | 0x31 | EIATTR_INT_WARP_WIDE_INSTR_OFFSETS | Free | Offsets |
| 50 | 0x32 | EIATTR_SHARED_SCRATCH | Indexed | Shared |
| 51 | 0x33 | EIATTR_STATISTICS | Free | Metadata |
| 52 | 0x34 | EIATTR_INDIRECT_BRANCH_TARGETS | Free | Offsets |
| 53 | 0x35 | EIATTR_SW2861232_WAR | Free | WAR |
| 54 | 0x36 | EIATTR_SW_WAR | Free | WAR |
| 55 | 0x37 | EIATTR_CUDA_API_VERSION | Indexed | Metadata |
| 56 | 0x38 | EIATTR_NUM_MBARRIERS | Indexed | Resource |
| 57 | 0x39 | EIATTR_MBARRIER_INSTR_OFFSETS | Free | Offsets |
| 58 | 0x3A | EIATTR_COROUTINE_RESUME_OFFSETS | Free | Offsets |
| 59 | 0x3B | EIATTR_SAM_REGION_STACK_SIZE | Indexed | Resource |
| 60 | 0x3C | EIATTR_PER_REG_TARGET_PERF_STATS | Free | Metadata |
| 61 | 0x3D | EIATTR_CTA_PER_CLUSTER | Indexed | Cluster |
| 62 | 0x3E | EIATTR_EXPLICIT_CLUSTER | Indexed | Cluster |
| 63 | 0x3F | EIATTR_MAX_CLUSTER_RANK | Indexed | Cluster |
| 64 | 0x40 | EIATTR_INSTR_REG_MAP | Free | Metadata |
| 65 | 0x41 | EIATTR_RESERVED_SMEM_USED | Indexed | Shared |
| 66 | 0x42 | EIATTR_RESERVED_SMEM_0_SIZE | Indexed | Shared |
| 67 | 0x43 | EIATTR_UCODE_SECTION_DATA | Free | Metadata |
| 68 | 0x44 | EIATTR_UNUSED_LOAD_BYTE_OFFSET | Free | Offsets |
| 69 | 0x45 | EIATTR_KPARAM_INFO_V2 | Free | Param |
| 70 | 0x46 | EIATTR_SYSCALL_OFFSETS | Free | Offsets |
| 71 | 0x47 | EIATTR_SW_WAR_MEMBAR_SYS_INSTR_OFFSETS | Free | WAR |
| 72 | 0x48 | EIATTR_GRAPHICS_GLOBAL_CBANK | Indexed | Graphics |
| 73 | 0x49 | EIATTR_SHADER_TYPE | Indexed | Graphics |
| 74 | 0x4A | EIATTR_VRC_CTA_INIT_COUNT | Indexed | Graphics |
| 75 | 0x4B | EIATTR_TOOLS_PATCH_FUNC | Indexed | Metadata |
| 76 | 0x4C | EIATTR_NUM_BARRIERS | Indexed | Resource |
| 77 | 0x4D | EIATTR_TEXMODE_INDEPENDENT | Indexed | Texture |
| 78 | 0x4E | EIATTR_PERF_STATISTICS | Free | Metadata |
| 79 | 0x4F | EIATTR_AT_ENTRY_FRAGEMENTS | Free | Blackwell |
| 80 | 0x50 | EIATTR_SPARSE_MMA_MASK | Free | Blackwell |
| 81 | 0x51 | EIATTR_TCGEN05_1CTA_USED | Indexed | Blackwell |
| 82 | 0x52 | EIATTR_TCGEN05_2CTA_USED | Indexed | Blackwell |
| 83 | 0x53 | EIATTR_GEN_ERRBAR_AT_EXIT | Indexed | Blackwell |
| 84 | 0x54 | EIATTR_REG_RECONFIG | Indexed | Blackwell |
| 85 | 0x55 | EIATTR_ANNOTATIONS | Free | Metadata |
| 86 | 0x56 | EIATTR_UNKNOWN | -- | Sentinel |
| 87 | 0x57 | EIATTR_STACK_CANARY_TRAP_OFFSETS | Free | Offsets |
| 88 | 0x58 | EIATTR_STUB_FUNCTION_KIND | Indexed | Metadata |
| 89 | 0x59 | EIATTR_LOCAL_CTA_ASYNC_STORE_OFFSETS | Free | Offsets |
| 90 | 0x5A | EIATTR_MERCURY_FINALIZER_OPTIONS | Free | Mercury |
| 91 | 0x5B | EIATTR_BLOCKS_ARE_CLUSTERS | Indexed | Cluster |
| 92 | 0x5C | EIATTR_SANITIZE | Indexed | Blackwell |
| 93 | 0x5D | EIATTR_SYSCALLS_FALLBACK | Free | Metadata |
| 94 | 0x5E | EIATTR_CUDA_REQ | Free | Metadata |
| 95 | 0x5F | EIATTR_MERCURY_ISA_VERSION | Sized | Mercury |
| 96 | 0x60 | EIATTR_ERROR_LAST | -- | Sentinel |
Most Important EIATTR Entries
This section provides detailed format information for the EIATTR entries that are most critical to GPU kernel launch, driver resource allocation, and the linker's own processing.
EIATTR_REGCOUNT (0x2F) -- Register Count
The single most important occupancy-determining attribute. The GPU driver computes max_warps_per_SM = total_registers / (regcount * warp_size) to determine how many warps can execute concurrently.
Format: 0x04 (Indexed)
On-disk layout (12 bytes total):
Byte 0: 0x04 (indexed format)
Byte 1: 0x2F (EIATTR_REGCOUNT)
Bytes 2-3: 0x0008 (size = 8 bytes)
Bytes 4-7: sym_idx (uint32, function symbol index)
Bytes 8-11: regcount (uint32, physical register count per thread)
nvlink creates REGCOUNT records via sub_450B70(elfw, 0x2F, 8, payload_ptr, sym_idx, 0). The payload is an 8-byte pair: [sym_idx:4][regcount:4].
Linker operations on REGCOUNT:
-
Weak resolution (
sub_45D180): When competing weak definitions exist, REGCOUNT is extracted to choose the winner. The definition with fewer registers wins, maximizing occupancy. -
Weak stripping (
sub_45D180): Bitmask0x800800020000marks REGCOUNT (bit 47), FRAME_SIZE (bit 17), and MAX_STACK_SIZE (bit 35) as resource attributes. When a weak symbol is replaced, records matching these codes for the discarded definition are zeroed (*(_BYTE *)(record + 1) = 0, overwriting attr_code to ERROR). The debug message"remove weak nvinfo"is printed when zeroing the entire nv.info section, and"remove weak frame_size"when zeroing individual resource records. -
Propagation (
sub_450ED0):propagate_register_countswalks the callgraph and propagates the maximum register count from callees to each entry kernel. If a callee uses more registers than the entry kernel, the entry kernel's REGCOUNT is raised to match. The verbose trace"regcount %d for %s propagated to entry %s"logs each propagation. -
Validation: If no REGCOUNT is found for an entry function, the linker emits
"no regcount?"via the error system. If a max-regcount-limited entry function calls a callee with a higher register count, it prints:"entry function '%s' with max regcount of %d calls function '%s' with regcount of %d".
EIATTR_NUM_BARRIERS (0x4C) -- Named Barrier Count
Controls how many named barrier slots the CTA hardware allocates. Most architectures support up to 16 barriers per CTA.
Format: 0x04 (Indexed) when read from input cubins
0x02 (Value) when synthesized by the linker during finalization
On-disk layout (indexed, 12 bytes):
Byte 0: 0x04 (indexed format)
Byte 1: 0x4C (EIATTR_NUM_BARRIERS)
Bytes 2-3: size
Bytes 4-7: sym_idx
Bytes 8-11: barrier_count
Linker-synthesized layout (internal node):
Byte 0: 0x02 (value format)
Byte 1: 0x4C (EIATTR_NUM_BARRIERS)
Byte 2: barrier_count (stored in the size field low byte)
Byte 3: 0x00
Bytes 4-7: secidx (output section index)
Barrier migration from section flags: The barrier count is also encoded in the section flags of .text sections as bits 26:20 (7 bits, mask 0x07F00000). During finalization, if a kernel has no EIATTR_NUM_BARRIERS record but its section flags carry a non-zero barrier count, the linker synthesizes one. The verbose message is:
"Creating new EIATTR_NUM_BARRIERS and moving barcount %d
from section flags of %s to nvinfo for entry symbol %s"
After creating the nv.info record, the section flags are cleared: sh_flags &= 0xF80FFFFF.
Barrier propagation: If a callee requires more barriers than the entry kernel's current count, the linker propagates the higher count: "Propagating higher barcount %d to the section flags of %s of entry symbol %s".
EIATTR_FRAME_SIZE (0x11) -- Per-Thread Local Memory Frame
Format: 0x04 (Indexed)
Payload: [sym_idx:4][frame_size:4]
frame_size: bytes of local memory per thread (register spills + local arrays)
One of the three resource attributes stripped during weak symbol replacement (bitmask bit 17). The frame size determines the .nv.local.<funcname> section size visible to the driver.
EIATTR_MAX_STACK_SIZE (0x23) -- Maximum Stack Size
Format: 0x04 (Indexed)
Payload: [sym_idx:4][max_stack_bytes:4]
The worst-case per-thread stack allocation, computed by propagating CRS (Call-Return Stack) sizes through the callgraph. If the driver allocates less than this, the kernel will corrupt memory. If it allocates more, occupancy drops unnecessarily.
One of the three resource attributes stripped during weak replacement (bitmask bit 35).
EIATTR_MIN_STACK_SIZE (0x12) -- Minimum Stack Size
Format: 0x04 (Indexed)
Payload: [sym_idx:4][min_stack_bytes:4]
The non-recursive stack size minimum. Used when the callgraph has no recursion and the exact stack depth is statically known. For recursive kernels, the linker emits "Stack size for entry function '%s' cannot be statically determined".
EIATTR_CRS_STACK_SIZE (0x1E) -- Call-Return Stack Size
Format: 0x04 (Indexed)
Payload: [sym_idx:4][crs_bytes:4]
Size of the call-return stack for nested function calls. Propagated through the callgraph to compute per-entry worst-case stack requirements.
EIATTR_SAM_REGION_STACK_SIZE (0x3B) -- SAM Region Stack
Format: 0x04 (Indexed)
Payload: [sym_idx:4][sam_stack_bytes:4]
SAM (Streaming Asynchronous Memory) region stack size. Processed by sub_44C880 during compute_entry_properties (case 0x3B at line 1322 in sub_451D80). One of the resource attributes subject to symbol index remapping during merge (same case group as 17, 35, 47).
EIATTR_MAX_THREADS (0x05) -- Maximum Threads Per Block
Format: 0x04 (Indexed)
Payload: [sym_idx:4][max_threads:4]
max_threads: maximum threads per block (from .maxntid PTX directive)
EIATTR_REQNTID (0x10) -- Required Thread Count
Format: 0x04 (Indexed)
Payload: [sym_idx:4][reqntid:4]
reqntid: required thread count per dimension (from .reqntid PTX directive)
EIATTR_MAXREG_COUNT (0x1B) -- Maximum Register Hint
Format: 0x03 (Sized)
On-disk layout (4 bytes total, no payload):
Byte 0: 0x03 (sized format)
Byte 1: 0x1B (EIATTR_MAXREG_COUNT)
Bytes 2-3: maxreg (uint16, from --maxrregcount or .maxnreg)
This is a compiler hint, not an absolute limit. The value comes from --maxrregcount or the .maxnreg PTX directive. Uses sized format (0x03) -- the value is encoded directly in the 16-bit size field with no additional payload.
EIATTR_NUM_MBARRIERS (0x38) -- Memory Barrier Count
Format: 0x04 (Indexed)
Payload: [sym_idx:4][num_mbarriers:4]
Number of mbarrier objects used by the kernel. Processed in compute_entry_properties at case 0x38.
EIATTR_PARAM_CBANK (0x0A) -- Parameter Constant Bank
Format: 0x04 (Indexed)
Payload: [sym_idx:4][bank_offset:4]
bank_offset: packed {bank_number:16, offset:16}
Identifies which constant bank and at what offset kernel parameters begin. During merge, the symbol index is remapped with lazy symbol creation (case 10 in the merge switch). The linker may need to create a new symbol table entry if the constant bank symbol doesn't yet exist in the output.
EIATTR_CBANK_PARAM_SIZE (0x19) -- Constant Bank Parameter Size
Format: 0x03 (Sized)
Value: uint16 in the size field = parameter constant bank size in bytes
EIATTR_KPARAM_INFO (0x17) -- Kernel Parameter Info
Format: 0x01 (Free)
Payload: variable-length array of parameter descriptors
Each descriptor: [ordinal:2][offset:2][size:2][?:2] (8 bytes per param)
Describes the type, size, and alignment of each kernel parameter. The v2 variant (EIATTR_KPARAM_INFO_V2, code 69/0x45) carries additional fields.
EIATTR_EXTERNS (0x0F) -- External Symbol References
Format: 0x01 (Free)
Payload: array of uint32 symbol indices
Each 4-byte entry is one external symbol reference
During merge, every 4-byte entry in the payload is remapped through the symbol index translation table (case 15 in the merge switch). During finalization, compute_entry_properties creates new EXTERNS records via sub_450B70(elfw, 0x0F, 4*count, payload, root_kernel_sym, ...).
EIATTR_CUDA_API_VERSION (0x37) -- CUDA API Version
Format: 0x04 (Indexed)
Payload: [sym_idx:4][api_version:4]
api_version: encoded CUDA version (e.g., 0x83 = CUDA 13.1)
During merge, this attribute triggers a version compatibility check (case 55). If the input's CUDA API version exceeds the linker's current maximum (elfw+628), the linker emits an error. This prevents linking objects compiled for a newer CUDA version than the linker supports.
EIATTR_AT_ENTRY_FRAGMENTS (0x4F) -- Entry Fragments (Blackwell)
Format: 0x01 (Free)
Payload: array of uint32 fragment type descriptors
During merge, each 4-byte entry is analyzed for fragment types: values 4--5 indicate type 1, values 6--7 indicate type 2. The detected fragment type is stored at elfw+664. If conflicting fragment types are found across inputs, an error is emitted. During finalization, EXTERNS-style record creation via sub_450B70(elfw, 0x4F, size, payload, root_sym, ...).
EIATTR_RESERVED_SMEM_USED (0x41) -- Reserved Shared Memory
Format: 0x04 (Indexed)
Payload: [sym_idx:4][flags:4]
Processed in compute_entry_properties at case 0x41. The linker resolves the owning section through sub_442270 and checks if the referenced .nv.reservedSmem.* section has content. Related sections in the binary: .nv.reservedSmem.begin, .nv.reservedSmem.cap, .nv.reservedSmem.offset0, .nv.reservedSmem.offset1, .nv.reservedSmem.end.
EIATTR_SHARED_SCRATCH (0x32) -- Shared Scratch Space
Format: 0x04 (Indexed)
Payload: [sym_idx:4][scratch_size:4]
Shared memory scratch space used for register spilling when the register file is exhausted.
Attributes by Category
Resource Allocation (GPU Driver Critical)
These attributes directly control how the GPU driver allocates hardware resources for kernel launch. Incorrect values cause silent performance degradation or launch failure.
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 47 | 0x2F | EIATTR_REGCOUNT | Indexed | Physical register count per thread. The GPU driver computes max_warps_per_SM = total_registers / (regcount * warp_size). This is the single most important occupancy-determining attribute. |
| 5 | 0x05 | EIATTR_MAX_THREADS | Indexed | Maximum threads per block (from .maxntid PTX directive). |
| 16 | 0x10 | EIATTR_REQNTID | Indexed | Required thread count per dimension (from .reqntid). |
| 17 | 0x11 | EIATTR_FRAME_SIZE | Indexed | Per-thread local memory frame size in bytes. |
| 18 | 0x12 | EIATTR_MIN_STACK_SIZE | Indexed | Minimum stack size per thread (non-recursive case). |
| 35 | 0x23 | EIATTR_MAX_STACK_SIZE | Indexed | Maximum stack size per thread (recursive case). |
| 30 | 0x1E | EIATTR_CRS_STACK_SIZE | Indexed | Call-Return-Stack size for nested function calls. |
| 59 | 0x3B | EIATTR_SAM_REGION_STACK_SIZE | Indexed | SAM (Streaming Asynchronous Memory) region stack size. |
| 76 | 0x4C | EIATTR_NUM_BARRIERS | Indexed | Number of named barriers used (max 16 on most architectures). |
| 56 | 0x38 | EIATTR_NUM_MBARRIERS | Indexed | Number of memory barriers (mbarrier objects) used. |
| 27 | 0x1B | EIATTR_MAXREG_COUNT | Sized | Maximum register count hint (from --maxrregcount or .maxnreg). |
Parameter Bank Layout
These describe how kernel parameters are laid out in constant memory bank 0 (c[0x0]).
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 10 | 0x0A | EIATTR_PARAM_CBANK | Indexed | Constant bank number and offset for kernel parameters. Merge uses lazy symbol creation for remapping. |
| 25 | 0x19 | EIATTR_CBANK_PARAM_SIZE | Sized | Size of the parameter constant bank in bytes. |
| 24 | 0x18 | EIATTR_SMEM_PARAM_SIZE | Indexed | Size of shared memory parameter region. |
| 11 | 0x0B | EIATTR_SMEM_PARAM_OFFSETS | Free | Offsets of parameters within shared memory. |
| 12 | 0x0C | EIATTR_CBANK_PARAM_OFFSETS | Free | Offsets of parameters within constant bank. |
| 23 | 0x17 | EIATTR_KPARAM_INFO | Free | Kernel parameter metadata (types, sizes, alignments). |
| 69 | 0x45 | EIATTR_KPARAM_INFO_V2 | Free | Extended kernel parameter info (v2 format with additional fields). |
Instruction Offset Tables
These record byte offsets of specific instruction types within the kernel's .text section, enabling the driver and tools to locate and patch instructions at load time.
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 28 | 0x1C | EIATTR_EXIT_INSTR_OFFSETS | Free | Byte offsets of all EXIT instructions. |
| 29 | 0x1D | EIATTR_S2RCTAID_INSTR_OFFSETS | Free | Offsets of S2R instructions reading SR_CTAID (CTA ID). |
| 37 | 0x25 | EIATTR_LD_CACHEMOD_INSTR_OFFSETS | Free | Offsets of load instructions with cache modifier. |
| 39 | 0x27 | EIATTR_ATOM_SYS_INSTR_OFFSETS | Free | Offsets of atomic instructions with .sys scope. |
| 40 | 0x28 | EIATTR_COOP_GROUP_INSTR_OFFSETS | Free | Offsets of cooperative group instructions. |
| 45 | 0x2D | EIATTR_ATOMF16_EMUL_INSTR_OFFSETS | Free | Offsets of emulated FP16 atomic instructions. |
| 46 | 0x2E | EIATTR_ATOM16_EMUL_INSTR_REG_MAP | Free | Register map for 16-bit atomic emulation. |
| 49 | 0x31 | EIATTR_INT_WARP_WIDE_INSTR_OFFSETS | Free | Offsets of integer warp-wide instructions. |
| 52 | 0x34 | EIATTR_INDIRECT_BRANCH_TARGETS | Free | Valid targets of indirect branches (for CFI). |
| 57 | 0x39 | EIATTR_MBARRIER_INSTR_OFFSETS | Free | Offsets of MBAR (memory barrier) instructions. |
| 58 | 0x3A | EIATTR_COROUTINE_RESUME_OFFSETS | Free | Resume point offsets for device-side coroutines. |
| 68 | 0x44 | EIATTR_UNUSED_LOAD_BYTE_OFFSET | Free | Byte offset of unused load instruction. |
| 70 | 0x46 | EIATTR_SYSCALL_OFFSETS | Free | Offsets of __cuda_syscall invocations. |
| 87 | 0x57 | EIATTR_STACK_CANARY_TRAP_OFFSETS | Free | Offsets of stack canary trap instructions (stack protector). |
| 89 | 0x59 | EIATTR_LOCAL_CTA_ASYNC_STORE_OFFSETS | Free | Offsets of CTA-local async store instructions. |
Texture and Surface Binding
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 2 | 0x02 | EIATTR_IMAGE_SLOT | Indexed | Texture/surface image slot assignment. |
| 6 | 0x06 | EIATTR_IMAGE_OFFSET | Indexed | Offset within the image descriptor table. |
| 7 | 0x07 | EIATTR_IMAGE_SIZE | Indexed | Size of the image descriptor. |
| 8 | 0x08 | EIATTR_TEXTURE_NORMALIZED | Indexed | Whether texture coordinates are normalized. |
| 9 | 0x09 | EIATTR_SAMPLER_INIT | Indexed | Sampler initialization parameters. |
| 14 | 0x0E | EIATTR_TEXID_SAMPID_MAP | Free | Texture ID to sampler ID mapping table. |
| 19 | 0x13 | EIATTR_SAMPLER_FORCE_UNNORMALIZED | Indexed | Force unnormalized sampler coordinates. |
| 20 | 0x14 | EIATTR_BINDLESS_IMAGE_OFFSETS | Free | Offsets for bindless image references. |
| 21 | 0x15 | EIATTR_BINDLESS_TEXTURE_BANK | Indexed | Constant bank used for bindless texture descriptors. |
| 22 | 0x16 | EIATTR_BINDLESS_SURFACE_BANK | Indexed | Constant bank used for bindless surface descriptors. |
| 77 | 0x4D | EIATTR_TEXMODE_INDEPENDENT | Indexed | Independent texture mode flag. |
Cluster and Cooperative Launch (sm_90+)
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 41 | 0x29 | EIATTR_COOP_GROUP_MASK_REGIDS | Indexed | Register IDs used for cooperative group masks. |
| 61 | 0x3D | EIATTR_CTA_PER_CLUSTER | Indexed | Number of CTAs per cluster (Hopper cluster launch). |
| 62 | 0x3E | EIATTR_EXPLICIT_CLUSTER | Indexed | Whether kernel uses explicit cluster dimensions. |
| 63 | 0x3F | EIATTR_MAX_CLUSTER_RANK | Indexed | Maximum cluster rank for scheduling. |
| 91 | 0x5B | EIATTR_BLOCKS_ARE_CLUSTERS | Indexed | CTA blocks are clusters flag. |
Shared Memory and Reserved Resources
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 50 | 0x32 | EIATTR_SHARED_SCRATCH | Indexed | Shared memory scratch space for register spilling. |
| 65 | 0x41 | EIATTR_RESERVED_SMEM_USED | Indexed | Whether reserved shared memory is used. |
| 66 | 0x42 | EIATTR_RESERVED_SMEM_0_SIZE | Indexed | Size of reserved shared memory partition 0. |
Software Workarounds
Hardware errata requiring instruction-level patching by the driver.
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 42 | 0x2A | EIATTR_SW1850030_WAR | Free | Workaround for HW bug 1850030. |
| 48 | 0x30 | EIATTR_SW2393858_WAR | Free | Workaround for HW bug 2393858. |
| 53 | 0x35 | EIATTR_SW2861232_WAR | Free | Workaround for HW bug 2861232. |
| 54 | 0x36 | EIATTR_SW_WAR | Free | Generic software workaround container. |
| 71 | 0x47 | EIATTR_SW_WAR_MEMBAR_SYS_INSTR_OFFSETS | Free | Offsets of MEMBAR.SYS instructions needing software workaround. |
Compilation Metadata
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 3 | 0x03 | EIATTR_JUMPTABLE_RELOCS | Free | Jump table relocation entries. |
| 4 | 0x04 | EIATTR_CTAIDZ_USED | Indexed | Whether kernel uses %ctaid.z (3D grid). |
| 13 | 0x0D | EIATTR_SYNC_STACK | Indexed | Synchronization stack depth. |
| 15 | 0x0F | EIATTR_EXTERNS | Free | External symbol references list. |
| 26 | 0x1A | EIATTR_QUERY_NUMATTRIB | Indexed | Number of queryable attributes. |
| 31 | 0x1F | EIATTR_NEED_CNP_WRAPPER | Indexed | Kernel needs CUDA Nested Parallelism wrapper. |
| 32 | 0x20 | EIATTR_NEED_CNP_PATCH | Indexed | Kernel needs CNP patching at load time. |
| 33 | 0x21 | EIATTR_EXPLICIT_CACHING | Indexed | Explicit cache control directives present. |
| 34 | 0x22 | EIATTR_ISTYPEP_USED | Indexed | isspacep instruction used. |
| 36 | 0x24 | EIATTR_SUQ_USED | Indexed | Surface query instruction used. |
| 38 | 0x26 | EIATTR_LOAD_CACHE_REQUEST | Indexed | Load cache request configuration. |
| 43 | 0x2B | EIATTR_WMMA_USED | Indexed | Warp Matrix Multiply-Accumulate instructions used. |
| 44 | 0x2C | EIATTR_HAS_PRE_V10_OBJECT | Value | Object contains pre-CUDA 10 compiled code. |
| 51 | 0x33 | EIATTR_STATISTICS | Free | Compilation statistics (instruction counts, etc.). |
| 55 | 0x37 | EIATTR_CUDA_API_VERSION | Indexed | CUDA API version the kernel was compiled for. Value 0x83 = CUDA 13.1. |
| 60 | 0x3C | EIATTR_PER_REG_TARGET_PERF_STATS | Free | Per-register-target performance statistics. |
| 64 | 0x40 | EIATTR_INSTR_REG_MAP | Free | Instruction-to-register mapping for profiling. |
| 67 | 0x43 | EIATTR_UCODE_SECTION_DATA | Free | Microcode section data (internal). |
| 75 | 0x4B | EIATTR_TOOLS_PATCH_FUNC | Indexed | Function patching descriptor for CUDA tools. |
| 78 | 0x4E | EIATTR_PERF_STATISTICS | Free | Performance statistics for the profiler. |
| 85 | 0x55 | EIATTR_ANNOTATIONS | Free | General-purpose annotation data. |
| 88 | 0x58 | EIATTR_STUB_FUNCTION_KIND | Indexed | Stub function classification. |
| 93 | 0x5D | EIATTR_SYSCALLS_FALLBACK | Free | Syscall fallback mechanism offsets. |
| 94 | 0x5E | EIATTR_CUDA_REQ | Free | CUDA requirements descriptor. |
Graphics-Specific
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 72 | 0x48 | EIATTR_GRAPHICS_GLOBAL_CBANK | Indexed | Global constant bank for graphics shaders. |
| 73 | 0x49 | EIATTR_SHADER_TYPE | Indexed | Shader type (vertex, fragment, compute, etc.). |
| 74 | 0x4A | EIATTR_VRC_CTA_INIT_COUNT | Indexed | Virtual Register Count CTA init count. Processed in compute_entry_properties (case 0x4A) -- resolves owning section and tracks max value per entry function. |
Blackwell+ Features (sm_100+)
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 79 | 0x4F | EIATTR_AT_ENTRY_FRAGEMENTS | Free | Fragment descriptors at function entry (note: "FRAGEMENTS" is a typo in the binary; corrected variant EIATTR_AT_ENTRY_FRAGMENTS at 0x245E8D9). |
| 80 | 0x50 | EIATTR_SPARSE_MMA_MASK | Free | Sparsity mask for structured-sparse MMA operations. |
| 81 | 0x51 | EIATTR_TCGEN05_1CTA_USED | Indexed | tcgen05 (5th-gen tensor core) single-CTA mode used. |
| 82 | 0x52 | EIATTR_TCGEN05_2CTA_USED | Indexed | tcgen05 two-CTA mode used. |
| 83 | 0x53 | EIATTR_GEN_ERRBAR_AT_EXIT | Indexed | Generate error barrier at kernel exit. |
| 84 | 0x54 | EIATTR_REG_RECONFIG | Indexed | Dynamic register reconfiguration (setmaxnreg). |
| 92 | 0x5C | EIATTR_SANITIZE | Indexed | Address sanitizer instrumentation present. |
Mercury-Specific
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 90 | 0x5A | EIATTR_MERCURY_FINALIZER_OPTIONS | Free | Options for the Mercury FNLZR post-link pass. |
| 95 | 0x5F | EIATTR_MERCURY_ISA_VERSION | Sized | Mercury ISA version for the shader binary. |
Sentinel and Error
| Code | Hex | Name | Format | Description |
|---|---|---|---|---|
| 0 | 0x00 | EIATTR_ERROR | -- | Invalid/error sentinel. Used to "delete" records: the merge sets attr_code = 0 to suppress them. |
| 1 | 0x01 | EIATTR_PAD | -- | Padding record (ignored by parser). |
| 86 | 0x56 | EIATTR_UNKNOWN | -- | Unknown attribute placeholder. |
| 96 | 0x60 | EIATTR_ERROR_LAST | -- | Upper bound sentinel for the main enum range. |
How nvlink Processes .nv.info
Phase 1: Merge (Input Processing)
When merge_elf (sub_45E7D0) encounters a section with sh_type == 0x70000000 (line 1854 of decompiled code), it enters the .nv.info processing path. This path:
-
Resolves the target section: If
sh_link != 0, the symbol is looked up via the input symbol table, mapped to the output symbol index, and the per-function.nv.info.<name>section is created or found viasub_4504B0. -
Walks TLV records sequentially: The parser runs
while (ptr < end)whereend = section_data + sh_size. For each record, it readsformat = ptr[0],attr_code = ptr[1],size = *(uint16*)(ptr+2). -
Dispatches on format byte: If
format == 0x04(indexed), the payload is copied to arena memory and symbol indices within the payload are remapped. Ifformat != 0x04, the record header is stored directly (no payload copy needed for non-indexed records). -
Per-attribute symbol remapping (format 0x04 only): A switch on
attr_codedetermines how symbol indices within the payload are translated:
| Switch cases | Attribute codes | Remapping behavior |
|---|---|---|
| 2,6,7,8,9,18,19,20,23,38,69 | IMAGE_SLOT through KPARAM_INFO_V2 | Remap first uint32 (simple sym_map[payload[0]]) |
| 10 | PARAM_CBANK | Remap with lazy symbol creation (creates unnamed symbol if needed) |
| 15 | EXTERNS | Remap every uint32 in the payload array |
| 17,35,47,59 | FRAME_SIZE, MAX_STACK_SIZE, REGCOUNT, SAM_REGION_STACK_SIZE | Remap sym_idx, but skip record entirely if source symbol was deleted |
| 55 | CUDA_API_VERSION | Version compatibility validation (no remap, but error if version mismatch) |
| 79 | AT_ENTRY_FRAGMENTS | Fragment type analysis (scan payload for type 1/2 markers) |
| default | All others | Pass through without remapping |
- Weak symbol handling: For already-processed weak symbols, the entire
.nv.infosection is skipped with verbose message"weak %s already processed".
Phase 2: Weak Symbol Resolution
merge_weak_function (sub_45D180) strips resource attributes from the discarded weak definition. It walks the output's nv.info linked list at elfw+392 and for each node:
int64_t bitmask = 0x800800020000LL; // bits 17, 35, 47
for (node = list_head; node != NULL; node = node->next) {
if (node->secidx == target_nv_info_secidx) {
// Section matches the discarded weak -- zero the entire record
if (verbose) fprintf(stderr, "remove weak nvinfo\n");
} else {
uint8_t code = node->attr_code;
if (code <= 0x2F && _bittest64(&bitmask, code)
&& payload[0] == target_sym_idx) {
// Resource attribute for the discarded symbol -- zero it
if (verbose) fprintf(stderr, "remove weak frame_size\n");
} else {
continue; // keep this record
}
}
node->attr_code = 0; // overwrite to EIATTR_ERROR (suppressed)
}
The three codes matching bitmask 0x800800020000:
- Bit 17 =
EIATTR_FRAME_SIZE(0x11) - Bit 35 =
EIATTR_MAX_STACK_SIZE(0x23) - Bit 47 =
EIATTR_REGCOUNT(0x2F)
The rationale: when a weak function is replaced, the replacement's resource descriptors are the ones that matter. The discarded definition's REGCOUNT, FRAME_SIZE, and MAX_STACK_SIZE would be incorrect for the replacement and must not contaminate the output.
Phase 3: Pre-Finalization (Callgraph Processing)
sub_44C030 (the EIATTR serialization builder, called from sub_44DB00 in the pre-finalization fixup) processes the callgraph to prepare nv.info data:
-
Callgraph traversal: Iterates all callgraph entries starting at
elfw+408. For each function pair (caller, callee), resolves the callee's callgraph node and links it to the caller's callee list. -
Recursion detection: Uses a two-flag approach (bytes at offsets +48 and +49 in callgraph nodes). If recursion is detected, the function is marked as recursive (
"recursion at function %d"). Recursive functions cannot have statically-determined stack sizes. -
Callee list propagation: For each entry function, the transitive closure of callees is computed. Each callee's entry is added to the entry function's "reachable" list via
sub_4644C0.
Phase 4: Finalization (Output Generation)
compute_entry_properties (sub_451D80, 97,969 bytes -- the largest function in the linker) runs during the finalization phase. It operates on all nv.info records and computes derived properties for each kernel entry point:
4a. Symbol index fixup: For indexed records with codes 2, 6--10, 17--20, 23, 35, 38, 47, 59, 69 (the same set as in merge remapping), the function re-resolves symbol indices through the positive/negative symbol mapping tables (elfw+456/elfw+464). If a symbol was deleted but the record still references it, attr_code is set to 0 (suppressed).
4b. Register count propagation (sub_450ED0): Walks the callgraph and propagates the maximum register count from callees to each entry kernel.
for each entry_function in callgraph:
regcount = entry_function.regcount;
for each callee in transitive_callees(entry_function):
if callee.regcount > regcount:
/* Raise entry's count to match */
entry_function.regcount = callee.regcount;
if (verbose)
fprintf(stderr, "regcount %d for %s propagated to entry %s\n",
callee.regcount, callee.name, entry_function.name);
If no REGCOUNT record exists for an entry function (e.g., it was a leaf function with the count stored in section flags), the linker creates one via sub_450B70(elfw, 0x2F, 8, payload, 0, 0). The payload is [sym_idx:4][regcount:4].
4c. Barrier count creation and propagation: When a kernel's section flags contain a barrier count (bits 26:20 of sh_flags) but no EIATTR_NUM_BARRIERS record exists, the function creates one:
"Creating new EIATTR_NUM_BARRIERS and moving barcount %d
from section flags of %s to nvinfo for entry symbol %s"
The linker then clears the barrier bits from section flags (sh_flags &= 0xF80FFFFF) because the nv.info record is now the authoritative source.
If a callee's barrier count exceeds the entry kernel's:
"Propagating higher barcount %d to the section flags of %s of entry symbol %s"
"Propagating higher barcount %d to the section %s of entry symbol %s"
4d. EXTERNS and AT_ENTRY_FRAGMENTS creation: For entry functions, the finalization phase creates EIATTR_EXTERNS (0x0F) and EIATTR_AT_ENTRY_FRAGMENTS (0x4F) records for the transitive closure of external references and fragment descriptors.
4e. Encoding: nvinfo_encode (sub_468760, 14,322 bytes) serializes the in-memory linked-list nodes into on-disk TLV records using SSE2/AVX intrinsics for efficient bitfield packing. The encoder processes a descriptor array of 4 x uint32 entries per field, with the third uint32 serving as a type code that dispatches to one of ~15 encoding strategies (cases 0 through 0x12 in the encoder's switch).
Emission Subsystem (Embedded ptxas)
When nvlink performs LTO compilation, the embedded ptxas compiler generates .nv.info attributes through a dedicated emission subsystem at 0x15C5000--0x160FFFF. This subsystem is the single largest code region dedicated to .nv.info processing in the entire binary.
Architecture
The emission pipeline has three layers:
Layer 1: SM dispatch (sub_15C0CE0). A singleton initialization function registers per-SM callback tables for 12 architecture families (sm_75 through sm_121). Each SM gets an nv.info emitter callback looked up through map A8 via sub_15C3DB0. The callback creates a ~1,936-byte codegen state with architecture-specific constants at offsets 344 and 348 (compute capability encoding).
Layer 2: Master emitters (4 functions, 78--55 KB each). These are the top-level attribute-lowering functions that read compilation state and dispatch to per-attribute-type handlers:
| Address | Size | Identity | Specialty |
|---|---|---|---|
sub_15C4A70 | 23,547 B | emit_nv_info_section_type1 | Core attributes |
sub_15C58F0 | 78,811 B | emit_nv_info_section_type2 | Comprehensive lowering (largest) |
sub_15C8A80 | 40,921 B | emit_nv_info_section_type3 | Texture/surface references |
sub_15CA450 | 54,675 B | emit_nv_info_section_extended | Extended attributes (sm_90+) |
The master emitters use an FNV-1a hash table (offset basis 0x811C9DC5, prime 16,777,619) at object+488 for O(1) function-ID-to-attribute lookup using 24-byte entries.
Layer 3: Per-attribute handlers (~190 functions at 0x15CF070--0x160FFFF). Each function is 4--8 KB and handles exactly one EIATTR type. They follow a uniform template:
1. Read attribute descriptor pointer from a2
2. Read sub-attribute fields from known offsets (m128i-based, 32-byte descriptors)
3. Call sub_A4CBB0 to create attribute IR node
4. Call sub_A49120 to set EIATTR type code
5. Call sub_A49190/sub_A49140 for type validation
6. Write output via sub_4A3D60 (operand builder)
7. Return constructed attribute list
The uniformity across ~190 functions suggests they are generated from a data-driven table or macro expansion in NVIDIA's source code, one handler per EIATTR type.
Supporting Functions
| Address | Size | Identity | Role |
|---|---|---|---|
sub_15CCEE0 | 12,668 B | sort_and_merge_attribute_lists | Orders nv.info entries before emission |
sub_15CD6A0 | 25,822 B | merge_attribute_sections | Merges nv.info from multiple compilation units |
sub_15CE650 | 14,296 B | validate_and_emit_attributes | Validates attributes before final emission |
sub_1631350 | 44,830 B | process_kernel_attributes | Master kernel attribute processor (shared memory, regs, stack) |
sub_16312F0 | small | emit_reserved_smem_attributes | Emits .nv.reservedSmem.* attribute records |
sub_1655A60 | 30,332 B | lower_nv_info_to_codegen | Converts nv.info symbol references to codegen operands |
How .nv.info Drives GPU Resource Allocation
The .nv.info section is not just metadata for tools -- it is the primary input to the GPU driver's kernel launch resource allocator. The relationship is:
-
Register allocation:
EIATTR_REGCOUNTtells the driver how many registers each thread needs. The driver computes:max_warps_per_SM = total_registers / (regcount * warp_size). This is the single most important occupancy-determining attribute. -
Shared memory reservation:
EIATTR_SMEM_PARAM_SIZEandEIATTR_RESERVED_SMEM_0_SIZEdetermine how much shared memory to carve out before the kernel's dynamic shared memory allocation. -
Stack allocation:
EIATTR_CRS_STACK_SIZEandEIATTR_MAX_STACK_SIZEdetermine per-thread stack allocation. If the driver gets this wrong (too small), the kernel will corrupt memory; if too large, occupancy drops. -
Barrier reservation:
EIATTR_NUM_BARRIERSreserves named barrier slots. On most architectures the hardware supports 16 barriers per CTA. The driver must configure the barrier hardware before launch. -
Instruction patching: The offset tables (
EIATTR_EXIT_INSTR_OFFSETS,EIATTR_S2RCTAID_INSTR_OFFSETS,EIATTR_SW*_WAR) tell the driver which instruction words to patch. This enables hardware workarounds and CTA-ID remapping for cluster launch without recompilation. -
Cluster configuration:
EIATTR_CTA_PER_CLUSTERandEIATTR_EXPLICIT_CLUSTER(sm_90+) control the cluster launch hardware, determining how many CTAs share distributed shared memory.
Binary Artifacts
Typos Preserved in the Binary
| String in binary | Correct spelling | Location |
|---|---|---|
EIATTR_AT_ENTRY_FRAGEMENTS | EIATTR_AT_ENTRY_FRAGMENTS | 0x1D36E0F |
"Invalid section type in .nv.info section header" | (correct) | 0x2460218 |
A corrected variant EIATTR_AT_ENTRY_FRAGMENTS also exists at 0x245E8D9, suggesting awareness of the typo but preservation of the original for backward compatibility. Similarly, EIATTR_COROUTINE_RESUME_ID_OFFSETS at 0x245F010 is an alternate name for code 58 (EIATTR_COROUTINE_RESUME_OFFSETS), used in the embedded ptxas compiler.
Diagnostic Strings
"nvinfo <fmt=%d,attr=%d,size=%d>, secidx=%d" (sub_4478F0 debug dump)
"no new register count found for %s, checking .nv.info" (0x1D3BA68)
"no original register count found for %s, checking .nv.info" (0x1D3BAA0)
"regcount %d for %s propagated to entry %s" (0x1D3B070)
"no regcount?" (0x1D3AF6C)
"entry function '%s' with max regcount of %d calls function
'%s' with regcount of %d" (0x1D39930)
"Creating new EIATTR_NUM_BARRIERS and moving barcount %d
from section flags of %s to nvinfo for entry symbol %s" (0x1D3AF88)
"Propagating higher barcount %d to the section flags of
%s of entry symbol %s" (sub_450ED0)
"remove weak nvinfo" (sub_45D180 debug)
"remove weak frame_size" (sub_45D180 debug)
"weak %s already processed" (sub_45E7D0 merge)
"error while updating global nvinfo section" (embedded ptxas)
"Stack size for entry function '%s' cannot be
statically determined" (sub_451D80)
Cross-References
- NVIDIA Section Types --
SHT_CUDA_INFOdefinition and section flag encoding - Section Catalog --
.nv.infoand.nv.info.<funcname>catalog entries - Weak Symbol Handling -- Register count extraction from
.nv.infoduring weak resolution - Section Merging --
.nv.infosection merge with symbol index remapping - Finalization Phase --
compute_entry_propertiesandpropagate_register_counts - Architecture Dispatch -- Per-SM nv.info emitter callback registration
- Constant Banks --
EIATTR_PARAM_CBANKandEIATTR_CBANK_PARAM_SIZEinteraction
Sibling wikis (ptxas):
- ptxas: Section Catalog & EIATTR -- ptxas-side EIATTR emission pipeline and section type constants
- ptxas: EIATTR Reference -- EIATTR attribute code reference from the ptxas perspective
Confidence Assessment
| Claim | Confidence | Evidence |
|---|---|---|
| SHT_CUDA_INFO = 0x70000000 | HIGH | Verified in merge_elf dispatch (line 1854) and sub_4504B0 calls |
| 97 EIATTR codes (0-96) | HIGH | sub_42F760 rejects attr_code > 0x60 (96); all 97 names in nvlink_strings.json |
| Name table at VA 0x1D37D60 (16-byte entries) | HIGH | sub_42F760 accesses dword_1D37D68[4*a1] and off_1D37D60[2*a1] confirming 16-byte stride |
| Name table metadata: [min_version:4, usage_policy:4] | HIGH | sub_42F760 tests metadata[0] > toolkit_version and dispatches on metadata[1] (0=warn, 1=error, 2=drop) |
| TLV format: [format:1][attr:1][size:2][payload:var] | HIGH | Confirmed by diagnostic format string "nvinfo <fmt=%d,attr=%d,size=%d>, secidx=%d" and merge parser at sub_45E7D0:1900 |
| Internal node: [fmt:1][attr:1][size:2][secidx:4][payload_ptr:8] | HIGH | Confirmed by sub_4508F0 which sets all fields, and sub_4478F0 which prints them |
| Format bytes 0x01--0x04 meaning | HIGH | Verified in merge parser: format==4 triggers payload copy + remap; format!=4 skips payload |
| Weak symbol bitmask 0x800800020000 = bits 17,35,47 | HIGH | Verified: _bittest64(&0x800800020000, n) matches FRAME_SIZE(17), MAX_STACK_SIZE(35), REGCOUNT(47) |
| Merge per-attribute remap switch cases | HIGH | Cases 2,6,7,8,9,10,15,17,18,19,20,23,35,38,47,55,59,69,79 verified in sub_45E7D0:1917-1997 |
| sub_4504B0 creates .nv.info sections | HIGH | Decompiled: global path at line 46 (sh_flags=0), per-function at line 63 (sh_flags=0x40) |
| sub_450B70 creates indexed EIATTR nodes | HIGH | Decompiled: sets format=4, attr=a2, size=a3, calls sub_4504B0 and sub_4644C0 |
| sub_4508F0 creates any-format nodes | HIGH | Decompiled: sets format=a2, attr=a3, secidx=a4, 16-byte arena allocation |
| EIATTR_NUM_BARRIERS synthesized as format=0x02 | HIGH | *(_WORD *)v86 = 19458 (0x4C02) = format 0x02, attr 0x4C confirmed in sub_450ED0:207 |
| Barrier count from section flags bits 26:20 | HIGH | (*(_DWORD *)(v8 + 8) >> 20) & 0x7F extracts 7-bit barrier count; cleared with 0xF80FFFFF mask |
| Register count propagation via callgraph | HIGH | "regcount %d for %s propagated to entry %s" at sub_450ED0:417 |
| sub_42F760 validates attr_code <= 0x60 | HIGH | Decompiled: if (a1 > 0x60u) { error("unknown attribute"); return 0; } |
| Parser sub_44E8B0 is nv.info parser | LOW | sub_44E8B0 is actually a string tokenizer (handles quotes/brackets/escapes); the page's original claim was incorrect. The actual TLV parser is inline in sub_45E7D0 (merge) and sub_451D80 (finalize). |
| Encoder sub_468760 (14,322 bytes) | HIGH | Decompiled file exists; SSE2 constants and bitfield packing confirmed |
| Master emission sub_15C58F0 (78,811 bytes) | HIGH | Decompiled file exists |
| Four master emitter functions | HIGH | All four decompiled files exist |
| ~190 per-attribute handlers at 0x15CF070 | MEDIUM | Range confirmed by decompiled file enumeration; exact count is approximate |
| FNV-1a hash (offset basis 0x811C9DC5) in emitter | MEDIUM | Hash constant seen in decompiled code; specific usage context inferred from pattern matching |
| sub_44C030 is callgraph-based nv.info builder | HIGH | Decompiled: accesses callgraph at elfw+408, handles recursion detection, builds callee lists |
| CUDA API version 0x83 = CUDA 13.1 | MEDIUM | Value observed in cubin output; mapping to CUDA version is conventional |
| EIATTR_AT_ENTRY_FRAGEMENTS typo | HIGH | String at 0x1D36E0F confirmed in nvlink_strings.json with typo preserved |