Section Merging
Section merging is the process by which nvlink combines identically-named sections from multiple input cubin ELF objects into single output sections. This spans two pipeline phases: during the merge phase (sub_45E7D0), input sections are classified, mapped to output sections, and their data appended; during the layout phase (sub_439830), the merged sections receive final addresses and sizes via the section layout engine. This page covers the section-level mechanics -- how sections are found, created, mapped, populated, and how overlapping data is validated.
Overview
A conventional linker merges .text from N objects into one .text. A GPU device linker does something fundamentally different: CUDA sections represent distinct GPU memory spaces (global, shared, local, constant) with hardware-specific alignment and overlap rules. nvlink must:
- Classify each input section by its CUDA memory space, not just its ELF
sh_type. - Create output sections on demand, indexed by both name and type.
- Accumulate data contributions into linked lists (not flat buffers), deferring final layout.
- Validate that overlapping data regions are identical (multiple TUs may define the same global).
- Handle per-kernel sections that must be split by entry point (constant banks, local data, shared memory).
The section merging infrastructure is built on four core primitives -- find_section_by_name, section_create, section_data_copy, and section_layout_engine -- plus five specialized overlap-merge functions for each memory space.
Section Classification
During the merge phase, input sections are reclassified from standard ELF types into NVIDIA CUDA section types. The dispatch logic in merge_elf (sub_45E7D0) uses both the sh_type field and the section name prefix:
Input sh_type Section name prefix Output type
--------------------------------------------------------------
SHT_NOBITS (8) .nv.global SHT_CUDA_GLOBAL (0x70000007)
SHT_NOBITS (8) .nv.shared. SHT_CUDA_SHARED (0x7000000A)
SHT_NOBITS (8) .nv.shared.reserved. SHT_CUDA_SHARED_RESERVED (0x70000015)
SHT_NOBITS (8) .nv.local. SHT_CUDA_LOCAL (0x70000009)
SHT_PROGBITS (1) .nv.constant<N> SHT_CUDA_CONSTANT0 + N (0x70000064 + N)
SHT_PROGBITS (1) .nv.global.init SHT_CUDA_GLOBAL_INIT (0x70000008)
SHT_PROGBITS (1) (other) original sh_type
The constant bank number is parsed from the section name suffix: strtol(name + 12, NULL, 10). This means .nv.constant0 becomes type 0x70000064, .nv.constant2 becomes 0x70000066, and so on up to .nv.constant17 (0x70000075).
The reclassification is critical because the ELF types alone are ambiguous. A SHT_NOBITS section could be global data, shared memory, or local data -- only the name distinguishes them. After reclassification, the type code drives all subsequent dispatch.
Section Registry
nvlink maintains two hash tables on the elfw (ELF wrapper) object for section lookup:
| Offset | Hash table | Keyed by | Maps to |
|---|---|---|---|
+288 | Symbol name hash table | Symbol/section name string | Symbol index (positive for local, negative for global) |
+296 | Name-to-section table | Section name string | Section index in the section vector |
Both tables use the same hash/compare infrastructure (sub_4489C0 / sub_449A80 / sub_448E70). The name-to-section table is the primary lookup mechanism for section merging.
find_section_by_name (sub_4411D0)
// sub_4411D0 -- look up section index by name
// Returns: section index, or 0 if not found
uint32_t find_section_by_name(elfw *ctx, const char *name) {
uint32_t *entry = hash_lookup(ctx->section_name_table, name);
if (entry)
return *entry;
return 0;
}
This is a thin wrapper over sub_449A80 (hash table lookup). It returns the section index stored at elfw+296, or 0 for "not found." Section index 0 is reserved (the ELF null section), so 0 is an unambiguous sentinel.
get_section_header (sub_442270)
// sub_442270 -- retrieve section header record by index
// Returns: pointer to 104-byte section record
section_record *get_section_header(elfw *ctx, uint32_t section_idx) {
if (ctx->has_virtual_sections) { // ctx+82
uint32_t mapped = ctx->virtual_map[section_idx]; // ctx+472
if (mapped != 0) {
assert(ctx->reverse_map[mapped] == section_idx,
"secidx not virtual");
}
}
return vector_get(ctx->all_sections, section_idx); // ctx+360
}
The virtual section mapping (ctx+472) supports section index indirection for advanced linking modes. In normal operation, it is a direct index into the section vector at ctx+360.
Section Creation (sub_441AC0)
When find_section_by_name returns 0, a new output section must be created. This is handled by sub_441AC0, which allocates a 104-byte section record, registers it in the section vector and name table, and optionally creates an associated relocation section.
// sub_441AC0 -- create a new section in the output ELF
// a1: elfw context
// a2: section name (e.g., ".nv.global")
// a3: section type (e.g., 0x70000007 for SHT_CUDA_GLOBAL)
// a4: section flags (e.g., SHF_ALLOC | SHF_WRITE = 3)
// a5: sh_info value (typically 0)
// a6: sh_link value (parent section index for reloc sections)
// a7: sh_addralign
// a8: sh_entsize
// Returns: assigned section index
uint32_t section_create(elfw *ctx, const char *name, uint32_t sh_type,
uint64_t sh_flags, uint32_t sh_info,
uint32_t sh_link, uint64_t sh_addralign,
uint64_t sh_entsize);
Section Record Layout (104 bytes)
The 104-byte section record stores the output ELF section header fields plus internal linked list pointers:
| Offset | Size | Field |
|---|---|---|
| 0 | 4 | sh_type |
| 4 | 4 | sh_type_ext (extended type for dispatch) |
| 8 | 8 | sh_flags |
| 16 | 8 | reserved |
| 24 | 8 | sh_offset (assigned during write) |
| 32 | 8 | sh_size (total section size, updated during layout) |
| 40 | 4 | sh_info |
| 44 | 4 | sh_link |
| 48 | 8 | sh_addralign (maximum alignment across all contributions) |
| 56 | 8 | sh_entsize |
| 64 | 4 | section_index (assigned by creation order) |
| 68 | 4 | padding |
| 72 | 8 | symbol_list_head (linked list of data contributions) |
| 80 | 8 | symbol_list_tail (for O(1) append) |
| 88 | 8 | reserved |
| 96 | 8 | name_ptr (pointer to section name string) |
Creation Algorithm
-
Look up existing section by name in
ctx+296. If found, retrieve the existing section record and reuse its type/flags metadata. -
Allocate section record:
arena_alloc(104), zero-initialize viamemset. -
Guard check: If the callgraph has already been built (
ctx+81set) and the new section has executable flags (flags & 4), emit a fatal error:"adding function section after callgraph completed". This prevents section creation after DCE has already pruned the callgraph. -
Initialize fields: Copy
sh_type,sh_flags,sh_info,sh_link,sh_addralign,sh_entsizefrom the arguments. -
Assign section index: Increment the global section counter at
ctx+584and store it in the record at offset 64. -
Register in section vector: Call
sub_464C30to append the record to the all-sections vector atctx+360. -
Register in name table: If this is a new name (not found in
ctx+296), allocate a 12-byte name-index node viaarena_alloc(12), copy the name string into arena memory, and insert into the hash table atctx+296viasub_448E70. -
Create relocation section (conditional): If the link mode is not relocatable (
ctx+16 != 1), the DCE flag (ctx+83) is set, and the section type is in the constant bank range (0x70000064..0x7000007Eor0x70000006), the function recursively calls itself to create a.rela<name>or.rel<name>section of typeSHT_RELA(4) orSHT_REL(9), linked back to the parent section. -
Register in name-to-symbol table: Insert the section's assigned symbol index into the hash table at
ctx+288via the samesub_448E70call.
Section Index Overflow Handling
Section indices are stored as 16-bit values in ELF symbol table entries (st_shndx). When the section count exceeds 0xFEFF (65279), nvlink switches to extended section index tables:
if (section_index > 0xFEFF && section_index != 65522) {
record->st_shndx = SHN_XINDEX; // 0xFFFF
if (ctx->xindex_table == NULL) {
ctx->xindex_table = vector_create(0x10000); // ctx+592
ctx->xindex_reverse = vector_create(0x10000); // ctx+600
}
vector_set(ctx->xindex_table, sym_index, section_index);
}
This follows the ELF SHN_XINDEX convention, storing the real index in a separate .symtab_shndx-like vector.
Data Copy Primitive (sub_433760)
Once an output section exists, input data is appended via sub_433760. This function does not copy bytes into a flat buffer; instead, it creates a 40-byte data node that records the source pointer, alignment, offset, and size. The actual byte-copy happens later during ELF serialization.
// sub_433760 -- append data contribution to a section
// a1: elfw context
// a2: section index (output)
// a3: source symbol index (in input, for tracking provenance)
// a4: alignment requirement
// a5: data size (bytes)
void section_data_copy(elfw *ctx, uint32_t section_idx,
uint32_t source_sym, uint32_t alignment,
uint64_t data_size);
Data Node Layout (40 bytes)
| Offset | Size | Field |
|---|---|---|
| 0 | 8 | source_sym -- source symbol index for provenance tracking |
| 8 | 8 | offset -- byte offset within the merged section |
| 16 | 8 | alignment -- alignment this contribution requires |
| 24 | 8 | data_ptr -- pointer to source data (from input ELF) |
| 32 | 8 | reserved (zeroed) |
Copy Algorithm
section = get_section_header(ctx, section_idx)
if section is NULL: return // silently skip unmapped sections
if alignment > section.sh_addralign:
section.sh_addralign = alignment // track maximum alignment
node = arena_alloc(40)
node.source_sym = source_sym
node.offset = 0
node.alignment = alignment
node.data_ptr = data_ptr
// Compute aligned insertion point
current_size = section.sh_size // offset +32 in section record
remainder = current_size % alignment
if remainder != 0:
current_size = current_size + alignment - remainder
node.offset = current_size
section.sh_size = current_size + data_size
// Append to section's linked list via tail pointer
if section.symbol_list_head == NULL:
list_prepend(node, §ion.symbol_list_head)
section.symbol_list_tail = section.symbol_list_head
else:
assert(section.symbol_list_tail != NULL,
"tail data node not found")
list_insert_after(node, section.symbol_list_tail)
section.symbol_list_tail = node
The linked-list approach means that during the merge phase, the section accumulates an ordered list of data contributions without ever copying bytes. Each node records where the data comes from and at what offset it will land. This is efficient for the common case where hundreds of input objects contribute small fragments to the same section.
Section Layout Engine (sub_4325A0)
After all input objects have been merged, the layout phase calls the section layout engine to assign final offsets to every data contribution in a section. This function sorts the contributions by alignment and computes a packed layout.
// sub_4325A0 -- lay out all symbols within a section
// a1: elfw context
// a2: pointer to section header record
// a3: initial offset (usually 0, but can be nonzero for shared memory)
// Returns: total section size
uint32_t section_layout_engine(elfw *ctx, section_record *section,
uint32_t initial_offset);
Layout Algorithm
assert(section != NULL, "section not found")
// Sort symbol list by alignment (descending) unless in extended-smem mode
if not ctx->extended_smem_mode or not arch_supports(section.sh_type):
list_sort(section.symbol_list, alignment_comparator)
current = initial_offset
for each symbol_node in section.symbol_list:
sym_record = get_sym_record(ctx, symbol_node.sym_index)
alignment = symbol_node.alignment
if alignment > 0:
// Explicit alignment: round up to alignment boundary
if current % alignment != 0:
current = current + alignment - (current % alignment)
elif symbol_node.size > 0:
// No explicit alignment: use natural alignment, capped at 8
natural_align = min(symbol_node.size, 8)
if current % natural_align != 0:
current = current + natural_align - (current % natural_align)
else:
// Zero alignment AND zero size: only valid in no-opt mode
assert(ctx->no_opt_mode,
"should only reach here with no opt")
// Assign offset to both the symbol record and the section-local record
sym_record.value = current
symbol_node.value = current
// verbose: "variable %s at offset %d"
current += symbol_node.size
section.total_size = current
return current
The sorting step is important: by placing the highest-alignment items first, the function minimizes internal fragmentation from alignment padding. The sort is performed by sub_4647D0 with comparator sub_432440.
The extended_smem_mode flag (ctx+100) disables sorting for shared memory sections when the architecture supports extended shared memory. In that mode, the order from the input objects is preserved.
Overlapping Data Merge
A GPU linker must handle a case that rarely appears in CPU linking: multiple translation units can define the same global variable with the same initializer data. This is common in CUDA because __device__ variables at file scope have external linkage by default.
nvlink handles this through five specialized overlap-merge functions, one per memory space:
| Address | Function | Memory space | Size |
|---|---|---|---|
0x432B10 | merge_overlapping_global | .nv.global | 11,683 B |
0x437E20 | merge_overlapping_local | .nv.local.* | 11,580 B |
0x4343C0 | merge_overlapping_constant | .nv.constant* | 11,838 B |
0x434BC0 | merge_overlapping_variant1 | (per-entry constant) | 11,147 B |
0x435390 | merge_overlapping_variant2 | (per-entry data) | 11,156 B |
All five follow the same algorithm, differing only in the section record fields they read and the error messages they emit:
Overlap Merge Algorithm (sub_432B10)
// sub_432B10 -- validate and merge overlapping data
// a1: elfw context
// a2: output section index
// a3: input symbol index
// a4: source data pointer (from input ELF)
// a5: insertion offset (-1 means "auto-compute")
// a6: alignment
// n: data size
int merge_overlapping_global(elfw *ctx, uint32_t section_idx,
uint32_t sym_idx, void *data,
uint64_t offset, uint32_t alignment,
size_t size);
The algorithm walks the section's existing data-node linked list and checks whether the new contribution overlaps with any existing one:
for each existing_node in section.symbol_list:
if new_range overlaps existing_range:
overlap_start = max(new_offset, existing_offset)
overlap_end = min(new_offset + new_size, existing_offset + existing_size)
overlap_size = overlap_end - overlap_start
// Validate: overlapping bytes must be identical
if memcmp(new_data + (overlap_start - new_offset),
existing_data + (overlap_start - existing_offset),
overlap_size) != 0:
fatal("overlapping non-identical data")
if overlap spans entire new contribution:
return // nothing new to add, existing data subsumes it
if overlap spans too much:
fatal("overlapping data spans too much")
The key invariant is that overlapping regions must contain byte-for-byte identical data. If they do not, the linker emits a fatal error via sub_467460. This catches the case where two TUs define the same __device__ variable with different initializers -- a programming error that would cause silent data corruption on the GPU.
Diagnostic strings from this family of functions:
"offset %lld goes past section %d size"-- bounds check failure"offset %lld (sym %d) overlaps in section %d"-- overlap detected (info, not error)"overlapping non-identical data"-- fatal: data mismatch in overlap region"overlapping data spans too much"-- fatal: overlap exceeds expected bounds"local data should have offset"-- (local variant only) missing offset attribute
Constant Bank Merge (sub_438640)
Constant bank sections (.nv.constant0 through .nv.constant17) have special merge logic because they can be either global (shared across all kernels) or per-entry (specific to one kernel entry point). The function sub_438640 handles both cases.
// sub_438640 -- merge data into a constant bank section
// a1: elfw context
// a2: source section pointer
// a3: symbol binding (1=GLOBAL, other=per-entry)
// a4: symbol index
// a5: data offset within source section
// a6: alignment
// n: data size
// s: source data pointer
// a9: constant bank type (0x70000064 + bank_number)
// a10: entry function section index (0 for global constants)
For per-entry constants (a10 != 0), the function constructs a per-entry section name:
sprintf(name_buf, "%s.%s", bank_type_name, entry_function_name);
// e.g., ".nv.constant0.my_kernel"
It then calls find_section_by_name on the composite name. If the section does not exist, it creates it via section_create and registers it in the per-entry constant list at ctx+272. Data is then merged via the overlap merge function.
The function validates:
- Per-entry data must not have GLOBAL binding:
"entry data cannot be GLOBAL" - Per-entry data must have an explicit offset:
"entry data should have offset" - The section type must be a valid constant bank:
"bank SHT not CUDA_CONSTANT_?"
Symbol Offset Updates
When data is merged into an output section, the input symbol's value (address) must be updated to reflect its position in the merged section. This happens through sub_440430, which updates the symbol-to-section mapping:
// sub_440430 -- update symbol's section association
// a1: elfw context
// a2: symbol record pointer
// a3: new section index
void update_symbol_section(elfw *ctx, symbol_record *sym,
uint32_t new_section_idx);
If the section index exceeds 0xFEFF, the function uses the extended section index table at ctx+592/ctx+600. Otherwise, it writes the section index directly into the symbol's st_shndx field.
For deleted symbols (from dead code elimination or weak resolution), the function checks the symbol mapping tables at ctx+456/ctx+464 and emits "reference to deleted symbol" if a relocation still points to a removed symbol.
Global Data Pending-Merge List
Global variables (.nv.global) have a special accumulation path. During the merge phase, globals are not immediately placed into the output section. Instead, they are appended to a pending-merge list at ctx+448. This defers the actual merge until the layout phase, which processes them in Phase 1:
// Layout Phase 1 (in sub_439830):
if ctx->pending_globals is not NULL:
section = find_or_create(".nv.global", SHT_CUDA_GLOBAL)
// walk to tail of existing symbol list
tail = section.symbol_list_head
while tail.next: tail = tail.next
section.symbol_list_tail = tail
// drain pending list
for each pending in ctx->pending_globals:
sym_record = get_sym_record(ctx, pending.sym_index)
update_symbol_section(ctx, sym_record, section_index)
section_data_copy(ctx, section_index, 0,
sym_record.alignment, sym_record.data_size)
This deferred approach allows the linker to collect all global variable definitions before computing their final layout, which is necessary for overlap validation.
Per-Entry Section Naming
CUDA kernels have per-kernel sections for shared memory, local data, and constant banks. These follow a naming convention:
<base_section_name>.<entry_function_name>
Examples:
.nv.shared.my_kernel-- shared memory for kernelmy_kernel.nv.local.my_kernel-- local memory for kernelmy_kernel.nv.constant0.my_kernel-- constant bank 0 for kernelmy_kernel
The section name is constructed via sprintf("%s.%s", base_name, entry_name). Both sub_438640 (constant bank merge) and the layout phase (sub_439830, Phase 9a) use this convention. The per-entry sections are tracked in linked lists on the elfw object:
| Offset | List | Contents |
|---|---|---|
+256 | Per-entry shared sections | .nv.shared.<kernel> section indices |
+264 | Extern shared sections | __shared__ extern placeholder section indices |
+272 | Per-entry constant sections | .nv.constant<N>.<kernel> section indices |
+280 | Per-entry local sections | .nv.local.<kernel> section indices |
Duplicate Detection
The merge phase tracks duplicate definitions through two mechanisms:
-
Symbol-level duplicates: When
merge_elfencounters a global symbol that already exists in the output, it checks binding and size. Strong-strong conflicts produce a multiple-definition error. Common symbols (section indexSHN_COMMON,0xFFF2) resolve to the larger size. The verbose trace prints"increase size of common %s"or"global.init replaces common for %s". -
Data-level duplicates: The overlap merge functions (
sub_432B10family) detect when two input objects contribute data to the same byte range within a section. Identical data is silently merged (no diagnostic); non-identical data is a fatal error.
The constant deduplication engine (sub_4339A0, called during layout Phase 9) goes further: it finds identical 32-bit and 64-bit constant values across different symbols and aliases them to save space. Verbose output: "found duplicate value 0x%x, alias %s to %s" and "found duplicate 64bit value 0x%llx, alias %s to %s".
Section Types Reference
Complete list of NVIDIA CUDA section types used during merging:
| Type (hex) | Type (decimal) | Name | Memory space |
|---|---|---|---|
0x70000000 | 1879048192 | SHT_CUDA_INFO | Metadata (.nv.info) |
0x70000001 | 1879048193 | SHT_CUDA_CALLGRAPH | Callgraph edges |
0x70000002 | 1879048194 | SHT_CUDA_CALLGRAPH_INFO | Callgraph per-function info |
0x70000004 | 1879048196 | SHT_CUDA_FUNCDATA | Debug function data |
0x70000006 | 1879048198 | SHT_CUDA_CONSTANT | Constant (base type) |
0x70000007 | 1879048199 | SHT_CUDA_GLOBAL | Global data (.nv.global) |
0x70000008 | 1879048200 | SHT_CUDA_GLOBAL_INIT | Global init data (.nv.global.init) |
0x70000009 | 1879048201 | SHT_CUDA_LOCAL | Local data (.nv.local.*) |
0x7000000A | 1879048202 | SHT_CUDA_SHARED | Shared memory (.nv.shared.*) |
0x7000000B | 1879048203 | SHT_CUDA_RELOCINFO | Relocation action table (.nv.rel.action) |
0x7000000E | 1879048206 | SHT_CUDA_UFT | Unified Function Table (.nv.uft) |
0x70000011 | 1879048209 | SHT_CUDA_UFT_ENTRY | UFT entry metadata (.nv.uft.entry) |
0x70000012 | 1879048210 | SHT_CUDA_UDT | Unified Descriptor Table (.nv.udt) |
0x70000014 | 1879048212 | SHT_CUDA_UDT_ENTRY | UDT entry metadata (.nv.udt.entry) |
0x70000015 | 1879048213 | SHT_CUDA_SHARED_RESERVED | Reserved shared memory |
0x70000064 | 1879048292 | SHT_CUDA_CONSTANT0 | Constant bank 0 |
0x70000065-0x70000075 | 1879048293-1879048309 | SHT_CUDA_CONSTANT1-17 | Constant banks 1-17 |
0x70000086 | 1879048326 | SHT_CUDA_COMPAT | Compatibility attributes |
Section Type Dispatch Table (sub_45E3C0)
The section type classifier (sub_45E3C0, called from merge_elf at sub_45E7D0) is the central dispatch point that maps every input section to its output NVIDIA CUDA type. The function takes the input section's sh_type, section name, and flags, then reclassifies it for the output ELF. The full dispatch logic, reconstructed from decompilation:
Name-Based Reclassification (NOBITS Sections)
When the input sh_type is SHT_NOBITS (8), the section carries no data in the ELF file but reserves address space. The classifier checks the section name to determine which GPU memory space it belongs to:
Input sh_type = SHT_NOBITS (8):
if name starts with ".nv.global" -> SHT_CUDA_GLOBAL (0x70000007)
else if name starts with ".nv.shared." -> SHT_CUDA_SHARED (0x7000000A)
else if name starts with ".nv.shared.reserved."
-> SHT_CUDA_SHARED_RESERVED (0x70000015)
else if name starts with ".nv.local." -> SHT_CUDA_LOCAL (0x70000009)
else -> remains SHT_NOBITS (8)
The comparison lengths are exact: .nv.global uses memcmp(name, ".nv.global", 10), .nv.shared. uses 11 bytes (note trailing dot), .nv.shared.reserved. uses 20 bytes, and .nv.local. uses 10 bytes. The ordering matters: .nv.shared.reserved. is checked after .nv.shared., so a .nv.shared.reserved.foo section first matches .nv.shared. and gets type 0x7000000A, not 0x70000015. However, this is the order in both the classifier function and in merge_elf itself -- the .nv.shared.reserved. check only fires if the .nv.shared. check fails, meaning a section named exactly .nv.shared.reserved.foo would match .nv.shared. first (11-byte prefix match). The reserved variant only matches section names that do NOT begin with .nv.shared. at the 11-byte level but DO begin with .nv.shared.reserved. at the 20-byte level -- which is impossible. In merge_elf, the same sequence appears at lines 975-993 of the decompilation, confirming this is the actual production logic in the binary. In practice, .nv.shared.reserved. names are constructed by ptxas to always match the longer prefix in contexts where the 11-byte check is skipped or where the section already has type 0x70000015 from the input cubin.
Name-Based Reclassification (PROGBITS Sections)
When the input sh_type is SHT_PROGBITS (1), the section contains initialized data. Two name prefixes trigger reclassification:
Input sh_type = SHT_PROGBITS (1):
if name starts with ".nv.constant" -> SHT_CUDA_CONSTANT0 + strtol(name+12, NULL, 10)
(0x70000064 + bank_number)
else if name starts with ".nv.global.init" -> SHT_CUDA_GLOBAL_INIT (0x70000008)
else -> remains SHT_PROGBITS (1)
The constant bank number extraction strtol(name + 12, NULL, 10) parses the decimal digit(s) immediately after .nv.constant. This produces:
| Section name | Bank number | Output type |
|---|---|---|
.nv.constant0 | 0 | 0x70000064 |
.nv.constant1 | 1 | 0x70000065 |
.nv.constant2 | 2 | 0x70000066 |
| ... | ... | ... |
.nv.constant10 | 10 | 0x7000006E |
.nv.constant17 | 17 | 0x70000075 |
Per-entry constant sections like .nv.constant0.my_kernel also match -- strtol("0.my_kernel", NULL, 10) correctly returns 0 because strtol stops at the first non-numeric character.
Pre-classified CUDA Types
When the input sh_type is already SHT_CUDA_CONSTANT (0x70000006), the section is a constant bank whose bank number must be extracted from the name. This path shares the same strtol(name + 12, ...) logic:
Input sh_type = SHT_CUDA_CONSTANT (0x70000006):
-> SHT_CUDA_CONSTANT0 + strtol(name+12, NULL, 10)
(0x70000064 + bank_number)
This handles the case where an input cubin already uses the generic SHT_CUDA_CONSTANT type rather than a specific bank type.
SHT_NOTE Pass-Through
When the input sh_type is SHT_NOTE (7) and the section flags include 0x1000000 (a CUDA-specific flag indicating a parameter bank), the classifier handles it as a special case. It does not reclassify the type; instead, it updates the parameter bank size field on the note section's associated entry point:
Input sh_type = SHT_NOTE (7) with flags & 0x1000000:
entry_section = get_section_header(ctx, ctx->param_bank_section) // ctx+208
sym_record = get_sym_record(ctx, entry_section)
sym_record.param_size = ctx->param_bank_size // ctx+134
return ctx->param_bank_section
This path exits early without creating a new output section. The parameter bank note is consumed during merge and its size is recorded on the entry point symbol, not emitted as a separate section.
All Other Types
Any sh_type value that does not match the above cases passes through unchanged. This covers:
SHT_PROGBITS(1) sections without.nv.constantor.nv.global.initprefix (e.g.,.text.*function sections)- Standard ELF types like
SHT_SYMTAB(2),SHT_STRTAB(3) - Already-classified CUDA types in the
0x70000064-0x7000007Erange
Complete Dispatch Summary
sub_45E3C0 dispatch table:
+-------------------------------+----------------------------+---------------------------+
| Input sh_type | Section name prefix | Output sh_type |
+-------------------------------+----------------------------+---------------------------+
| SHT_NOBITS (8) | .nv.global | 0x70000007 (CUDA_GLOBAL) |
| SHT_NOBITS (8) | .nv.shared. | 0x7000000A (CUDA_SHARED) |
| SHT_NOBITS (8) | .nv.shared.reserved. | 0x70000015 (CUDA_SH_RSVD) |
| SHT_NOBITS (8) | .nv.local. | 0x70000009 (CUDA_LOCAL) |
| SHT_NOBITS (8) | (other) | 8 (unchanged) |
| SHT_PROGBITS (1) | .nv.constant<N> | 0x70000064+N (CONSTANT_N) |
| SHT_PROGBITS (1) | .nv.global.init | 0x70000008 (GLOBAL_INIT) |
| SHT_PROGBITS (1) | (other) | 1 (unchanged) |
| SHT_CUDA_CONSTANT (0x70000006)| .nv.constant<N> | 0x70000064+N (CONSTANT_N) |
| SHT_NOTE (7) | (flags & 0x1000000) | (early return, no output) |
| (any other) | (any) | (unchanged) |
+-------------------------------+----------------------------+---------------------------+
Section Type Filter Bitmasks
After reclassification, merge_elf uses bitmask-based filters to decide which sections enter each processing pass. Two bitmasks appear in the decompiled code:
Second-Pass Section Filter (0x5D05)
The second pass of merge_elf (the section-header loop at sub_45E7D0 lines 1547-1670) iterates over section headers and decides which ones require output section creation via sub_45E3C0. The filter at line 1600-1602:
LOBYTE(v148) = 0;
if ((unsigned int)(sh_type - 0x70000006) <= 0xE) // types 0x70000006..0x70000014
v148 = (0x5D05uLL >> ((uint8_t)sh_type - 6)) & 1;
if ((unsigned int)(sh_type - 0x70000064) > 0x1A && !v148)
goto skip; // section not processed in this pass
The bitmask 0x5D05 = 0101 1101 0000 0101 in binary, indexed by (low_byte(sh_type) - 6). Decoding each bit position:
| Bit | sh_type | Name | In bitmask? | Meaning |
|---|---|---|---|---|
| 0 | 0x70000006 | SHT_CUDA_CONSTANT | 1 (accepted) | Generic constant: needs bank resolution |
| 1 | 0x70000007 | SHT_CUDA_GLOBAL | 0 (skipped) | Globals handled via pending-merge list |
| 2 | 0x70000008 | SHT_CUDA_GLOBAL_INIT | 1 (accepted) | Initialized globals: overlap-merge path |
| 3 | 0x70000009 | SHT_CUDA_LOCAL | 0 (skipped) | Locals handled in first pass |
| 4 | 0x7000000A | SHT_CUDA_SHARED | 0 (skipped) | Shared memory handled separately |
| 5 | 0x7000000B | (unused) | 0 | — |
| 6 | 0x7000000C | (unused) | 0 | — |
| 7 | 0x7000000D | (unused) | 0 | — |
| 8 | 0x7000000E | SHT_CUDA_UFT | 1 (accepted) | Unified Function Table (.nv.uft) |
| 9 | 0x7000000F | (unused) | 0 | — |
| 10 | 0x70000010 | (unused in v13.0) | 1 (accepted) | Bit set but no known section type at this value |
| 11 | 0x70000011 | SHT_CUDA_UFT_ENTRY | 1 (accepted) | UFT entry metadata (.nv.uft.entry) |
| 12 | 0x70000012 | SHT_CUDA_UDT | 1 (accepted) | Unified Descriptor Table (.nv.udt) |
| 13 | 0x70000013 | (unused) | 0 | — |
| 14 | 0x70000014 | SHT_CUDA_UDT_ENTRY | 1 (accepted) | UDT entry metadata (.nv.udt.entry) |
Additionally, any type in the constant bank range 0x70000064 through 0x7000007E (26 values) is unconditionally accepted regardless of the bitmask. This covers SHT_CUDA_CONSTANT0 through SHT_CUDA_CONSTANT17 plus 8 reserved constant bank slots.
Sections that pass this filter enter the sub_45E3C0 call that creates output sections and maps input-to-output section indices.
The types explicitly skipped by this filter (SHT_CUDA_GLOBAL, SHT_CUDA_LOCAL, SHT_CUDA_SHARED) are not ignored -- they are handled by separate dedicated code paths in the first symbol-iteration pass and the post-merge layout phases.
NOBITS Data Suppression Mask (0x400D)
Inside sub_45E3C0, after a section is reclassified and looked up/created, the function decides whether the section should carry data into the overlap-merge function. The bitmask 0x400D at line 150-151 controls this:
LOBYTE(v28) = (sh_type != 8); // start with "has data" = not NOBITS
if ((unsigned int)(sh_type - 0x70000007) <= 0xE)
v28 &= ~(0x400DuLL >> ((uint8_t)sh_type - 7));
0x400D = 0100 0000 0000 1101 in binary, indexed by (low_byte(sh_type) - 7):
| Bit | sh_type | Name | In bitmask? | Effect |
|---|---|---|---|---|
| 0 | 0x70000007 | SHT_CUDA_GLOBAL | 1 | Data pointer suppressed (NOBITS semantics) |
| 1 | 0x70000008 | SHT_CUDA_GLOBAL_INIT | 0 | Data pointer preserved (PROGBITS) |
| 2 | 0x70000009 | SHT_CUDA_LOCAL | 1 | Data pointer suppressed (NOBITS) |
| 3 | 0x7000000A | SHT_CUDA_SHARED | 1 | Data pointer suppressed (NOBITS) |
| 4-13 | 0x7000000B-0x70000014 | (various) | 0 | Data pointer preserved |
| 14 | 0x70000015 | SHT_CUDA_SHARED_RESERVED | 1 | Data pointer suppressed (NOBITS) |
When a bit is set, the data pointer passed to merge_overlapping_global (sub_432B10) is forced to NULL. This means the section participates in the overlap-merge algorithm for offset/size tracking but no byte-level data comparison occurs. This is correct because these section types represent uninitialized GPU memory reservations -- their ELF representation has no data payload.
The four NOBITS-semantics types (GLOBAL, LOCAL, SHARED, SHARED_RESERVED) all originated from SHT_NOBITS (8) input sections. After reclassification, they carry a CUDA-specific type code, but the bitmask preserves their original NOBITS behavior during merge.
Per-Type Merge Handling
Each section type follows a different merge path through the two passes of merge_elf. The following table documents the complete handling for every type:
Pass 1: Symbol-Level Merge (Symbol Iteration)
The first pass iterates over symbols from the input object and dispatches based on the reclassified section type:
| Reclassified type | Merge handler | Data path | Notes |
|---|---|---|---|
SHT_PROGBITS (1) | sub_45E3C0 -> sub_432B10 | Overlap merge with data | Text/code sections; creates output via find_section_by_name + section_create |
SHT_NOBITS (8, unreclassified) | sub_45E3C0 -> sub_432B10 | Overlap merge, no data | Rare: NOBITS sections that didn't match any .nv.* prefix |
SHT_CUDA_GLOBAL (0x70000007) | sub_45E3C0 -> sub_432B10 | Overlap merge, NULL data | Size reservation only; actual data deferred to pending-merge list |
SHT_CUDA_GLOBAL_INIT (0x70000008) | sub_45E3C0 -> sub_432B10 | Overlap merge with data | Initialized global data has a real data pointer |
SHT_CUDA_LOCAL (0x70000009) | sub_45E3C0 -> sub_432B10 | Overlap merge, NULL data | Per-entry local: size reservation |
SHT_CUDA_SHARED (0x7000000A) | sub_45E3C0 -> sub_432B10 | Overlap merge, NULL data | Per-entry shared memory reservation |
SHT_CUDA_SHARED_RESERVED (0x70000015) | sub_45E3C0 -> sub_432B10 | Overlap merge, NULL data | tcgen05/reserved shared memory reservation |
SHT_CUDA_CONSTANT0-17 (0x70000064-0x75) | sub_438640 (merge_constant_bank) | Per-entry or global constant merge | Dispatches to per-entry (sub_434BC0/sub_435390) or global (sub_4343C0) overlap merge |
SHT_CUDA_CONSTANT (0x70000006) | Reclassified to CONSTANT_N, then as above | — | Never seen as a final type in pass 1 |
Weak function (STB_WEAK, type matches param bank) | sub_438640 | Per-entry constant merge | Weak entries use the constant bank path with entry function section index |
Common symbol (SHN_COMMON = 0xFFF2) | sub_440740 -> pending list at ctx+448 | Deferred to layout | Common symbols accumulate on the pending-globals list |
Defined global (STB_GLOBAL, section != 0) | sub_45CD30 -> sub_440740 | Section creation + symbol mapping | Creates output section, maps symbol, records offset |
Shared memory with entry (STB_LOCAL, .nv.shared.) | sub_437BB0 or sub_4379A0 | Per-entry shared memory merge | Dispatches based on *(v19+57) (extended smem mode flag) |
Pass 2: Section-Header Merge (Section Iteration)
The second pass iterates over section headers from the input object. Only section types passing the 0x5D05 bitmask filter or in the constant bank range are processed:
| sh_type | Handler | Processing |
|---|---|---|
SHT_CUDA_UFT_ENTRY (0x70000011) | sub_45CF00 | UFT entry metadata merge; creates output section of type 0x70000011, passes is_rela=1 flag |
SHT_CUDA_UDT_ENTRY (0x70000014) | sub_45CF00 | UDT entry metadata merge; creates output of type 0x70000014, passes is_rela=0 flag |
SHT_CUDA_CONSTANT0-17 (0x70000064-0x7E) | sub_45E3C0 | Constant bank sections not yet mapped; creates output section and registers mapping |
SHT_CUDA_CONSTANT (0x70000006) | sub_45E3C0 | Generic constant: reclassified to specific bank via strtol |
SHT_CUDA_GLOBAL_INIT (0x70000008) | sub_45E3C0 | Initialized global data sections not yet mapped |
SHT_CUDA_UFT (0x7000000E) | sub_45E3C0 | Unified Function Table (.nv.uft) |
| (0x70000010) | sub_45E3C0 | Bitmask-accepted but no known section type at this value in v13.0 |
SHT_CUDA_UDT (0x70000012) | sub_45E3C0 | Unified Descriptor Table (.nv.udt) |
SHT_CUDA_SHARED (0x7000000A) | Direct sub_45E3C0 | Only when extended smem mode active (*(v19+57)) AND not in relocatable mode (!*(ctx+80)) |
SHT_NOTE (7) | Direct sub_45E3C0 | Only when link mode is not relocatable AND DCE flag (*(ctx+101)) is set; output type forced to 0x70000000 (CUDA_INFO) |
Pass 3: Post-Section Merge (Special Types)
After both symbol and section passes, three special section types are processed inline in merge_elf:
| sh_type | Handler | Processing |
|---|---|---|
SHT_CUDA_INFO (0x70000000) | Inline in merge_elf | Per-entry .nv.info sections: EIATTR records are parsed, symbol indices remapped, and data nodes copied to per-function info lists at ctx+480. Per-entry sections require resolving the entry function's section mapping. Global .nv.info (sh_info=0) is processed directly. |
SHT_CUDA_CALLGRAPH (0x70000001) | Inline in merge_elf | Callgraph edge table: entries are 8-byte (caller_sym, callee_sym) pairs. Each entry's symbol indices are remapped through the symbol mapping table. Dead (deleted) symbols are skipped. Three edge types are dispatched: type -1 (sub_44B9F0), type -2 (sub_44BA60), type -3 (sub_44BAA0), type -4 (sub_44BF90). |
SHT_CUDA_CALLGRAPH_INFO (0x70000002) | Inline in merge_elf | Per-function callgraph metadata: symbol indices are remapped and validated. A mismatch between the function's declared info and the current input triggers a fatal error via sub_467460. |
SHT_CUDA_FUNCDATA (0x70000004) | Inline in merge_elf | Debug function data records: variable-length records containing (sym_index, name_offset, count, data[count*2]). Each record is copied into an arena allocation and appended to the funcdata list at ctx+480. Attribute type 3 entries have their name offsets resolved via sub_43D690. |
SHT_CUDA_COMPAT (0x70000086) | Inline in merge_elf | Compatibility attribute section: a byte stream of (type, data) pairs. Parsed sequentially with a switch on attribute type (0-7). Dispatches to sub_451920 for ISA class (type 2), address size (type 3), and other compatibility checks. Validates architecture compatibility. Enforces that .nv.compat data is present via fatal error if missing. |
Relocation Section Merge
Relocation sections (SHT_RELA=4 and SHT_REL=9) are handled through a dedicated path in the second pass. When merge_elf encounters a section with sh_type 4 or 9, it:
- Reads the relocation section's
sh_infofield to find the target section it applies to - Loads the target section's header to determine the target's type
- Validates that the target type is in the expected range using the same
0x5D05bitmask and constant bank range check. If the target type is unexpected, it emits"unexpected reloc section" - Maps the target section through
sub_45E3C0if not already mapped - For each relocation entry, remaps the symbol index through the input-to-output symbol mapping table
- Skips relocations targeting deleted (dead) symbols, with special handling for
.debug_line,.nv_debug_line_sass, and.debug_framesections (these are silently skipped for weak symbols already processed)
The relocation merge also detects weak-entry-related relocations targeting .nv.constant sections and removes them when the weak entry has been resolved: "remove weak reloc for %s".
Reserved Shared Memory Symbols
Sections reclassified to SHT_CUDA_SHARED with STB_LOCAL binding and specific symbol names trigger reserved shared memory tracking:
| Symbol name | SMEM partition type |
|---|---|
__nv_reservedSMEM_tcgen05_partition | 2 (tcgen05) |
__nv_reservedSMEM_allocation_phase | 1 (general) |
__nv_reservedSMEM_allocation_mask | 1 (general) |
__nv_reservedSMEM_tmem_allocation_pipeline_mbarrier | 1 (general) |
__nv_reservedSMEM_tmem_allocation_pipeline_mbarrier_parity | 1 (general) |
The partition type is stored at ctx+664. If different input objects disagree on the partition type, merge_elf emits a fatal error (diagnostic descriptor at unk_2A5B8C0). These symbols are Blackwell-era (sm_100+) reserved shared memory regions for tensor core generation 05 operations.
Function Map
| Address | Name | Size | Role |
|---|---|---|---|
0x4411D0 | find_section_by_name | ~200 B | Hash table lookup: name -> section index |
0x4411B0 | find_symbol_by_name | ~200 B | Hash table lookup: name -> symbol index |
0x441AC0 | section_create | 9,523 B | Allocate section record, register in tables, optionally create reloc section |
0x433760 | section_data_copy | ~600 B | Create 40-byte data node, compute aligned offset, append to section list |
0x4325A0 | section_layout_engine | ~1,400 B | Sort symbols by alignment, assign offsets with padding |
0x440430 | update_symbol_section | ~500 B | Update symbol's section index (with xindex overflow handling) |
0x442270 | get_section_header | ~300 B | Retrieve section record by index (with virtual mapping) |
0x440590 | get_sym_record | ~300 B | Retrieve symbol record by index |
0x440350 | get_section_of_sym | ~200 B | Get section index that a symbol belongs to |
0x432B10 | merge_overlapping_global | 11,683 B | Overlap-validate and merge data in .nv.global |
0x437E20 | merge_overlapping_local | 11,580 B | Overlap-validate and merge data in .nv.local.* |
0x4343C0 | merge_overlapping_constant | 11,838 B | Overlap-validate and merge data in .nv.constant* |
0x434BC0 | merge_overlapping_variant1 | 11,147 B | Overlap merge for per-entry constant data |
0x435390 | merge_overlapping_variant2 | 11,156 B | Overlap merge for per-entry data variant |
0x435B60 | merge_overlapping_host | 11,014 B | Overlap merge with .nv.host reference |
0x438640 | merge_constant_bank | 4,043 B | Merge data into constant bank (global or per-entry) |
0x4339A0 | constant_dedup | 13,199 B | Find duplicate 32/64-bit constants, alias symbols |
0x45E3C0 | section_type_classify | ~700 B | Reclassify input sh_type by name prefix, create/lookup output section |
0x45CF00 | merge_reloc_section | ~1,500 B | Merge relocation section (CUDA_RELOCINFO / RELOCINFO_EXT) |
0x437BB0 | merge_shared_entry | ~700 B | Per-entry shared memory merge (extended smem mode) |
0x4379A0 | merge_shared_local | ~800 B | Per-entry shared/local memory merge (standard mode) |
0x440740 | add_global_symbol | ~1,200 B | Create global symbol record, add to pending list or section |
0x45CD30 | create_entry_section | ~600 B | Create output section for a defined entry-point symbol |
0x449A80 | hash_lookup | ~500 B | Generic hash table lookup |
0x448E70 | hash_insert | ~500 B | Generic hash table insert |
Error Conditions
| Error string | Function | Condition |
|---|---|---|
"section not found" | sub_4325A0 | Section pointer is NULL when layout is attempted |
"should only reach here with no opt" | sub_4325A0 | Zero-alignment zero-size symbol in optimized mode |
"tail data node not found" | sub_433760 | Section has no tail pointer but has a non-empty list |
"adding function section after callgraph completed" | sub_441AC0 | Executable section created after DCE |
"secidx not virtual" | sub_442270 | Virtual section mapping inconsistency |
"reference to deleted symbol" | sub_440430 | Symbol was removed but relocation still targets it |
"offset %lld goes past section %d size" | sub_432B10 | Data contribution exceeds section bounds |
"overlapping non-identical data" | sub_432B10 | Two overlapping regions have different byte content |
"overlapping data spans too much" | sub_432B10 | Overlap exceeds the contribution size |
"entry data cannot be GLOBAL" | sub_438640 | Per-entry constant has GLOBAL binding |
"entry data should have offset" | sub_438640 | Per-entry constant missing explicit offset |
"bank SHT not CUDA_CONSTANT_?" | sub_438640 | Section type not in constant bank range |
"unexpected reloc section" | sub_45E7D0 | Relocation targets a section type not in valid range |
"section not mapped" | sub_45E7D0 | .nv.info references a section with no output mapping |
"duplicate param bank on weak entry %s" | sub_45E3C0 | Weak entry's param bank already exists (verbose only) |
"weak %s already processed" | sub_45E7D0 | Weak entry/section already merged from prior object (verbose only) |
"skip mercury section %i" | sub_45E7D0 | Mercury section skipped during merge (verbose only) |
"remove weak reloc for %s" | sub_45E7D0 | Weak relocation to constant section removed (verbose only) |
"unknown .nv.compat attribute (%x) encoutered." | sub_45E7D0 | Unrecognized compat attribute type (verbose warning) |
Confidence Assessment
| Claim | Confidence | Evidence |
|---|---|---|
sub_4411D0 at 0x4411D0 looks up section name at elfw+296 | HIGH | Decompiled: sub_449A80(*(_QWORD *)(a1 + 296), a2) returns *v2 or 0 |
sub_442270 at 0x442270 retrieves section by index from elfw+360 | HIGH | Decompiled file exists; virtual mapping check at ctx+472 confirmed in decompiled sub_440350 pattern |
sub_4325A0 at 0x4325A0 asserts "section not found" when a2 is NULL | HIGH | Decompiled line 26: if (!a2) sub_467460(&unk_2A5B990, "section not found", ...) |
sub_4325A0 sorts symbol list via sub_4647D0 with comparator sub_432440 | HIGH | Decompiled line 30: sub_4647D0(a2 + 72, sub_432440) |
sub_4325A0 uses natural alignment capped at 8 when explicit alignment is 0 | HIGH | Decompiled lines 63-68: v24 = 8; if (v23 <= 8) v24 = *(_QWORD *)(v10 + 24); |
sub_4325A0 asserts "should only reach here with no opt" via elfw+90 check | HIGH | Decompiled line 76-77: if (!*(_BYTE *)(a1 + 90)) then "should only reach here with no opt" string; string at addr 0x1d38758 in nvlink_strings.json |
sub_4325A0 emits "variable %s at offset %d" when verbose | HIGH | Decompiled line 55: fprintf(stderr, "variable %s at offset %d\n", ...) with (*(_BYTE *)(a1 + 64) & 2) gate; string at addr 0x1d38739 in nvlink_strings.json |
sub_4325A0 extended_smem_mode flag at elfw+100 | HIGH | Decompiled line 27: if (!*(_BYTE *)(a1 + 100) || ...) controls sort bypass |
sub_433760 at 0x433760 allocates 40-byte data node | HIGH | Decompiled: sub_4307C0(v10, 40) confirmed |
sub_433760 updates section size at section+32 | HIGH | Decompiled line 43: v9[4] = v16 + a5; (QWORD index 4 = offset 32) |
sub_433760 asserts "tail data node not found" | HIGH | Decompiled line 53: string literal confirmed in code and at addr 0x1d38839 |
sub_45E3C0 at 0x45E3C0 classifies NOBITS by name prefix | HIGH | Decompiled: memcmp(v18, ".nv.global", 0xA), memcmp(v18, ".nv.shared.", 0xB), memcmp(v18, ".nv.shared.reserved.", 0x14), memcmp(v18, ".nv.local.", 0xA) with hex types 1879048199/202/213/201 |
sub_45E3C0 constant bank: strtol(name+12, 0, 10) + 1879048292 | HIGH | Decompiled line 102: strtol(v18 + 12, 0, 10) + 1879048292 (1879048292 = 0x70000064) |
sub_45E3C0 PROGBITS .nv.global.init -> type 1879048200 (0x70000008) | HIGH | Decompiled line 93-94: memcmp(v18, ".nv.global.init", 0xF) then v13 = 1879048200 |
sub_45E3C0 SHT_NOTE (7) with flags & 0x1000000 early return | HIGH | Decompiled line 105: if (v13 == 7 && (v49 & 0x1000000) != 0) |
Section type filter bitmask 0x5D05 in sub_45E7D0 | HIGH | Decompiled: (0x5D05uLL >> ((unsigned __int8)v145 - 6)) & 1 at line 1601 |
NOBITS data suppression bitmask 0x400D in sub_45E3C0 | HIGH | Decompiled line 151: v28 &= ~(0x400DuLL >> ((unsigned __int8)v13 - 7)); |
sub_432B10 at 0x432B10 (merge_overlapping_global) 11,683 bytes | HIGH | Decompiled file sub_432B10_0x432b10.c exists at correct address |
sub_4343C0 at 0x4343C0 (merge_overlapping_constant) exists | HIGH | Decompiled file confirms function at address; 7 parameters matching wiki signature |
sub_438640 at 0x438640 (merge_constant_bank) exists | HIGH | Decompiled file confirms function at address with 10 parameters |
"overlapping non-identical data" string | HIGH | String at addr 0x1d387d8 in nvlink_strings.json |
"overlapping data spans too much" string | HIGH | String at addr 0x1d387f8 in nvlink_strings.json |
"offset %lld goes past section %d size" string | HIGH | String at addr 0x1d38780 in nvlink_strings.json |
"adding function section after callgraph completed" string | HIGH | String at addr 0x1d39ec8 in nvlink_strings.json |
"entry data cannot be GLOBAL" string | HIGH | String at addr 0x1d38a9f in nvlink_strings.json |
"bank SHT not CUDA_CONSTANT_?" string | HIGH | String at addr 0x1d38950 in nvlink_strings.json |
"unexpected reloc section" string | HIGH | String at addr 0x1d3bcd0 in nvlink_strings.json |
"local data should have offset" string | HIGH | String at addr 0x1d38ade in nvlink_strings.json |
"unknown .nv.compat attribute (%x) encoutered" string | HIGH | String at addr 0x1d3b1b8 in nvlink_strings.json (note: "encoutered" typo is in the original binary) |
__nv_reservedSMEM_tcgen05_partition string | HIGH | String at addr 0x1d3bd08 in nvlink_strings.json |
__nv_reservedSMEM_allocation_phase string | HIGH | String at addr 0x1d3bd30 in nvlink_strings.json |
| Section record 104 bytes with fields at documented offsets | MEDIUM | Offset reads at +32 (size), +48 (addralign), +72 (symbol_list) confirmed across sub_4325A0 and sub_433760; complete layout reconstructed from multiple functions |
| Data node 40 bytes with data_ptr at +0, alignment at +16, size at +24, sym_index at +32 | MEDIUM | Offsets confirmed in sub_433870 and sub_4325A0 decompiled code; +16 (alignment), +24 (data_size), +32 (sym_index) verified |
| Per-entry section lists at elfw+256/+264/+272/+280 | MEDIUM | Inferred from sub_438640 and sub_439830 parameter passing patterns; not all offsets individually verified |
| Five overlap-merge functions follow identical algorithm with different error messages | MEDIUM | Two confirmed (sub_432B10, sub_4343C0) share same structure; remaining three (sub_437E20, sub_434BC0, sub_435390) inferred from file size similarity and address range |
Cross-References
- Merge Phase -- the per-object merge loop that drives section merging
- Layout Phase -- post-merge address assignment using the section layout engine
- Symbol Resolution -- how global/weak/local symbols are resolved during merge
- Weak Symbols -- weak function selection policy (register count, PTX version)
- Data Layout Optimization -- constant deduplication and overlap optimization
- Dead Code Elimination -- callgraph-based section removal before layout