Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

NVIDIA Debug Extensions

nvlink recognizes six proprietary debug sections that extend beyond the standard DWARF sections carried in CUDA device ELF objects. These sections provide SASS-level line information, register allocation debug data, register type annotations, embedded PTX source text, PTX-level debug information, and shared memory debug metadata. The linker processes these sections through a dedicated classification chain (sub_1CED7C0 at 0x1CED7C0), a section-to-offset dispatcher (sub_1CEDD50 at 0x1CEDD50), and three concatenation-based output writers (sub_181B050, sub_181B160, sub_181B270). During Mercury linking, each NVIDIA debug section acquires a .nv.merc. prefix and is dispatched through a parallel code path (sub_1CF1690 at 0x1CF1690).

For standard DWARF section processing, see DWARF Processing. For line table merging details, see Line Table Merging. For Mercury-specific debug sections, see Mercury Debug Sections.

Key Facts

PropertyValue
Section classifiersub_1CED7C0 at 0x1CED7C0 (315 lines)
Section-to-offset mappersub_1CEDD50 at 0x1CEDD50 (148 lines)
.debug_frame writersub_181B050 at 0x181B050 (60 lines)
.nv_debug_info_reg_sass writersub_181B160 at 0x181B160 (60 lines)
.nv_debug_info_reg_type writersub_181B270 at 0x181B270 (60 lines)
Mercury section dispatchersub_1CF1690 at 0x1CF1690 (545 lines)
Section filter (skip predicate)sub_1CECBB0 at 0x1CECBB0 (131 lines)
Debug prefix matchersub_1672F50 at 0x1672F50 (uses .nv_debug_ and .debug_ prefixes)
PTX-level debug info parsersub_1D1D2F0 at 0x1D1D2F0 (handles .nv_debug_info_ptx)
Total NVIDIA-specific sections6

NVIDIA Debug Section Catalog

Section NameDescriptionWriter FunctionMercury Output Name
.nv_debug_line_sassSASS-level line number mappings(line table pipeline).nv.merc.nv_debug_line_sass
.nv_debug_info_reg_sassRegister allocation debug datasub_181B160.nv.merc.nv_debug_info_reg_sass
.nv_debug_info_reg_typeRegister type annotationssub_181B270.nv.merc.nv_debug_info_reg_type
.nv_debug_ptx_txtEmbedded PTX source text(prefix-matched, opaque passthrough).nv.merc.nv_debug_ptx_txt
.nv_debug_info_ptxPTX-level debug information(DWARF parser at sub_1D1D2F0)--
.nv_debug.sharedShared memory debug metadata(filter predicate only)--

Section Descriptions

.nv_debug_line_sass

SASS-level line number mappings that associate machine instruction addresses with source locations. Unlike the standard .debug_line section which maps PTX-level source positions, this section records line information at the SASS (native GPU ISA) level. The line table pipeline (documented in Line Table Merging) produces this section when the builder index a3 > 0, using the same DWARF line program encoding as .debug_line but with SASS instruction addresses as the program counter values.

In the section-to-offset mapper (sub_1CEDD50), .nv_debug_line_sass maps to context offset +88, adjacent to .debug_line at +80. The Mercury dispatcher (sub_1CF1690) recognizes both the bare name and the .nv.merc.nv_debug_line_sass prefixed variant, assigning both to the same slot at +88.

.nv_debug_info_reg_sass

Register allocation debug information that records which hardware registers are assigned to which variables or temporaries at each program point. This section is emitted by the SASS-level code generator (ptxas) and carried through the linker as an opaque data blob. The linker concatenates per-CU fragments from multiple input objects via sub_181B160.

In the section-to-offset mapper, .nv_debug_info_reg_sass maps to context offset +96. The writer reads from a linked list at struct offset +408, accumulates total size from +424, and writes the concatenated result to the buffer at +416.

.nv_debug_info_reg_type

Register type debug information that annotates each register with its data type (integer, float, predicate, etc.) and bit width. This section complements .nv_debug_info_reg_sass by providing type classification rather than allocation location. The linker concatenates per-CU fragments via sub_181B270.

In the section-to-offset mapper, .nv_debug_info_reg_type maps to context offset +104. The writer reads from a linked list at struct offset +432, accumulates total size from +448, and writes the concatenated result to the buffer at +440.

.nv_debug_ptx_txt

Embedded PTX source text carried verbatim through the linker. This section contains the raw PTX assembly text of the compilation unit, enabling debuggers to display PTX source alongside SASS disassembly. The section classifier (sub_1CED7C0) uses the prefix-matching function sub_44E3A0 rather than exact strcmp/memcmp to recognize this section, testing whether the section name starts with .nv_debug_ptx_txt. This is the only NVIDIA debug section matched by prefix rather than exact name in the classifier.

The section is passed through as opaque data -- there is no dedicated writer or parser for the content. The Mercury variant .nv.merc.nv_debug_ptx_txt is similarly prefix-matched in sub_1CED0E0 during output emission.

.nv_debug_info_ptx

PTX-level debug information encoded in a DWARF-like compilation unit format. Unlike the opaque passthrough sections, this section is actively parsed by the DWARF subsystem at sub_1D1D2F0. The parser processes .nv_debug_info_ptx through the same compilation unit loop as .debug_info, reading DWARF headers (length, version, abbreviation offset, pointer size) and dispatching to sub_1D1BE80 for attribute processing.

At sub_1D1D2F0 line 348--362, the parser checks:

if (memcmp(section_name, ".debug_info", 12) == 0)
    goto process_cu;
if (memcmp(section_name, ".nv_debug_info_ptx", 19) == 0)
    goto process_cu;

This means .nv_debug_info_ptx uses the same DWARF abbreviation/attribute/form encoding as standard .debug_info, but contains PTX-level scope, variable, and type information rather than source-level debug data.

This section has a single xref at 0x1D1D6B3 in sub_1D1D2F0 and does not appear in the Mercury namespace -- it is consumed during linking and its information is folded into the standard debug sections.

.nv_debug.shared

Shared memory debug metadata, used during section filtering to exclude shared-memory sections from certain link phases. The section filter predicate sub_1CECBB0 at 0x1CECBB0 checks for this name (via exact strcmp at line 72) and returns 0 (skip) when encountered. This prevents shared memory debug sections from being treated as relocatable content during the merge phase.

The string .nv_debug.shared is referenced from three functions: sub_4377B0 (at 0x437946), sub_437BB0 (at 0x437D76), and sub_1CECBB0 (at 0x1CECC3E). The first two are in the ELF section classifier subsystem, while the third is the section filter.

Section Content Format

This section documents the on-disk byte layout, producer, and consumer for each of the six NVIDIA debug sections. The information is derived from the decompiled classifier (sub_1CED7C0), the section-to-offset mapper (sub_1CEDD50), the three concatenation writers (sub_181B050 / sub_181B160 / sub_181B270), and the DWARF CU parser (sub_1D1D2F0). All producer information refers to the upstream toolchain (ptxas); nvlink is always the consumer.

Note on scope: the six sections catalogued below are the complete set of NVIDIA-proprietary debug sections that nvlink v13.0.88 recognises by name. There are no .debug_cuda_line, .debug_cuda_ranges, or .nv.debug.cuda_version sections in this binary — those names do not appear in nvlink_strings.json and are not referenced by any decompiled function. The .nv.uft family (at strings 0x12422, 0x12449, 0x12595) is the Unified Function Table mechanism for cross-module function pointer resolution, not a debug section, and is documented separately in UFT: Unified Function Table.

.nv_debug_line_sass

PropertyValue
Section name.nv_debug_line_sass
PurposeSASS-level line number program mapping native GPU ISA PCs to source locations
Section typeSHT_PROGBITS (1)
Binary layoutDWARF-2/3 .debug_line line-number program: header (unit_length, version, header_length, minimum_instruction_length, default_is_stmt, line_base, line_range, opcode_base, standard_opcode_lengths, include_directories, file_names) followed by opcode stream. Uses NVIDIA-extended opcodes documented in Line Table Merging; the address register holds a SASS PC instead of a host address
Record granularityVariable-length opcodes: DW_LNS_* (standard, 1-byte opcode + LEB128 operands), DW_LNE_* (extended, 0 + LEB128 length + opcode byte + operands), NVIDIA-specific extended opcodes for function index, inline stack, and correlation ID
Producerptxas debug info emitter — emitted only when PTX was compiled with --generate-line-info and a SASS-level builder index a3 > 0 is active during merge
Consumernvlink line-table merge pipeline — recognised by classifier sub_1CED7C0 (strcmp at decompiled line 296), slot-mapped by sub_1CEDD50 to context offset +88 (decompiled line 78, memcmp(..., ".nv_debug_line_sass", 0x14u)), then merged by the same DWARF line-program merger used for .debug_line
Concatenation writerNone; line tables are merged (re-encoded), not concatenated. See Line Table Merging for the merger function addresses
Mercury variant.nv.merc.nv_debug_line_sass at string 0x2458431, dispatched to the same slot +88 by sub_1CF1690 at decompiled line 490
DWARF relationshipParallel to .debug_line: identical opcode encoding but distinct address space (SASS PCs instead of source PCs). Both occupy adjacent slots (+80 vs +88) in the context structure
String table.nv.merc.nv_debug_line_sass at 0x2458431; bare name appears as a suffix of this string, matched by memcmp(..., 20) in the classifier (length 20 = strlen + NUL)

.nv_debug_info_reg_sass

PropertyValue
Section name.nv_debug_info_reg_sass
PurposeRegister allocation map: records which SASS hardware register holds which PTX virtual register or source variable at each program point
Section typeSHT_PROGBITS (1)
Binary layoutOpaque concatenation-friendly byte stream. nvlink does not interpret the contents; it only concatenates per-CU fragments byte-for-byte. ptxas-internal format, typically a sequence of (reg_id, pc_begin, pc_end, location_expr) records, but nvlink never parses individual records
Record granularityNot known to nvlink — the entire per-CU blob is treated as a flat byte array with a single 32-bit length field at fragment-node offset +8
Producerptxas register allocator after final assignment pass — one blob per compilation unit carried through the cubin into nvlink
Consumernvlink concatenation writer sub_181B160 at 0x181B160. Recognised by classifier sub_1CED7C0 via strcmp at decompiled line 246 (if (!strcmp(v18, ".nv_debug_info_reg_sass")) return 1;). Slot-mapped by sub_1CEDD50 to context offset +96 via strcmp at decompiled line 134
Concatenation writersub_181B160 at 0x181B160 (60 lines). Reads linked-list head at context offset +408, total size at +424, writes output buffer pointer to +416, emits section name literal ".nv_debug_info_reg_sass" to sub_434BC0 (decompiled line 58)
Mercury variant.nv.merc.nv_debug_info_reg_sass at 0x2458450, dispatched to slot +96 by sub_1CF1690 at decompiled line 441
DWARF relationshipOrthogonal to standard DWARF: no DW_AT attribute or DIE references into this section. Consumers (debuggers) cross-correlate via SASS PC ranges that also appear in .debug_info's DW_AT_low_pc / DW_AT_high_pc attributes
String table.nv_debug_info_reg_sass at 0x241282C

.nv_debug_info_reg_type

PropertyValue
Section name.nv_debug_info_reg_type
PurposeRegister type annotations: records the data type (integer, float, predicate) and bit width (8/16/32/64/128) for each SASS hardware register at each program point
Section typeSHT_PROGBITS (1)
Binary layoutOpaque concatenation-friendly byte stream, same treatment as .nv_debug_info_reg_sass. ptxas-internal record format, likely (reg_id, type_code, bit_width, pc_begin, pc_end) tuples, but nvlink does not parse individual records
Record granularityNot known to nvlink — treated as a flat byte array per CU
Producerptxas register allocator — emitted in parallel with .nv_debug_info_reg_sass, with matching record counts
Consumernvlink concatenation writer sub_181B270 at 0x181B270. Recognised by classifier sub_1CED7C0 via strcmp at decompiled line 254. Slot-mapped by sub_1CEDD50 to context offset +104 via strcmp at decompiled line 145
Concatenation writersub_181B270 at 0x181B270 (60 lines, byte-for-byte structural clone of sub_181B160). Reads linked-list head at context offset +432, total size at +448, writes output buffer pointer to +440, emits section name literal ".nv_debug_info_reg_type" to sub_434BC0 (decompiled line 58)
Mercury variant.nv.merc.nv_debug_info_reg_type at 0x2458470, dispatched to slot +104 by sub_1CF1690 at decompiled line 420
DWARF relationshipOrthogonal to standard DWARF. Paired with .nv_debug_info_reg_sass — one provides the where (which hardware register), the other the what (type signature)
String table.nv_debug_info_reg_type at 0x2412844

.nv_debug_ptx_txt

PropertyValue
Section name.nv_debug_ptx_txt (prefix-matched; accepts any section whose name begins with this string)
PurposeEmbedded PTX source text carried verbatim, enabling debuggers to display PTX assembly alongside SASS disassembly
Section typeSHT_PROGBITS (1)
Binary layoutRaw, uncompressed 7-bit ASCII PTX source code with Unix-style (LF) line endings. No framing, no length prefix, no NUL terminator — the section size equals the PTX text length in bytes
Record granularityByte stream (not record-oriented). Each .nv_debug_ptx_txt* section contains exactly one compilation unit's PTX text
Producerptxas front-end — copies the PTX input stream into the output cubin as an opaque blob before parsing
Consumernvlink classifier sub_1CED7C0 recognises this section via sub_44E3A0(".nv_debug_ptx_txt", section_name) at decompiled line 279 — the only prefix match in the entire classifier. This means section names like .nv_debug_ptx_txt.main, .nv_debug_ptx_txt_v2, or .nv_debug_ptx_txt.CU42 all classify as debug sections
Concatenation writerNone. Passed through as opaque data by the generic section emitter — there is no dedicated slot in the context structure, no entry in the section-to-offset mapper (sub_1CEDD50), and no NVIDIA-specific merge logic
Mercury variant.nv.merc.nv_debug_ptx_txt at 0x2458403, recognised as a prefix match by sub_1CED0E0 during Mercury output emission
DWARF relationshipParallel to .debug_str: both carry pure text data that debuggers reference by offset. Unlike .debug_str, .nv_debug_ptx_txt is not pointed to by any DWARF form value — debuggers locate it by section name
String tableLiteral .nv_debug_ptx_txt is embedded as a suffix inside .nv.merc.nv_debug_ptx_txt at 0x2458403; the classifier references the string through an offset computation rather than a standalone entry
Prefix rationaleThe prefix match allows ptxas to emit multiple .nv_debug_ptx_txt.<suffix> variants (e.g., per-entry-function PTX) without nvlink requiring an update. All variants share the same passthrough behaviour

.nv_debug_info_ptx

PropertyValue
Section name.nv_debug_info_ptx
PurposePTX-level debug information encoded as a DWARF compilation unit. Complements .debug_info (which targets source-level debugging) with PTX-level scopes, virtual registers, and types
Section typeSHT_PROGBITS (1)
Binary layoutDWARF-2/3 compilation unit header + DIE tree: 4-byte unit_length, 2-byte version, 4-byte debug_abbrev_offset (indexes into .debug_abbrev), 1-byte address_size, followed by a tree of Debugging Information Entries encoded via the referenced abbreviation table. Identical encoding to .debug_info — only the semantic content (PTX-scoped rather than source-scoped) differs
Record granularityDWARF compilation unit header (11 bytes for DWARF-3) followed by a DIE tree terminated by a null DIE
Producerptxas debug info emitter — emitted only when the PTX input included .file/.loc directives and debug mode is enabled. One CU per input PTX compilation unit
Consumernvlink DWARF CU parser sub_1D1D2F0 at 0x1D1D2F0. The parser dispatches on section name at decompiled line 348 (memcmp(a4, ".debug_info", 0xCu) == 0) and line 352--361 (12-byte unrolled comparison against ".nv_debug_info_ptx", length 19). Both branches fall into LABEL_63 which invokes sub_1D1BE80 for DIE processing. This is the only NVIDIA debug section that nvlink actively parses (rather than concatenates or passes through)
CU header processingsub_1D1D2F0 reads the 4-byte unit length at offset +192/+200, the 2-byte version at +204, the 4-byte abbreviation offset at +212, the 1-byte pointer size at +208, and stores the CU base pointer at context +168. The abbreviation offset is matched against a cached abbreviation table list at context +128 (each entry is 32 bytes) to locate the appropriate abbreviation decoder
Concatenation writerNone — parsed content is folded into the output's standard debug sections during DIE emission
Mercury variantNone — this section is consumed during linking. No .nv.merc.nv_debug_info_ptx string exists in nvlink_strings.json
DWARF relationshipParasitic on .debug_abbrev: its debug_abbrev_offset field indexes into the same .debug_abbrev section used by .debug_info. ptxas emits abbreviations that serve both CU types. Missing .debug_abbrev causes the linker to skip parsing (diagnostic at string 0x292878: "skipping .debug_info section due to missing .debug_abbrev section")
String table.nv_debug_info_ptx at 0x245E6D4, single xref at 0x1D1D6B3 inside sub_1D1D2F0

.nv_debug.shared

PropertyValue
Section name.nv_debug.shared
PurposeShared memory debug metadata — marker section that informs the filter/classifier subsystem about shared-memory debug layout. Carries no linkable content from nvlink's perspective
Section typeSHT_PROGBITS (8) — v4 == 8 gate in sub_1CECBB0 at decompiled line 35
Binary layoutNot parsed by nvlink. The section is recognised by name but always excluded from the output. Its content is presumed to be a ptxas-internal record of per-entry-function shared memory layout (offsets, sizes, variable names), intended for debuggers that read the input cubin directly rather than the nvlink output
Record granularityNot known to nvlink (excluded before content inspection)
Producerptxas shared-memory layout pass — emitted for entry functions that declare .shared variables when debug info is enabled
ConsumerExclusion, not consumption. sub_1CECBB0 at decompiled line 72 explicitly matches strcmp(name, ".nv_debug.shared") == 0 and returns 0 (skip this section). The section is therefore stripped from the output unconditionally
Concatenation writerNone — the section never reaches the writer phase
Mercury variantNone — no .nv.merc.nv_debug.shared string exists
DWARF relationshipOrthogonal. No DWARF form value references this section. It is a pure metadata marker that is filtered out during link
String table.nv_debug.shared at 0x1D38995, with three xrefs: sub_4377B0 (0x437946), sub_437BB0 (0x437D76), sub_1CECBB0 (0x1CECC3E)
Filtering evidenceDecompiled sub_1CECBB0 line 72: if (!strcmp((const char *)sub_448590(v7, (unsigned int *)v6), ".nv_debug.shared")) return 0;. The return 0 path bypasses both the debug-section path (line 114--124) and the generic content path

Summary Matrix

SectionPurposeLayoutProducer (ptxas role)Consumer in nvlinkWriter addr
.nv_debug_line_sassSASS line programDWARF line-number opcodesdebug info emitterline table merger(merged, not written)
.nv_debug_info_reg_sassRegister allocationOpaque per-CU blobregister allocatorsub_181B1600x181B160
.nv_debug_info_reg_typeRegister type annotationsOpaque per-CU blobregister allocatorsub_181B2700x181B270
.nv_debug_ptx_txtEmbedded PTX textRaw ASCIIfront-end passthroughgeneric emitter (opaque)(passthrough)
.nv_debug_info_ptxPTX-level DWARF DIE treeDWARF-3 CU + DIEsdebug info emittersub_1D1D2F0 (DWARF parser)(parsed into .debug_info)
.nv_debug.sharedShared memory metadataNot parsedshared-mem layout passsub_1CECBB0 (excluded)(stripped)

Writer Context-Offset Layout

The three active concatenation writers (sub_181B050 for .debug_frame, plus the two NVIDIA-specific writers) share a common 24-byte stride in the context structure. Each writer slot contains three fields:

slot_base +  0:  QWORD  list_head       // linked list of fragment nodes
slot_base +  8:  QWORD  output_buffer   // allocated during writer run
slot_base + 16:  DWORD  total_size      // sum of fragment sizes
slot_base + 20:  DWORD  (padding or reserved)

Concrete slot bases in the linker context:

WriterList headOutput bufferTotal sizeDelta from prev
sub_181B050 (.debug_frame)+384+392+400--
sub_181B160 (.nv_debug_info_reg_sass)+408+416+424+24
sub_181B270 (.nv_debug_info_reg_type)+432+440+448+24

This uniform stride confirms that the writer slots were laid out as a C struct of three consecutive struct { list_head *head; void *buffer; uint32_t size; } fields rather than allocated ad hoc, and strongly suggests the ptxas/nvlink shared header defines these debug buffers as a single aggregate type.

Section-to-Offset Mapper Offsets

SectionContext slotLookup orderLookup method
.debug_line+801memcmp(..., 0xC) (12 bytes)
.debug_frame+722memcmp(..., 0xD) (13 bytes)
.nv_debug_line_sass+883memcmp(..., 0x14) (20 bytes)
.debug_info+1124memcmp(..., 0xC) (12 bytes)
.debug_loc+1205memcmp(..., 0xB) (11 bytes)
.nv_debug_info_reg_sass+966strcmp
.nv_debug_info_reg_type+1047strcmp

Sections not in the above table fall through to sub_464DB0(*(QWORD **)(a1 + 8), a3) — a generic hash-map lookup on the linker context's section map. The ordering places memcmp-checked names first (cheaper when the match succeeds early) and strcmp-checked names last.

Section Classifier: sub_1CED7C0

The section name classifier is a chain of memcmp/strcmp calls that identifies whether a given section is a recognized debug section. It accepts a linker context pointer (a1) and a section header record pointer (a2), resolves the section name via sub_448590, and returns 1 if the section is any recognized debug section, or 0 otherwise.

Recognition Order

The classifier checks section names in this fixed order:

  1. .debug_abbrev (memcmp, 14 bytes)
  2. .debug_aranges (memcmp, 15 bytes)
  3. .debug_frame (memcmp, 13 bytes)
  4. .debug_info (memcmp, 12 bytes)
  5. .debug_loc (memcmp, 11 bytes)
  6. .debug_macinfo (memcmp, 15 bytes)
  7. .debug_pubnames (memcmp, 16 bytes)
  8. .debug_pubtypes (strcmp)
  9. .debug_ranges (strcmp)
  10. .debug_str (strcmp)
  11. .nv_debug_info_reg_sass (strcmp)
  12. .nv_debug_info_reg_type (strcmp)
  13. .nv_debug_ptx_txt (prefix match via sub_44E3A0)
  14. .debug_line (strcmp)
  15. .nv_debug_line_sass (strcmp)

The first seven checks use memcmp with an explicit length. This means a section name with extra characters after the matched prefix would still match (e.g., .debug_info.dwo would match the .debug_info check). The remaining checks use strcmp for exact matching, except .nv_debug_ptx_txt which uses the prefix-matching helper.

Section Type Preprocessing

Before each name comparison, the function inspects the section header's sh_type field (a2[1]). The magic constants 1879048198 through 1879048292 correspond to NVIDIA-specific ELF section types (SHT_LOOS-based ranges). A bitmask 0x5D05 is used to quickly classify section types into "possibly debug" or "definitely not debug" categories, avoiding expensive string comparisons for clearly non-debug sections.

Section-to-Offset Mapper: sub_1CEDD50

The section-to-offset mapper accepts a context object (a1), a section header record (a2), and a section index (a3), and returns the pointer stored at the context slot corresponding to the given section name. This function serves as the lookup mechanism that converts a section name into the accumulated data pointer for that section's concatenation buffer.

Offset Map

Section NameContext OffsetContent
.debug_line+80Standard DWARF line program data
.debug_frame+72Standard DWARF frame unwind data
.nv_debug_line_sass+88SASS line number data
.nv_debug_info_reg_sass+96Register allocation data
.nv_debug_info_reg_type+104Register type data
.debug_info+112Standard DWARF compilation unit data
.debug_loc+120Standard DWARF location list data

For any section name not in this table, the function falls through to sub_464DB0 which performs a generic hash-map lookup on the linker context's section map at a1 + 8.

Lookup Order

The mapper checks names in this order: .debug_line, .debug_frame, .nv_debug_line_sass, .debug_info, .debug_loc, .nv_debug_info_reg_sass, .nv_debug_info_reg_type. The ordering differs from the classifier because this function optimizes for the sections that are most frequently looked up during concatenation.

Concatenation Writers

Three functions with identical structure handle the final emission of concatenated debug sections into the output ELF. Each function walks a linked list of per-CU data fragments collected during the input parsing phase, allocates a single contiguous buffer, copies all fragments into it via memcpy, frees the fragment nodes, and registers the result as an output ELF section via sub_434BC0/sub_434290.

Common Algorithm

writer(context, elf_writer):
    list_head = context[linked_list_offset]
    total_size = context[size_offset]
    
    // Flatten the linked list
    flat_list = flatten(list_head)          // sub_4649E0
    
    // Allocate output buffer
    alloc_ctx = get_allocator(list_head, elf_writer)  // sub_44F410
    buffer = allocate(alloc_ctx, total_size)           // sub_4307C0
    if (!buffer) fatal_error(alloc_ctx, total_size)    // sub_45CAC0
    context[buffer_offset] = buffer
    
    // Concatenate fragments
    write_pos = 0
    for node in flat_list:
        data = node->data    // +0: pointer to fragment bytes
        size = node->size    // +8: fragment byte count (uint32)
        memcpy(buffer + write_pos, data, size)
        write_pos += size
        free(node->data)     // sub_431000
        free(node)           // sub_431000
    
    // Register output section
    section_id = create_section(elf_writer, section_name, 0, 1, 0)
    emit_section(elf_writer, section_id, buffer, 1, total_size)

Writer Instance Table

FunctionAddressSection NameList OffsetSize OffsetBuffer Offset
sub_181B0500x181B050.debug_frame+384+400+392
sub_181B1600x181B160.nv_debug_info_reg_sass+408+424+416
sub_181B2700x181B270.nv_debug_info_reg_type+432+448+440

The three functions are byte-for-byte identical in structure, differing only in the struct field offsets and the section name string passed to sub_434BC0. The stride between consecutive instances is exactly 24 bytes in the context structure (+384 to +408 to +432 for the list head; +400 to +424 to +448 for the size; +392 to +416 to +440 for the buffer).

Fragment Node Layout

Each node in the linked list has this structure:

OffsetSizeField
+08next pointer (NULL for tail)
+88Pointer to inner data record

The inner data record at the pointer from +8:

OffsetSizeField
+08Pointer to raw section bytes
+84Byte count of this fragment

After concatenation, both the inner data record's byte pointer and the inner record itself are freed via sub_431000. The flattened list returned by sub_4649E0 is freed via sub_464520.

Mercury Section Dispatcher: sub_1CF1690

The Mercury section dispatcher (sub_1CF1690 at 0x1CF1690, 545 lines) handles section recognition and slot assignment for debug sections in Mercury-format ELF objects. It recognizes both bare section names (e.g., .debug_frame) and Mercury-prefixed names (e.g., .nv.merc.debug_frame), dispatching each to the same context slot. A flag byte at context offset +432 controls whether Mercury-prefixed sections are accepted.

Dispatch Table

For each section, the dispatcher first checks the bare name. If the bare name does not match but the section has the Mercury attribute flag (byte +11 bit 4 set in the section header), the dispatcher tries the .nv.merc. prefixed name.

Bare NameMercury NameContext Slot
.debug_frame.nv.merc.debug_frame+72
.debug_line.nv.merc.debug_line+80
.nv_debug_line_sass.nv.merc.nv_debug_line_sass+88
.nv_debug_info_reg_sass.nv.merc.nv_debug_info_reg_sass+96
.nv_debug_info_reg_type.nv.merc.nv_debug_info_reg_type+104
.debug_info.nv.merc.debug_info+112
.debug_loc.nv.merc.debug_loc+120

When a section matches, the dispatcher stores the per-CU data record pointer (v17, a 64-byte zero-initialized record allocated via sub_4307C0) into the context slot. The function returns 0 on successful dispatch and 2 when the section was already known (duplicate).

Mercury Acceptance Gate

The flag at context offset +432 acts as a gate for Mercury-prefixed sections. When this byte is zero, the dispatcher skips the Mercury-prefixed name checks and only recognizes bare names. When non-zero, both bare and prefixed names are accepted. This allows the linker to selectively enable Mercury debug section handling based on the link mode (standard CUDA vs. Mercury/capmerc).

Section Filter Predicate: sub_1CECBB0

The section filter at sub_1CECBB0 determines whether a section should be included in the output. For debug-related sections, it applies special rules:

  • .nv_debug.shared is always excluded (returns 0). The function checks this name via exact strcmp when the section type is SHT_PROGBITS (type 8) and the name did not match .nv.shared. or .nv.local. or .nv.global prefixes.
  • Standard string table sections (.strtab, .shstrtab) are excluded.
  • Sections with NVIDIA-specific types in the SHT_LOOS range (1879048193--1879048326) are selectively included based on a bitmask filter (0x34B).
  • Debug sections (type 1 or types in the 1879048198--1879048292 range) are included only if the section header's flag byte at +8 has bit 2 set.

Prefix Detection: sub_1672F50

The function sub_1672F50 at 0x1672F50 uses the string constants .nv_debug_ (10 characters) and .debug_ (7 characters) as prefix tests to identify whether a section belongs to the debug namespace. This is used during the output section naming phase to decide whether a section name needs the Mercury .nv.merc. prefix transformation. The prefix strings at 0x226B814 and 0x226B81F respectively have single xrefs into this function.

Integration with DWARF Subsystem

The NVIDIA debug extensions integrate with the standard DWARF processing pipeline at several points:

  1. Input parsing: The section classifier (sub_1CED7C0) identifies NVIDIA debug sections alongside standard DWARF sections, allowing the input parser to collect them into the appropriate linked lists.

  2. Section merging: The section-to-offset mapper (sub_1CEDD50) provides fast context-slot lookups during the merge phase. Both standard and NVIDIA sections are accessed through the same dispatch pattern, with NVIDIA sections occupying slots +88 through +104 between the standard .debug_line slot (+80) and .debug_info slot (+112).

  3. Output emission: The three concatenation writers run after all input objects have been processed. They flatten per-CU fragment lists into contiguous buffers and register them as output ELF sections. The .debug_frame writer (sub_181B050) uses the same algorithm as the NVIDIA-specific writers, confirming that .debug_frame is treated as a concatenation-based section rather than a DWARF-parsed section.

  4. Mercury output: The ELF emitter (sub_1CED0E0) handles the mapping from bare names to .nv.merc. prefixed names when producing Mercury-format output. All six NVIDIA debug sections have Mercury variants, as do the eleven standard DWARF sections.

Cross-References

Internal (nvlink wiki) — debug subsystem:

  • DWARF Processing -- Core DWARF-2/3 parsing pipeline, abbreviation table cache, vendor-specific attributes (DW_AT_NV_*), and form value decoding. Used by .nv_debug_info_ptx via sub_1D1D2F0 and shared with standard .debug_info parsing
  • Line Table Merging -- DWARF line program merging, NVIDIA extended opcodes, and the .nv_debug_line_sass SASS-level line-table pipeline that produces the merged output
  • Mercury Debug Sections -- Mercury-specific debug section handling and the .nv.merc.* namespace; cross-reference for the Mercury dispatcher (sub_1CF1690) and acceptance gate at context offset +432
  • Debug Options -- CLI flags (-g, -lineinfo, --strip-debug) that control which NVIDIA debug sections are emitted by upstream ptxas and which survive nvlink processing

Internal (nvlink wiki) — ELF and section management:

  • NVIDIA Section Types -- Section type constants for .nv_debug_* sections in the CUDA section catalog (the 1879048198--1879048292 SHT_LOOS-based ranges that the classifier and filter use as a fast pre-check)
  • Section Catalog -- Alphabetical index entries #62--#66 covering all NVIDIA debug sections, including their full sh_type hex values and creation phase
  • Mercury ELF Sections -- The Mercury container and DWARF mirror sections that carry debug data for sm100+ targets. Documents the parallel .nv.merc.* namespace
  • Section Merging -- How debug sections are classified by merge_elf name-prefix dispatch and routed to the writers documented above
  • UFT: Unified Function Table -- The .nv.uft / .nv.uft.rel / .nv.uft.entry family is unrelated to debug info despite naming similarity; it implements function-pointer indirection for cross-module calls

Sibling wikis:

  • ptxas: Debug Info -- ptxas-side generation of .nv_debug_info_reg_sass, .nv_debug_info_reg_type, and SASS line tables
  • ptxas: ELF Emitter -- How ptxas emits debug sections into the cubin that nvlink later processes
  • cicc: Debug Info Pipeline -- cicc's four-stage debug metadata lifecycle; cicc does not directly produce NVIDIA-specific sections but generates the LLVM debug metadata that ptxas converts to .nv_debug_* formats

Confidence Assessment

ClaimConfidenceEvidence
Section classifier sub_1CED7C0 recognizes 15 section names in documented orderHIGHDecompiled file confirms sequential memcmp/strcmp chain: .debug_abbrev (memcmp 14 bytes) at line 50, .debug_aranges (memcmp 15 bytes) at line 76, with 0x5D05 bitmask at line 45
Bitmask 0x5D05 for section type pre-filteringHIGHDecompiled: (0x5D05uLL >> ((unsigned __int8)v4 - 6)) & 1 at line 45; magic constants 1879048198 and 1879048292 confirm SHT_LOOS-based ranges
Section-to-offset mapper sub_1CEDD50 -- all 7 offset valuesHIGHDecompiled: .debug_line -> *(a1+80), .debug_frame -> *(a1+72), .nv_debug_line_sass -> *(a1+88), .debug_info -> *(a1+112), .debug_loc -> *(a1+120), .nv_debug_info_reg_sass -> *(a1+96), .nv_debug_info_reg_type -> *(a1+104) -- all exact
Three writers structurally identical, differing only in offsets and namesHIGHDecompiled sub_181B050/sub_181B160/sub_181B270 are byte-for-byte identical in call sequence; offsets +384/+408/+432, +400/+424/+448, +392/+416/+440 confirmed; section name strings .debug_frame, .nv_debug_info_reg_sass, .nv_debug_info_reg_type at sub_434BC0 call
6 NVIDIA-specific section namesHIGHAll confirmed in nvlink_strings.json: .nv_debug_line_sass, .nv_debug_info_reg_sass (0x241282C), .nv_debug_info_reg_type (0x2412844), .nv_debug_ptx_txt, .nv_debug_info_ptx (0x245E6D4), .nv_debug.shared (0x1D38995)
.nv_debug_ptx_txt matched by prefix via sub_44E3A0MEDIUMsub_44E3A0 is a starts-with predicate confirmed by its use in sub_1CECBB0 (decompiled line 38); its use for .nv_debug_ptx_txt in the classifier is documented but not individually verified in the dense sub_1CED7C0 decompilation
.nv_debug.shared excluded by sub_1CECBB0 (returns 0)HIGHDecompiled: strcmp(..., ".nv_debug.shared") at line 72, returns 0 when matched
Fragment node layout (next at +0, inner ptr at +8, data at inner+0, size at inner+8)HIGHDecompiled writers: v15[1] (inner at +8), *(const void **)v17 (data at +0), *(unsigned int *)(v17 + 8) (size at +8), *v15 (next at +0)
Mercury dispatcher sub_1CF1690 handles 7 section slotsHIGHDecompiled file present (16,049 bytes); slot assignments consistent with offset map
Prefix detection sub_1672F50 uses .nv_debug_ and .debug_ prefixesHIGHString .nv_debug_ at 0x226B814 confirmed; function takes 11 parameters matching wiki's DWARF emitter description
Fallback to sub_464DB0 hash-map lookupHIGHDecompiled sub_1CEDD50: return sub_464DB0(*(_QWORD **)(a1 + 8), a3) at end of function
.nv_debug_info_ptx parsed by DWARF CU parser at sub_1D1D2F0HIGHString .nv_debug_info_ptx at 0x245E6D4; decompiled sub_1D1D2F0 has memcmp checks for both .debug_info and .nv_debug_info_ptx
.nv_debug_info_ptx shares CU header format with .debug_info (4+2+4+1 = 11 bytes for DWARF-3)HIGHDecompiled sub_1D1D2F0 reads length at +192/+200, version at +204, abbrev offset at +212, pointer size at +208, base ptr stored at +168; both branches converge at LABEL_63 calling sub_1D1BE80
Three writer slots form a 24-byte stride aggregate (list/buffer/size pattern)HIGHOffsets +384/+392/+400, +408/+416/+424, +432/+440/+448 confirmed across all three writers; deltas exactly 24 bytes; size-field stored as 32-bit DWORD per *(_DWORD *)(a1 + 400) access pattern
.nv_debug.shared is excluded (not consumed) by sub_1CECBB0HIGHDecompiled line 72 unconditionally returns 0 on strcmp(..., ".nv_debug.shared") match, before any content inspection
.nv_debug_ptx_txt is the only prefix-matched section in the classifierHIGHAll other classifier branches use memcmp with explicit length or strcmp; only line 279 of sub_1CED7C0 uses sub_44E3A0 (the starts-with predicate confirmed by reading its 22-line decompilation)
sub_44E3A0 is a starts-with predicate, not a generic substring searchHIGH22-line decompilation: walks both strings in lockstep, returns a2 at first NUL of a1 (success) or 0 on first mismatch; classic strncmp-style prefix test
No .debug_cuda_* or .nv.debug.cuda_version sections exist in nvlink v13.0HIGHnvlink_strings.json grep for debug_cuda and cuda_version returns no matches; only the 6 sections catalogued above are referenced
.nv.uft / .nv.uft.rel / .nv.uft.entry are NOT debug sectionsHIGHStrings 0x12422, 0x12449, 0x12595 are referenced from UFT processing functions, not from any debug subsystem function. UFT is documented in elf/uft.md
Section-to-offset mapper uses memcmp for first 5 names and strcmp for last 2HIGHDecompiled sub_1CEDD50: memcmp at lines 51, 67, 78, 100, 111 for .debug_line, .debug_frame, .nv_debug_line_sass, .debug_info, .debug_loc; strcmp at lines 134, 145 for .nv_debug_info_reg_sass, .nv_debug_info_reg_type