Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DWARF Processing

nvlink contains a complete DWARF-2/3 debug information processing subsystem at address range 0x1D10000--0x1D20570. This subsystem parses, validates, and re-emits standard DWARF sections carried in CUDA device ELF objects, handling abbreviation tables, compilation units, type information, location expressions, and line number programs. The implementation supports both 32-bit and 64-bit address sizes and recognizes seven vendor-specific attribute extensions from four vendors: NVIDIA (DW_AT_NV_general_flags), MIPS/GCC (DW_AT_MIPS_linkage_name), GNU (DW_AT_GNU_pubnames), and PGI (DW_AT_PGI_lbase/soffset/lstride). Of these, DW_AT_MIPS_linkage_name and the PGI triplet have unique processing logic; the others are recognized for display but passed through opaquely. All DWARF encoding uses LEB128/ULEB128 variable-length integers, decoded through a shared codec subsystem with SSE-accelerated variants at 0x1D00000--0x1D0FFF0.

This page documents the core parsing functions. For the NVIDIA-specific extensions and Mercury debug section variants, see NVIDIA Debug Extensions. For line-table merging during the link phase, see Line Table Merging.

Key Facts

PropertyValue
DWARF subsystem range0x1D10000--0x1D20570 (~140 functions, ~0.6 MB)
LEB128 codec range0x1D00000--0x1D0FFF0 (~20 functions, ~0.5 MB)
DWARF versions supported2 and 3
Address sizes4 bytes (32-bit) and 8 bytes (64-bit)
Abbreviation buffer2048 bytes initial, grows by 2x when full
Abbreviation entry size32 bytes per slot
Maximum attributes per DIE256 (hard limit with fatal error)
Vendor-specific attributes7 total: NVIDIA (1), MIPS/GCC (1), GNU (1), PGI (3), sentinel (1)
Section type classifiersub_12D4370 at 0x12D4370
Top-level entry pointsub_1D166F0 at 0x1D166F0 (allocates 95,968-byte context)

Standard DWARF Sections

nvlink processes the following standard DWARF sections from input ELF objects. The section type classifier at sub_12D4370 assigns numeric IDs used internally to dispatch processing:

Section NameType IDDescription
.debug_info1Compilation units and DIE trees
.debug_loc2Location lists for variables
.debug_abbrev3Abbreviation tables
.nv_debug_ptx_txt4NVIDIA PTX source text
.debug_line5Line number programs
.debug_str6String table for DW_FORM_strp

Additional sections processed by the ELF emitter (sub_1CED0E0 at 0x1CED0E0) but not given classifier IDs include .debug_frame, .debug_ranges, .debug_aranges, .debug_pubnames, .debug_pubtypes, and .debug_macinfo. These are carried through as opaque blobs during linking and re-emitted in the output under the Mercury namespace prefix .nv.merc.:

Output SectionSource Section
.nv.merc.debug_info.debug_info
.nv.merc.debug_abbrev.debug_abbrev
.nv.merc.debug_line.debug_line
.nv.merc.debug_str.debug_str
.nv.merc.debug_frame.debug_frame
.nv.merc.debug_loc.debug_loc
.nv.merc.debug_ranges.debug_ranges
.nv.merc.debug_aranges.debug_aranges
.nv.merc.debug_pubnames.debug_pubnames
.nv.merc.debug_pubtypes.debug_pubtypes
.nv.merc.debug_macinfo.debug_macinfo
.nv.merc.nv_debug_info_reg_sassSASS register debug info
.nv.merc.nv_debug_line_sassSASS line numbers
.nv.merc.nv_debug_ptx_txtPTX source text
.nv.merc.nv_debug_info_reg_typeRegister type info

Top-Level Entry Point: sub_1D166F0

The top-level DWARF processing entry point allocates a 95,968-byte (0x176E0) context structure and dispatches to the section-level parser.

Signature

int64_t dwarf_process_sections(
    void*    section_data,     // a1: raw section bytes
    size_t   section_size,     // a2: byte count
    char*    section_name,     // a3: e.g. ".debug_info"
    uint64_t flags             // a4: processing flags
);
// Returns byte count processed, or -64 on allocation failure.

Context Structure Layout

The function allocates a context via malloc(0x176E0) and initializes it before calling sub_1D15E00 (the section-level dispatcher). Key offsets within this context:

OffsetTypeField
+128 (0x80)void*Abbreviation table pointer
+136 (0x88)uint64_tAbbreviation table capacity (bytes)
+144 (0x90)uint8_tAbbreviation table valid flag
+152 (0x98)uint64_tCurrent abbreviation index (1-based)
+164 (0xA4)int32_tNesting depth counter
+168 (0xA8)void*Current CU data pointer
+176 (0xB0)uint64_tCurrent CU data size
+184 (0xB8)uint8_tCurrent CU data valid flag
+192 (0xC0)int32_tCU total length
+196 (0xC4)int32_tCU header size (11 for DWARF-2/3)
+200 (0xC8)int32_tCU content length
+204 (0xCC)int32_tDWARF version number
+208 (0xD0)int32_tAddress (pointer) size
+212 (0xD4)int32_tAbbreviation section offset
+216 (0xD8)int32_tMatched abbreviation base index
+224 (0xE0)uint64_tMagic/state marker (38110068)
+30104int32_tEnd-of-data flag
+30176int32_tFormat flags
+30204int32_tStream position flag
+30228int32_tExtended format flag

The magic value 38110068 stored at offset +224 acts as a state sentinel: it is set at the start of each compilation unit's processing and cleared to zero when parsing is complete.

Abbreviation Table Parser: sub_1D17C90

The abbreviation table parser (sub_1D17C90 at 0x1D17C90, 18,519 bytes, 675 decompiled lines) reads .debug_abbrev section data and builds an in-memory lookup table. Each abbreviation entry maps an abbreviation number to a DW_TAG code, a has-children flag, and a list of (attribute, form) pairs.

Signature

int dwarf_parse_abbrev_table(
    dwarf_context* ctx,         // a1: context with abbrev table storage
    uint64_t       section_base, // a2: start of .debug_abbrev data
    int            section_size, // a3: byte count
    int            verbose       // a4: 1 = print parsed entries to stdout
);

Abbreviation Table Storage

The parser stores entries in a contiguous buffer at ctx+128. Initial allocation is 2048 bytes. When the buffer is full (entry count reaches capacity >> 5), the parser doubles the capacity, allocates a new buffer via sub_1D16C20, copies existing entries with memcpy, frees the old buffer, and swaps in the new one.

Each abbreviation entry is a 32-byte record:

Offset   Size   Field
------   ----   -----
+0       4      DW_TAG code (from first ULEB128 of attr/form pair loop)
+4       4      DW_FORM code (companion to the attribute)
+8       1      has_children flag (1 = DW_CHILDREN_yes)
+12      4      Number of attribute/form pairs for this abbreviation
+16      4      Byte offset within the .debug_abbrev section
+24      8      Pointer to heap-allocated (attr, form) pair array

The pair array at +24 stores each attribute/form pair as an 8-byte record: 4 bytes for DW_AT_* code, 4 bytes for DW_FORM_* code. The maximum number of pairs per abbreviation is 256; exceeding this triggers a fatal error:

unexpectedly too many dwarf attributes for any DW_TAG entry!

Parse Loop

The parser is a while(1) loop that reads ULEB128-encoded abbreviation numbers from the byte stream. For each non-zero abbreviation number:

  1. Read the DW_TAG code (ULEB128) via sub_1D1FAD0.
  2. Read the has-children byte (single byte after the tag).
  3. Read attribute/form pairs in a loop until both attribute and form are zero (the null terminator).
  4. Allocate heap storage for the pair array, copy from a 256-entry stack buffer (v151, 2048 bytes on stack).
  5. Store the completed entry into the abbreviation table at the current index.
  6. Increment the abbreviation index at ctx+152.

When verbose (a4) is non-zero, the parser prints each entry to stdout:

Contents of the .debug_abbrev section:

  Number  TAG
   1      0x11 DW_TAG_compile_unit      [has children]
   DW_AT_producer(0x25)          DW_FORM_strp(0xe)
   DW_AT_language(0x13)          DW_FORM_data1(0xb)
   ...

The DW_TAG code-to-name lookup uses the string table at off_245F080, which contains 67 entries (indices 0 through 0x42). Tag codes beyond this range produce <unknown>.

DW_FORM Name Lookup: sub_1D16C60

A simple switch-based lookup (sub_1D16C60 at 0x1D16C60, 80 lines) that maps DWARF form codes to their string names. The complete mapping:

CodeNameEncoding
1DW_FORM_addrTarget address (4 or 8 bytes based on CU pointer size)
3DW_FORM_block22-byte length + data block
4DW_FORM_block44-byte length + data block
5DW_FORM_data22-byte unsigned integer
6DW_FORM_data44-byte unsigned integer
7DW_FORM_data88-byte unsigned integer
8DW_FORM_stringNull-terminated inline string
9DW_FORM_blockULEB128 length + data block
10DW_FORM_block11-byte length + data block
11DW_FORM_data11-byte unsigned integer
12DW_FORM_flag1-byte boolean
13DW_FORM_sdataSigned LEB128
14DW_FORM_strp4-byte offset into .debug_str
15DW_FORM_udataUnsigned LEB128
16DW_FORM_ref_addrAddress-sized offset into .debug_info
17DW_FORM_ref11-byte CU-relative reference
18DW_FORM_ref22-byte CU-relative reference
19DW_FORM_ref44-byte CU-relative reference
20DW_FORM_ref88-byte CU-relative reference
21DW_FORM_ref_udataULEB128 CU-relative reference
22DW_FORM_indirectULEB128 form code followed by actual value

This covers all forms defined in DWARF-2 and DWARF-3. Form code 2 (DW_FORM_block2 gap in the standard) is not handled -- the standard reserves but does not assign it. Unknown form codes produce a diagnostic to stderr:

Unknown FORM value %d

DW_AT Attribute Name Lookup: sub_1D16DF0

The attribute name lookup (sub_1D16DF0 at 0x1D16DF0, 330 lines) is a deeply nested if/else tree (not a switch) that maps DWARF attribute codes to string names. It covers the full DWARF-2/3 standard attribute set plus several vendor extensions.

Standard Attributes (Codes 1--90)

CodeNameCodeName
1DW_AT_sibling46DW_AT_stride_size
2DW_AT_location47DW_AT_upper_bound
3DW_AT_name49DW_AT_abstract_origin
9DW_AT_ordering50DW_AT_accessibility
10DW_AT_subscr_data51DW_AT_address_class
11DW_AT_byte_size52DW_AT_artificial
12DW_AT_bit_offset53DW_AT_base_types
13DW_AT_bit_size54DW_AT_calling_convention
15DW_AT_element_list55DW_AT_count
16DW_AT_stmt_list56DW_AT_data_member_location
17DW_AT_low_pc57DW_AT_decl_column
18DW_AT_high_pc58DW_AT_decl_file
19DW_AT_language59DW_AT_decl_line
20DW_AT_member60DW_AT_declaration
21DW_AT_discr61DW_AT_discr_list
22DW_AT_discr_value63DW_AT_external
23DW_AT_visibility64DW_AT_encoding
24DW_AT_import65DW_AT_frame_base
25DW_AT_string_length66DW_AT_friend
26DW_AT_common_reference67DW_AT_identifier_case
27DW_AT_comp_dir68DW_AT_macro_info
28DW_AT_const_value69DW_AT_namelist_item
29DW_AT_containing_type70DW_AT_priority
30DW_AT_default_value71DW_AT_segment
32DW_AT_inline72DW_AT_specification
33DW_AT_is_optional73DW_AT_static_link
34DW_AT_lower_bound74DW_AT_type
37DW_AT_producer75DW_AT_use_location
39DW_AT_prototyped76DW_AT_variable_parameter
42DW_AT_return_addr77DW_AT_virtuality
44DW_AT_start_scope78DW_AT_vtable_elem_location

DWARF-3 Attributes (Codes 79--91)

CodeName
79DW_AT_associated
80DW_AT_allocated
81DW_AT_data_location
82DW_AT_stride
83DW_AT_entry_pc
84DW_AT_extension
85DW_AT_use_UTF8
86DW_AT_ranges
87DW_AT_trampoline
88DW_AT_call_column
89DW_AT_call_file
90DW_AT_call_line
91DW_AT_description

Vendor Extensions

Code (decimal)Code (hex)NameVendorUnique Handling
81990x2007DW_AT_MIPS_linkage_nameMIPS/GCCYes -- name priority, pubnames/pubtypes
85000x2134DW_AT_GNU_pubnamesGNUNo -- name lookup only
99870x2703DW_AT_NV_general_flagsNVIDIANo -- name lookup only
148480x3A00DW_AT_PGI_lbasePGIYes -- DW_OP expression decoding
148490x3A01DW_AT_PGI_soffsetPGIYes -- DW_OP expression decoding
148500x3A02DW_AT_PGI_lstridePGIYes -- DW_OP expression decoding
163830x3FFFDW_AT_hi_userStandardNo -- sentinel value

Unknown attribute codes produce a diagnostic to stderr:

Unknown Attribute value %d

The if/else tree structure in sub_1D16DF0 (not a compiler-generated switch table) suggests this was hand-written or came from a legacy code generator. The vendor attribute codes fall in the DWARF user-defined ranges: 0x2000--0x2FFF for vendor-specific use (MIPS, GNU, NVIDIA), and 0x3A00--0x3FFF for the upper user range (PGI, sentinel).

DW_AT_MIPS_linkage_name (0x2007) -- Linkage Name Priority

DW_AT_MIPS_linkage_name is the most extensively handled vendor attribute in the DWARF subsystem. Originally defined by SGI for the MIPS ABI, it was adopted by GCC and Clang as the de facto standard for encoding the mangled (C++ linkage) name of a symbol before DWARF-4 introduced DW_AT_linkage_name (code 0x76). The CUDA toolchain emits it in device ELF objects for mangled kernel names.

nvlink gives DW_AT_MIPS_linkage_name priority over DW_AT_name when extracting the canonical name of a DIE. This affects three functions:

DIE tree walker (sub_1D1BE80): When processing DW_TAG_subprogram (tag 46), the walker has a special check at the attribute dispatch level. If the current attribute is DW_AT_name (3) and the previous attribute stored in the DIE context (offset +48) was DW_AT_MIPS_linkage_name (8199), the walker skips the DW_AT_name extraction entirely -- the linkage name already captured is the canonical identifier. Conversely, if the attribute is DW_AT_MIPS_linkage_name, the walker proceeds directly to the name extraction path. The pseudocode for the relevant check:

// Inside DIE tree walker, attribute dispatch for DW_TAG_subprogram (46):
if (current_attr == DW_AT_name) {
    if (die_ctx->prev_attr == DW_AT_MIPS_linkage_name)
        goto skip;   // linkage name already captured, ignore DW_AT_name
}
if (current_attr == DW_AT_MIPS_linkage_name) {
    goto extract_name;  // treat as the canonical name
}

Pubnames emitter (sub_1D19900) and pubtypes emitter (sub_1D193E0): Both use an identical priority pattern when building the .debug_pubnames / .debug_pubtypes name index. For each abbreviation entry's attribute list, they check:

if (attr == DW_AT_MIPS_linkage_name ||
    (attr == DW_AT_name && prev_captured_name != DW_AT_MIPS_linkage_name))
{
    // Extract name string from the stream, allocate arena copy
    prev_captured_name = attr;
}

This means: if a DIE has both DW_AT_name and DW_AT_MIPS_linkage_name, the mangled linkage name always wins. The DW_AT_name is only used as a fallback when no linkage name is present. After capturing via DW_AT_MIPS_linkage_name, encountering DW_AT_name has no effect on the stored name. This guarantees that pubnames/pubtypes entries use the mangled C++ name when available, which matches what host-side linkers and debuggers expect.

DW_AT_GNU_pubnames (0x2134) -- GCC .debug_gnu_pubnames

DW_AT_GNU_pubnames is a boolean attribute added to DW_TAG_compile_unit DIEs by GCC when the .debug_gnu_pubnames section is present. This is the GCC extension for accelerated name lookup (later standardized as .debug_names in DWARF-5). nvlink recognizes the attribute name for display in verbose mode but does not perform any special processing on its value -- the attribute is decoded generically through the form value reader like any other boolean or constant. The .debug_pubnames section itself is carried through as an opaque blob in the Mercury output (.nv.merc.debug_pubnames).

DW_AT_NV_general_flags (0x2703) -- NVIDIA GPU Function Properties

DW_AT_NV_general_flags at code 9987 (0x2703) is NVIDIA's sole custom DWARF attribute in the vendor extension range. It is used by the CUDA toolchain (cicc/ptxas) to annotate DW_TAG_subprogram DIEs with GPU-specific function properties in device ELF .debug_info sections.

Despite being the only NVIDIA-proprietary attribute, DW_AT_NV_general_flags has no special handling in the nvlink DWARF subsystem beyond the name lookup in sub_1D16DF0. The attribute value is:

  • Decoded generically by the form value reader (sub_1D1B540) according to whatever DW_FORM_* is specified in the abbreviation table (typically DW_FORM_data4 for a 32-bit flags word or DW_FORM_data2)
  • Not examined, filtered, or modified by the DIE tree walker (sub_1D1BE80)
  • Not referenced by the pubnames or pubtypes emitters
  • Passed through opaquely to the Mercury output

The exact bit layout of the flags value was not determined from decompilation of nvlink alone -- the flags are produced by cicc and consumed by cuda-gdb and other NVIDIA debug tools. The attribute code 0x2703 falls in the 0x2000--0x3FFF user-defined range (specifically in the 0x2700--0x27FF sub-range that appears to be reserved for NVIDIA).

PGI Extensions (0x3A00--0x3A02) -- Fortran Array Descriptors

The three PGI attributes reflect nvlink's lineage from the PGI (Portland Group / NVIDIA HPC SDK) compiler toolchain. They encode Fortran array descriptor components:

AttributeCodeDescription
DW_AT_PGI_lbase0x3A00Lower bound base address of the array descriptor
DW_AT_PGI_soffset0x3A01Section (stride) offset within the descriptor
DW_AT_PGI_lstride0x3A02Element stride (distance between consecutive elements)

These are typically encoded as DW_FORM_block1 values containing DWARF location expressions (DW_OP_* sequences). The form value reader (sub_1D1B540) explicitly includes all three PGI codes in its location-expression decode check:

// In sub_1D1B540, DW_FORM_block1 handler:
if (attr == DW_AT_location       ||   // 2
    attr == DW_AT_data_location   ||   // 81
    (attr - 14848) <= 2u          ||   // DW_AT_PGI_lbase/soffset/lstride
    attr == DW_AT_stride_size)         // 46
{
    // Invoke DW_OP expression decoder (sub_1D1A920) on block contents
}

This means the PGI array descriptor attributes are treated as first-class location expressions by the DWARF subsystem -- their block values are decoded through the full DW_OP_* interpreter (sub_1D1A920), producing human-readable location descriptions in verbose mode. This is the same treatment given to standard location attributes like DW_AT_location and DW_AT_data_location.

DW_OP Expression Decoder: sub_1D1A920

The DW_OP expression decoder (sub_1D1A920 at 0x1D1A920, 15,580 bytes, 616 lines) parses DWARF location expressions and prints them into a string buffer. It handles the full set of DWARF-2/3 expression opcodes needed for GPU debug information.

Signature

uint64_t dwarf_decode_dw_op(
    uint32_t*   addr_size_ctx,   // a1: points to CU address size
    char**      section_name,    // a2: section name pointer (for debug_frame detection)
    int         expr_length,     // a3: byte count of expression data
    string_buf* output,          // a4: output string buffer
    int64_t     reserved1,       // a5
    int64_t     reserved2,       // a6
    void*       expr_data,       // a7: expression byte stream
    uint64_t    expr_capacity    // a8: bounds-checking limit
);

Supported Opcodes

Opcode(s)NameDescription
0x03DW_OP_addrPush address constant (4 or 8 bytes based on CU address size)
0x0CDW_OP_const4uPush 4-byte unsigned constant
0x10DW_OP_constuPush ULEB128 unsigned constant
0x18DW_OP_xderefExtended dereference
0x22DW_OP_plusAddition
0x23DW_OP_plus_uconstAdd ULEB128 constant
0x30--0x4FDW_OP_lit0--DW_OP_lit31Push literal 0--31 (opcode minus 0x30)
0x50--0x6FDW_OP_reg0--DW_OP_reg31Name register 0--31 (opcode minus 0x50)
0x70--0x8FDW_OP_breg0--DW_OP_breg31Register 0--31 plus signed LEB128 offset
0x90DW_OP_regxULEB128 register number
0x91DW_OP_fbregFrame base plus signed LEB128 offset
0x92DW_OP_bregxULEB128 register + signed LEB128 offset
0x94DW_OP_deref_sizeDereference with explicit byte size
0x96DW_OP_nopNo operation
0x9FDW_OP_stack_valueDWARF-4 stack value (marks TOS as the value)

The DW_OP_addr handler dispatches on the CU address size: 4-byte addresses use format "DW_OP_addr: 0x%x", while 8-byte addresses use "DW_OP_addr: 0x%llx".

For DW_OP_bregx (opcode 0x92), the decoder has a special code path for .debug_frame sections. When the section name matches "debug_frame" (compared against the suffix of ".nv.merc.debug_frame"), it decodes the register number through sub_1D17460 which maps register numbers to NVIDIA-specific register names with a 24-bit mask (& 0xFFFFFF). Otherwise it uses the generic ULEB128 decoder.

Multiple DW_OP operations within a single expression are separated by "; " in the output string.

Form Value Reader: sub_1D1B540

The form value reader (sub_1D1B540 at 0x1D1B540, 9,243 bytes, 353 lines) reads and formats a single DWARF attribute value based on its DW_FORM code. This is the central dispatch for all attribute value decoding.

Signature

int64_t dwarf_read_form_value(
    dwarf_context* ctx,       // a1: parsing context (offset +52 = addr size, +56 = section name)
    void*          allocator, // a2: memory allocator context
    uint16_t       form,      // a3: DW_FORM_* code
    string_buf*    output,    // a4: output buffer for formatted value
    int64_t        reserved1, // a5
    int64_t        reserved2, // a6
    void*          data,      // a7: raw byte stream
    uint64_t       data_size, // a8: remaining bytes
    int64_t        slice_ctx  // a9: slice/validation context
);
// Returns number of bytes consumed from the data stream.

Form Dispatch Table

FormBytes ConsumedReaderOutput Format
DW_FORM_addr (1) / DW_FORM_ref_addr (16)4 or 8 (address size)sub_1D17560 (4-byte) or sub_1D192F0 (8-byte)%x or %llx
DW_FORM_block2 (3)2 + Nsub_1D18B20 (read uint16 length), then N bytes%5d byte block: %2x %2x ...
DW_FORM_block4 (4)4 + Nsub_1D17560 (read uint32 length), then N bytes%10d byte block: %2x %2x ...
DW_FORM_data2 (5)2sub_1D18B200x%llx
DW_FORM_data4 (6)4sub_1D175600x%llx
DW_FORM_data8 (7)8sub_1D192F00x%llx
DW_FORM_string (8) / DW_FORM_strp (14)strlen+1sub_1D18B80 + sub_1D175B0%s
DW_FORM_block (9)ULEB128 + Nsub_1D229C0 (read ULEB128 length), then N bytes%20lld byte block: %2x ...
DW_FORM_block1 (10)1 + Nsub_1D17510 (read uint8 length), then N bytes%3d byte block: %2x ...
DW_FORM_data1 (11)1sub_1D175100x%llx
DW_FORM_flag (12)1sub_1D19350%d
DW_FORM_sdata (13)LEB128sub_1D22B50 (signed LEB128)%lld
DW_FORM_udata (15) / DW_FORM_ref_udata (21)ULEB128sub_1D229C0 (unsigned LEB128)%llu
DW_FORM_ref1 (17)1sub_1D17510<%x>
DW_FORM_ref2 (18)2sub_1D18B20<%x>
DW_FORM_ref4 (19)4sub_1D17560<%x>
DW_FORM_ref8 (20)8sub_1D192F0<%llx>
DW_FORM_indirect (22)----Fatal: exit(1)

For block forms (DW_FORM_block1, DW_FORM_block4, DW_FORM_block), after printing the hex dump the reader also invokes the DW_OP expression decoder (sub_1D1A920) to produce a human-readable interpretation. The decoded expression is appended in parentheses: (%s).

The DW_FORM_block1 reader has an additional dispatch based on the attribute code: for location-related attributes (DW_AT_location = 2, DW_AT_data_member_location = 56, DW_AT_stride_size = 46, DW_AT_address_class = 51, and PGI attributes 14848--14850), it invokes sub_1D1A920 with the block contents. For DW_AT_data_member_location specifically, it passes the data through a different slice path to handle the member offset encoding.

Encountering DW_FORM_indirect triggers a fatal error with exit(1) and the message:

Warning: we should not get here! - DW_FORM_indirect

Any unrecognized form code triggers:

Error in get_form_value default

Compilation Unit Parser: sub_1D1D2F0

The .debug_info section parser (sub_1D1D2F0 at 0x1D1D2F0, 9,191 bytes, 397 lines) iterates over compilation units (CUs) and dispatches to the DIE tree walker. This is called through a thin wrapper sub_1D1DAE0 which normalizes the parameter order.

Signature

int dwarf_parse_debug_info(
    dwarf_context*  ctx,           // a1: DWARF context
    uint64_t        section_size,  // a2: total .debug_info size
    int64_t         string_table,  // a3: .debug_str base address
    const char*     section_name,  // a4: ".debug_info" or ".nv_debug_info_ptx"
    void*           allocator,     // a5: memory allocator
    uint8_t         alloc_flags,   // a6
    void*           data,          // a7: raw .debug_info bytes
    uint64_t        data_size,     // a8: byte count
    uint8_t         data_valid,    // a9: computed from a7 != NULL && a8 != 0
    uint8_t         verbose        // a10: 1 = print compilation unit headers
);

Compilation Unit Header

Each compilation unit starts with an 11-byte header (DWARF-2/3 32-bit format):

Offset   Size   Field
------   ----   -----
+0       4      unit_length: total bytes after this field (excludes the 4-byte length itself)
+4       2      version: DWARF version (2 or 3)
+6       4      debug_abbrev_offset: byte offset into .debug_abbrev for this CU's table
+10      1      address_size: pointer size (4 or 8)

The parser reads these fields and stores them in the context:

ctx->cu_total_length    = unit_length;       // +192
ctx->cu_header_size     = 11;                // +196 (constant for DWARF-2/3)
ctx->cu_content_length  = unit_length;       // +200
ctx->cu_version         = version;           // +204
ctx->cu_address_size    = address_size;      // +208
ctx->cu_abbrev_offset   = abbrev_offset;     // +212

When verbose mode is active (a10 != 0), the parser prints:

 Compilation Unit @ offset 0x%zx:
  Length:           %d
  Version:          %d
  Abbrev Offset:    %d
  Pointer Size:     %d

Abbreviation Table Matching

After reading the CU header, the parser scans the abbreviation table (stored at ctx+128) to find the first entry whose byte offset matches the CU's debug_abbrev_offset. This establishes the base index for abbreviation lookups within this CU:

for (int i = 1; i <= ctx->abbrev_count; i++) {
    abbrev_entry* entry = &ctx->abbrev_table[i];
    if (entry->section_offset == abbrev_offset) {
        ctx->matched_abbrev_base = i - 1;  // +216
        break;
    }
}

DIE Tree Dispatch

After CU header parsing, the function reads the first ULEB128 from the CU content (the root DIE's abbreviation number) and allocates a 48-byte record for the CU's data pointers, then dispatches to sub_1D1BE80 (the DIE tree walker) if the section name matches either ".debug_info" or ".nv_debug_info_ptx".

After processing one CU, the parser advances data by unit_length - 7 bytes and loops to process the next CU, continuing until all data is consumed.

DIE Tree Walker: sub_1D1BE80

The DIE tree walker (sub_1D1BE80 at 0x1D1BE80, 27,583 bytes, 1,059 lines) recursively processes all Debug Information Entries within a compilation unit. For each DIE it:

  1. Reads the abbreviation number (ULEB128).
  2. Looks up the abbreviation entry to determine the DW_TAG, has-children flag, and attribute list.
  3. For each attribute, calls the form value reader (sub_1D1B540) to consume and format the value.
  4. If the DIE has children, recurses to process child DIEs.
  5. A zero abbreviation number signals the end of a sibling chain (returns to parent scope).

In verbose mode, each DIE is printed as:

 <%d><%x>:  Abbrev Number: %d   (0x%02x %s)

where the fields are nesting depth, byte offset from CU start, abbreviation number, DW_TAG code, and DW_TAG name.

The walker recognizes DW_TAG values 5 (DW_TAG_formal_parameter) and 52 (DW_TAG_variable) as special cases for tracking function parameter and variable debug information through a separate codepath.

LEB128 Codec Subsystem

The LEB128 codec at 0x1D00000--0x1D0FFF0 provides variable-length integer encoding/decoding used throughout the DWARF subsystem. It has four implementation tiers:

FunctionAddressSizeDescription
sub_1CFEDC00x1CFEDC055,417 BLEB128 encoder, 32-bit ELF target
sub_1D007900x1D0079054,711 BLEB128 encoder, 64-bit ELF target
sub_1D023200x1D0232025,838 BLEB128 decoder, simple variant
sub_1D030900x1D0309027,414 BLEB128 decoder, with validation
sub_1D058800x1D0588053,217 BULEB128 encoder
sub_1D079000x1D0790028,383 BULEB128 decoder
sub_1D08D900x1D08D9053,282 BSSE-accelerated LEB128 encoder
sub_1D0DFD00x1D0DFD069,653 BSSE-accelerated LEB128 decoder
sub_1D101200x1D1012069,928 BSSE-accelerated signed LEB128 decoder
sub_1D13C800x1D13C8048,315 BSSE bulk LEB128 encoder
sub_1D238D00x1D238D031,937 BMulti-pass LEB128 encoder
sub_1D0AF400x1D0AF4017,016 BLEB128 lookup table initializer
sub_1D0B9A00x1D0B9A016,630 BCompact LEB128 encoder for small values

Inline Decoders Used by DWARF

The DWARF parser calls two specific ULEB128/SLEB128 decoders for individual values:

  • sub_1D229C0 -- ULEB128 decoder. Returns the decoded unsigned value and stores the byte count consumed into an output parameter. Used for abbreviation numbers, form lengths, unsigned data.
  • sub_1D22B50 -- SLEB128 (signed LEB128) decoder. Returns the decoded signed value. Used for DW_FORM_sdata and DW_OP_fbreg/DW_OP_breg* offsets.
  • sub_1D1FAD0 -- Another ULEB128 decoder variant used in the abbreviation parser. Returns the decoded value and stores consumed byte count via a pthread_mutexattr_t* parameter (reused struct for alignment).

SSE Acceleration

The SSE-accelerated encoders and decoders process 16 bytes at a time using SSE2 SIMD instructions (_mm_load_si128, _mm_shuffle_epi8, _mm_and_si128, _mm_or_si128, _mm_srli_epi64). They extract continuation bits in parallel across 16 LEB128 bytes, determine group boundaries, and decode/encode all values in a single pass. These are used for bulk operations on large DWARF sections, not for individual value decoding.

The signed SSE decoder (sub_1D10120, 69,928 bytes -- the largest function in the LEB128 subsystem) additionally handles sign extension for negative values, which requires detecting the sign bit position within each variable-length group.

Helper Functions

FunctionAddressSizeDescription
sub_1D16C200x1D16C20~200 BArena allocator wrapper (allocates via context arena)
sub_1D175100x1D17510~80 BRead 1 byte (uint8_t) from stream, advance pointer
sub_1D175600x1D17560~80 BRead 4 bytes (uint32_t) from stream, advance pointer
sub_1D175B00x1D175B0~100 BCopy N bytes from stream to buffer
sub_1D17C100x1D17C10~120 BLook up abbreviation entry by index from table
sub_1D18B200x1D18B20~80 BRead 2 bytes (uint16_t) from stream, advance pointer
sub_1D18B800x1D18B80~100 BCompute string length (strlen on stream)
sub_1D192F00x1D192F0~100 BRead 8 bytes (uint64_t) from stream, advance pointer
sub_1D193500x1D19350~80 BRead 1 byte as signed, advance pointer
sub_1D193A00x1D193A0~60 BBounds validation helper
sub_1D174600x1D17460~180 BNVIDIA register name lookup (for DW_OP_bregx in .debug_frame)
sub_1D176300x1D17630~3,752 BLEB128 decoder with 512-byte working buffer
sub_1D1FAD00x1D1FAD0~200 BULEB128 decoder for abbreviation parsing

Pubnames and Pubtypes Emitters

Two functions emit the .debug_pubnames and .debug_pubtypes lookup sections:

  • sub_1D18EA0 (0x1D18EA0, 5,152 bytes) -- .debug_pubnames emitter. Walks the abbreviation table, and for each DIE with a DW_AT_name attribute, emits an entry mapping the name to the DIE offset within .debug_info.

  • sub_1D193E0 (0x1D193E0, 6,101 bytes) -- .debug_pubtypes emitter. Similar structure but emits entries for type DIEs (those with DW_TAG_base_type, DW_TAG_typedef, etc.).

Both follow the DWARF-2/3 pubnames/pubtypes section format: a header with CU offset and CU size, followed by (offset, name) pairs terminated by a zero offset.

Bounds Checking

The DWARF parser performs pervasive bounds checking through a consistent pattern. Each data access is guarded by three assertions on the context's data triple (pointer, capacity, valid_flag):

  1. Null check: if (!pointer) fatal(ASSERT_NOT_NULL);
  2. Valid flag: if (!valid_flag) fatal(ASSERT_VALID);
  3. Bounds check: if (required_offset > capacity) fatal(ASSERT_BOUNDS);

These correspond to the three error codes referenced as dword_2A5F0D0 (null pointer), dword_2A5F0B0 (invalid state), and dword_2A5F0A0 (out of bounds). The assertions are implemented as calls to sub_467460 which is the global diagnostic/assertion handler. This pattern appears on virtually every byte read throughout the DWARF subsystem, giving strong protection against malformed input but contributing significantly to code size.

Cross-References

Internal (nvlink wiki):

  • NVIDIA Debug Extensions -- Six proprietary debug sections (.nv_debug_*) processed alongside standard DWARF sections
  • Line Table Merging -- DWARF line program merging during linking, including NVIDIA extended opcodes
  • Mercury Debug Sections -- Mercury-format debug sections with .nv.merc.* prefix and the Mercury section dispatcher
  • Debug Options -- CLI flags controlling debug section emission (-g, --no-debug, debug section output matrix)
  • Mercury ELF Sections -- The 11 standard DWARF mirrors under .nv.merc.* namespace
  • Error Reporting -- sub_467460 diagnostic handler used by DWARF bounds-check assertions
  • Section Merging -- How debug sections are classified and routed during merge_elf

Sibling wikis:

The debug information lifecycle spans three toolchain components. For the upstream generation stages:

  • ptxas: Debug Info -- DWARF line table generation (PTX-level and SASS-level), .nv_debug_info_reg_sass/.nv_debug_info_reg_type emission, Mercury debug section classifiers, and the --device-debug/--lineinfo flag semantics within ptxas
  • cicc: Debug Info Pipeline -- Four-stage debug metadata lifecycle from CUDA source through the LLVM optimizer to PTX .loc/.file directives. Covers the three compilation modes (-g, -generate-line-info, neither), the five stripping passes, and the NVVM container DebugInfo enum (NONE/LINE_INFO/DWARF)

nvlink's DWARF processing subsystem consumes the output of both upstream stages: cicc produces PTX with @@DWARF directives and .loc/.file metadata, ptxas compiles this to SASS and emits the standard and NVIDIA-proprietary debug sections, and nvlink merges and re-emits these sections during linking.

Confidence Assessment

ClaimConfidenceEvidence
DWARF subsystem at 0x1D10000--0x1D20570HIGHAll key functions (sub_1D166F0, sub_1D17C90, sub_1D16C60, sub_1D16DF0, sub_1D1A920, sub_1D1B540, sub_1D1BE80, sub_1D1D2F0) confirmed present in decompiled/ at exact addresses
Top-level entry sub_1D166F0 allocates 95,968-byte contextHIGHDecompiled code: malloc(0x176E0u) at line 31, and 0x176E0 = 95,968 decimal
Magic value 38110068 at context offset +224HIGHDecompiled sub_1D1D2F0: *(_QWORD *)(a1 + 224) = 38110068 at line 69
CU header size = 11 bytes (DWARF-2/3)HIGHDecompiled sub_1D1D2F0: *(_DWORD *)(v14 + 196) = 11 at line 267
Context offsets +192 through +216 for CU fieldsHIGHDecompiled code stores to +192, +196, +200, +204, +208, +212, +216 exactly as documented
Abbreviation table 2048 bytes initial, 32 bytes per entryHIGHDecompiled sub_1D17C90 exists at exact address; string "unexpectedly too many dwarf attributes for any DW_TAG entry!" confirmed in strings at 0x245DD70
DW_FORM name lookup sub_1D16C60 -- 22 formsHIGHDecompiled file present; string "Unknown FORM value %d" at 0x245D5B4
DW_AT vendor extensions (MIPS, GNU, NV, PGI)HIGHAll four vendor attribute name strings confirmed: DW_AT_MIPS_linkage_name at 0x245DB8A, DW_AT_GNU_pubnames at 0x245DBA2, DW_AT_NV_general_flags at 0x245DBF7, DW_AT_PGI_lbase/soffset/lstride at 0x245DBB5--0x245DBD7
DW_AT_MIPS_linkage_name priority over DW_AT_nameMEDIUMString evidence confirms attribute exists; priority logic inferred from decompiled sub_1D1BE80 (1,059-line function too complex for full verification but attribute dispatch structure is consistent)
DW_OP expression decoder sub_1D1A920 opcodesHIGHAll DW_OP format strings confirmed in strings: DW_OP_addr, DW_OP_constu, DW_OP_const4u, DW_OP_xderef, DW_OP_breg%d, DW_OP_fbreg, DW_OP_deref_size, DW_OP_lit%u, DW_OP_reg%d, DW_OP_stack_value, DW_OP_plus_uconst at addresses 0x245DEE0--0x245DFAC
.nv_debug_info_ptx processed by CU parserHIGHString .nv_debug_info_ptx at 0x245E6D4 with xref into sub_1D1D2F0
Section type classifier sub_12D4370 assigns IDs 1--6HIGHDecompiled file present at exact address
Bounds checking pattern with three error codesHIGHDecompiled sub_1D1D2F0 calls sub_467460(dword_2A5F0D0), sub_467460(dword_2A5F0B0), sub_467460(dword_2A5F0A0) exactly as documented
LEB128 codec subsystem with SSE accelerationMEDIUMFunction addresses confirmed in decompiled/; SSE claim based on function sizes (50--70 KB) which are consistent with SIMD loop unrolling, but individual SSE instructions not verified in decompiled output
DWARF versions 2 and 3 only (no 4 or 5)HIGHString "Dwarf version %d is not supported" at 0x1DFC8C8 confirms version validation; DWARF-4/5 forms (DW_FORM_sec_offset, DW_FORM_exprloc) absent from form table
DW_FORM_indirect triggers exit(1)MEDIUMString "Warning: we should not get here! - DW_FORM_indirect" not found in strings search, but the decompiled sub_1D1B540 would need full reading to confirm; claim is plausible given the function's error handling pattern
Verbose mode printing format stringsHIGHFormat strings "Compilation Unit @ offset 0x%zx:" at 0x245E6E8, "Abbrev Offset: %d" at 0x245E6A4, "Contents of the .debug_abbrev section:" at 0x245DD48 all confirmed