Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Capsule Mercury Format

Capsule Mercury (capmerc) is the default binary output format for SM100+ targets in nvlink v13.0.88. It replaces the traditional cubin (CUDA binary) format with a two-layer ELF structure: the outer ELF is a standard CUDA device ELF with e_machine = 190 (EM_CUDA), but it carries an inner set of .nv.merc.* sections that encode Mercury-native instruction streams, Mercury-specific relocations, debug information, and ISA metadata. The CUDA driver reconstitutes finalized SASS from these Mercury sections at load time (or the linker can do it ahead of time via --self-check), enabling the driver to apply architecture-specific fixups and optimizations that were impossible with the flat SASS cubin model.

The name "Capsule Mercury" comes from the encapsulation metaphor: Mercury-format data is encapsulated inside a CUDA ELF container. The "capmerc" abbreviation appears consistently in CLI flags, output filenames, and internal string references.

Key Facts

PropertyValue
CLI selection--binary-kind=capmerc (default for sm100+)
Other valid kindsmercury (legacy), sass (flat SASS)
Option parsersub_4AC380 at 0x4AC380 (9,967 bytes, 429 lines)
ELF type0xFF00 (ET_LOPROC -- processor-specific, distinct from ET_EXEC=2)
Default output namecapmerc.cubin (vs sass.cubin for SASS kind)
Activation flagbyte_2A5F222 = 1 when arch > sm_99
capmerc-specific flagbyte_2A5F225 = 1
Self-check flag--self-check via sub_4AC380
FNLZR entrysub_4275C0 (post-link) / sub_4748F0 (engine)
SASS reconstitutionsub_5207A0 (18,673 bytes, 784 lines)
Fatbin member type16 (distinct from cubin=2, PTX=1, NVVM=8)

Binary Kind Selection

The --binary-kind flag is registered in sub_4AC380 with the help text:

Specify the type of target ELF binary kind. Default on sm100+ is capmerc

Three values are accepted: mercury, capmerc, and sass. The capmerc value is the default when byte_2A5F222 is set (architecture > sm_99). The mercury value produces a legacy Mercury ELF without the capsule wrapping. The sass value produces a traditional flat SASS cubin identical to pre-Blackwell output.

Internally, option parsing in sub_427AE0 sets byte_2A5F222 = 1 and byte_2A5F225 = 1 for any architecture with SM > 99. These globals gate every Mercury-specific code path in the linker. When byte_2A5F225 is set, the output path in main() serializes to memory (via sub_45C950) rather than directly to file (via sub_45C920), because the FNLZR post-link transform requires a complete in-memory ELF image.

OptionHelp textFunction
--binary-kindTarget ELF binary kindSelects mercury/capmerc/sass
--cap-mercGenerate Capsule MercuryBoolean, same effect as --binary-kind=capmerc
--self-checkSelf check for capsule mercury (capmerc)Validates round-trip reconstitution
--out-sassOutput reconstituted SASSWrites SASS from capmerc only via --self-check
--fastpath-offTurns off the fast-path finalization optimizationDisables cross-family fast finalization
--opportunistic-finalization-lvlOpportunistic finalization level (0-3)Controls finalization scope
--asatentrypatchCompile patch as at entry fragmentEntry-patch compilation mode

Capmerc vs Traditional Cubin

A traditional cubin (SASS kind) is a fully resolved ELF: every .text.* section contains final machine code with all relocations applied, all instruction encodings fixed, and all control-flow addresses resolved. The driver loads it and maps it directly to GPU memory.

A capmerc binary defers the final instruction encoding step. Instead of storing finalized SASS, each function's .text.* section is accompanied by a set of .nv.merc.* sections that carry:

  1. Mercury intermediate code -- instruction streams in Mercury's scheduling-friendly representation, where instructions carry symbolic operand references and control-flow metadata rather than fully resolved bit-level encodings.

  2. Mercury relocations -- R_MERCURY_* relocation types (see R_MERCURY Relocations) that the driver's finalizer resolves at load time. These are distinct from R_CUDA_* relocations and use a dedicated relocation table (.nv.merc.rela).

  3. Mercury debug sections -- DWARF-like debug data under the .nv.merc.* namespace, including line tables, abbreviation tables, and register-level debug info.

  4. Mercury ISA metadata -- EIATTR_MERCURY_ISA_VERSION and EIATTR_MERCURY_FINALIZER_OPTIONS attributes that tell the driver which Mercury ISA version the code targets and what finalizer options were used at compile time.

  5. Compatibility attributes -- EICOMPAT_ATTR_MERCURY_ISA_MAJOR_MINOR_VERSION, EICOMPAT_ATTR_MERCURY_ISA_PATCH_VERSION, and EICOMPAT_ATTR_CAN_FASTPATH_FINALIZE for driver-side version matching.

The deferred encoding enables the driver to:

  • Apply architecture-variant-specific instruction scheduling (sm_100 vs sm_100a vs sm_100f).
  • Insert or remove NOPs based on final instruction addresses (control-flow-dependent padding).
  • Apply warp-level optimizations that require knowledge of the final binary layout.
  • Support forward compatibility via the "f" architecture variants (sm_100f, sm_103f, etc.).

Structural Comparison

AspectTraditional cubin (sass)Capsule Mercury (capmerc)
ELF typeET_EXEC (2)0xFF00
Text sectionsFinal SASS bytesMercury intermediate + SASS stub
RelocationsR_CUDA_* (resolved)R_CUDA_* (resolved) + R_MERCURY_* (deferred)
Debug sections.debug_*.debug_* + .nv.merc.debug_*
Output filenamesass.cubincapmerc.cubin
Driver finalizationNoneRequired (reconstitutes SASS)
Forward compatibleNoYes (via "f" variants)
Fatbin member typedefault (cubin)16 (mercury/capmerc)

Mercury Sections (.nv.merc.*)

The inner Mercury payload is distributed across multiple ELF sections with the .nv.merc prefix. These sections are created by the compiler backend (ptxas/cicc) and preserved through linking by nvlink's merge phase. During merge (sub_45E7D0), Mercury sections are identified and either copied or skipped depending on the linking mode -- the string "skip mercury section %i" appears when a Mercury section is not relevant to the current merge pass.

All .nv.merc.* sections carry the SHF_NV_MERC flag: bit 28 of sh_flags (0x10000000). This NVIDIA extension flag serves two purposes: (1) fast O(1) rejection of non-merc sections during classification, and (2) namespace separation during section index remapping (sub_1C99BB0). The finalizer uses this flag to identify which sections require relocation patching during off-target finalization.

Section Catalog

Section nameDescription
.nv.mercMercury code payload (per-function)
.nv.merc.relaMercury-specific relocation table
.nv.merc.symtab_shndxExtended symbol section indices for Mercury
.nv.merc.debug_abbrevDWARF abbreviation table (Mercury)
.nv.merc.debug_arangesDWARF address ranges (Mercury)
.nv.merc.debug_frameDWARF call frame info (Mercury)
.nv.merc.debug_infoDWARF debug info entries (Mercury)
.nv.merc.debug_locDWARF location lists (Mercury)
.nv.merc.debug_macinfoDWARF macro info (Mercury)
.nv.merc.debug_pubnamesDWARF public names (Mercury)
.nv.merc.debug_pubtypesDWARF public types (Mercury)
.nv.merc.debug_rangesDWARF ranges (Mercury)
.nv.merc.debug_strDWARF string table (Mercury)
.nv.merc.debug_lineDWARF line number table (Mercury)
.nv.merc.nv_debug_line_sassNVIDIA SASS-level line debug
.nv.merc.nv_debug_info_reg_sassNVIDIA SASS register debug info
.nv.merc.nv_debug_info_reg_typeNVIDIA register type debug info
.nv.merc.nv_debug_ptx_txtPTX source text for debug
.nv.merc.nv.shared.reserved.Reserved shared memory (Mercury)

The Mercury code sections use a naming convention of .nv.merc. followed by a function-relative suffix. The section .nv.merc without further qualification is the prefix used during section lookup in the finalization pipeline (sub_4748F0 references .nv.merc. at four distinct call sites).

Section Name Construction

Mercury section names are constructed by two helper functions:

  • sub_1CEC4C0 -- generic name constructor. Prepends ".nv.merc" to the original section name: sprintf(buf, "%s%s", ".nv.merc", original_name). Allocates strlen(name) + 9 bytes (8 for prefix + NUL).

  • sub_1CEC660 -- constant bank name constructor. Maps .nv.constant* sections to Mercury equivalents with bank-type suffixes. Extracts the bank character from offset 12 of the section name, computes the bank type as char + 0x70000034 (SHT_LOPROC+52), then matches against six known bank types:

Bank typeSuffixvtable offset
Entry image header indices.entry_image_header_indices+72, +304
Driver.driver+144
Optimizer.optimizer+136
User.user+192
PIC.pic+168
Tools data.tools_data+152

The composite name is built by sub_1CEC570: e.g., ".nv.merc" + ".nv.constant" + ".user" + bank_name.

Relocation section names are constructed in sub_1CF72E0 as: sprintf(buf, "%s%s%s", ".nv.merc", ".rela", name+8) -- the +8 strips the ".nv.merc" prefix so that ".nv.merc.debug_info" becomes ".nv.merc.rela.debug_info".

Section Classifier -- sub_1CED0E0

The 9,262-byte classifier at 0x1CED0E0 identifies .nv.merc.* sections using a two-stage guard-then-waterfall algorithm identical to the ptxas classifier at sub_1C98C60 (see ptxas: Capsule Mercury & Finalization for the full algorithm description).

Stage 1: sh_type range check with bitmask 0x5D05. The section's sh_type is tested against two processor-specific ranges:

Rangesh_type spanQualifying types
A0x70000006..0x70000014Filtered by bitmask 0x5D05 (7 specific types)
B0x70000064..0x7000007EAll accepted (memory-space data)
Special1 (SHT_PROGBITS)Accepted (capmerc descriptors, string tables)

Bitmask 0x5D05 = binary 0101_1101_0000_0101 selects: SHT_LOPROC+6 (bit 0), +8 (bit 2), +14 (bit 8), +16 (bit 10), +17 (bit 11), +18 (bit 12), +20 (bit 14).

Stage 2: Name-based disambiguation. When SHF_NV_MERC (0x10000000) is set in sh_flags, the classifier performs sequential strcmp() against 15+ section names, returning 1 on first match. The check order is: .nv.merc.debug_abbrev, .nv.merc.debug_aranges, .nv.merc.debug_frame, .nv.merc.debug_info, .nv.merc.debug_loc, .nv.merc.debug_macinfo, .nv.merc.debug_pubnames, .nv.merc.debug_pubtypes, .nv.merc.debug_ranges, .nv.merc.debug_str, .nv.merc.nv_debug_info_reg_type, and more.

A companion classifier sub_1CED7C0 (6,757 bytes) performs the same algorithm for non-merc debug sections (.debug_*), using the same 0x5D05 bitmask.

sh_type Map for Mercury Sections

sh_typeHexSection types
10x00000001.nv.capmerc<func>, .nv.merc.debug_abbrev, .nv.merc.debug_str, .nv.merc.nv_debug_ptx_txt
40x00000004.nv.merc.rela* (SHT_RELA)
180x00000012.nv.merc.symtab_shndx (SHT_SYMTAB_SHNDX)
SHT_LOPROC+60x70000006.nv.merc.<memory-space> clones
SHT_LOPROC+80x70000008.nv.merc.nv.shared.reserved
SHT_LOPROC+120x7000000CMercury data sections (during output serialization)
SHT_LOPROC+130x7000000DMercury debug sections (during output serialization)
SHT_LOPROC+140x7000000E.nv.merc.debug_line
SHT_LOPROC+160x70000010.nv.merc.debug_frame
SHT_LOPROC+170x70000011.nv.merc.debug_info
SHT_LOPROC+180x70000012.nv.merc.nv_debug_line_sass
SHT_LOPROC+200x70000014.nv.merc.debug_loc, .nv.merc.debug_ranges, .nv.merc.nv_debug_info_reg_*
SHT_LOPROC+100..+1260x70000064..0x7000007EMemory-space variant sections (constant banks, shared, local, global)

The .nv.merc.* debug sections reuse the same sh_type values as their non-merc counterparts. The SHF_NV_MERC flag (0x10000000) in sh_flags is the distinguishing marker.

Capsule Descriptor Layout

The per-function .nv.capmerc<funcname> section contains a 328-byte capsule descriptor. For the full byte-level layout including all 7 field groups (Identity, SASS Data, Relocation Infrastructure, Function Metadata, Code Generation Parameters, Constant Bank Info, KNOBS Embedding), marker stream TLV format, and sub-byte relocation design, see ptxas: Capsule Mercury & Finalization which documents the descriptor format at the compiler-output level. nvlink reads and preserves these descriptors through the linking pipeline without modification.

Production Pipeline

The capmerc output pipeline in main() follows a distinct path from traditional cubin serialization.

Step 1: ELF Construction and Finalization

The standard linking pipeline runs identically for both sass and capmerc output: input parsing, merge (sub_45E7D0), shared memory layout (sub_439830), relocation application (sub_469D60), and ELF finalization (sub_445000). The finalization phase handles the 0xFF00 ELF type with special-case logic for virtual-to-physical section index remapping and Mercury-specific symbol binding rules (see Finalization Phase).

Step 2: Serialize to Memory (sub_45C950)

Instead of writing directly to a file, the Mercury path:

  1. Computes total ELF size via sub_45C980.
  2. Allocates a contiguous memory buffer from the arena.
  3. Calls sub_45C950 which uses the mode-4 (memcpy) polymorphic writer to serialize the complete ELF into the buffer.

This produces an "in-memory-ELF-image" -- the string used as the FNLZR input identifier.

Step 3: FNLZR Post-Link Transform (sub_4275C0 -> sub_4748F0)

The FNLZR (Finalizer) operates on the serialized ELF byte buffer. sub_4275C0 is a 3,989-byte dispatch function that:

  1. Builds a 160-byte configuration struct from global flags:

    • byte_2A5F222 (Mercury mode)
    • byte_2A5F225 (capmerc mode, controls config word at offset +24: value 4 or 5)
    • byte_2A5F310 (shared flag)
    • byte_2A5F210 (secondary flag)
    • byte_2A5F224, byte_2A5F223 (additional binary properties)
    • byte_2A5F2A9 (mode selector)
  2. Logs to stderr when verbose: "FNLZR: Input ELF: %s", "FNLZR: Post-Link Mode", "FNLZR: Flags [ %u | %u ]", "FNLZR: Starting %s", "FNLZR: Ending %s".

  3. Calls sub_4748F0 (48,730 bytes, 1,830 lines), the top-level link-and-finalize engine with 25 parameters. This function:

    • Parses the binary-kind specification to determine mercury, capmerc, or sass output.
    • Processes finalization options: "cap-merc", "self-check", "out-sass", "fastpath-off", "opportunistic-finalization-lvl".
    • Calls sub_471700 (78,516 bytes), the main finalization orchestrator that compiles Mercury intermediate code to final SASS instruction encodings.
    • Manages Hash Relocation sections (.nvHRKE, .nvHRKI, .nvHRCE, .nvHRCI, .nvHRDE, .nvHRDI) for incremental linking support.

Two FNLZR modes exist:

ModeConfigWhen
Pre-Link (a5=0)Used on individual input objectsDuring input processing
Post-Link (a5=1)Used on the final linked outputAfter ELF serialization

FNLZR Configuration Struct (160 bytes)

sub_4275C0 builds a 160-byte configuration struct (v28[0..19], with bytes 8-159 zeroed via memset(&v28[1], 0, 0x98)) before calling sub_4748F0:

OffsetSizeFieldSource
+08output_elf_ptrSet by sub_4748F0 on return
+244mode_selector4 + (byte_2A5F310 != 0) or 5 when byte_2A5F310 && !byte_2A5F2A9
+281shared_flagbyte_2A5F310 != 0
+311secondary_flagbyte_2A5F210 != 0
+644(unknown)Set to 3 in some paths
+1041mercury_mode1 when Mercury active (not capmerc)
+1051capmerc_mode1 when byte_2A5F225 set
+1061always_1Always set to 1
+1071sm_gt_72byte_2A5F224 != 0
+1081sm_gt_99_variantbyte_2A5F223 != 0

The mode_selector at +24 controls finalization behavior: 4 = standard finalization, 5 = shared-mode finalization (when byte_2A5F310 is active). The FNLZR logs the two flag bytes as "FNLZR: Flags [ %u | %u ]" where the first value is the mercury_mode flag and the second is the capmerc_mode flag.

The post-link mode is the one that produces the final capmerc binary. Pre-link mode runs earlier in the pipeline on individual cubin inputs that need Mercury-level transformation before merging.

Step 4: Write Finalized Binary

After FNLZR returns, main() writes the transformed buffer to the output file via fwrite().

Mercury Section Emission Order

During ELF output serialization (sub_1CEE030), Mercury sections are emitted in five passes:

  1. Data sections -- .nv.merc.* memory-space clones (constant banks, shared, local, global). Written with sh_type = 0x7000000C (SHT_LOPROC+12) and sh_flags = 0x10000000 (SHF_NV_MERC). Section headers are copied via SSE-optimized _mm_loadu_si128 operations.

  2. Debug sections -- .nv.merc.debug_* and .nv.merc.nv_debug_*. Written with sh_type = 0x7000000D (SHT_LOPROC+13) and sh_flags = 0x10000000. Output offset is aligned to each section's sh_addralign value.

  3. Relocation sections -- .nv.merc.rela*. Written with sh_type = 0x70000064 (SHT_LOPROC+100) in Mercury mode or 1 (SHT_PROGBITS) in some variants. sh_flags = 0x42.

  4. Remaining sections -- any additional Mercury sections from the 280-entry section list.

  5. Extended section index -- .nv.merc.symtab_shndx (sh_type = 18, SHT_SYMTAB_SHNDX). Created only when section indices exceed 0xFF00. When a symbol references a section with index > 0xFF00, the symbol table entry stores 0xFFFF and the actual index is recorded in this extended section index table.

Output Filename

The output filename is selected in main() based on architecture and binary-kind flags:

ConditionFilename
arch <= 0x63cubin (standard)
arch > 0x63, SASS modesass.cubin
arch > 0x63, capmerc mode (byte_2A5F222)capmerc.cubin
arch > 0x63, mercury mode (byte_2A5F225)merc.cubin (computed as "capmerc.cubin" + 3)

Fastpath Optimization

The finalizer supports a fastpath for "off-target" finalization, logged as:

[Finalizer] fastpath optimization applied for off-target %u -> %u finalization

This occurs when the driver's target architecture differs from the compilation architecture but belongs to the same decade family (e.g., sm_100 -> sm_103, since 100/10 == 103/10). The fastpath avoids a full re-finalization by applying only the delta between the two ISA variants. The --fastpath-off flag disables this optimization.

Opportunistic Finalization

The --opportunistic-finalization-lvl option controls when off-target finalization is attempted:

LevelBehavior
0Default (driver decides)
1No opportunistic finalization
2Intra-family finalization only
3Intra and inter family finalization
4(Accepted by parser; behavior undocumented, possibly maximum permissiveness)

The option parser (sub_4AC380) rejects values greater than 4, not greater than 3 -- the range check is > 4. The attribute EICOMPAT_ATTR_ENABLE_OPPORTUNISTIC_FINALIZATION is emitted into the ELF to communicate this level to the driver.

Architecture Compatibility

The finalization compatibility checking functions (sub_4709E0, sub_470DA0) determine whether a capmerc binary can be finalized for a target architecture. The checks use:

  1. Internal architecture remapping: 104 -> 120, 130 -> 107, 101 -> 110 (mapping internal variant codes to canonical family representatives).

  2. Decade-family matching: Two architectures are in the same family if arch1/10 == arch2/10. For example, sm_100 and sm_103 are both in the "10x" decade.

  3. Capability bitmask matching: Each architecture has a capability bitmask:

    • sm_100 (code 'd'/100) = bit 0 (value 1)
    • sm_103 (code 'g'/103) = bit 3 (value 8)
    • sm_110 (code 'n'/110) = bit 1 (value 2)
    • sm_121 (code 'y'/121) = bit 6 (value 64)
  4. Version bounds: Finalization version > 0x101 returns error code 25 (version too high). The finalization class (byte at offset +3 of the arch info struct) has values 0-4, dispatched through dword_1D40660[].

  5. Environment override: The CAN_FINALIZE_DEBUG environment variable enables debug logging of compatibility decisions.

Architecture Compatibility Return Codes

sub_4709E0 returns a numeric error code indicating the compatibility result:

ReturnMeaning
0Compatible -- finalization allowed
24NULL architecture profile (a1 == NULL)
25ISA version too high (> 0x101)
26Incompatible finalization class (general)
27Finalization class 4 restriction
28Finalization class 3 restriction (only allows sm_100 -> sm_102/sm_103 with specific bit checks, or sm_120 -> sm_121)
29Finalization class 2 restriction (same-decade only, source must be < target)
30Unknown finalization class (byte not in range 0-4)

The finalization class at offset +3 of the arch info struct controls the rules. Class 0 is the most restrictive (no cross-arch if special flag set, no sm_110 involvement). Class 4 is the most permissive (allows cross-family including sm_110 and sm_121). Classes 1-3 require the source SM to be less than the target SM. The special flag at offset +4 further restricts class 1 (returns error 26 if set) and relaxes classes 2-3 (returns error 26 for classes 0-1 if special flag is set).

Self-Check Mechanism

The --self-check flag triggers a validation pass where the linker reconstitutes SASS from the capmerc binary and compares it against expected output. Three sections are validated independently:

CheckError string
Text section"Self check for capsule mercury text section failed"
Debug section"Self check for capsule mercury debug section failed"
Relocation section"Self check for capsule mercury relocation section failed"

A more detailed failure message references the internal Jira page:

Failure of '%s' section in self-check for capsule mercury.
See the Jira confluence page 'MERCSW-125' for more information
that includes some debugging steps.

The --out-sass option works only through self-check mode. Its help text states:

Generate output of capmerc based reconstituted sass only through -self-check

The reconstitution itself is performed by sub_5207A0 (18,673 bytes, 784 lines), an instruction opcode dispatch table that routes opcode case IDs (1..49+) to the encoding handler sub_A49120 with opcode dispatch IDs (827..875+). This function is part of the reconstitution pipeline that decodes Mercury intermediate sections and re-encodes them as flat SASS instruction bytes using the instruction encoding engine at sub_4C7D10.

Mercury Uplift

The error string "Invalid elf provided for mercury uplift." reveals a conversion mechanism called "mercury uplift" -- the process of converting a traditional SASS cubin into a capmerc binary. When an input cubin is detected as SASS-only (fatbin member type is default cubin rather than 16), the linker can "uplift" it by wrapping the SASS content in Mercury sections. This operation validates that the input ELF is structurally sound before attempting the conversion.

The MercGenerateSassUCode string at 0x2443D02 names the internal pass that generates Mercury microcode from SASS input, forming the core of the uplift pipeline.

FNLZR Internals

The FNLZR finalization orchestrator (sub_471700, 78,516 bytes) is the largest function in the finalization subsystem. It:

  1. Allocates a 656-byte compilation unit descriptor with a vtable at off_1D49C58.
  2. Copies a 256-byte architecture profile via SSE-accelerated _mm_loadu_si128 operations.
  3. Parses key-value options from the configuration: "deviceDebug", "lineInfo", "optLevel", "IsCompute" (True/False), "IsPIC" (True/False).
  4. Builds compiler flags and concatenates with existing flags at v4+48.
  5. Sets up the compilation unit: arch version at v4+28, PIC flag at v4+32, debug flags at v4+36/+40, optimization level at v4+104.
  6. Allocates a 648-byte section builder (offsets 448+) and initializes it via sub_4F5880.
  7. Creates sections including .nv.merc., .nvFatBinSegment, __nv_relfatbin, .nv_fatbin, .note.nv.tkinfo, .symtab.
  8. Uses "Final memory space" as the arena name for finalization allocations.
  9. Outputs the finalized ELF with version string "Cuda compilation tools, release 13.0, V13.0.88".

JIT Finalization Path

A separate JIT finalization entry exists at sub_52E060 (47,095 bytes, called finalizer_jit_entry). This handles the CUDA driver's JIT compilation path with its own logging:

FNLZR: JIT Path
FNLZR: preLink Mode
FNLZR: postLink Mode
FNLZR: Flags [ %u | %u ]
FNLZR: Starting JIT
FNLZR: Ending JIT

The JIT path uses setjmp for error handling across its multiple compilation phases. It shares the underlying finalization orchestrator (sub_471700) with the ahead-of-time path but wraps it in a JIT-specific framework that handles on-the-fly architecture selection and runtime option injection.

Error Conditions

Error stringTrigger
"Invalid elf provided for mercury uplift."Input ELF is malformed for uplift conversion
"Self check for capsule mercury text section failed"Self-check text mismatch
"Self check for capsule mercury debug section failed"Self-check debug mismatch
"Self check for capsule mercury relocation section failed"Self-check relocation mismatch
"SASS generation failed"Reconstitution engine failure
"the elf arch is not compatible with finalizer arch"Architecture mismatch during finalization
"conflicting options provided for finalizer"Contradictory FNLZR options
"Failed to create finalizer thread"Thread creation failure in parallel finalization
"Param struct passed to finalizer is Nil"NULL parameter to finalizer entry
"Internal FNLZR error '%s'"Generic finalizer error with detail string
"Cannot target %s when input '%s' is SASS"Attempting capmerc output from SASS-only input
"skip mercury section %i"Verbose message during merge skipping Mercury sections
"cubin not an elf?"Input cubin fails ELF magic validation
"cubin not a device elf?"Input ELF has wrong e_machine

Function Map

AddressSizeIdentityRole
sub_4AC3809,967 Bnvlink_options_define_tableBinary-kind option registration
sub_4AD42012,265 Bnvlink_options_postprocessOption validation and defaults
sub_4275C03,989 Bfnlzr_post_linkFNLZR dispatch (pre-link / post-link)
sub_4748F048,730 Bnvlink_link_and_finalize_entryTop-level FNLZR engine (25 params)
sub_47170078,516 Bnvlink_finalize_objectMain finalization orchestrator
sub_4709E02,609 Bcan_finalize_architecture_checkArchitecture compatibility check
sub_470DA02,074 Bcan_finalize_with_capability_maskCapability bitmask check
sub_5207A018,673 Bcapmerc_reconstitute_sassSASS reconstitution from Mercury
sub_52E06047,095 Bfinalizer_jit_entryJIT finalization entry point
sub_45C950~1 KBwrite_elf_to_memorySerialize ELF to buffer (Mercury path)
sub_45C980~1 KBcompute_elf_sizeCompute serialized ELF byte count
sub_45BF0013,258 Bserialize_elfCore ELF serialization engine
sub_42AF4011,143 Bextract_and_process_fatbin_memberFatbin extraction (type 16 = capmerc)
sub_1CED0E09,262 BELF_EmitDebugSectionsMercury debug section emitter
sub_1CED7C06,757 BELF_EmitSASSDebugSectionsMercury SASS debug section emitter
sub_1CEC390~500 Bclassify_shared_reservationIdentify .nv.shared.reserved and .nv.merc.nv.shared.reserved sections
sub_1CEC4C0~200 Bmerc_section_name_constructPrepend ".nv.merc" to section names
sub_1CEC570~250 Bmerc_composite_name_constructBuild composite .nv.merc.* names with multiple parts
sub_1CEC660~400 Bmerc_constant_bank_section_mapMap .nv.constant* to .nv.merc equivalents with bank suffixes
sub_1CEF5B022,867 BELF_ProcessRelocationsMercury relocation processing
sub_1CF169016,049 BELF_EmitRelocationTableMercury relocation table emitter
sub_1CF72E0~3 KBemit_merc_rela_sectionsConstruct and emit .nv.merc.rela* sections
sub_1CF3720~10 KBprocess_merc_symtab_shndxHandle .nv.merc.symtab_shndx mapping
sub_1CF7F30~5 KBemit_merc_rela_companionEmit companion relocation sections

Global Variables

AddressTypeNameDescription
byte_2A5F222bytemercury_modeSet when arch > sm_99
byte_2A5F225bytecapmerc_modeSet alongside mercury_mode
byte_2A5F224bytesm_gt_72Set when arch > sm_72
byte_2A5F229byteewp_detectedMercury/EWP input detected (e_type == 0xFF00)
dword_2A5F314dwordarch_codeTarget architecture numeric code
dword_2A5F308dwordfnlzr_verboseFNLZR verbose output flags
byte_2A5F310byteshared_flagControls FNLZR config word (+24: 4 or 5)

Cross-References

Sibling Wikis

  • ptxas: Capsule Mercury & Finalization -- standalone ptxas capmerc format (Mercury section binary layouts, 328-byte capsule descriptor layout, sh_type map, classifier algorithm with 0x5D05 bitmask, marker stream TLV format, rela entry format, sub-byte relocation design, finalization levels)
  • ptxas: Mercury Encoder Pipeline -- standalone ptxas Mercury encode/decode pipeline (phases 113--122)

Confidence Assessment

ClaimRatingEvidence
--binary-kind=capmerc is default for sm100+HIGHString "mercury,capmerc,sass" at 0x1D41D03 verified. Option parser sub_4AC380 decompiled (9,967 bytes, 429 lines).
ELF type 0xFF00 for capmercHIGHDecompiled from sub_4275C0 and sub_4748F0: ELF subtype check elf_subtype == 0xFF00.
sub_4AC380 option parser (9,967 bytes, 429 lines)HIGHDecompiled file sub_4AC380_0x4ac380.c exists and confirms size/line count.
byte_2A5F222 = Mercury mode, byte_2A5F225 = capmerc modeHIGHBoth globals referenced in decompiled sub_4AC380 and sub_4275C0.
sub_45C950 serialize-to-memory pathHIGHFunction called at main line 1462. "in-memory-ELF-image" string at 0x1D3236D verified. sub_45C980 computes size, buffer allocated, then sub_45C950(buffer, elf) serializes.
FNLZR dispatch sub_4275C0 (3,989 bytes)HIGHDecompiled file exists. Size and parameter count verified. All call sites confirmed.
FNLZR engine sub_4748F0 (48,730 bytes, 1,830 lines, 25 params)HIGHDecompiled file exists. Size, line count, and parameter count verified.
Finalization orchestrator sub_471700 (78,516 bytes)HIGHDecompiled file exists. vtable off_1D49C58, 256-byte profile, 656-byte CU confirmed.
sub_5207A0 SASS reconstitution (18,673 bytes)MEDIUMFunction exists at stated address. Size from function bounds. Role inferred from call context and string proximity.
Fatbin member type 16 = capmercMEDIUMInferred from decompiled sub_42AF40 fatbin extraction logic. No direct string evidence for the numeric value 16.
Self-check validates text/debug/relocation independentlyHIGHThree distinct error strings verified at 0x2458F38, 0x2458F70, 0x2458FA8. MERCSW-125 reference at 0x1F44288.
--self-check, --out-sass, --fastpath-off CLI optionsHIGHAll option strings verified in nvlink_strings.json. Help text strings confirmed.
Opportunistic finalization levels 0--4MEDIUMEICOMPAT_ATTR_ENABLE_OPPORTUNISTIC_FINALIZATION verified at 0x245EED8. Parser accepts 0--4 (rejects > 4). Level 4 semantics undocumented. Levels 0--3 semantics partially inferred from code paths.
Decade-family matching (arch1/10 == arch2/10)HIGHInteger division comparison verified in decompiled sub_4709E0.
Version ceiling > 0x101 returns error 25HIGHVerified from decompiled sub_4748F0 Phase 2.
"Failed to create finalizer thread" at 0x2458EC0HIGHVerified in nvlink_strings.json. Confirms thread-based finalization.
JIT entry sub_52E060 (47,095 bytes)HIGHFunction exists. JIT diagnostic strings verified: "FNLZR: JIT Path", "FNLZR: preLink Mode", "FNLZR: postLink Mode", "FNLZR: Ending JIT", "FNLZR: Starting JIT" all confirmed in nvlink_strings.json.
Section catalog: 19 .nv.merc.* namesHIGHAll 19 section name strings verified at addresses 0x24582E8--0x2458D00. Xrefs to emitter functions confirmed.
"skip mercury section %i" at 0x1D3BCB7HIGHString verified at exact address with xref to 0x45F624.
Hash Relocation sections .nvHRKE/.nvHRKI/.nvHRCE/.nvHRCI/.nvHRDE/.nvHRDIMEDIUMSection names inferred from decompiled code. Not individually verified in string scan.
"SASS generation failed" error stringMEDIUMNot individually verified in nvlink_strings.json scan. May exist at an unchecked address.
SHF_NV_MERC = 0x10000000 (bit 28 of sh_flags)HIGHVerified in 10+ decompiled locations: sub_45E7D0 line 1583 (v140 & 0x10000000), sub_1CED0E0 line 47 (*((_QWORD *)a2 + 1) & 0x10000000), sub_1CEE030 line 322/370 (0x10000000 written to sh_flags).
Bitmask 0x5D05 for sh_type classificationHIGHConstant 23813 (0x5D05) verified in sub_1CED0E0, sub_1CED7C0, sub_1CEF5B0, and sub_1CF1690. Same bitmask used by ptxas classifier.
Section name constructors sub_1CEC4C0, sub_1CEC660HIGHDecompiled files verified. sub_1CEC4C0 line 31: sprintf(v10, "%s%s", ".nv.merc", v15). sub_1CEC660 line 52: sub_1CEC570(".nv.merc", ...).
Architecture compatibility return codes 0/24--30HIGHAll return paths verified in decompiled sub_4709E0 (149 lines). Error code 25 at line 50-52, 26 at line 57, 28 at line 132, 29 at line 92, 30 at line 102/130.
FNLZR config struct 160-byte layoutHIGHDecompiled sub_4275C0 line 88: memset(&v28[1], 0, 0x98) (152 bytes + 8 for v28[0] = 160 total). All field assignments verified at lines 89-121.
Emission sh_types 0x7000000C and 0x7000000DHIGHVerified in sub_1CEE030 line 318: v15->m128i_i32[1] = 1879048204 (= 0x7000000C) and line 364: *(_DWORD *)(v33 + 4) = 1879048205 (= 0x7000000D).
.nv.merc.rela name construction (name+8 stripping)HIGHVerified in sub_1CF72E0 line 358: sprintf(buf, "%s%s%s", ".nv.merc", ".rela", (const char *)(v60 + 8)).