Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Compatibility Checking

nvlink enforces architecture compatibility at three distinct points: during input file validation, during finalization dispatch, and during capability mask evaluation. These three checks form a layered system -- the input validator decides whether a cubin can enter the link, the finalization checker decides whether a finalized object can be re-finalized for a different target, and the capability mask checker gates individual feature support within a finalization context. All three share a common architecture remapping table and a common "same-decade" rule for family grouping.

Key Functions

AddressNameSizeRole
sub_4878A0arch_string_match328 BCore compatibility checker: parses two arch strings, compares profiles
sub_4876A0arch_compat_check2,115 BCompanion checker: virtual-vs-native and a-variant exact-match logic
sub_4709E0can_finalize_arch_check2,609 BFinalization compatibility: 5-level dispatch table, error codes 0/24--30
sub_470DA0can_finalize_capability_mask2,074 BCapability bitmask compatibility within a finalization context
sub_426570validate_arch_and_add7,427 BInput file validation: arch match, toolkit version constraints, mode selection
sub_486FF0arch_parse_name_to_number2,665 BParses sm_%d%c / compute_%d%c / sass_%d%c into structured record
sub_487420arch_canonicalize~600 BCanonicalizes arch string for profile hash lookup
sub_465450profile_family_match~300 BCompares two profiles via family linked list

Architecture Record Layout

sub_486FF0 parses an architecture name string into a structured record. The format strings it recognizes are sm_%2d%s, compute_%2d%s, and sass_%2d%s. The parsed record contains:

OffsetSizeFieldMeaning
+04 bytesarch_numberSM version number (e.g. 100 for sm_100)
+41 bytehas_suffixNonzero if an a or f suffix was present
+51 byte(padding)
+61 byteis_virtual1 if prefix was compute_
+71 byteis_native1 if prefix was sm_ or sass_
+81 bytesame_decade_flagControls whether same-decade rule applies

The arch_number for multi-digit architectures maps directly to the SM value: sm_75 = 75, sm_100 = 100, sm_121 = 121.

Core Compatibility Check (sub_4878A0)

sub_4878A0 is the primary compatibility checker invoked during cubin loading. It takes two architecture name strings (the input cubin's architecture and the link target --arch value) and returns 1 (compatible) or 0 (incompatible).

Algorithm

arch_string_match(input_arch_str, target_arch_str):
    record_a = arch_parse_name_to_number(input_arch_str)
    record_b = arch_parse_name_to_number(target_arch_str)
    if either parse fails:
        return 0

    // Both virtual: reject (cannot link two virtual targets)
    if record_a.is_virtual AND record_b.is_virtual:
        return 0

    // Canonicalize both arch strings and look up profiles in the hash map
    profile_a = profile_lookup(canonicalize(record_a))
    profile_b = profile_lookup(canonicalize(record_b))

    // Case 1: No suffix (base arch, e.g. sm_100 vs sm_100)
    if !record_a.has_suffix:
        return profile_family_match(profile_a.family_list, profile_b)

    // Case 2: Native 'a' variant -- requires exact arch match
    if record_a.is_native:
        return record_a.arch_number == record_b.arch_number

    // Case 3: Same-decade rule
    if !record_a.same_decade_flag:
        return record_b.arch_number >= record_a.arch_number

    // Case 4: Same-decade with cross-family exceptions
    // Special handling for SM 101 and SM 110
    if arch_a == 101 or arch_b == 101 or arch_a == 110 or arch_b == 110:
        if arch_b in {101, 110}:
            return arch_a != 101 AND arch_a != 110  // inverted logic
        return 0  // arch_a in {101, 110} but arch_b is not

    // General same-decade: arch_b >= arch_a AND same family
    if arch_b >= arch_a:
        return arch_a / 10 == arch_b / 10
    return 0

The Same-Decade Rule

The same-decade rule is the fundamental family-grouping mechanism: two architectures are in the same family if their SM numbers, divided by 10, are equal. This produces the following families:

SM / 10FamilyMembers
7Turingsm_75
8Amperesm_80, sm_86, sm_87, sm_88, sm_89
9Hoppersm_90, sm_90a
10Blackwell (100-series)sm_100, sm_103
11Blackwell (110-series)sm_110
12Blackwell (120-series)sm_120, sm_121

The intent is that code compiled for a lower SM within a family can run on a higher SM in the same family (e.g. sm_80 code on sm_89 hardware), while code cannot cross family boundaries (e.g. sm_89 code on sm_90 hardware). The division is integer division, so sm_100 and sm_103 share decade 10, while sm_110 is in decade 11.

SM 101 / SM 110 Cross-Mapping Exception

SM 101 and SM 110 receive special handling. In the decompiled code, when either the source or target is 101 or 110, the normal same-decade comparison is bypassed. If the target is 101 or 110, the check returns true only if the source is the other member of the pair (110 or 101 respectively). This implements a bidirectional compatibility bridge between these two architectures that would otherwise be in different decades. SM 101 is an internal/alternate designation that maps to the sm_10x family from the finalization perspective (see the 101->110 remapping in sub_4709E0).

Virtual vs. Native Comparison

When neither record has the has_suffix flag set and the record is not is_native, the check falls through to profile_family_match (sub_465450). This function traverses the family linked list stored at profile+56 in the profile database, checking whether the target profile appears in the source profile's family chain. This handles compute_XX to sm_XX compatibility -- a compute_75 target is compatible with any sm_XX where XX is in the same family as 75.

The a Variant: Exact Match

When the source record has is_native set and the a flag is detected (e.g. sm_90a, sm_100a), the check requires an exact numeric match: arch_a == arch_b. The a suffix denotes architecture-specific features that are not guaranteed to be forward-compatible within the same decade. An sm_90a cubin can only link against an sm_90a or sm_90 target, not against sm_91 (hypothetical) or any other SM in the same decade.

Companion Check (sub_4876A0)

sub_4876A0 is a closely related function that implements the same logic with slightly different entry conditions. It is used from code paths that already have parsed architecture records rather than raw strings. The function checks:

  1. Both inputs must be non-null and the target must not have is_virtual set (byte at a2+offset6).
  2. If the source is virtual (a1[6] set), a prefix-based check applies with different rules for suffix and native flags.
  3. If the source is not virtual, the same arch_number / 10 family check applies.
  4. The SM 101/110 cross-mapping exception is replicated identically.

Finalization Compatibility (sub_4709E0)

sub_4709E0 determines whether a previously finalized object can be re-finalized for a different target architecture. This is used by the JIT / opportunistic finalization pipeline when nvlink receives a finalized cubin and needs to decide if it can produce output for the requested --arch. The function returns an integer error code rather than a boolean.

Architecture Remapping

Before any comparison, both the source and target architecture numbers are remapped through a fixed table:

InputOutputMeaning
104120SM 104 (internal) maps to SM 120 family
130107SM 130 (internal) maps to SM 107 (i.e. SM 100 family, decade 10)
101110SM 101 maps to SM 110

This remapping collapses internal/experimental architecture numbers into their canonical families before the compatibility check runs. After remapping, the check proceeds with the remapped values.

Return Codes

CodeMeaning
0Compatible -- finalization can proceed
24Null input -- the compatibility record pointer a1 is NULL
25Version too high -- the record's version field (a1 word at offset +6) exceeds 0x101
26Incompatible architecture -- family mismatch or disallowed cross-family link
27Type 4 error -- finalization class 4 incompatibility
28Type 3 error -- finalization class 3 incompatibility
29Type 2 error -- finalization class 2, source >= target in same decade
30Unknown type -- finalization class byte not in range 0--4

Finalization Class Dispatch

The byte at a1[3] encodes the "finalization class" (0--4). This byte indexes into the dispatch table at dword_1D40660, which returns one of 5 compatibility levels:

a1[3]dword_1D40660[a1[3]]Compatibility LevelSemantics
00ExactNo variant allowed, no cross-family, no SM 110 bridging
11Same-decade, same directionSource < target within same decade
22Same-decade, bidirectionalSource and target in same decade (class 2 error if source >= target)
33Cross-family allowedSM 110/121 cross-decade bridging permitted
44BroadestSM 110 bridging plus additional cross-family tolerance

The a1[4] byte is the "variant flag" (corresponding to the a suffix). When the variant flag is set:

  • Level 0: The variant flag is incompatible, returns 26.
  • Level 1: The variant flag is incompatible with same-decade matching, returns 26.
  • Levels 2--3: Variant flag permitted only if level >= 2, returns 0 if so, else returns 26 (0x1A).

SM 110 Special Handling

When either the remapped source or target is 110, and the finalization class is not level 4, the check returns 26 (incompatible). Only level 4 permits SM 110 as a finalization target or source. Similarly, SM 121 receives special treatment at level 3+, allowing cross-decade links between 120 and 121.

Class 3 Specific Rules: SM 100 / SM 120 / SM 121

At finalization class 3 (type byte *a1 == 3, return code 28), the variant flag and level interact:

  • sm_100 (remapped d = 100) with target sm_102 or sm_103: allowed only if a specific bit pattern in a1[2] matches (bits [3:2] == 1 and bits [1:0] == 1).
  • sm_120 with target sm_121: allowed unconditionally at level 2+.
  • All other class-3 combinations return 28.

Capability Mask Check (sub_470DA0)

sub_470DA0 is the third layer of compatibility checking. Where sub_4709E0 returns an error code, this function returns a boolean and additionally gates on a per-architecture capability bitmask.

Architecture Remapping

The same 104->120, 130->107, 101->110 remapping applies. Both source and target are remapped before comparison.

Algorithm

can_finalize_capability_mask(record, source_arch, target_arch, flag):
    // Remap both architectures
    source = remap(source_arch)
    target = remap(target_arch)

    // Direct match: source == target (post-remap)
    result = flag & (source == target)
    if result: return 1

    // Capability mask check
    if record+24 (version word) == 0: return 0
    mask_ptr = *(record + 16)
    if mask_ptr == NULL: return 0

    // Map target arch to bitmask value
    switch (target):
        case 100 ('d'): bit = 1
        case 103 ('g'): bit = 8
        case 110 ('n'): bit = 2
        case 121 ('y'): bit = 64
        default: return 0

    // Check if the source's capability mask includes the target's bit
    if (bit & *mask_ptr) != bit: return 0
    return result

Capability Bitmask Values

The capability bitmask encodes which target architectures a given source can be finalized for:

Target ArchASCIIBitmask ValueBinary
sm_100'd' (100)100000001
sm_110'n' (110)200000010
sm_103'g' (103)800001000
sm_121'y' (121)6401000000

A source architecture's capability mask at *(record + 16) is a bitfield indicating which target architectures it supports. For example, a mask of 0x09 (bits 0 and 3) would indicate compatibility with sm_100 and sm_103.

Debug Environment Variable

Both sub_4709E0 and sub_470DA0 check for the environment variable CAN_FINALIZE_DEBUG. When set, its value is parsed as an integer via strtol. The exact effect is not fully reversed, but the presence of this variable enables diagnostic output from the finalization compatibility pipeline, printing the remapped architectures and compatibility decisions to stderr (inferred from the strtol parse and the diagnostic string references in the surrounding code).

Input File Validation (sub_426570)

sub_426570 is the top-level validation function called from main() for every cubin that enters the link. It orchestrates architecture matching, toolkit version enforcement, and link mode selection.

Validation Sequence

  1. Word size check: The cubin's ELF class (32-bit vs 64-bit) must match the --machine setting (dword_2A5F30C, either 32 or 64).

  2. ELF type rejection: ET_DYN (shared library, e_type == 2) cubins are rejected unconditionally.

  3. ELF class byte check: For 32-bit links, the ELF class byte at ehdr+8 must be 7 (legacy CUDA format) or 8 (sm > 72). Other values trigger an error.

  4. SM version extraction: The SM version number is extracted from e_flags in a class-dependent manner:

    • OSABI != 0x41 (legacy): sm = e_flags & 0xFF (low byte)
    • OSABI == 0x41 (Mercury): sm = (e_flags >> 8) & 0xFF (second byte)
  5. Arch string construction: The SM number and the ABI suffix flag (sub_43E6F0) are formatted into a 12-byte buffer:

    snprintf(buf, 12, "sm_%d%c", sm_version, has_abi ? 'a' : '\0')
    

    Or, if the cubin is virtual (byte_2A5F2C1 set):

    snprintf(buf, 12, "compute_%d%c", sm_version, has_abi ? 'a' : '\0')
    
  6. Primary arch match: sub_4878A0 compares the constructed string against qword_2A5F318 (the --arch value). If it returns 1, the cubin is accepted.

  7. Fallback: .nv.compat match: If the primary match fails and byte_2A5F221 (SASS mode flag) is set, the fallback path reads the .nv.compat section via sub_43E610 and calls sub_4709E0 to test finalization compatibility. If sub_4709E0 returns 0 (compatible), the cubin is accepted.

  8. Final failure: If both checks fail, the error "SM Arch ('%s') not found in '%s'" is emitted.

Toolkit Version Constraints

After architecture matching, sub_426570 enforces toolkit version requirements:

ConditionConstraintError
GeneralCurrent toolkit version (sub_468560()) must be >= cubin's toolkit versionToolkit too old for input cubin
SM 50Cubin toolkit version must be > 64 (0x40)SM 50 requires CUDA 6.5+ cubins
SM 90Cubin toolkit version must be > 119 (0x77)SM 90 requires CUDA 12.0+ cubins

The SM 50 and SM 90 checks enforce minimum CUDA toolkit versions for specific architectures. SM 50 (Maxwell) requires cubins compiled with at least CUDA 6.5 (toolkit version 65), and SM 90 (Hopper) requires CUDA 12.0 (toolkit version 120). These prevent linking objects compiled with toolkit versions that predate the architecture's introduction.

tcgen05 Cross-Version Incompatibility

A separate cross-version check at address 0x1D39330 enforces that objects using tcgen05 tensor core instructions (Blackwell-era) compiled with CUDA 12.x cannot be linked with objects from CUDA 13.0+. The tcgen05 instruction encoding changed between the 12.x preview support and the 13.0 production release, making the two incompatible at the binary level. This check is part of the broader version validation pipeline and fires before the architecture compatibility check runs.

Mercury / EWP Mode Detection

When the cubin's e_type is 0xFF00 (Mercury executable with payload, "EWP"), the global flag byte_2A5F229 is set. Once set, all subsequent cubins must be from the same toolkit version -- the error "linking with -ewp objects requires using current toolkit" fires if the toolkit version of any subsequent input does not match sub_468560().

When e_type is not 0xFF00 and byte_2A5F229 has not been set yet, the return value depends on the error list state -- the function returns true (continue) only if no errors have been recorded.

Compatibility Matrix

The following matrix summarizes the compatibility rules for SM 100-series architectures:

Sourcesm_100sm_103sm_110sm_120sm_121
sm_100yesyes (decade 10)specialnono
sm_103noyesspecialnono
sm_110specialspecialyesnono
sm_120nononoyesyes (decade 12)
sm_121nonononoyes

"special" = requires finalization class 4 or explicit 101/110 cross-mapping.

For the a variants (sm_100a, sm_120a, etc.), all entries become exact-match only: sm_100a is compatible only with sm_100 or sm_100a.

Error Messages

ErrorTriggerSource
"SM Arch ('%s') not found in '%s'"Cubin arch does not match --arch and .nv.compat fallback failssub_426570
"linking with -ewp objects requires using current toolkit"EWP cubin from different toolkit versionsub_426570
"specified arch exceeds buffer length"compute_%d%c format exceeds 12-byte buffersub_426570
Return code 24NULL compatibility recordsub_4709E0
Return code 25Record version > 0x101sub_4709E0
Return code 26Architecture family mismatchsub_4709E0
Return code 27Finalization class 4 incompatibilitysub_4709E0
Return code 28Finalization class 3 incompatibilitysub_4709E0
Return code 29Finalization class 2 incompatibilitysub_4709E0
Return code 30Unknown finalization classsub_4709E0

Globals

AddressNameTypeRole
qword_2A5F318g_target_archchar *The --arch value (e.g. "sm_100")
dword_2A5F314g_sm_versionintNumeric SM version (e.g. 100)
dword_2A5F30Cg_machineintWord size: 32 or 64
byte_2A5F221g_sass_modeboolSet when any SASS cubin enters the link
byte_2A5F229g_ewp_modeboolSet when an EWP (Mercury executable) cubin enters
byte_2A5F2C1g_virtual_modeboolSet when linking virtual architectures
qword_2A5F8D8g_profile_hashmapvoid *Hash map of architecture profiles (name -> profile struct)
dword_1D40660g_finalize_dispatchint[5]Finalization class -> compatibility level mapping

Confidence Assessment

ClaimConfidenceVerification
sub_4878A0 core compatibility check, 328 BCONFIRMEDDecompiled file exists; logic matches: parses two arch strings, compares profiles, same-decade rule at line 68 (v19 / 0xA == v20 / 0xA)
SM 101/110 cross-mapping exceptionCONFIRMEDDecompiled sub_4878A0 line 55: if ( v19 == 101 || v20 == 101 || v19 == 110 || v20 == 110 ) exactly matches wiki description
sub_4709E0 finalization check, remapping 104->120, 130->107, 101->110CONFIRMEDDecompiled lines 22-31: case 104: v4 = 120; case 130: v4 = 107; case 101: v4 = 110; exactly matches wiki
Return codes 0/24/25/26/27-30CONFIRMEDDecompiled code returns 24 (null), 25 (version > 0x101), 26 (incompatible), 29 (class 2 error); matches wiki table
dword_1D40660 finalization dispatch tableCONFIRMEDDecompiled line 59: v9 = dword_1D40660[v8]; with v8 = a1[3] byte
CAN_FINALIZE_DEBUG env varCONFIRMEDDecompiled lines 17-19 in both sub_4709E0 and sub_470DA0: getenv("CAN_FINALIZE_DEBUG"); string at 0x1d40080
sub_470DA0 capability bitmask: d=1, g=8, n=2, y=64CONFIRMEDDecompiled lines 95-106: case 'd': v12 = 1; case 'g': v12 = 8; case 'n': v12 = 2; case 'y': v12 = 64; exactly matches
Same-decade rule (arch / 10) for family groupingCONFIRMEDDecompiled sub_4878A0 line 68: v19 / 0xA == v20 / 0xA
a variant requires exact matchCONFIRMEDDecompiled sub_4878A0 line 47: if (v6[7]) { LOBYTE(v13) = v20 == v19; }
sub_426570 validates arch and adds inputsHIGHFunction exists at stated address; 7,427 B size claimed
Version check a1[3] > 0x101CONFIRMEDDecompiled sub_4709E0 line 50: *((_WORD *)a1 + 3) > 0x101u
Compatibility matrix (sm_100/103/110/120/121)HIGHDerived from confirmed remapping and decade rules; special cases for 110 at decompiled line 70-76
tcgen05 cross-version incompatibilityMEDIUMAddress 0x1D39330 claimed; not independently verified in decompiled code

For general architecture compatibility concepts, see the ptxas wiki targets.

Cross-References

Sibling Wikis