Architecture Profile

The ArchProfile struct is a 136-byte heap-allocated descriptor that encodes everything nvlink needs to know about a single GPU architecture target. Each recognized architecture (e.g. sm_100) produces three profile instances -- a real profile (sm_), a virtual profile (compute_), and an LTO profile (lto_) -- all stored in a global hash map keyed by name string. The struct is created by sub_484DB0 and consumed throughout the linking, finalization, and output pipelines.

This page documents the byte-level layout derived from the constructor (sub_484DB0), the database initializer (sub_484F50), and the two finalization compatibility checkers (sub_4709E0, sub_470DA0).

Constructor: sub_484DB0

Prototype (reconstructed):
    ArchProfile* ArchProfile::create(
        uint8_t  is_virtual,       // a1: 0=real (sm_), 1=virtual (compute_/lto_)
        uint8_t  is_lto,           // a2: 0=not LTO, 1=LTO variant
        char*    arch_name,        // a3: "sm_100", "compute_100", "lto_100"
        char*    display_name,     // a4: display name (same as arch_name for base archs)
        char*    isa_class_name,   // a5: "Turing", "Ampere", "Blackwell", "Ada", or
                                   //     "(profile_sm_NNN)->isaClass" for suffix variants
                                   //     NULL for LTO variants
        char*    cuda_arch_define, // a6: "-D__CUDA_ARCH__=1000"
        char*    canonical_name    // a7: same as arch_name for sm_/compute_;
                                   //     points to compute_ name for lto_
    )

Address: 0x484DB0
Size: ~400 bytes

The constructor allocates 136 bytes via sub_4307C0, zeros the entire allocation (using an SSE-aligned memset loop), then writes the seven arguments into their respective offsets. It also creates three linked-list-head objects at offsets 48, 56, and 64 via sub_465020, and registers the profile into two ordered lists (qword_2A5F8E0, qword_2A5F8E8).

Allocation and Zeroing

profile = alloc(allocator, 136);
if (!profile) oom_handler(allocator, 136);

// Zero entire struct (SSE-aligned zeroing pattern)
*(uint16_t*)(profile + 2) = 0;      // bytes 2-3
profile[16] = 0;                      // qword at offset 128
memset_aligned(profile + 10, 0, ...); // bulk zero from ~offset 10 to 136

The zeroing is somewhat redundant with the memset but ensures no stale data in any field before explicit assignment.

Field Assignment

profile->byte[0]   = is_virtual;        // a1
profile->byte[1]   = is_lto;            // a2
profile->qword[1]  = arch_name;         // a3 -> offset 8
profile->qword[2]  = display_name;      // a4 -> offset 16
profile->qword[3]  = isa_class_name;    // a5 -> offset 24
profile->qword[4]  = canonical_name;    // a7 -> offset 32
profile->qword[5]  = cuda_arch_define;  // a6 -> offset 40

// Create three linked list heads for compatibility tracking
profile->qword[6]  = list_create(str_hash, str_equal, 8);  // offset 48
profile->qword[7]  = list_create(str_hash, str_equal, 8);  // offset 56
profile->qword[8]  = list_create(str_hash, str_equal, 8);  // offset 64

// Post-assignment clearing
profile->word[1]    = 0;    // bytes 2-3 (re-zeroed)
profile->byte[128]  = 0;    // byte 128
profile->word[2]    = 0;    // bytes 4-5

Note the argument ordering anomaly: canonical_name (a7) goes to offset 32 while cuda_arch_define (a6) goes to offset 40. This means offset 32 holds the "identity" name (what this profile "is"), while offset 40 holds the compiler define string.

Ordered List Registration

After construction, the profile is registered into global ordered lists:

// Register into qword_2A5F8E0 (all real+virtual profiles, 128-capacity)
if (list_needs_resize(qword_2A5F8E0))
    list_resize(qword_2A5F8E0, 0x2C);  // grow by 44
list_insert(qword_2A5F8E0, arch_name);

// For non-virtual profiles, also register into qword_2A5F8E8
if (!is_virtual) {
    if (list_needs_resize(qword_2A5F8E8))
        list_resize(qword_2A5F8E8, 0x2C);
    list_insert(qword_2A5F8E8, arch_name);
}

Struct Layout

ArchProfile (136 bytes, 8-byte aligned, heap-allocated)
==========================================================================
Offset  Size  Type      Field                 Description
--------------------------------------------------------------------------
  0      1    uint8     is_virtual            0 = real (sm_), 1 = virtual
                                              (compute_ or lto_)
  1      1    uint8     is_lto                0 = not LTO, 1 = LTO variant
  2      1    uint8     feature_byte_a        Finalization compatibility
                                              bitmask. Bits [1:0] and
                                              [3:2] checked by sub_4709E0
                                              for sm_100/sm_102/sm_103
                                              cross-finalization
  3      1    uint8     finalization_class     0-4. Indexes into
                                              dword_1D40660[5] lookup
                                              table. Set to 1 for sm_89
                                              (Ada tessellation flag).
                                              Controls finalization
                                              compatibility rules
  4      1    uint8     suffix_a              1 if 'a' variant (sm_90a,
                                              sm_100a, ...). Set on all
                                              three profiles (sm/compute/
                                              lto) for 'a' architectures
  5      1    uint8     suffix_f              1 if 'f' variant (sm_100f,
                                              sm_103f, ...). Set on all
                                              three profiles for 'f'
                                              architectures
  6      2    uint16    version_limit         Checked in sub_4709E0:
                                              if value > 0x101, returns
                                              error 25 (version too high).
                                              Always 0 for CUDA 13.0
                                              profiles
  8      8    char*     arch_name             "sm_100", "compute_100a",
                                              "lto_100f", etc.
 16      8    char*     display_name          Display/UI name. Same as
                                              arch_name for base archs.
                                              For LTO: points to the
                                              compute_ name string
 24      8    char*     isa_class_name        ISA family: "Turing",
                                              "Ampere", "Hopper", "Ada",
                                              "Blackwell". For suffix
                                              variants: literal string
                                              "(profile_sm_NNN)->isaClass".
                                              NULL for LTO profiles
 32      8    char*     canonical_name        Identity name. For sm_ and
                                              compute_: same as arch_name.
                                              For lto_: same as arch_name
                                              (the "lto_NNN" string)
 40      8    char*     cuda_arch_define      Preprocessor define passed
                                              to cicc/ptxas:
                                              "-D__CUDA_ARCH__=750",
                                              "-D__CUDA_ARCH__=100a0",
                                              etc.
 48      8    List*     compat_list_0         Linked list: cross-variant
                                              compatibility. Links real
                                              <-> virtual profiles and
                                              suffix variants to their
                                              base arch
 56      8    List*     compat_list_1         Linked list: same-generation
                                              family. Links all archs in
                                              the same generation (e.g.
                                              sm_80 links to sm_86/87/88)
 64      8    List*     compat_list_2         Linked list: additional
                                              cross-references. For
                                              compute_ profiles: links
                                              to corresponding real arch.
                                              For real profiles: links
                                              to compute_ arch
 72      8    ArchProfile*  virtual_ptr       For real (sm_) profiles:
                                              pointer to the compute_
                                              profile. For compute_
                                              profiles: self-pointer.
                                              For lto_ profiles: pointer
                                              to the compute_ profile
 80     16    xmm128    capability_vec_0      Generation base capabilities.
                                              Loaded from xmmword_1D40F10
                                              for all current archs
 96     16    xmm128    capability_vec_1      Extended feature set. Varies
                                              by architecture. Determines
                                              cross-arch finalization
                                              compatibility
112     16    xmm128    capability_vec_2      Architecture-specific
                                              features. Two distinct
                                              values: xmmword_1D40F30
                                              (pre-Blackwell) or
                                              xmmword_1D40F70 (Blackwell+)
128      1    uint8     reserved              Always 0 in CUDA 13.0
129      7    --        padding               Zero

Capability Vectors (Offsets 80-127)

The three 128-bit vectors at offsets +80, +96, and +112 encode hardware capabilities as bitmasks. They are loaded from read-only data constants during sub_484F50 initialization. Suffix variants ('a', 'f') inherit vectors by SSE copy (_mm_loadu_si128) from their base arch rather than loading from rodata independently.

Vector Assignment by Architecture

Architecture	Vec 0 (+80)	Vec 1 (+96)	Vec 2 (+112)	Notes
sm_75	`1D40F10`	`1D40F20`	`1D40F30`	Turing: unique vec 1
sm_80	`1D40F10`	`1D40F40`	`1D40F30`	Ampere base
sm_86	`1D40F10`	`1D40F50`	`1D40F30`	Ampere: different vec 1
sm_87, sm_88	`1D40F10`	`1D40F50`	`1D40F30`	Inherit sm_86 pattern
sm_89	`1D40F10`	`1D40F60`	`1D40F30`	Ada: distinct vec 1
sm_90	`1D40F10`	`1D40F40`	`1D40F30`	Hopper: shares sm_80 vec 1
sm_100	`1D40F10`	`1D40F40`	`1D40F70`	Blackwell: new vec 2
sm_103	`1D40F10`	`1D40F40`	`1D40F70`	Shares sm_100 pattern
sm_110	`1D40F10`	`1D40F60`	`1D40F70`	Ada vec 1 + Blackwell vec 2
sm_120	`1D40F10`	`1D40F60`	`1D40F70`	Same as sm_110
sm_121	`1D40F10`	`1D40F60`	`1D40F70`	Same as sm_120

Key observations:

Vec 0 is identical for all architectures -- a universal base capability set.
Vec 1 has five distinct values, grouping architectures by instruction set similarity: Turing alone (F20), Ampere-base/Hopper/sm_100/sm_103 (F40), Ampere-extended (F50), and Ada/sm_110/sm_120/sm_121 (F60).
Vec 2 has two values: 1D40F30 for pre-Blackwell (sm_75 through sm_90a) and 1D40F70 for Blackwell-generation (sm_100+).

Capability Vector Usage

The finalization function sub_470DA0 (can_finalize_with_capability_mask) reads the capability data through a pointer at profile offset +16 to check bitmask compatibility. It maps architecture family codes to bitmask values:

switch (target_arch_code) {
    case 'd' (100): mask = 1;    // sm_100 (datacenter Blackwell)
    case 'g' (103): mask = 8;    // sm_103 (Blackwell Ultra)
    case 'n' (110): mask = 2;    // sm_110 (Jetson Thor)
    case 'y' (121): mask = 64;   // sm_121 (DGX Spark)
    default:        return 0;    // not capable
}
if ((mask & *capability_ptr) != mask)
    return 0;  // target capabilities not satisfied

Finalization Class Field (Byte 3)

The finalization_class byte at offset 3 indexes into the dword_1D40660[5] lookup table. The sub_4709E0 (can_finalize_architecture_check) function interprets it as follows:

dword_1D40660 value	Meaning	Behavior
0	Default	suffix_a must be 0; sm_110 cross-arch not allowed
1	Base-only	suffix_a blocks finalization (if class=1 and suffix_a set, error 26)
2	Family-compatible	Same-decade rule: `target/10 == source/10` required
3	Cross-family	Allows cross-family within certain conditions (sm_110/sm_121 special cases)
4	Full-compat	Broadest compatibility; handles sm_110 cross-arch

The only architecture that explicitly sets byte 3 during initialization is sm_89 (Ada), where profile->byte[3] = 1 is assigned after the compatibility lists are built. All other architectures leave byte 3 at its zero-initialized value.

Feature Byte A (Byte 2)

Byte 2 at offset +2 is checked in sub_4709E0 with a specific bit-field test:

if (((profile->byte[2] >> 2) & 3) == 1 && (profile->byte[2] & 3) == 1)
    return 0;  // compatible
return 28;     // error

This extracts two 2-bit fields from byte 2:

Bits [1:0]: low field, must equal 1
Bits [3:2]: high field, must equal 1

The combined value 0x05 (binary 0b00000101) passes the check. This test appears only in the sm_100/sm_102/sm_103 cross-finalization path (source=100, target in range 102-103). In CUDA 13.0, byte 2 is zero-initialized for all profiles and no initialization code sets it, suggesting this field is either set dynamically at runtime or reserved for future use.

Version Limit Field (Bytes 6-7)

The 16-bit word at offset 6 (*((_WORD *)profile + 3)) is checked early in sub_4709E0:

if (profile->version_limit > 0x101)
    return 25;  // error: version too high

All profiles are zero-initialized, so this check always passes in CUDA 13.0. The 0x101 threshold (257 decimal) suggests this was designed as a forward-compatibility guard -- if a profile's version exceeds the linker's known maximum, finalization is rejected.

Linked List Heads (Offsets 48-64)

Each of the three linked list pointers at offsets 48, 56, and 64 is a full hash-set object created by sub_465020 with string hashing and comparison functions. They are not simple singly-linked lists but hash-based sets that support O(1) membership testing. The sub_465720 (list_append) function used to populate them is the same hash-set insertion function used throughout nvlink.

compat_list_0 (Offset 48): Cross-Variant Links

For base architectures, this list connects the real profile to its virtual counterpart and self:

sm_100.compat_list_0 -> { compute_100, sm_100 }
compute_100.compat_list_0 -> { sm_100 }

For suffix variants, the base arch is also linked:

sm_100a.compat_list_0 -> { compute_100a, sm_100a, sm_100 }
sm_100f.compat_list_0 -> { compute_100f, sm_100f, sm_100 }

compat_list_1 (Offset 56): Same-Generation Family

Links all architectures within the same generation. For Ampere:

sm_80.compat_list_1 -> { sm_80, sm_86, sm_87, sm_88, sm_89 }

The sm_89 (Ada) profile is appended to sm_80's family list despite being classified as "Ada" rather than "Ampere" -- this reflects hardware backward compatibility.

For Blackwell, both intra-family and cross-family links exist:

sm_120.compat_list_1 -> { sm_120, sm_121, sm_121a }
sm_121.compat_list_1 -> { sm_121, sm_120 }

compat_list_2 (Offset 64): Compute-to-Real Mapping

For compute_ profiles, this list links to the corresponding real (sm_) profile. For real profiles, it links to the compute_ profile. This provides bidirectional real<->virtual navigation.

Virtual Pointer (Offset 72)

The virtual_ptr field at offset 72 establishes the primary profile cross-reference:

Profile Type	virtual_ptr Value
Real (sm_)	Pointer to corresponding compute_ profile
Virtual (compute_)	Self-pointer (points to itself)
LTO (lto_)	Pointer to corresponding compute_ profile

The self-pointer for compute_ profiles allows code that follows profile->virtual_ptr to always reach a compute_ profile regardless of the input profile type. This simplifies the finalization pipeline, which needs the compute_ profile's cuda_arch_define string.

Destructor: sub_484D00

void ArchProfile::destroy(ArchProfile* profile) {
    char* arch_name = profile->arch_name;  // offset 8

    // Remove from global hash map
    LinkerHash::remove(qword_2A5F8D8, arch_name);

    // Destroy three linked list heads
    list_destroy(profile->compat_list_0, arch_name);   // offset 48
    list_destroy(profile->compat_list_1, arch_name);   // offset 56
    list_destroy(profile->compat_list_2, arch_name);   // offset 64

    // Free the profile allocation itself
    free(profile, arch_name);
}

The destructor is called indirectly through sub_484D40 (the database teardown function registered via atexit). Teardown walks the hash map calling destroy on each entry, then destroys the hash map itself and both ordered lists.

Database Teardown: sub_484D40

void ArchProfileDB::teardown() {
    if (!byte_2A5F8D0) return;  // not initialized
    byte_2A5F8D0 = 0;

    // Walk hash map, call destroy on each value, then destroy map
    LinkerHash::for_each(qword_2A5F8D8, ArchProfile::destroy, 0);
    LinkerHash::destroy(qword_2A5F8D8, ArchProfile::destroy);
    qword_2A5F8D8 = 0;

    // Destroy ordered lists
    OrderedList::destroy(qword_2A5F8E0, ArchProfile::destroy);
    OrderedList::destroy(qword_2A5F8E8, ArchProfile::destroy);
}

Profile-to-ParseResult: sub_486DC0

Given a profile pointer (obtained from the hash map), sub_486DC0 constructs a 12-byte ArchParseResult:

ArchParseResult* profile_to_parse_result(ArchProfile* profile) {
    if (!profile) return NULL;

    ArchParseResult* result = alloc(allocator, 12);
    memset(result, 0, 12);

    result->is_compute_or_lto = profile->is_virtual;       // byte[4] <- byte[0]

    char* name = profile->arch_name;                        // offset 8
    uint32_t sm_num = arch_extract_sm_number(name);

    bool is_sass_capable;
    if (arch_is_virtual(name)) {
        is_sass_capable = false;
    } else if (sm_num >= dword_2A5F8C8) {
        is_sass_capable = (memcmp(name, "sass_", 5) != 0);
    } else {
        is_sass_capable = false;
    }
    result->is_sass_capable = is_sass_capable;              // byte[5]
    result->sm_number = arch_extract_sm_number(name);       // dword[0]
    result->has_suffix_a = arch_has_suffix_a(name);         // byte[7]
    result->has_suffix_f = arch_has_suffix_f(name);         // byte[8]

    return result;
}

The ArchParseResult layout:

ArchParseResult (12 bytes)
==========================================================================
Offset  Size  Type      Field               Description
--------------------------------------------------------------------------
  0      4    uint32    sm_number           Numeric SM (75, 80, 100, ...)
  4      1    uint8     is_compute_or_lto   1 if virtual profile
  5      1    uint8     is_sass_capable     1 if real + sm >= 100 + not "sass_"
  6      1    uint8     (unused)            Always 0 from this path
  7      1    uint8     has_suffix_a        1 if name ends with 'a'
  8      1    uint8     has_suffix_f        1 if name ends with 'f'
  9-11   3    --        padding             Zero

Key Functions

Address	Size	Name	Role
`sub_484DB0`	400 B	`ArchProfile::create`	Constructor: allocates 136 bytes, fills fields, creates list heads
`sub_484D00`	56 B	`ArchProfile::destroy`	Destructor: removes from hash map, destroys lists, frees
`sub_484D40`	112 B	`ArchProfileDB::teardown`	atexit handler: destroys all profiles and global state
`sub_484F50`	53,974 B	`ArchProfileDB::init`	Lazy singleton initializer: registers all 22+ architectures
`sub_486DC0`	528 B	`profile_to_parse_result`	Extracts a 12-byte parse result from a profile pointer
`sub_4709E0`	2,609 B	`can_finalize_arch_check`	Checks arch compatibility for finalization (reads bytes 2-4, word 6)
`sub_470DA0`	2,074 B	`can_finalize_with_caps`	Checks capability bitmask compatibility (reads offset +16)

Cross-References

Architecture Profiles (overview) -- database initialization sequence, complete architecture table, name parsing
Compatibility -- finalization compatibility rules
Finalize -- how profiles flow through the finalization pipeline
CLI Options -- --arch option triggers profile lookup

Confidence Assessment

Verified against decompiled sub_484DB0_0x484db0.c (constructor), sub_484F50_0x484f50.c (database init, lines 240-1280), sub_4709E0_0x4709e0.c (finalization check), sub_470DA0_0x470da0.c (capability mask check), sub_484D00_0x484d00.c (destructor), sub_484D40_0x484d40.c (teardown), and nvlink_strings.json.

Struct Size and Allocation

Claim	Confidence	Evidence
ArchProfile struct size = 136 bytes	HIGH	`sub_484DB0` line 24: `v14 = sub_4307C0(v11, 136);` and OOM fallback `sub_45CAC0(v11, 136, ...)`
Heap/arena allocation via `sub_4307C0`	HIGH	`sub_484DB0` line 23-24: `v11 = ((_QWORD )sub_44F410(a1, a2) + 3); v14 = sub_4307C0(v11, 136);`
Constructor at `0x484DB0`	HIGH	`sub_484DB0_0x484db0.c` header: `// Address: 0x484db0`
Constructor takes 7 arguments	HIGH	`sub_484DB0_0x484db0.c` line 4-11: `(__int64 a1, pthread_mutexattr_t a2, pthread_mutexattr_t a3, __int64 a4, __int64 a5, __int64 a6, __int64 a7)`
SSE-aligned memset-to-zero for body	HIGH	`sub_484DB0` line 33-36: `memset(((unsigned __int64)v14 + 10) & 0xFFFFFFFFFFFFFFF8LL, 0, 8LL * (...))`

Byte Fields (offsets 0-7)

Claim	Confidence	Evidence
Byte 0: `is_virtual` flag stored from a1	HIGH	`sub_484DB0` line 38: `(_BYTE )v14 = v8;` where `v8 = a1`. Initialization in `sub_484F50` always passes 0 (for `sm_`) or 1 (for `compute_`/`lto_`)
Byte 1: `is_lto` flag stored from a2	HIGH	`sub_484DB0` line 39: `((_BYTE )v14 + 1) = (_BYTE)a2;`. `sub_484F50` passes 0 for `sm_`/`compute_` and 1 for `lto_`
Byte 2: `feature_byte_a` with two 2-bit fields, zero-init in CUDA 13.0	HIGH	`sub_4709E0` line 143: `if (((unsigned __int8)a1[2] >> 2) & 3) == 1 && (a1[2] & 3) == 1) return 0;`. No write to byte 2 observed in `sub_484F50`
Byte 3: `finalization_class` index into `dword_1D40660[5]`	HIGH	`sub_4709E0` lines 56-60: `v8 = (unsigned __int8)a1[3]; result = 26; if ((unsigned __int8)v8 > 4u) return result; v9 = dword_1D40660[v8];`
sm_89 is the ONLY arch setting byte 3 = 1	HIGH	`sub_484F50` line 511: `v47->m128i_i8[3] = 1;` (only occurrence in entire init; `v47` is sm_89 real profile)
Byte 4: `suffix_a` flag	HIGH	`sub_484F50` line 682: `v79->m128i_i8[4] = 1;` (sm_100a); line 683: `(_BYTE )(v82 + 4) = 1;` (compute_100a). `sub_4709E0` line 65: `v10 = a1[4]; if (v10) { if (v9 == 1) return result; }`
Byte 5: `suffix_f` flag	HIGH	`sub_484F50` line 732: `v88->m128i_i8[5] = 1;` (sm_100f); line 733: `(_BYTE )(v91 + 5) = 1;` (compute_100f); line 734: `v94[5] = 1;` (lto_100f)
Version limit at word offset 3 (bytes 6-7)	HIGH	`sub_4709E0` line 51: `if ( ((_WORD )a1 + 3) > 0x101u ) return result;` with `result = 25`
Error code 25 for version overflow	HIGH	`sub_4709E0` line 50-52
Word at offset 2 re-zeroed after `v14[4] = a7`	HIGH	`sub_484DB0` line 47: `((_WORD )v14 + 1) = 0;`
Word at offset 4 re-zeroed at end	HIGH	`sub_484DB0` line 59: `((_WORD )v14 + 2) = 0;`

Pointer Fields (offsets 8-40)

Claim	Confidence	Evidence
`arch_name` at offset 8 (qword, from a3)	HIGH	`sub_484DB0` line 40: `v14[1] = a3;`
`display_name` at offset 16 (qword, from a4)	HIGH	`sub_484DB0` line 41: `v14[2] = a4;`
`isa_class_name` at offset 24 (qword, from a5)	HIGH	`sub_484DB0` line 37: `v14[3] = v16;` where `v16 = a5` (line 25)
`canonical_name` at offset 32 (qword, from a7)	HIGH	`sub_484DB0` line 48: `v14[4] = a7;`
`cuda_arch_define` at offset 40 (qword, from a6)	HIGH	`sub_484DB0` line 42: `v14[5] = a6;`
Argument ordering anomaly (a6 to offset 40, a7 to offset 32)	HIGH	Constructor literally writes `v14[4] = a7; v14[5] = a6;` in that order
isa_class strings "Turing", "Ampere", "Hopper", "Blackwell" present in binary	HIGH	`nvlink_strings.json` lines 17340 (Turing), 17620 (Ampere), 18410 (Hopper), 18850 (Blackwell). Passed directly as a5 in `sub_484F50` lines 251, 293, 517, 609
"Ada" ISA class string	MEDIUM	`sub_484F50` lines 468, 476 pass literal `(__int64)"Ada"` to `sub_484DB0`, but the string does not appear as a standalone entry in `nvlink_strings.json` (likely merged with another string in the extraction)
Suffix-variant isa_class = `"(profile_sm_NNN)->isaClass"`	HIGH	`sub_484F50` lines 650, 658, 699, 707, 792, 800, etc. pass literal `"(profile_sm_NNN)->isaClass"` strings
LTO variants pass NULL isa_class	HIGH	`sub_484F50` lines 273, 315, 489, 539, 630, 672, 722, etc. all pass `0` as the a5 argument for `lto_*` calls
LTO variants pass `compute_*` as display_name (a4)	HIGH	`sub_484F50` line 272: `(__int64)"compute_75"` for `lto_75`; line 314 for `lto_80`; line 488 for `lto_89`; etc.

List Heads (offsets 48-64)

Claim	Confidence	Evidence
`compat_list_0` at offset 48 (v14[6])	HIGH	`sub_484DB0` line 43: `v14[6] = sub_465020(sub_44E1C0, sub_44E1E0, 8u);`. Destructor `sub_484D00` line 10: `sub_4650A0(a1[6], v1);`
`compat_list_1` at offset 56 (v14[7])	HIGH	`sub_484DB0` line 44: `v14[7] = sub_465020(...);`. Destructor line 11: `sub_4650A0(a1[7], v1);`
`compat_list_2` at offset 64 (v14[8])	HIGH	`sub_484DB0` line 45: `v14[8] = sub_465020(...);`. Destructor line 12: `sub_4650A0(a1[8], v1);`
Three list heads use string hash/compare (sub_44E1C0/sub_44E1E0)	HIGH	Exact function pointers `sub_44E1C0`, `sub_44E1E0` in all three `sub_465020` calls
Initial list capacity = 8	HIGH	Third arg `8u` in all three calls
compat_list_2 (offset 64) used for compute-to-real linking	HIGH	`sub_484F50` line 279: `sub_465720((_QWORD )(v9 + 64), (unsigned __int64)v6);` where `v9` = compute_75 and `v6` = sm_75

Virtual Pointer (offset 72)

Claim	Confidence	Evidence
`virtual_ptr` at offset 72 (index [9] in `_QWORD*`)	HIGH	`sub_484F50` line 264: `v6[4].m128i_i64[1] = (__int64)v7;` (offset 72 on `sm_75`, points to `compute_75`)
Compute profile's virtual_ptr is self	HIGH	`sub_484F50` line 265: `v7[9] = v7;` where `v7` is `compute_75` (offset 72 = self)
LTO profile's virtual_ptr points to compute profile	HIGH	`sub_484F50` line 277: `((_QWORD )v10 + 9) = v9;` where `v10` is `lto_75` and `v9` is `compute_75`
Offset 72 is NOT written by the constructor	HIGH	`sub_484DB0` writes offsets 0, 1, 8, 16, 24, 32, 40, 48, 56, 64, 128 only -- zeroed by memset prior. Field populated by `sub_484F50` after construction

Capability Vectors (offsets 80-127)

Claim	Confidence	Evidence
Three XMM128 vectors at offsets 80, 96, 112	HIGH	`sub_484F50` line 285: `v6[5] = _mm_load_si128((const __m128i *)&xmmword_1D40F10);` (offset 80); line 286: `v6[6] = si128;` (offset 96); line 287: `v6[7] = v13;` (offset 112)
sm_75 vec addresses: F10, F20, F30	HIGH	`sub_484F50` lines 283-287: F20 -> `v6[6]`, F30 -> `v6[7]`, F10 -> `v6[5]`
sm_80 vec addresses: F10, F40, F30	HIGH	`sub_484F50` lines 325-331: F30 -> `v14[7]`, F10 -> `v14[5]`, F40 -> `v14[6]` (via v212)
sm_86 vec addresses: F10, F50, F30	HIGH	`sub_484F50` lines 370-372: F50 -> `v22[6]`, F10 -> `v22[5]`, F30 -> `v22[7]`
sm_89 vec addresses: F10, F60, F30	HIGH	`sub_484F50` lines 499-505: F60 -> `v47[6]`, F30 -> `v47[7]`, F10 -> `v47[5]`
sm_90 vec addresses: F10, F40, F30	HIGH	`sub_484F50` lines 549-554: F30 -> `v56[7]`, F10 -> `v56[5]`, F40 (via `v212`) -> `v56[6]`
sm_100 vec addresses: F10, F40, F70	HIGH	`sub_484F50` lines 640-644: F10 -> `v73[5]`, `v208` (which holds F40) -> `v73[6]`, F70 -> `v73[7]`
sm_103 vec addresses: F10, F40, F70	HIGH	`sub_484F50` lines 922-926: F10 -> `v128`, F40 -> `v123[6]`, F70 -> `v205` (then into vec 2)
sm_120/sm_121 vec addresses: F10, F60, F70	HIGH	`sub_484F50` line 1064: F60 -> `v153` (loaded into sm_120 vec 1)
Suffix variants inherit vecs via `_mm_loadu_si128` copy	HIGH	`sub_484F50` line 595-599: `v71 = _mm_loadu_si128(v56 + 6); ... v64[5] = _mm_loadu_si128(v56 + 5); v64[6] = v71; v64[7] = v72;` (sm_90a inherits from sm_90)
`sub_470DA0` reads capability via pointer at +16 in its argument buffer	MEDIUM	`sub_470DA0` line 91: `v11 = (_DWORD *)(a1 + 16);` -- but this `a1` is NOT an ArchProfile (callers pass local `__m128i` buffers like `v204`, `v205`, `v387`). The claim "profile offset +16" in the wiki body is misleading: this is a different struct
Capability mask values (d=1, g=8, n=2, y=64)	HIGH	`sub_470DA0` lines 96-108: `case 'd': v12 = 1; case 'g': v12 = 8; case 'n': v12 = 2; case 'y': v12 = 64;`

Registration & Globals

Claim	Confidence	Evidence
Byte 128 zero-init	HIGH	`sub_484DB0` line 58: `((_BYTE )v14 + 128) = 0;`
Global hash map at `qword_2A5F8D8`	HIGH	`sub_484F50` line 240: `qword_2A5F8D8 = (__int64)sub_4489C0(...)`; also `sub_448E70(qword_2A5F8D8, "sm_75", ...)` line 267
Ordered list `qword_2A5F8E0` (real+virtual, 128-cap)	HIGH	`sub_484DB0` line 46: `v17 = qword_2A5F8E0;` and line 51: `sub_44FE60(qword_2A5F8E0, a3);`. Created at `sub_484F50` line 244: `sub_44FB20(128, ...)`
Ordered list `qword_2A5F8E8` (non-virtual only)	HIGH	`sub_484DB0` lines 52-56: `if (!v8) { ... sub_44FE60(qword_2A5F8E8, a3); }`. Created at `sub_484F50` line 245
Resize check via `sub_4504A0` + `sub_44FF90`	HIGH	`sub_484DB0` line 49-50: `if (sub_4504A0(v17)) sub_44FF90(qword_2A5F8E0, (pthread_mutexattr_t *)0x2C);`
Resize step = 0x2C (44)	HIGH	Literal `0x2C` passed to `sub_44FF90` lines 50 and 55

Destructor & Teardown

Claim	Confidence	Evidence
Destructor at `0x484D00`, reads arch_name from offset 8	HIGH	`sub_484D00` line 8: `v1 = a1[1];` (offset 8 = arch_name)
Destructor removes from hash map qword_2A5F8D8	HIGH	`sub_484D00` line 9: `sub_449860(qword_2A5F8D8, v1);`
Destructor destroys all three list heads	HIGH	`sub_484D00` lines 10-12: `sub_4650A0(a1[6], v1); sub_4650A0(a1[7], v1); sub_4650A0(a1[8], v1);`
Destructor frees profile allocation	HIGH	`sub_484D00` line 13: `sub_431000((unsigned __int64)a1, v1);`
Teardown at `0x484D40` guarded by `byte_2A5F8D0`	HIGH	`sub_484D40` lines 8-10: `if (byte_2A5F8D0) { byte_2A5F8D0 = 0; ... }`
Teardown destroys hash map then both ordered lists	HIGH	`sub_484D40` lines 11-15: `sub_448DA0(qword_2A5F8D8, sub_484D00, 0); sub_448A40(qword_2A5F8D8, sub_484D00); sub_44FB90(qword_2A5F8E0, sub_484D00); sub_44FB90(qword_2A5F8E8, sub_484D00);`

Finalization Check Semantics (sub_4709E0)

Claim	Confidence	Evidence
`can_finalize_arch_check` at `sub_4709E0`	HIGH	Decompiled file exists with matching address header
Error codes 24, 25, 26, 27, 28, 29, 30, 0x1A	HIGH	All return values visible in `sub_4709E0` branches
Architecture aliasing: 104->120, 130->107, 101->110	HIGH	`sub_4709E0` lines 23-46: two identical switches on `a2` and `a3`
Same-arch self-check returns 0	HIGH	`sub_4709E0` line 54: `if (v4 == a3) return result;` with `result = 0`
Error path for cross-arch sm_110	HIGH	`sub_4709E0` lines 71-77: `v11 = v4 == 110 \|\| a3 == 110; if (v11) { if (v9 != 4) return 26; }`
Same-decade rule `a3/10 == v4/10`	HIGH	`sub_4709E0` line 95: `if (a3 / 10 != v4 / 10) return 29;` and line 106
sm_100/sm_103 feature_byte_a gate	HIGH	`sub_4709E0` lines 135-145: `if (v4 != 100) {...} if ((unsigned int)(a3 - 102) <= 1) { if ((((unsigned __int8)a1[2] >> 2) & 3) == 1 && (a1[2] & 3) == 1) return 0; return 28; }`
`CAN_FINALIZE_DEBUG` env var	HIGH	`sub_4709E0` line 18: `v6 = getenv("CAN_FINALIZE_DEBUG");`. Also `sub_470DA0` line 16

Discrepancies With Other Wiki Pages

Finding	Confidence	Evidence
`targets/arch-profiles.md` has offsets 32/40 swapped (says 32=cuda_arch_define, 40=canonical_name)	HIGH	Decompiled constructor definitively shows `v14[4] = a7 (canonical_name)` and `v14[5] = a6 (cuda_arch_define)`. THIS PAGE is correct; the targets page needs fixing
`targets/arch-profiles.md` says offset 64 = virtual_profile_ptr, offset 72 = lto_profile_ptr	HIGH	Constructor and `sub_484F50` show offset 64 = list head (compat_list_2) and offset 72 = virtual_ptr. THIS PAGE is correct; the targets page needs fixing

Unverified / Approximate Claims

Claim	Confidence	Evidence
Constructor size claim "400 B"	MEDIUM	Approximate; exact byte size not re-measured in this pass
Database init at `0x484F50` (53,974 B)	MEDIUM	File exists and is 1,330 decompiled lines; exact binary size not re-measured
`profile_to_parse_result` at `sub_486DC0`	MEDIUM	File `sub_486DC0_0x486dc0.c` exists but not decompiled in this pass
ArchParseResult size = 12 bytes	MEDIUM	Derived from `sub_486DC0` allocation; not re-verified here
`dword_1D40660[5]` lookup table values (0-4 meanings)	MEDIUM	Table exists (referenced by `sub_4709E0`); rodata content not re-extracted in this pass
Per-vector address table mapping to F10-F70 rodata constants	HIGH	All vector assignments directly observed in `sub_484F50` via `xmmword_1D40F10` through `xmmword_1D40F70` load-immediate calls

Keyboard shortcuts

nvlink Reverse Engineering Reference