SM103 / SM110 / SM120 / SM121
nvlink v13.0.88 registers four additional Blackwell-family architectures beyond the base SM100 datacenter target. All four share the "Blackwell" family name string at 0x1D40B6E, all use the 128-bit SASS instruction encoding defined by the SM100 ISA, and all reuse SM100 infrastructure for encoding, decoding, and descriptor initialization. The differences between them are confined to three areas: the architecture profile metadata (SM number, __CUDA_ARCH__ define, variant suffixes), the finalization compatibility remapping table, and the capability bitmask used to gate feature subsets during JIT re-finalization.
This page documents the profile registration, dispatch table sharing, finalization remapping, and capability bitmask system for sm_103, sm_110, sm_120, and sm_121 as observed in the binary.
Architecture Identity Matrix
| Architecture | Product Line | __CUDA_ARCH__ | Family String | ISA Class String | Same-Decade Group |
|---|---|---|---|---|---|
sm_100 | Datacenter Blackwell (B200/B100) | 1000 | "Blackwell" | "(profile_sm_100)->isaClass" | 10 |
sm_103 | Blackwell Ultra (GB300) | 1030 | "Blackwell" | "(profile_sm_103)->isaClass" | 10 |
sm_110 | Jetson Thor | 1100 | "Blackwell" | "(profile_sm_110)->isaClass" | 11 |
sm_120 | Consumer RTX 50xx / Enterprise Pro | 1200 | "Blackwell" | "(profile_sm_120)->isaClass" | 12 |
sm_121 | DGX Spark | 1210 | "Blackwell" | "(profile_sm_121)->isaClass" | 12 |
Every architecture stores its ISA class as a (profile_sm_NNN)->isaClass string rather than a hardcoded human-readable name like "Hopper" or "Turing". This indirection means the ISA class is resolved at runtime through the profile object's field pointer rather than being a compile-time constant. The family name "Blackwell" is shared across all five 1xx architectures (a single rodata string with xrefs from all five registration blocks in sub_484F50).
Sub-variant Registration
Each of the four architectures registers three sub-variants (base, a, f) through the profile database initializer sub_484F50. The a suffix enables the full accelerated feature set; the f suffix marks the forward-compatible subset. For each base architecture, the database creates nine profile objects: three real (sm_NNN, sm_NNNa, sm_NNNf), three virtual (compute_NNN, compute_NNNa, compute_NNNf), and three LTO (lto_NNN, lto_NNNa, lto_NNNf).
| Architecture | Base Profiles | a Profiles | f Profiles |
|---|---|---|---|
| SM103 | sm_103, compute_103, lto_103 | sm_103a, compute_103a, lto_103a | sm_103f, compute_103f, lto_103f |
| SM110 | sm_110, compute_110, lto_110 | sm_110a, compute_110a, lto_110a | sm_110f, compute_110f, lto_110f |
| SM120 | sm_120, compute_120, lto_120 | sm_120a, compute_120a, lto_120a | sm_120f, compute_120f, lto_120f |
| SM121 | sm_121, compute_121, lto_121 | sm_121a, compute_121a, lto_121a | sm_121f, compute_121f, lto_121f |
Registration Order
The architectures are registered in sub_484F50 in the following order within the Blackwell block, determined by the address order of their string references:
sm_100/sm_100a/sm_100f(xrefs at0x485A..)sm_110/sm_110a/sm_110f(xrefs at0x485E..)sm_103/sm_103a/sm_103f(xrefs at0x4861..)sm_120/sm_120a/sm_120f(xrefs at0x4865..)sm_121/sm_121a/sm_121f(xrefs at0x4869..)
The ordering is notable: sm_103 is registered after sm_110, not after sm_100 as the numbering might suggest. This reflects the chronological order in which these targets were added to the compiler toolchain -- sm_110 (Jetson Thor) was defined before sm_103 (GB300 Blackwell Ultra), as confirmed by the xref address ordering in sub_484F50.
Rodata String Addresses
| String | Address | Registering Function |
|---|---|---|
"sm_103" | 0x1D40CDE | sub_484F50 |
"sm_103a" | 0x1D40D09 | sub_484F50 |
"sm_103f" | 0x1D40D3A | sub_484F50 |
"(profile_sm_103)->isaClass" | 0x1D40CF9 | sub_484F50 |
"-D__CUDA_ARCH__=1030" | 0x1D40CC9 | sub_484F50 |
"sm_110" | 0x1D40C2B | sub_484F50 |
"sm_110a" | 0x1D40C56 | sub_484F50 |
"sm_110f" | 0x1D40C87 | sub_484F50 |
"(profile_sm_110)->isaClass" | 0x1D40C46 | sub_484F50 |
"-D__CUDA_ARCH__=1100" | 0x1D40C16 | sub_484F50 |
"sm_120" | 0x1D40D91 | sub_484F50 |
"sm_120a" | 0x1D40DBC | sub_484F50 |
"sm_120f" | 0x1D40DED | sub_484F50 |
"(profile_sm_120)->isaClass" | 0x1D40DAC | sub_484F50 |
"-D__CUDA_ARCH__=1200" | 0x1D40D7C | sub_484F50 |
"sm_121" | 0x1D40E44 | sub_484F50 |
"sm_121a" | 0x1D40E6F | sub_484F50 |
"sm_121f" | 0x1D40EA0 | sub_484F50 |
"(profile_sm_121)->isaClass" | 0x1D40E5F | sub_484F50 |
"-D__CUDA_ARCH__=1210" | 0x1D40E2F | sub_484F50 |
Dispatch Table Sharing
The SM dispatch table initializer sub_15C0CE0 registers seven callback function pointers per architecture into hash maps (qword_2A644B8 through qword_2A64488). The callbacks serve these roles:
| Slot | Hash Map | Role |
|---|---|---|
| 0 | qword_2A644B8 | cpf_optx (control-program-flow optimization) |
| 1 | (implicit) | nv.info attribute emitter |
| 2 | (implicit) | Resource usage table |
| 3 | (implicit) | Instruction encoding table |
| 4 | qword_2A644A0 | Compute capability byte array |
| 5 | qword_2A64490 | Perf-stats handler |
| 6 | qword_2A64488 | Codegen option handler |
Within each architecture family, all sub-variants share identical function pointers:
| Architecture | Sharing | Encoding Table Function |
|---|---|---|
sm_100, sm_100a, sm_100f | All 7 slots identical | sub_15C3840 |
sm_103, sm_103a, sm_103f | All 7 slots identical | sub_15C3630 |
sm_110, sm_110a, sm_110f | All 7 slots identical | sub_15C3950 |
sm_120, sm_120a, sm_120f | All 7 slots identical | sub_15C1D20 |
sm_121, sm_121a, sm_121f | All 7 slots identical | sub_15C3410 |
The encoding table accessor functions are small stubs (~100 bytes each) that return architecture-specific instruction encoding parameters. Each architecture gets its own encoding table accessor despite sharing the same underlying 128-bit instruction format. The differences between accessors encode per-architecture feature availability flags (e.g., which MMA variants are supported, which memory ordering modes exist).
Complete Encoding Table Map
| Function | Architecture | Slot |
|---|---|---|
sub_15C3210 | sm_75 | 3 |
sub_15C3310 | sm_80 | 3 |
sub_15C3B60 | sm_86 | 3 |
sub_15C3C60 | sm_87 | 3 |
sub_15C3A60 | sm_88 | 3 |
sub_15C3740 | sm_89 | 3 |
sub_15C3520 | sm_90 | 3 |
sub_15C3840 | sm_100 | 3 |
sub_15C3630 | sm_103 | 3 |
sub_15C3950 | sm_110 | 3 |
sub_15C1D20 | sm_120 | 3 |
sub_15C3410 | sm_121 | 3 |
Finalization Architecture Remapping
The finalization compatibility checker sub_4709E0 (2,609 bytes) and its companion sub_470DA0 (2,074 bytes) apply an internal architecture remapping table before performing compatibility comparisons. This remapping collapses certain architecture numbers into canonical equivalents:
| Input Arch | Remapped To | Interpretation |
|---|---|---|
| 104 | 120 | Internal designation 104 maps to sm_120 (consumer Blackwell) |
| 130 | 107 | Internal designation 130 maps to 107 (within sm_100 family, decade 10) |
| 101 | 110 | Internal designation 101 maps to sm_110 (Jetson Thor) |
The remapping uses a character-based encoding where each arch number maps to an ASCII character: 'd' (100), 'h' (104->120), 'g' (103), 'n' (110), 'y' (121). After remapping, the standard same-decade rule (arch / 10) determines family membership.
Remapping Semantics
The 104->120 remapping means that a cubin tagged with internal arch 104 is treated as sm_120-compatible for finalization purposes. Similarly, 130->107 places internal arch 130 into decade 10 (the sm_100 family), and 101->110 bridges internal arch 101 to the sm_110 family. These internal designations (101, 104, 130) never appear in user-facing --arch flags; they exist only within the finalization pipeline's compatibility-checking logic and represent early or experimental architecture IDs that were subsequently renumbered.
Special-Case Handling
sub_4709E0 contains explicit special-case logic for three architectures:
- sm_110: Direct match check outside the decade rule, because decade 11 contains only sm_110
- sm_121: Direct match check, because sm_121 shares decade 12 with sm_120 but has distinct finalization semantics
- sm_100: Family-head check for the entire 100-decade (sm_100, sm_103, sm_107)
The function returns a 5-way error code: 0 = compatible, 24 = null input, 25 = version too high (>0x101), 26 = incompatible architecture, 27-30 = type-specific incompatibility. The a1[3] byte selects among finalization class types 0-4 through the lookup table dword_1D40660[].
Capability Bitmasks
sub_470DA0 (can_finalize_with_capability_mask) extends the architecture check with a per-architecture capability bitmask. This function reads a mask pointer from a1+16, computes a target bitmask value based on the architecture number, and returns whether the required capabilities are satisfied.
Bitmask Assignment
| Architecture | Char Code | Bitmask Value | Binary |
|---|---|---|---|
| sm_100 | 'd' (100) | 1 | 0b00000001 |
| sm_110 | 'n' (110) | 2 | 0b00000010 |
| sm_103 | 'g' (103) | 8 | 0b00001000 |
| sm_121 | 'y' (121) | 64 | 0b01000000 |
The check evaluates (v12 & *v11) == v12 where v12 is the required bitmask for the target architecture and *v11 is the capability mask stored in the compilation unit. A compilation unit built with capability 1 (sm_100 only) cannot be re-finalized for sm_103 (requires bit 3) or sm_121 (requires bit 6). This mechanism controls which Blackwell sub-architectures a given compiled artifact is forward-compatible with.
Note that sm_120 does not appear in the bitmask table. Its finalization compatibility is handled entirely through the architecture remapping (104->120) and the same-decade rule (decade 12 includes both sm_120 and sm_121).
Capability Data in Profile Structs
Each profile struct stores three 128-bit capability vectors at offsets +80, +96, and +112 (loaded from xmmword_1D40F10--xmmword_1D40F70 via SSE instructions during sub_484F50 initialization). These vectors encode generation-specific feature bitmasks used by the finalization pipeline. The can_finalize_with_capability_mask function dereferences through the profile's capability pointer at a1+16 to reach these vectors.
Same-Decade Compatibility Groups
The same-decade rule (arch_number / 10, integer division) produces three distinct Blackwell compatibility groups:
| Decade | Architectures | Compatibility |
|---|---|---|
| 10 | sm_100, sm_103 | sm_100 code runs on sm_103; sm_103 code does not run on sm_100 |
| 11 | sm_110 | Sole member; no cross-compatibility within decade |
| 12 | sm_120, sm_121 | sm_120 code runs on sm_121; sm_121 code does not run on sm_120 |
Despite sharing the "Blackwell" family name string and the same ISA encoding infrastructure, code compiled for sm_100 cannot run on sm_120 -- the decade boundary is a hard compatibility wall. The family name is informational only; actual compatibility is governed by the decade rule and the finalization remapping table.
SM101/SM110 Cross-Mapping Bridge
The compatibility checker sub_4878A0 contains a special bidirectional bridge between SM101 and SM110. When either the source or target architecture is 101 or 110, the normal same-decade comparison is bypassed. SM101 is an internal designation that maps to the sm_110 family (confirmed by the 101->110 remapping in sub_4709E0). This bridge allows artifacts tagged with the internal arch 101 to finalize for sm_110 and vice versa.
Instruction Encoding Sharing
All five Blackwell-family architectures (sm_100 through sm_121) share the same 128-bit SASS instruction encoding infrastructure documented on the SM100 Blackwell page. The 4,236 template-instantiated encoder/decoder/descriptor functions at 0x620000--0xF15A50 are common to all 1xx targets. The per-architecture encoding table accessors (slot 3 in the dispatch table) return architecture-specific parameters that modify which instruction families are available, but the encoding format itself is identical.
Shared Components
| Component | Address Range | Size | Shared By |
|---|---|---|---|
| SASS encoders (table 1) | 0x620000--0x84DD70 | 2.2 MB | All sm_1xx |
| InstrDesc initializers | 0x84DD70--0xA48290 | 1.7 MB | All sm_1xx |
| SASS encoders (table 2) | 0xDA0000--0xE436D0 | 660 KB | All sm_1xx |
| SASS decoders | 0xE43DC0--0xF15A50 | 840 KB | All sm_1xx |
| Encoder dispatch | sub_E43C20 | 92 lines | All sm_1xx |
| Decoder dispatch | sub_EFE6C0 | 93 lines | All sm_1xx |
| Opcode table constructor | sub_1782540 | 111 KB | All sm_1xx |
| Master instruction encoder | sub_17F2670 | 157 KB | All sm_1xx |
Per-Architecture Differences
The encoding table accessor functions (slot 3) return different parameter sets that control feature gating at the instruction level. While the exact parameter layouts have not been fully decoded, the pattern is consistent: each accessor populates a small structure (8-32 bytes, exact size not determined) that tells the encoder/decoder which instruction families and sub-opcodes are valid for the target architecture.
This means sm_103 (GB300) may support additional MMA instruction variants compared to sm_100, and sm_120 (consumer) may lack certain datacenter-specific instructions present in sm_100. The instruction format is the same; only the set of valid opcodes within that format varies.
Compiler Backend Sharing
The SM-specific compiler backend functions (instruction selector, peephole optimizer, legalization passes) are selected through the dispatch table rather than duplicated per architecture. The backend at 0x1782540--0x17B9300 is shared across all 1xx targets, with per-architecture behavior controlled through the dispatch table callbacks and the encoding table parameters.
Key shared backend functions:
| Address | Size | Function | Shared By |
|---|---|---|---|
sub_1782540 | 111,076 B | Opcode table constructor | All sm_1xx |
sub_17884A0 | 44,713 B | Instruction property initializer | All sm_1xx |
sub_178AA00 | 35,422 B | Scheduling table initializer | All sm_1xx |
sub_179BD10 | 16,544 B | Peephole optimizer | All sm_1xx |
sub_17A2130 | 33,823 B | Instruction legalization | All sm_1xx |
sub_17AB9D0 | 36,177 B | Instruction selection | All sm_1xx |
sub_17F2670 | 156,611 B | Master instruction encoder | All sm_1xx |
SM-Specific Codegen Options
The per-SM codegen option handler sub_15C2E90 processes SM-specific compilation flags via string comparison dispatch. These options are common to all Blackwell targets:
| Option | Values | Description |
|---|---|---|
lds128convert | always / nonconst / never | 128-bit shared memory load conversion policy |
stress-maxrregcount | Integer | Maximum register count override for stress testing |
stress-noglobalregalloc | Boolean | Disable global register allocation |
legacy-cvtf64 | Boolean | Enable legacy FP64 conversion behavior |
perf-per-watt-opt-level | 0 / 1 / 2 | Performance-per-watt optimization level |
stress-no-crp | Boolean | Disable constant register propagation |
Key Functions
| Address | Name | Size | Role |
|---|---|---|---|
sub_484F50 | ArchProfileDB::init | 53,974 B | Registers all GPU architectures including sm_103/110/120/121 |
sub_15C0CE0 | init_sm_dispatch_tables | 14,517 B | Registers 7 dispatch callbacks per architecture |
sub_4709E0 | can_finalize_arch_check | 2,609 B | Finalization compatibility with arch remapping |
sub_470DA0 | can_finalize_capability_mask | 2,074 B | Capability bitmask check for finalization |
sub_15C3630 | sm_103 encoding table accessor | ~100 B | Returns sm_103-specific encoding parameters |
sub_15C3950 | sm_110 encoding table accessor | ~100 B | Returns sm_110-specific encoding parameters |
sub_15C1D20 | sm_120 encoding table accessor | ~100 B | Returns sm_120-specific encoding parameters |
sub_15C3410 | sm_121 encoding table accessor | ~100 B | Returns sm_121-specific encoding parameters |
sub_15C2E90 | process_sm_codegen_option | ~70 lines | SM-specific codegen option string dispatch |
sub_4878A0 | arch_string_match | 328 B | Core compatibility checker with SM101/110 bridge |
Observations
-
No separate ISA definition -- sm_103, sm_110, sm_120, and sm_121 do not define their own instruction encoding/decoding tables. They reuse the SM100 infrastructure entirely. The per-architecture differentiation happens through small encoding-table accessor stubs and capability bitmask checks, not through separate code.
-
Heterogeneous numbering -- The five Blackwell architectures span three decades (10, 11, 12), creating three distinct compatibility groups despite sharing a single ISA. The three decades correspond to substantially different silicon configurations (datacenter, automotive/embedded, consumer) even though the instruction set is common.
-
Internal designations -- The remapping table reveals three internal architecture numbers (101, 104, 130) that do not correspond to any user-visible
sm_target. These are development or pre-release designations that were renumbered before public release. -
Asymmetric bitmasks -- The capability bitmask values are non-sequential (1, 2, 8, 64) with gaps at 4, 16, and 32. This leaves room for future architectures to be inserted at those positions without renumbering existing masks.
-
Registration order anomaly -- sm_103 is registered after sm_110 in the database initializer, suggesting sm_110 (Jetson Thor) was added to the compiler toolchain before sm_103 (Blackwell Ultra GB300) despite sm_103 having a lower SM number.
-
sm_120 bitmask absence -- sm_120 has no entry in the capability bitmask table at
sub_470DA0. Its finalization compatibility is handled solely through the remapping rule (104->120) and the same-decade rule. This may indicate that sm_120 was designed from the start to be the "base" consumer architecture within decade 12, with sm_121 being the enhanced derivative.
Confidence Assessment
| Claim | Confidence | Verification |
|---|---|---|
All four use "Blackwell" family string at 0x1D40B6E | CONFIRMED | Decompiled sub_484F50 lines 751/891/1032/1175: "Blackwell" for sm_110/sm_103/sm_120/sm_121 base profiles; string at 0x1d40b6e |
ISA class strings (profile_sm_NNN)->isaClass | CONFIRMED | Strings confirmed: 0x1d40cf9 (sm_103), 0x1d40c46 (sm_110), 0x1d40dac (sm_120), 0x1d40e5f (sm_121) |
__CUDA_ARCH__ values: 1030, 1100, 1200, 1210 | CONFIRMED | Strings at 0x1d40cc9, 0x1d40c16, 0x1d40d7c, 0x1d40e2f |
| All sub-variant strings (sm_NNNa, sm_NNNf, compute_, lto_) | CONFIRMED | All strings found in nvlink_strings.json at documented addresses (e.g., 0x1d40d14=sm_103a, 0x1d40c61=sm_110a, etc.) |
| Registration order: sm_100 -> sm_110 -> sm_103 -> sm_120 -> sm_121 | CONFIRMED | Decompiled sub_484F50: sm_110 at ~line 751, sm_103 at ~line 891; string address ordering 0x1d40c2b (sm_110) < 0x1d40cde (sm_103) confirms sm_110 registered before sm_103 |
Dispatch table: sm_103 encoding = sub_15C3630 | CONFIRMED | Decompiled sub_15C0CE0 line 182: sub_448E70(qword_2A644A8, "sm_103f", sub_15C3630) |
Dispatch table: sm_110 encoding = sub_15C3950 | CONFIRMED | Decompiled sub_15C0CE0 shows sm_110 with sub_15C3950 at A8 slot |
Dispatch table: sm_120 encoding = sub_15C1D20 | CONFIRMED | Decompiled sub_15C0CE0 line 189: sub_448E70(qword_2A644A8, "sm_120", sub_15C1D20) |
Dispatch table: sm_121 encoding = sub_15C3410 | CONFIRMED | Decompiled sub_15C0CE0 line 210: sub_448E70(qword_2A644A8, "sm_121", sub_15C3410) |
| Sub-variants share function pointers (sm_120/120a/120f identical) | CONFIRMED | Decompiled sub_15C0CE0 lines 187-207: sm_120, sm_120a, sm_120f all use sub_15C1D20 for encoding table |
| Finalization remapping: 104->120, 130->107, 101->110 | CONFIRMED | Decompiled sub_4709E0 lines 22-31 and sub_470DA0 lines 20-31 |
| Capability bitmask: d=1, n=2, g=8, y=64 | CONFIRMED | Decompiled sub_470DA0 lines 95-106 |
| sm_120 absent from capability bitmask table | CONFIRMED | Decompiled sub_470DA0 switch: only cases d/g/n/y; no case for 120 |
| Same-decade groups: 10={100,103}, 11={110}, 12={120,121} | HIGH | Derived from integer division rule confirmed in sub_4878A0 |
| SM101/110 bidirectional bridge | CONFIRMED | Decompiled sub_4878A0 line 55: v19 == 101 || v20 == 101 || v19 == 110 || v20 == 110 |
| All 4 share SM100 encoding infrastructure | HIGH | Dispatch table shows distinct accessor stubs but same underlying encoder/decoder regions |
For general Blackwell architecture details, see the ptxas wiki: Blackwell. For SM120 consumer target specifics, see cicc wiki: SM120.
Cross-References
nvlink Internal
- SM100 Blackwell -- base Blackwell ISA and encoding shared by all four architectures
- Architecture Profiles -- profile registration for sm_103/110/120/121
- Compatibility -- finalization compatibility and same-decade rules
- Architecture Dispatch -- per-architecture vtable entries (all four share SM100 codegen)
- Mercury Overview -- Mercury encoding default for all Blackwell-family targets
Sibling Wikis
- ptxas: Blackwell -- standalone ptxas SM100+ target support (includes SM103/110/120/121)
- cicc: SM100 Blackwell -- cicc compiler Blackwell target
- cicc: SM120 -- cicc compiler SM120 consumer target