Volta through Ada Lovelace (sm_70 – sm_89)

The sm_70 through sm_89 range spans four GPU generations — Volta, Turing, Ampere, and Ada Lovelace — and represents the most mature feature tier in cicc v13.0. Turing (sm_75) serves as the compiler's default architecture. Volta (sm_70/72) is no longer directly targetable: no compute_70 or compute_72 entry exists in the CLI parser, though the sm_70 feature boundary is still checked at 23 locations throughout the binary.

Supported Compute Capabilities

The architecture registration table at sub_95EB40 maps CLI strings to internal flags. Only the following are accepted for this generation range:

Compute Capability	Internal Target	`__CUDA_ARCH`	PTX Version	Generation
`compute_75`	`sm_75`	750	5	Turing
`compute_80`	`sm_80`	800	5	Ampere
`compute_86`	`sm_86`	860	5	Ampere
`compute_87`	`sm_87`	870	5	Ampere (Jetson Orin)
`compute_88`	`sm_88`	880	5	Ada Lovelace
`compute_89`	`sm_89`	890	5	Ada Lovelace

There is no compute_70, compute_72, compute_73, or compute_82. The sm_73, sm_82, and sm_88 targets exist only as internal processor table entries — they have no publicly documented differentiation and no unique feature gates in the compiler.

SM Version Encoding

Two parallel version tracking systems coexist in the binary:

qword_4F077A8 — Encodes SM_MAJOR * 10000 + SM_MINOR * 100. Used in approximately 309 decompiled files, primarily in the NVVM frontend and optimizer. Boundary thresholds use the XX99 pattern (e.g., 69999 for pre-Volta, 79999 for pre-Ampere, 89999 for pre-Hopper).
unk_4D045E8 — Stores the raw SM number as a decimal (e.g., 75 for sm_75, 89 for sm_89). Used in approximately 12 decompiled files, primarily in the builtin checker and atomic lowering logic.

Feature Configuration Call Order

The compiler configures feature flags through a strict four-function call sequence. Each subsequent function can override or augment the previous one's settings:

CLI parser — Sets byte_4CF8* override flags from user-specified options. These prevent any subsequent auto-configuration from touching the guarded flag.
sub_60DFC0 — Basic initialization. Sets unk_4D041B8 for sm_80+ (C++20 __VA_OPT__ support).
sub_60D650(opt_level) — Optimization-level-based flag configuration. Sets approximately 109 flags based on the -O level. Many of the same unk_4D04* flags set by SM gates are also set here under C++17/C++20 language-version conditions.
sub_60E7C0 — Master SM architecture feature configurator. Reads qword_4F077A8 and sets approximately 60 backend flags through threshold comparisons. Also calls sub_60E530 (tertiary cascade) for supplementary flags.
sub_982C80 — NVPTX subtarget feature table initialization (224-byte bitfield for the LLVM backend). This is a separate path from the EDG flags above.

Override priority: CLI flag > SM version > Optimization level > C++ standard version > CUDA mode > Virtual arch flag.

Feature Gates by Generation

Volta (sm_70+) — Threshold `qword_4F077A8 > 69999`

Volta introduced the first tensor core generation and independent thread scheduling. Although not directly targetable in this compiler version, the sm_70 boundary enables:

HMMA tensor core intrinsics — Builtin IDs 678–707 registered in sub_90AEE0. Three shape variants (m16n16k16, m32n8k16, m8n32k16) with load, store, and MMA operations across f16/f32 accumulator combinations.
Convergent branch intrinsic — llvm.nvvm.branch.if.all.convergent (builtin 3755/8282) requires sm_70+. Error: "not supported on pre-Volta Architectures" (checked in sub_1C36530 and sub_2C7B6A0).
Proper atomic memory ordering — At sm_70+, atomics use acquire/release/relaxed semantics instead of falling back to volatile qualification. The gate is unk_4D045E8 > 69.
128-bit atomic operations — Enabled at sm_70+. Below this threshold, diagnostic 3758 is emitted: "16-byte atomics only supported on sm_70+".
Optimizer feature flags — unk_4D041DC, unk_4D04858, unk_4D041EC are set by sub_60E7C0. The tertiary cascade sub_60E530 additionally sets unk_4D0428C (extended float suffix support for C++23 std::float*_t / std::bfloat16_t). Multiple SelectionDAG patterns in sub_706250 activate for sm_70+ codegen.
Variant-flag-gated features — When dword_4F077BC (SM variant flag, the a/f suffix) is set and sm_70+ is active, unk_4D043C4 is enabled. When compiling for a virtual architecture with effective SM > 69999, unk_4D04740 is set for multi-arch optimization.
WMMA memory space optimization — The wmma-memory-space-opt pass (registered at ctor_267, ctor_531) optimizes memory access patterns for tensor core operations.

Turing (sm_75) — Default Architecture

sm_75 is the baseline for cicc v13.0. The default is hardcoded in sub_900130 and sub_125FB30 via strcpy("compute_75"), and in sub_95EB40 as "-arch=compute_75".

No explicit sm_75-specific feature gates exist beyond the sm_70 tier. All Volta-era features are available. The key behavioral distinction is that sm_75 passes all pre-Volta gates cleanly — no diagnostic 3703 (sub_5C68F0), no volatile atomic fallback, no 128-bit atomic restrictions.

Ampere (sm_80+) — Threshold `qword_4F077A8 > 79999`

C++20 __VA_OPT__ support — unk_4D041B8 set at sub_60DFC0 line 132–133. This is the only flag set exclusively by sub_60DFC0 at the sm_80 threshold. It enables __VA_OPT__ recognition in the EDG macro expander (sub_A03 line 1010), variadic trailing argument elision (line 1584), and diagnostic 2939 for misuse.
Additional convergent branch — llvm.nvvm.branch.if.convergent (builtin 3754/8283) requires sm_80+. Error: "not supported on pre-Ampere Architectures". Note the distinction: branch.if.all.convergent requires only sm_70+, while branch.if.convergent requires sm_80+.
L2 cache hint atomics — The L2::cache_hint suffix on atomic operations, emitted from sub_21E6DD0 when bit 0x400 is set in instruction encoding flags. Supported operations: exch, add, and, or, xor, max, min, cas, and floating-point add. These are PTX 7.3+ features. Emission logic lives in sub_21E6420.
cp.async.bulk patterns — String matching for cp.async.bulk.tensor.g2s. and cp.async.bulk. in inline assembly validation at sub_A8E250.

Important correction: The master SM feature configurator sub_60E7C0 does NOT set any new flags at the sm_80 boundary (> 79999). The Ampere-specific unk_4D041B8 is set by the secondary configurator sub_60DFC0. The next threshold in sub_60E7C0 after sm_70+ (> 69999) is sm_90+ (> 89999). This means sm_80 through sm_89 share the same sub_60E7C0 flag profile as sm_75.

Ada Lovelace and Ampere Variants (sm_86 – sm_89)

All of sm_86, sm_87, sm_88, and sm_89 share identical feature gates within cicc. They occupy unk_4D045E8 values 86–89 and qword_4F077A8 range 86000–89999, all below the 89999 Hopper boundary.

The primary gate at this tier is unk_4D045E8 <= 89, which delineates pre-Hopper from Hopper+:

Location	Feature	Behavior at sm_89 and below
`sub_5D1A60`	`__block_size__` attribute	Diagnostic 3790; only 4 args parsed (5th cluster arg is sm_90+)
`sub_5D1FE0`	`__cluster_dims__` attribute	Diagnostic 3687 emitted (cluster dimensions are Hopper-only)
`sub_5D2430`	`__launch_bounds__` 3rd param	Diagnostic 3704 emitted (cluster launch bounds)
`sub_6BBC40`	Atomic scope "cluster"	Falls through to "gpu" scope; diagnostic 3763/3759
`sub_6BBC40`	16-byte extended atomics	Diagnostic 3764 for certain scope+type combinations
`sub_9502D0` / `sub_12AE930`	Atomic scope emission	"gpu" used instead of "cluster"
`sub_214DA90`	Cluster PTX directives	Skipped entirely (`arch_id <= 89`)

No code path differentiates sm_89 from sm_86/87/88. Hardware differences between these sub-architectures (e.g., Ada Lovelace RTX 4090 at sm_89 vs. Jetson Orin at sm_87) are resolved at the ptxas assembler level, not in cicc.

Atomic Lowering Detail

The atomic builtin lowering (sub_12AE930 / sub_9502D0) follows two paths split at the sm_70 boundary:

Pre-sm_70 path (unk_4D045E8 <= 69): Atomics are emitted with a volatile qualifier instead of memory ordering. Scope (cta/gpu/sys) is parsed but ordering is forced to volatile. 128-bit atomics emit diagnostic 3758.

sm_70+ path (unk_4D045E8 > 69): Full memory ordering support — relaxed, acquire, release, acq_rel. Scope resolution: cta (scope 0–1), gpu (scope 3), sys (scope 4). Cluster scope (scope 2) is only available at sm_90+; on sm_70–89, scope 2 falls through to "gpu".

Operations: ld, st, atom.add, atom.and, atom.or, atom.xor, atom.max, atom.min, atom.exch, atom.cas. Type suffixes via lookup table: b (bitwise), u (unsigned), s (signed), f (float).

Hopper-Gated Intrinsics Rejected on sm_70–89

Multiple intrinsics emit "this intrinsic is only supported for Hopper+" when the SM version field is non-zero and <= 899:

Builtin ID	Description
0x10B3 (4275)	Hopper+ intrinsic requiring i1 or i32 return
0xFD5 (4053)	Hopper+ intrinsic
0xEB7 (3767)	Memory ordering/fence intrinsic with operation modes
0xEB9–0xEBA (3769–3770)	Pointer-size-dependent intrinsics (>= 64-bit)

Complete sub_60E7C0 Flag Table

The master feature configurator sub_60E7C0 (address 0x60E7C0, 12,466 bytes, 56 qword_4F077A8 comparisons) is the primary SM-architecture-to-feature-flag mapper. Every flag assignment follows a guarded pattern: if the corresponding byte_4CF8* override byte is nonzero (set by a CLI flag), the auto-configuration is skipped and the user's explicit value is preserved.

Unconditional Assignments

These flags are set regardless of SM version with no user override check:

Flag	Value	Notes
`unk_4D047C0`	1	Always enabled
`unk_4D047B4`	1	Always enabled
`unk_4F07584`	0	Always cleared
`unk_4D0423C`	1	Always enabled
`unk_4D04208`	0	Always cleared
`unk_4D04218`	1	Always enabled
`unk_4D04214`	1	Always enabled
`unk_4F06970`	0	Always cleared
`unk_4F06964`	0	Always cleared
`unk_4F06904`	0	Always cleared

SM-Dependent Unconditional Flags

These depend on SM version but have no user override check:

Flag	Condition	Value	Notes
`unk_4D047BC`	SM <= 119999	1	Disabled only for sm_120+
`unk_4D04758`	SM <= 30300	1	sm_32 and below only
`unk_4D04764`	SM <= 30399	1	sm_32 and below only
`unk_4D044B8`	SM <= 40299	1	Pre-Maxwell only

Guarded Flags (byte_4CF8* Override Bypass)

Each flag is set only when its guard byte is zero (user has not overridden via CLI):

Guard	Flag	Default Value (guard=0)
`byte_4CF807B`	`dword_4D048B8`	= 1
`byte_4CF810C`	`dword_4D04824`	= 1
`byte_4CF80F0`	`unk_4D04388`	= 1
`byte_4CF8108`	`unk_4D04338`	= (`dword_4F077BC` && !`dword_4F077B4` && SM <= 30399) ? 1 : 0
`byte_4CF8123`	`unk_4D047C8`	= 1 (only if SM > 30399, sm_35+)
`byte_4CF8125`	`dword_4D047B0`	= 1 (only if SM > 30399, sm_35+)
`byte_4CF8139`	`unk_4D04314`	= (SM <= 40299), pre-Maxwell
`byte_4CF814D`	`unk_4D047C4`	= 0
`byte_4CF8119`	`unk_4D047D0`	= (SM <= 40000)
`byte_4CF8119`	`unk_4D047CC`	= (SM <= 40099)
`byte_4CF810F`	`unk_4D047EC`	= 1
`byte_4CF8107`	`unk_4D04340`	= 0
`byte_4CF8116`	`unk_4D047E0`	= 1
`byte_4CF815F`	`unk_4F0771C`	= 1
`byte_4CF813E`	`unk_4D044B0`	= (SM > 40299), Maxwell+
`byte_4CF8149`	`unk_4D04470`	Complex Maxwell+ gate
`byte_4CF8172`	`dword_4D041AC`	= 0 (when SM <= 109999)
`byte_4CF8159`	`dword_4D048B0`	= (`dword_4D048B8` && `dword_4D048B4` && SM > 40799)
`byte_4CF811C`	`unk_4D04790`	= 0 (when virtual arch flag set)
`byte_4CF813C`	`dword_4D047AC`	sm_35+ feature gate
`byte_4CF8156`	`unk_4D04408`	CUDA C++ feature gate
`byte_4CF815D`	`unk_4D048A0`	= 1 (when SM > 40699)

Total: 21 override bytes controlling approximately 25 feature flags.

Feature Escalation by SM Version

The cumulative flag-setting cascade. Each tier inherits all flags from lower tiers. Only the tiers relevant to sm_70–89 plus their immediate predecessors and successors are shown.

SM > 59999 (sm_60+, Pascal+):

Flag	Identified Meaning
`unk_4D043CC`	EDG C++17 feature gate (also set by C++17 language block)
`unk_4D04404`	EDG extended feature gate (also set by C++17 language block)
`unk_4D043D8`	EDG C++17 feature gate (also set by C++17 language block)
`unk_4D043D4`	EDG feature gate (also set via virtual arch > 30599)
`dword_4F07760`	PTX generation mode flag
`unk_4D04870`	EDG C++20 feature gate (also set by C++20 language block)

SM > 69999 (sm_70+, Volta+):

Flag	Identified Meaning
`unk_4D041DC`	EDG C++17 feature gate (also set by C++17 language block)
`unk_4D04858`	EDG C++17 feature gate (also set by C++17 language block)
`unk_4D041EC`	EDG C++17/Pascal virtual arch feature gate

SM > 89999 (sm_90+, Hopper+) — NOT active for sm_70–89:

Flag	Identified Meaning
`unk_4D043D0`	EDG C++20 feature gate (also set by C++20 language block)
`unk_4D041B0`	EDG C++20 feature gate (also set by C++20 language block)
`unk_4D04814`	EDG C++20 feature gate (also set by C++20 language block)
`unk_4D0486C`	(with additional C++ version check)

sub_60E530 Tertiary Cascade

This supplementary function provides additional progressive unlocks. For the sm_70–89 range:

Threshold	Hex	Flags Set
> 40599	`0x9E97`	`unk_4F07764`
> 40699	`0x9EFB`	`unk_4D043F0`, `unk_4D043F4`
> 40899	`0x9FC3`	`unk_4D04220`, `unk_4D044D0`
> 59999	`0xEA5F`	`unk_4D043CC` (duplicates `sub_60E7C0`)
> 69999	`0x1116F`	`unk_4D0428C` (extended float suffixes: C++23 `std::float*_t` / `std::bfloat16_t`)
> 89999	`0x15F8F`	`dword_4F07760` (duplicates `sub_60E7C0`)
> 99999	`0x1869F`	`dword_4D043F8`, `dword_4D041E8`

Note: unk_4D0428C is set at > 69999 (sm_70+) by the cascade but at > 119999 (sm_120+) by sub_60E7C0. The cascade runs as part of sub_60E7C0, so the sm_70+ activation wins for all practical SM versions. This flag gates C++23 extended float suffixes (std::float16_t, std::float32_t, std::float64_t, std::bfloat16_t) in the EDG numeric parser at sub_A02 line 1612.

sub_60DFC0 SM-Gated Flags

The secondary configurator adds one flag at the sm_80 boundary:

Threshold	Flag	Identified Meaning
> 79999 (sm_80+)	`unk_4D041B8`	C++20 `__VA_OPT__` support in EDG macro expander. Enables `__VA_OPT__` recognition, variadic trailing argument elision, and diagnostic 2939.

Virtual Architecture Downgrade Path

When compiling for a virtual architecture (dword_4F077B4 = 1), sub_60E7C0 uses unk_4F077A0 (the effective/real SM) for a secondary tier of feature decisions:

Effective SM >	Flags Set
29999	`unk_4D043E4`
30099	`unk_4D044D0`
30199	`unk_4D043F0`
30299	`unk_4D04220`
30599	`unk_4D043D4`
59999	`unk_4D041EC`, `unk_4D043D8`, `unk_4D04404`
69999	`unk_4D04740`
79999	`unk_4D043D0`
89999	`unk_4D043D0` (redundant — already set at > 79999)
129999	`unk_4D04184`

Note: In the virtual arch path, unk_4D043D0 is set at > 79999 (sm_80+), while in the primary path it requires > 89999 (sm_90+). Virtual arch compilation is more conservative, enabling features the real target supports even if the virtual arch normally gates them.

unk_4D045E8 Frontend Gates

These gates use the raw SM number and control frontend semantic checks rather than backend flags:

Gate	Locations	Effect
<= 69	`sub_12AE930` ln 241, `sub_9502D0` ln 294	Atomic volatile fallback
<= 69	`sub_6BBC40` ln 763	128-bit atomic error 3758
<= 69	`sub_5C68F0`	Diagnostic 3703
<= 51	`sub_691790` ln 126	Surface builtin warning
<= 59	`sub_6BBC40` ln 639	Atomic scope restriction
60–69	`sub_6BBC40` ln 814	Diagnostic 3762
<= 79	`sub_5C6950` ln 15	Diagnostic 3660
<= 89	`sub_5D1A60` ln 35	`__block_size__` 5th arg blocked
<= 89	`sub_5D1FE0` ln 19	`__cluster_dims__` diagnostic 3687
<= 89	`sub_5D2430` ln 33	`__launch_bounds__` 3rd param diagnostic 3704
<= 89	`sub_6BBC40` ln 684	Atomic scope diagnostic 3763/3759
<= 89	`sub_6BBC40` ln 805, 827	16-byte atomic diagnostic 3764
<= 89	`sub_9502D0` ln 424, `sub_12AE930` ln 255	Cluster scope falls through to "gpu"
<= 89	`sub_214DA90` ln 66	Cluster PTX directives skipped

Cumulative Flag Profile per SM Version

This table shows the net flag state for each SM version in the range, combining all three configurators (sub_60E7C0 + sub_60E530 + sub_60DFC0). Only flags that differ across the sm_70–89 range are shown.

Flag	sm_75	sm_80	sm_86–89	Set By	Identified Role
`unk_4D041DC`	1	1	1	`sub_60E7C0` > 69999	EDG C++17 feature gate
`unk_4D04858`	1	1	1	`sub_60E7C0` > 69999	EDG C++17 feature gate
`unk_4D041EC`	1	1	1	`sub_60E7C0` > 69999	EDG C++17 / virtual arch feature gate
`unk_4D0428C`	1	1	1	`sub_60E530` > 69999	Extended float suffixes (C++23)
`unk_4D041B8`	0	1	1	`sub_60DFC0` > 79999	C++20 `__VA_OPT__` support
`unk_4D043D0`	0	0	0	`sub_60E7C0` > 89999	(sm_90+ only)
`unk_4D041B0`	0	0	0	`sub_60E7C0` > 89999	(sm_90+ only)
`unk_4D04814`	0	0	0	`sub_60E7C0` > 89999	(sm_90+ only)
`unk_4D0486C`	0	0	0	`sub_60E7C0` > 89999	(sm_90+ only)

The sole differentiator between sm_75 and sm_80+ within sub_60E7C0/sub_60DFC0 is unk_4D041B8. All flags set at > 69999 are shared by all sm_70–89 targets. All flags set at > 89999 are absent from all sm_70–89 targets. There is no per-flag difference between sm_86, sm_87, sm_88, and sm_89.

Identified Flag Semantics

Where flag consumers have been positively identified in the decompiled binary:

Flag	Set At	Consumer	Meaning
`unk_4D041B8`	sm_80+ (`sub_60DFC0`)	EDG macro expander (`sub_A03` ln 1010)	C++20 `__VA_OPT__` support: recognition, variadic trailing argument elision, diagnostic 2939
`unk_4D0428C`	sm_70+ (`sub_60E530`), sm_120+ (`sub_60E7C0`)	EDG numeric parser (`sub_A02` ln 1612)	Extended float suffixes: C++23 `std::float16_t`, `std::float32_t`, `std::float64_t`, `std::bfloat16_t`
`dword_4F07760`	sm_60+ (`sub_60E7C0`, `sub_60E530`)	PTX generation path	PTX emission mode flag
`unk_4D047C8`	sm_35+ (`sub_60E7C0`)	Backend	Dynamic parallelism optimization
`dword_4D047B0`	sm_35+ (`sub_60E7C0`)	Backend	Dynamic parallelism support
`unk_4D04780`	always	EDG macro expander	GNU `##__VA_ARGS__` comma-deletion extension

The remaining approximately 50 flags feed into the EDG frontend and NVVM IR generation pipeline. Based on the pattern that sub_60D650 (optimization level) and sub_60E7C0 (SM version) set the same flags with overlapping conditions, most are language feature gates (C++17/20/23 features that are also SM-gated) or optimization pass enables that depend on target capability.

Key Binary Locations

Function	Address	Size	Role
`sub_60E7C0`	`0x60E7C0`	Master SM feature flag initialization (12,466 bytes, 56 comparisons)	Master SM feature flag initialization (12,466 bytes, 56 comparisons)
`sub_60DFC0`	`0x60DFC0`	Secondary feature flag initialization (`unk_4D041B8` at sm_80+)	Secondary feature flag initialization (`unk_4D041B8` at sm_80+)
`sub_60E530`	`0x60E530`	Tertiary feature cascade (`unk_4D0428C` at sm_70+)	Tertiary feature cascade (`unk_4D0428C` at sm_70+)
`sub_60D650`	`0x60D650`	Optimization-level flag configurator (~109 flags)	Optimization-level flag configurator (~109 flags)
`sub_982C80`	`0x982C80`	NVPTX subtarget 224-byte feature bitfield	NVPTX subtarget 224-byte feature bitfield
`sub_617BD0`	`0x617BD0`	CLI parser; sets `unk_4D045E8` per `compute_XX`	CLI parser; sets `unk_4D045E8` per `compute_XX`
`sub_12AE930`	`0x12AE930`	Atomic builtin lowering (volatile vs. ordering)	Atomic builtin lowering (volatile vs. ordering)
`sub_9502D0`	`0x9502D0`	Duplicate atomic lowering (standalone pipeline)	Duplicate atomic lowering (standalone pipeline)
`sub_6BBC40`	`0x6BBC40`	Builtin semantic checker (atomics, scope validation)	Builtin semantic checker (atomics, scope validation)
`sub_90AEE0`	`0x90AEE0`	Builtin registration table (HMMA builtins 678–707)	Builtin registration table (HMMA builtins 678–707)
`sub_95EB40`	`0x95EB40`	Architecture registration (`compute_XX` to `sm_XX`)	Architecture registration (`compute_XX` to `sm_XX`)
`sub_1C36530`	`0x1C36530`	NVVM verifier (convergent intrinsic SM gates)	NVVM verifier (convergent intrinsic SM gates)
`sub_2C7B6A0`	`0x2C7B6A0`	NVVM lowering (convergent intrinsic SM gates)	NVVM lowering (convergent intrinsic SM gates)
`sub_21E6DD0`	`0x21E6DD0`	PTX emission (volatile / L2::cache_hint / .unified)	PTX emission (volatile / L2::cache_hint / .unified)
`sub_21E6420`	`0x21E6420`	Atomic L2 cache hint PTX emission	Atomic L2 cache hint PTX emission
`sub_214DA90`	`0x214DA90`	Kernel attribute PTX emitter (cluster directives gated at arch_id > 89)	Kernel attribute PTX emitter (cluster directives gated at arch_id > 89)
`sub_5D1A60`	`0x5D1A60`	`__block_size__` attribute (cluster dims at sm_90+)	`__block_size__` attribute (cluster dims at sm_90+)
`sub_5D1FE0`	`0x5D1FE0`	`__cluster_dims__` attribute (sm_90+ feature)	`__cluster_dims__` attribute (sm_90+ feature)
`sub_5D2430`	`0x5D2430`	`__launch_bounds__` 3rd param (sm_90+ cluster)	`__launch_bounds__` 3rd param (sm_90+ cluster)
`sub_5C68F0`	`0x5C68F0`	Pre-sm_70 diagnostic 3703	Pre-sm_70 diagnostic 3703
`sub_5C6950`	`0x5C6950`	Pre-sm_80 diagnostic 3660	Pre-sm_80 diagnostic 3660

Keyboard shortcuts

CICC Reverse Engineering Reference