SM75 Turing

The SM75 (Turing, compute capability 7.5) instruction selection backend occupies 984 KB at 0xF16000--0x100C000 and is the largest single-architecture ISel backend in the nvlink v13.0.88 binary. It contains 1,737 functions organized into four functional layers -- operand predicates, instruction emitters, pattern matchers, and post-ISel emit+encode functions -- plus a 280 KB mega-hub dispatch function (sub_FBB810) that is the largest function in the entire binary at 65,999 instructions.

Turing is architecturally significant as the first SM generation to introduce the Uniform Register File (URF), which manifests throughout this backend as operand kind 10 (UREG). The ISel uses a priority-based linear scan: for each IR instruction, all 276 pattern matchers run in sequence, and the highest-priority match wins.

Key Facts

Property	Value
Address range	`0xF16000`--`0x100C000` (984 KB)
Total functions	~1,737
Mega-hub dispatch	`sub_FBB810` at `0xFBB810` (280 KB, 65,999 instructions, 1,733 callees)
Operand predicates	15 functions at `0xF16030`--`0xF160F0`
Instruction emitters	18 functions at `0xF10080`--`0xF15A50`
Pattern matchers	276 functions at `0xF16150`--`0xFBB780`
Emit+encode functions	38 functions at `0xFFFDF0`--`0x100BBF0`
Encoding context size	576+ bytes
ISel architecture	Priority-based linear scan (not tree-pattern or DAG)

Address Map

Range	Size	Subsystem	Count
`0xF16030`--`0xF160F0`	<1 KB	Operand predicate functions	15
`0xF10080`--`0xF15A50`	22 KB	Instruction emitters	18
`0xF16150`--`0xFBB780`	678 KB	ISel pattern matchers	276
`0xFBB810`	280 KB	Mega-hub dispatch (`sub_FBB810`)	1
`0xFFFDF0`--`0x100BBF0`	48 KB	Post-ISel emit+encode	38

Instruction Selection Flow

For each NVVM IR instruction to be lowered to SM75 machine code, sub_FBB810 executes the following protocol:

1. sub_FBB810 iterates through all 276+ pattern matchers
2. Each matcher calls sub_A49150(ctx, node, field_id) to read instruction attributes
3. Each matcher calls sub_530FD0(node) to check explicit operand count
4. Each matcher calls sub_530FB0(node, idx) to retrieve operand at index
5. Each matcher calls sub_530FC0(node) to check implicit operand count
6. Operand type checked via sub_F16040/F16070/etc predicates (kind tag)
7. Register class validated via sub_F16030: value 1023 = "any" (wildcard)
8. If all checks pass: *priority_out = priority, *pattern_id_out = id
9. After all matchers run, mega-hub picks highest-priority match
10. Corresponding emitter called to generate 128-bit encoding

The priority mechanism ensures specific patterns override general ones. Higher values win. If the current best priority already exceeds a matcher's threshold, that matcher early-outs (optimization to avoid redundant checks). Priority ranges across the 276 matchers:

Priority Range	Meaning	Example
2--4	Fallback/default patterns (minimal constraints)	`sub_FBB780` (pattern 1, priority 2): matches any instruction with 0 explicit ops and 2 implicit uniform-register ops
7--10	Simple patterns (few attribute checks)	NOP/barrier variants, basic shifts
14--19	Standard patterns (moderate constraints)	IADD3, I2I, MUFU, ISETP, texture fetch, surface load
22--24	Complex patterns (many attribute + operand checks)	Memory indexed 3-op, branch with predication
33--36	Very specific patterns (maximum constraints)	SHFL/VOTE with 8 mixed operands, STG with 7 uniform-reg operands
39	Most specific (HMMA widest variants)	`sub_F77140` (9 implicit operands, R128 tensor core ops)

Operand Predicates

Fifteen trivial inline functions at 0xF16030--0xF160F0 classify operand types by a single-byte tag at operand offset +0. These are the leaves of every pattern-match tree.

Address	Function	Test	Operand Kind	Confidence
`0xF16030`	`sm75_get_regclass_id`	`return a1`	Identity (passthrough)	HIGH
`0xF16040`	`sm75_is_register_operand`	`a1 == 2`	REG -- general register	HIGH
`0xF16050`	`sm75_is_immediate_operand`	`a1 == 1`	IMM -- immediate/literal value	HIGH
`0xF16060`	`sm75_is_memory_operand`	`a1 == 6`	Memory/address operand	MEDIUM
`0xF16070`	`sm75_is_uniform_register`	`a1 == 10`	UREG -- uniform register (Turing+)	MEDIUM
`0xF16080`	`sm75_is_predicate_operand`	`a1 == 9`	PRED -- predicate register	MEDIUM
`0xF16090`	`sm75_is_cbuf_operand`	`a1 == 5`	Constant buffer reference	LOW
`0xF160A0`	`sm75_is_texture_operand`	`a1 == 4`	Texture/sampler reference	LOW
`0xF160B0`	`sm75_is_true_predicate`	`a1 == 3`	PT -- always-true guard	MEDIUM
`0xF160C0`	`sm75_is_false_predicate`	`a1 == 15`	PN -- always-false guard	MEDIUM
`0xF160D0`	`sm75_is_kind_13`	`a1 == 13`	Unknown	LOW
`0xF160E0`	`sm75_is_kind_14`	`a1 == 14`	Unknown	LOW
`0xF160F0`	`sm75_is_kind_16`	`a1 == 16`	Unknown	LOW
`0xF16100`	`sm75_is_kind_7`	`a1 == 7`	Unknown	LOW
`0xF16110`	`sm75_is_kind_11`	`a1 == 11`	Unknown	LOW

The identity function at 0xF16030 is used as both a register-class accessor (return value compared against 1023/1/2/4/5) and a generic field value passthrough. A second identity function at 0xF16130 has a different type signature in the original source (both compile to identical machine code, but they occupy distinct vtable slots).

The predicate pair sub_F160B0 / sub_F160C0 is always called as sub_F160B0(v) || sub_F160C0(v) -- accepting either PT (always true, kind 3) or PN (always false, kind 15), matching the SASS convention where a predicate guard can be either polarity.

Operand Kind Tag Summary

Type	Symbol	Meaning	Predicate Address
1	IMM	Immediate / constant value	`0xF16050`
2	REG	General register operand	`0xF16040`
3	PT	Predicate true (always-true guard)	`0xF160B0`
4	--	Texture / sampler reference	`0xF160A0`
5	--	Constant buffer reference	`0xF16090`
6	--	Memory / address operand	`0xF16060`
9	PRED	Predicate register operand	`0xF16080`
10	UREG	Uniform register (Turing+)	`0xF16070`
15	PN	Predicate false (always-false guard)	`0xF160C0`

Kind 10 (UREG) is the defining Turing addition. It reflects the Uniform Register File introduced in SM75, which provides scalar registers shared across all threads in a warp. This operand kind appears pervasively in the pattern matchers, often alongside kind 2 (REG) as alternatives in the same operand slot.

Register Class IDs

ID	Symbol	Width	Usage
1	R32 / GPR32	32-bit	General purpose register
2	R64 / GPR64	64-bit	General purpose register pair
4	R128 / GPR128	128-bit	Register quad (HMMA tensor core)
5	P	1-bit	Predicate register
1023 (0x3FF)	ANY	wildcard	Matches any register class ("don't care")

Instruction Emitters

Eighteen functions at 0xF10080--0xF15A50 implement the "emit" phase of instruction selection. Each takes an emitter context (a1, 576+ bytes) and an instruction node (a2) and produces the SM75 128-bit instruction encoding. All share a common structure:

Phase 1: Set instruction opcode at a2+12 (e.g., 126, 18, 104)
Phase 2: Load register class descriptor from rodata into a1+8 (SSE load)
Phase 3: Populate 10-slot operand descriptor arrays at a1+24..a1+140
         (register IDs, types, flags -- using SSE memcpy for speed)
Phase 4: Set explicit operand count at a1+144
Phase 5: Bind operands via sub_4C6380/sub_4C60F0/sub_4C6DC0
Phase 6: Encode bitfields into encoding words at a1+544 and a1+552
Phase 7: Set instruction class tag at a1+276
Phase 8: Write branch target / relocation info to instruction node

Identified Emitters

Address	Size	Identity	Opcode	Operand Bindings	Instruction Class
`0xF10080`	4,975 B	`sm75_emit_memop_5src`	126 (memory)	5: src reg, gen x3, pred	`0xE000000004` (load)
`0xF10620`	4,969 B	`sm75_emit_memop_6src`	126 (memory)	6: as above + extra src	`0xE000000003` (store)
`0xF10BE0`	4,857 B	`sm75_emit_alu_2src_uniform`	18 (int ALU)	2: gen(rc=10) + pred	`0x7000000001`
`0xF11090`	~4.8 KB	`sm75_emit_alu_2src_uniform_B`	18	2	`0x7000000001`
`0xF11540`	~4.8 KB	`sm75_emit_alu_2src_uniform_C`	18	2	`0x7000000001`
`0xF119F0`	~4.8 KB	`sm75_emit_alu_variant_D`	18	--	--
`0xF11EE0`	~4.8 KB	`sm75_emit_alu_variant_E`	18	--	--
`0xF123D0`	~4.8 KB	`sm75_emit_alu_variant_F`	18	--	--
`0xF128E0`	~4.8 KB	`sm75_emit_memop_variant_G`	126	--	--
`0xF12DF0`	~4.8 KB	`sm75_emit_memop_variant_H`	126	--	--
`0xF13310`	~4.8 KB	`sm75_emit_variant_I`	--	--	--
`0xF13830`	~4.8 KB	`sm75_emit_variant_J`	--	--	--
`0xF13D50`	~4.8 KB	`sm75_emit_variant_K`	--	--	--
`0xF14310`	~4.8 KB	`sm75_emit_variant_L`	--	--	--
`0xF148D0`	~4.8 KB	`sm75_emit_variant_M`	--	--	--
`0xF14E90`	~4.8 KB	`sm75_emit_variant_N`	--	--	--
`0xF15470`	~4.8 KB	`sm75_emit_variant_O`	--	--	--
`0xF15A50`	~4.8 KB	`sm75_emit_variant_P`	--	--	--

Opcode Families

Three instruction opcode numbers are confirmed from a2+12 assignments:

Opcode	Family	SASS Instructions
18	Integer ALU	IADD, IMAD, ISCADD, LEA, SHF, BFE, BFI, LOP3, PRMT
104	FP32 operations	FADD, FMUL, FMAD, FFMA
126	Memory / load-store	LDG, STG, LDS, STS, LDL, STL

Instruction Class Tags

The 5-byte value written at context offset +276 encodes both the instruction family (high nibble) and operand configuration (low word):

Tag	Meaning
`0x7000000001`	Integer ALU, 2-operand form
`0xE000000003`	Memory store (opcode 126, 6 sources)
`0xE000000004`	Memory load (opcode 126, 5 sources)

Pattern Matchers

All 276 pattern matchers at 0xF16150--0xFBB780 share the same signature:

char __fastcall sm75_match_XXX(
    void*     ctx,          // a1: ISel context
    void*     instr_node,   // a2: IR instruction node
    uint32_t* pattern_id,   // a3: output -- matched pattern ID
    uint32_t* priority      // a4: in/out -- current best priority
);

Each performs a deeply-nested sequence of checks:

Check instruction attributes via sub_A49150(ctx, node, field_id) -- see field ID dictionary below
Check explicit operand count via sub_530FD0(node)
For each explicit operand: validate kind tag and register class
Check implicit operand count via sub_530FC0(node)
For each implicit operand: validate kind tag and register class
If all checks pass and *a4 <= threshold: set *a4 = new_priority, *a3 = pattern_id

Pattern Matcher Categories

The 276 matchers group into 12 functional categories by the instruction families they match:

Type	Address Range	Count	Representative Patterns
NOP / barrier	`0xF16150`--`0xF163A0`	~5	NOP variant (pattern 33, priority 4), control flow simple (42, 8)
HMMA (tensor core)	`0xF1C3F0`--`0xF20D10`	~10	HMMA f16/f32 (4, 15), HMMA f64 (13, 15), HMMA with UR (9, 15)
ALU / arithmetic	`0xF20D10`--`0xF2B2A0`	~30	IADD3 reg+imm (2, 17), ALU 2-op variants
Memory / load-store	`0xF307E0`--`0xF36A20`	~15	Memory indexed 3-op (12, 24), STG indexed 6-op (25, 36)
Conversion / cast	`0xF3C0F0`--`0xF437C0`	~20	I2I 3-op (57, 17), MUFU/F2F (82, 19), TEX 3-op (121, 19)
Predicated ops	`0xF4AA30`--`0xF4FB70`	~10	ISETP 3-op (209, 19), texture fetch 3-op (218, 19)
Store variants	`0xF58BB0`--`0xF5C120`	~10	STG with predicate (10, 19), store predicated variants
Surface / texture	`0xF6DC60`--`0xF71B60`	~15	SULD 4-op predicated (5, 19) with side-effect check
Complex HMMA	`0xF76170`--`0xF77DF0`	~8	HMMA wide R128 (7, 34), HMMA widest 9-op (8, 39) -- largest matchers
ALU extended	`0xF82CF0`--`0xF96B40`	~50	IMAD predicated 6-op (1, 19), IADD/IMUL/SHF/BFE/BFI/LOP3/PRMT
Comparison / SETP	`0xF97CE0`--`0xF9CD30`	~15	DSETP 8-op (22, 34), DSETP 9-op+pred (23, 36)
Branch / call	`0xFA0310`--`0xFAA4E0`	~20	BRA complex predicated (1, 24), call/return variants
Final / fallback	`0xFB7A90`--`0xFBB780`	~5	SHF 3-op imm (2, 10), fallback simplest (1, 2)

Complex HMMA Matchers (Largest)

The most complex matchers target Half-precision Matrix Multiply-Accumulate (HMMA) instructions for Turing's tensor cores. These are the largest individual matcher functions (6--8 KB each) because HMMA has the most operands and encoding options:

sub_F77140 -- HMMA widest variant A (8,408 bytes, 179 lines):

Checks field 0x216 == 2717 (HMMA opcode variant)
Additional checks: fields 0xA1 == 700, 0xA2 in range 702--703
1 explicit operand: register R128
9 implicit operands: R128 x3, predicate, R64, R32, UREG R32, PT/PN check
Sets pattern_id=8, priority=39 (maximum observed)

sub_F77DF0 -- HMMA widest variant B (8,401 bytes, 179 lines):

Checks field 0x21A == 2729 (different HMMA subtype)
Same 9-implicit-operand structure
Sets pattern_id=12, priority=39

sub_F76DD0 -- HMMA wide operand A (7,226 bytes, 164 lines):

Checks field 0x216 == 2716
1 explicit R128 + 8 implicit (R128 x3, predicate, R32, UREG x2)
Sets pattern_id=7, priority=34

Fallback Matcher

sub_FBB780 (1,108 bytes, 34 lines) is the fallback pattern that matches when nothing else does:

Zero instruction attribute checks
Requires: explicit operand count == 0, implicit count == 2, first implicit = uniform register R32
Sets pattern_id=1, priority=2 (lowest observed)
Any other matching pattern will override this due to priority 2

Post-ISel Emit+Encode Functions

Thirty-eight functions at 0xFFFDF0--0x100BBF0 combine pattern matching with instruction encoding for complex instructions requiring immediate bitfield packing. They share the emitter signature __int64 (ctx, instr_node) and use sub_4C28B0(ctx, bit_offset, width, value) extensively to pack individual fields into the 128-bit SASS instruction word.

Encoding Protocol

1. Pack opcode bits:     sub_4C28B0(a1, 0, 4, 2)   -- 4 bits at offset 0
2. Pack sub-opcode:      sub_4C28B0(a1, 4, 3, 0)   -- 3 bits at offset 4
3. Pack encoding fields from rodata tables
4. Initialize operand binding via sub_4C2A60/sub_4C2A90
5. Extract instruction-specific modifiers via sub_A551C0..sub_A55470
6. Encode modifiers into bit positions via sub_A4xxxx/sub_A50xxx
7. Pack into encoding words at ctx+544 and ctx+552 (shift+OR)
8. Set relocation metadata at ctx+148..ctx+160

Identified Emit+Encode Functions

Address	Size	Identity	Opcode	Sources
`0xFFFDF0`	6,810 B	`sm75_emit_encode_memop_complex_4src`	126	4 src via `sub_4C4D60` + `sub_4C5C30`
`0x1000460`	6,959 B	`sm75_emit_encode_memop_complex_6src`	126	6 src, two relocation entries
`0x1000CD0`	5,492 B	`sm75_emit_encode_alu_3src_A`	18	3 src
`0x10012B0`	5,493 B	`sm75_emit_encode_alu_3src_B`	18	3 src
`0x1001890`	5,407 B	`sm75_emit_encode_alu_3src_C`	18	3 src
`0x1001E10`	5,460 B	`sm75_emit_encode_alu_3src_D`	18	3 src
`0x1002340`	5,687 B	`sm75_emit_encode_alu_4src_A`	18	4 src
`0x10028F0`	5,688 B	`sm75_emit_encode_alu_4src_B`	18	4 src
`0x10088D0`	5,190 B	`sm75_emit_encode_fp32_4op`	104	4 src, `sub_A51DD0(node) == 1875` check
`0x1008DD0`	5,329 B	`sm75_emit_encode_fp32_4op_B`	104	4 src
`0x10092F0`	5,191 B	`sm75_emit_encode_fp32_4op_C`	104	4 src
`0x10097F0`	5,330 B	`sm75_emit_encode_fp32_4op_D`	104	4 src
`0x1009D10`	5,138 B	`sm75_emit_encode_fp32_4op_E`	104	4 src
`0x100A210`	5,296 B	`sm75_emit_encode_fp32_4op_F`	104	4 src
`0x100A730`	5,435 B	`sm75_emit_encode_fp32_4op_G`	104	4 src
`0x100AC70`	5,297 B	`sm75_emit_encode_fp32_4op_H`	104	4 src
`0x100B190`	5,436 B	`sm75_emit_encode_fp32_4op_I`	104	4 src
`0x100B6D0`	5,297 B	`sm75_emit_encode_fp32_4op_J`	104	4 src
`0x100BBF0`	5,296 B	`sm75_emit_encode_fp32_4op_K`	104	4 src

The remaining 20 emit+encode functions (0x1002EA0--0x10083A0) follow the same structure with varying field encoding positions and are labeled as generic variants (I through AA).

Opcode distribution among emit+encode functions: 11 for FP32 (opcode 104), 8 for integer ALU (opcode 18), 2 for memory (opcode 126), 17 for undetermined variants.

Encoding Context Structure

The 576-byte emitter context is the central data structure threading through all emitter and emit+encode functions. It accumulates the operand bindings and bitfield encodings for one SM75 SASS instruction.

Offset  Size   Field
+0      8      Reserved / vtable pointer
+8      16     XMM register class descriptor (SSE-loaded from rodata)
+12     2      Instruction opcode number (18, 104, or 126)
+16     4      Base bit position for predicate encoding
+24     40     Operand register numbers: 10 x 4-byte slots (indices 0--9)
+64     40     Operand types / constraints: 10 x 4-byte slots
+104    40     Operand flags: 10 x 4-byte slots (0=def, 1-5=use, -1=unused)
+144    4      Explicit operand count
+148    4      Relocation type (first)
+152    4      Relocation bit offset (first)
+156    4      Relocation type (second)
+160    4      Relocation bit offset (second)
+276    8      Instruction class tag (e.g., 0xE000000004)
+404    32     Match/emit dispatch table pointer
+536    8      Pointer to instruction descriptor table
+544    8      Encoding word 0 (64-bit bitfield, low half of 128-bit)
+552    8      Encoding word 1 (64-bit bitfield, high half of 128-bit)
+558    2      Immediate value (16-bit)
+572    4      Branch/offset target

The operand descriptor arrays at offsets +24, +64, and +104 are populated with optimized SIMD memcpy (aligned SSE loads/stores copying 4 elements at a time from rodata descriptor tables).

Rodata Register Class Descriptors

Each instruction family has a 16-byte register class descriptor loaded from rodata into context offset +8 via SSE:

Rodata Address	Instruction Family	Used By
`xmmword_1F46E28`	Memory operations (opcode 126)	`sub_F10080`, `sub_F10620`, `sub_FFFDF0`, `sub_1000460`
`xmmword_1F466B8`	Integer ALU (opcode 18)	`sub_F10BE0`, `sub_F11090`, `sub_F11540`
`xmmword_1F46630`	FP32 operations (opcode 104)	`sub_10088D0`--`sub_100BBF0`
`xmmword_1F47268`	Complex memory (emit+encode)	Post-ISel emit+encode functions

Each descriptor has three parallel arrays of 10 DWORDs defining per-slot operand register IDs, type/constraint descriptors, and flag words. Example for memory operations: dword_1F46E38[0..9] (register IDs), dword_1F46E60[0..9] (types), dword_1F46E88[0..9] (flags).

External Dependencies

The SM75 backend relies on shared infrastructure functions used across all ISel backends:

IR Node Accessors

Function	Signature	Description	Callers
`sub_A49150`	`(ctx, node, field_id) -> value`	Read instruction attribute by field ID	30,768 (binary-wide)
`sub_530FD0`	`(node) -> count`	Get explicit operand count	Universal
`sub_530FB0`	`(node, idx) -> operand*`	Get operand at index	31,399 (binary-wide)
`sub_530FC0`	`(node) -> count`	Get implicit operand count	Universal
`sub_A49720`	`(node) -> bool`	Check instruction has side effects	Surface load matchers
`sub_A51DD0`	`(node) -> class`	Get instruction class / post-condition	FP32 emit+encode

Operand Binding Functions

Function	Description
`sub_4C6380(ctx, node, op, off, rc)`	Bind source register operand
`sub_4C60F0(ctx, node, op, off, rc)`	Bind general register operand
`sub_4C6DC0(ctx, node, op, off, rc)`	Bind predicate register operand
`sub_4C5F90(ctx, node)`	Finalize operand binding
`sub_4C28B0(ctx, bit, width, val)`	Pack value into encoding bitfield
`sub_4C2A60(ctx)`	Initialize encoding
`sub_4C2A90(ctx, node, flag)`	Bind primary result
`sub_4C4D60(ctx, node, op, off)`	Bind source operand (complex)
`sub_4C5C30(ctx, node, op, off)`	Bind special operand

Modifier Extraction and Encoding

Modifier fields are extracted from the IR node via sub_A55xxx functions and encoded into bit positions via sub_A4xxxx/sub_50xxxx functions:

Extractor	Field	Encoder	Width
`sub_A551C0`	Modifier 1	`sub_A4F970`	3-bit
`sub_A55220`	Modifier 2	`sub_A4D940`	2-bit
`sub_A55280`	Modifier 3	`sub_A4DC60`	2-bit (alt)
`sub_A55320`	Modifier 4	`sub_A4FDE0`	4-bit
`sub_A55340`	Modifier 5	`sub_A50260`	2-bit
`sub_A55400`	Modifier 6	`sub_A500E0`	2-bit
`sub_A55450`	Modifier 7	`sub_A500F0`	5-bit
`sub_A55470`	Modifier 8	`sub_A4FBC0`	4-bit

Additional encoding functions handle specific operand attributes: sub_509D90 (register source A), sub_509DB0 (register source B), sub_509F20 (comparison mode), sub_509160 (data type/precision), sub_509290 (rounding mode), sub_509890 (saturation/clamp), sub_50AC80 (source negate/abs), sub_50ACD0 (source modifier composite), sub_509800 (address mode), sub_509930 (thread scope), sub_509A90 (memory order), sub_50C820 (cache policy), sub_50B570 (texture mode).

Field ID Dictionary

Field IDs passed to sub_A49150 to query instruction attributes. These are the keys used by every pattern matcher to classify instructions:

Field ID	Hex	Semantic Name	Known Values
5	`0x05`	Instruction major class	12 = memory/special
28	`0x1C`	Branch/jump type subfield	123--124
46	`0x2E`	Integer comparison mode	213
59	`0x3B`	Warp operation mode	273--274
88	`0x58`	Data type / precision code	406--408
89	`0x59`	Store type	410--416
91	`0x5B`	Address space qualifier	425--427
92	`0x5C`	Memory ordering	429--430
105	`0x69`	ALU function select	477
116	`0x74`	Texture/surface function	512--513
123	`0x7B`	Special function unit selector	536 = texture/surface
126	`0x7E`	Cache coherence / eviction policy	547--548
136	`0x88`	Source negate/absolute modifier	598--599
161	`0xA1`	HMMA input precision A	700
162	`0xA2`	HMMA input precision B	702--703
190	`0xBE`	NOP/barrier subtype	815
201	`0xC9`	Control flow subtype	1109
203	`0xCB`	Integer multiply mode	1113--1119
207	`0xCF`	Integer multiply variant	1150--1158
211	`0xD3`	Conversion subtype	1182
220	`0xDC`	Load/store address mode	1206
226	`0xE2`	Matrix layout	1229
229	`0xE5`	Special instruction code	1238
242	`0xF2`	Addressing mode detail	1281--1282
253	`0xFD`	MUFU function select	1321
254	`0xFE`	I2I conversion mode	1324
255	`0xFF`	Texture fetch type	1327--1328
265	`0x109`	Texture opcode variant	1363/1366
281	`0x119`	Warp shuffle type	1435--1440
285	`0x11D`	Warp shuffle mode	1454--1457
287	`0x11F`	Special indexed operation	1464
294	`0x126`	Memory bank selector	1493
295	`0x127`	Surface load type	1495
302	`0x12E`	Set-predicate class	1525
329	`0x149`	Integer addressing mode	1833--1837
338	`0x152`	Source predication mode A	1871/1873--1874
339	`0x153`	HMMA accumulator type	1877
341	`0x155`	Source predication mode B	1881--1882
345	`0x159`	Memory scope / synchronization	1899--1903
348	`0x15C`	Execution model qualifier	1912--1915
355	`0x163`	Data size / vector width	1943/1947
356	`0x164`	Texture data type	1949
359	`0x167`	Surface data type	1960
376	`0x178`	Memory persistence	2035
377	`0x179`	Memory eviction priority	2037--2041
379	`0x17B`	Branch condition type	2046
380	`0x17C`	Branch target type A	2048--2049
381	`0x17D`	Branch target type B	2052--2053
382	`0x17E`	Branch modifier	2055--2060
394	`0x18A`	Convert source type	2107--2108
397	`0x18D`	Destination predication	2115
399	`0x18F`	HMMA sub-operation	2121
404	`0x194`	Comparison extension	2140--2141
406	`0x196`	HMMA configuration	2146
407	`0x197`	Comparison precision	2148--2151
409	`0x199`	Set-predicate comparison	2155
413	`0x19D`	Memory segment	2167--2168
423	`0x1A7`	Source data type	bitmask test
424	`0x1A8`	Function lookup	bitmask test (739)
429	`0x1AD`	Memory ordering qualifier	2253--2257
430	`0x1AE`	Source A comparison	2259--2260
431	`0x1AF`	Source A comparison ext	2262--2263
465	`0x1D1`	Source B comparison	2420--2421
466	`0x1D2`	Source B comparison ext	2423--2424
468	`0x1D4`	HMMA step select	2429--2430
480	`0x1E0`	Matrix multiply type	2480/2482/2485
484	`0x1E4`	Set-predicate subclass	2502
492	`0x1EC`	Comparison boolean combine	2524--2525
494	`0x1EE`	Branch target form	2529--2530
505	`0x1F9`	HMMA operand layout A	2569
506	`0x1FA`	HMMA operand layout B	2571
508	`0x1FC`	Shift/funnel type	2576--2577
524	`0x20C`	Branch distance	2678--2679
534	`0x216`	HMMA opcode variant A	2716--2717
535	`0x217`	HMMA source C layout	2719--2720
536	`0x218`	HMMA source D layout	2722--2723
538	`0x21A`	HMMA opcode variant B	2729
539	`0x21B`	HMMA mode X	2731--2736
540	`0x21C`	HMMA mode Y	2738--2743
547	`0x223`	HMMA step A	2767--2768
548	`0x224`	HMMA step B	2770
549	`0x225`	HMMA step C	2772
569	`0x239`	Integer set-predicate type	2850--2851
575	`0x23F`	Memory base addressing	2870
576	`0x240`	Memory indexed addressing	2872
583	`0x247`	Conversion class	2892
595	`0x253`	Store qualifier	2937--2938

Turing-Specific Design Observations

Uniform Register File (URF). SM75 introduced the uniform register file -- scalar registers whose value is identical across all threads in a warp. This eliminates redundant per-lane computation for warp-uniform values. In the ISel backend, UREG (kind 10) appears as a first-class operand type alongside REG (kind 2). Many pattern matchers accept either kind in the same operand slot, reflecting that SASS instructions can take operands from either the general or uniform register file.

HMMA complexity. The tensor core (HMMA) instruction family drives the most complex patterns in this backend. The R128 register class (ID 4) exists specifically for HMMA, representing four consecutive 32-bit registers that hold matrix fragments. The highest-priority matchers (priority 39) are all HMMA variants, and the largest individual matcher functions (8+ KB) target HMMA.

Linear scan architecture. Unlike tree-pattern matchers (as used in LLVM's TableGen-generated ISel), this backend evaluates all patterns sequentially. The 280 KB mega-hub calls each of the 276 matchers in order, collects the highest-priority match, then dispatches to the winning emitter. This is computationally expensive (O(patterns) per instruction) but simple to extend: adding a new pattern requires only inserting a new matcher function into the sequence.

128-bit instruction encoding. SM75 uses 128-bit SASS instructions (two 64-bit words at context offsets +544 and +552). The sub_4C28B0 primitive packs arbitrary-width bit fields at arbitrary positions within these two words. Modifier fields are extracted from the IR node and encoded at precise bit positions, with different emit+encode variants differing only in which bits they set within the 128-bit word.

Confidence Assessment

Claim	Confidence	Verification
ISA class string "Turing" for sm_75	CONFIRMED	Decompiled `sub_484F50` line 251: `"Turing"`; string in `nvlink_strings.json` at `0x1d409dc`
SM75 backend at `0xF16000`--`0x100C000` (984 KB)	HIGH	Address range consistent with decompiled function addresses in the catalog; mega-hub at `sub_FBB810` falls within range
Mega-hub `sub_FBB810` at 280 KB, 65,999 instructions	HIGH	Size claim derived from binary analysis; too large for Hex-Rays decompilation consistent with other mega-hubs
276 pattern matchers at `0xF16150`--`0xFBB780`	HIGH	Pattern addresses verified against decompiled function catalog; representative patterns like `sub_FBB780` (fallback) confirmed
15 operand predicates at `0xF16030`--`0xF160F0`	HIGH	Address range and trivial predicate structure consistent with ISel infrastructure
Operand kind tags: 1=IMM, 2=REG, 10=UREG, etc.	HIGH	Consistent with shared infrastructure used across SM75/80/89/90 backends
Priority-based linear scan ISel architecture	HIGH	Protocol described matches mega-hub structure: iterate all matchers, pick highest priority
18 emitter functions at `0xF10080`--`0xF15A50`	HIGH	Addresses consistent with function catalog
Opcode families: 18 (int ALU), 104 (FP32), 126 (memory)	HIGH	Opcode numbers from decompiled emit+encode functions at `*(a2+12)` assignments
Register class 1023 = wildcard	HIGH	Consistent across all ISel backends; sentinel value used in operand matching
Dispatch table: sm_75 encoding table = `sub_15C3210`	CONFIRMED	Decompiled `sub_15C0CE0` shows sm_75 registration (earlier in file)
`__CUDA_ARCH__=750`	CONFIRMED	String at `0x1d409c8`; decompiled `sub_484F50` line 252
128-bit instruction encoding at ctx+544/+552	HIGH	Consistent across all SM75+ backends; encoding word offsets documented
Field ID dictionary (500+ field IDs)	MEDIUM	Field IDs from pattern matcher analysis; individual values not independently verified but consistent with `sub_A49150` usage

For general SM75 architecture details, see the ptxas wiki: Turing/Ampere and cicc wiki: SM70-89.

Cross-References

nvlink Internal

Embedded ptxas: Architecture Overview -- full address map including SM75 backend position
Instruction Selection Hubs -- the five mega-hub dispatch functions
IR Nodes -- IR node structure and accessor functions
Architecture Dispatch -- SM75 vtable registration and callbacks
Architecture Profiles -- SM75 profile in the linker database
SM80 Ampere -- successor ISel backend

Sibling Wikis

ptxas: Turing/Ampere -- standalone ptxas SM75/SM80 target documentation
ptxas: ISel -- standalone ptxas instruction selection
cicc: SM70-89 -- cicc compiler SM75 through SM89 targets

Keyboard shortcuts

nvlink Reverse Engineering Reference