Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Peephole Optimization

All addresses in this page apply to ptxas v13.0.88 (CUDA 13.0). Other versions will differ.

The peephole optimization pass in ptxas is the single largest subsystem by code volume in the entire binary. Three monolithic dispatch functions -- totaling approximately 750 KB of machine code -- implement a brute-force pattern-match-and-rewrite engine that recognizes instruction idioms in the internal IR and replaces them with more efficient SASS instruction forms. Each dispatch function serves a different compilation context (generic, SM120-specific, and post-scheduling), but all three share the same architecture: a giant opcode-based switch dispatches to hundreds of pattern matchers; the highest-priority match wins; the winning rewrite modifies the instruction in-place.

None of the three mega-dispatchers can be decompiled by Hex-Rays due to their extreme size (233--280 KB each). All analysis in this page derives from disassembly, call graphs, and the 3,185 pattern-matcher functions that they invoke.

Scale Summary

Dispatch functionBinary sizeInstructionsPattern matchersTotal call sitesEntry trampolineContext
sub_169B190280 KB65,99976215,870sub_B12930Generic (all SM)
sub_143C440233 KB~56,2411,0871,971sub_B12940SM120-specific
sub_198BCD0233 KB54,0431,33613,391sub_B12960Post-scheduling

All three entry trampolines (sub_B12930, sub_B12940, sub_B12960) are 11-byte thunks that strip or forward one argument and tail-call the corresponding giant.

Pipeline Position

 IR instruction stream
       |
       v
 sub_B12930 -----> sub_169B190   (generic peephole)
       |
       v
 sub_B12940 -----> sub_143C440   (SM120 peephole, RTX 50-series / Pro)
       |
       v
 [instruction scheduling]
       |
       v
 sub_B12960 -----> sub_198BCD0   (post-schedule peephole)
       |
       v
 [instruction encoding via vtable]

The generic and SM120 dispatchers run before scheduling; the post-scheduling dispatcher runs after. The SM120 dispatcher (sub_143C440) appears to be architecture-gated -- it is called only when compiling for SM 120 targets (consumer RTX 50-series, enterprise Pro GPUs).

Dispatch Architecture

All three mega-dispatchers follow the same algorithm.

Entry and primary switch

push callee-saves
sub  rsp, 10h
mov  rbp, rdi            ; ctx
mov  rbx, rsi            ; instruction node
mov  [rsp+var_2C], -1    ; best_template_id = NONE
mov  [rsp+var_30], -1    ; best_priority    = NONE
movzx edi, word [rsi+0Ch]; read opcode field
call sub_13B9DC0          ; identity / normalization (returns opcode)
cmp  ax, 174h             ; 373 cases (opcodes 0..372)
ja   default
jmp  [jump_table + rax*8] ; PRIMARY SWITCH on opcode

The 16-bit opcode at instruction node offset +0x0C selects a primary case. All three dispatchers use 373-case primary switches.

Per-case pattern matching

Within each primary case, the dispatcher:

  1. Calls a sequence of pattern-matcher functions, passing pointers to best_template_id and best_priority as out-parameters.
  2. Each matcher may update these if it finds a match with higher priority than the current best.
  3. After all matchers for the opcode have run, the dispatcher checks best_template_id. If it is no longer -1, a secondary switch on the template ID selects the rewrite action.

The secondary switches are embedded inside the giant function. sub_143C440 alone contains 85 secondary jump tables (sizes 7--190 cases), totaling 1,971 switch cases.

Rewrite action

When a rewrite is selected, the action block performs four operations:

setRewrittenOpcode(instr, new_opcode);     // sub_B28F10: writes byte at instr+14
setRewrittenModifier(instr, new_modifier); // sub_B28F20: writes byte at instr+15
setOperandMapping(instr, slot, value);     // sub_BA9CF0: writes instr+72+4*slot
markRewritten(instr);                      // sub_BA9C30 or sub_BA9CB0

sub_BA9C30 (markRewrittenSimple) sets bit 0 of the flags word at instr+140:

*(uint32_t*)(instr + 140) |= 1;

sub_BA9CB0 (markRewrittenComplex) applies priority-aware flag logic that respects existing rewrites from earlier passes -- it sets bits to 0x8 ("superseded") when a higher-priority rewrite exists.

The symmetry of call frequencies in sub_143C440 confirms this: setRewrittenOpcode and setRewrittenModifier are each called exactly 1,759 times -- every rewrite always sets both the opcode and modifier bytes.

Pattern Matcher Signature

Every one of the 3,185 pattern matchers shares the same prototype:

char __fastcall match(
    int64_t ctx,           // a1: peephole optimization context
    int64_t instr,         // a2: instruction node being examined
    int32_t *template_id,  // a3: output -- combined opcode / template ID
    int32_t *priority      // a4: input/output -- current best priority
);

The function returns a char (the last comparison result, used for early-exit optimization in the caller), but the meaningful outputs are *template_id and *priority.

Matching algorithm

Every matcher performs a deeply-nested chain of checks:

Step 1 -- Modifier/property checks. Call queryModifier(ctx, instr, slot) (sub_10AE5C0) repeatedly. Each call returns an enumerated value for a specific instruction property:

if (queryModifier(ctx, instr, 0xDC) != 1206) return 0;  // data type != .f32
if (queryModifier(ctx, instr, 0x163) != 1943) return 0;  // rounding != .rn
if (queryModifier(ctx, instr, 0x7E) - 547 > 1) return 0; // saturation out of range

The slot indices (0x05, 0x7B, 0x7E, 0x88, 0x90, 0xA1, 0xBE, 0xD2, 0xD3, 0xDC, 0xF2, 0x101, 0x119, 0x126, 0x127, 0x142, 0x152, 0x155, 0x159, 0x15C, 0x163, 0x167, 0x178, 0x179, 0x18A, 0x18D, 0x196, 0x197, 0x199, 0x19D, 0x1A8, 0x1AD, 0x1AE, 0x1AF, 0x1B2, 0x1D1, 0x1D2, 0x1E0, 0x1E4, 0x1EC, 0x216, 0x253, etc.) index into a per-instruction property table covering type, rounding mode, saturation, negate, comparison type, and architecture-specific modifiers.

Step 2 -- Operand count. Check the number of explicit/fixed operands and the total operand slot count:

int fixed = getExplicitOperandCount(instr);  // sub_B28F50: returns *(instr+92)
int total = getTotalOperandSlots(instr);     // sub_B28F40: returns *(instr+40)+1 - *(instr+92)

Step 3 -- Operand type and register class validation. For each operand slot, retrieve the operand pointer and check its kind:

void *op = getOperand(instr, idx);   // sub_B28F30: returns *(instr+32) + 32*idx
byte kind = *(byte*)op;
if (!isRegister(kind))   return 0;   // sub_13B9CD0: kind == 2
if (!isImmediate(kind))  return 0;   // sub_13B9CE0: kind == 1 (alt check)

Register class is checked against expected values:

int regclass = getRegisterClass(*(uint32_t*)(op + 4)); // sub_13B9CC0
if (regclass != 1023 && regclass != 1) return 0;       // 1023 = wildcard

Step 4 -- Priority gate. If all checks pass and the current priority allows it:

if (*priority <= threshold) {
    *priority = threshold + 1;
    *template_id = combined_opcode_id;
}

Since matchers are called sequentially and each checks the running maximum, the highest-priority match always wins.

Operand Type Discriminators

Three families of trivial single-instruction functions serve as operand type predicates, one family per dispatch context:

SM120 matchers (Zone A of sub_143C440)

FunctionTestSemantic
sub_13B9CD0kind == 2isRegister
sub_13B9CE0kind == 1isImmediate
sub_13B9D00kind == 2 || kind == 1isRegOrImm
sub_13B9D10kind == ?isConstantBuffer
sub_13B9D40kind == ?isPredicate
sub_13B9D50kind == ?isUniformRegister
sub_13B9CC0extracts classgetRegisterClass (1023 = wildcard)

Generic matchers (Zone A of sub_169B190)

FunctionTestSemantic
sub_15F59C0a1 == 2isRegister
sub_15F59D0a1 == 1isImmediate
sub_15F59E0a1 == 0isNone
sub_15F59F0a1 == 10isConstantMemory
sub_15F5A00a1 == 9isTexRef
sub_15F5A30a1 == 3isPredicate / isConstImm
sub_15F5A40a1 == 15isUniformRegister / isTrueConst
sub_15F5A80a1 == 6isLabel
sub_15F5A90a1 == 11isTexture
sub_15F5AB0identitygetOperandValue

Post-schedule matchers (Zone A of sub_198BCD0)

FunctionTestSemanticCall count
sub_1820170identitygetOpcodeRaw9,278
sub_1820180a1 == 2isRegOperand2,743
sub_1820190a1 == 1isImmOperand677
sub_18201A0a1 == 8isUniform7
sub_18201B0a1 == 10isPredicateReg1,228
sub_18201C0a1 == 9isTexRef211
sub_18201D0a1 == 5isConstBuf14
sub_18201E0a1 == 4isAddress9
sub_18201F0a1 == 3isConstImm1,044
sub_1820200a1 == 15isTrueConst1,044
sub_1820210a1 == 7isBarrier9
sub_1820220a1 == 12isSurface12
sub_1820230a1 == 11isTexture12
sub_1820240a1 == 6isLabel2
sub_1820250a1 == 14isSpecialReg2
sub_1820260a1 == 13isUnknown6

Priority System

Matchers use a strict numeric priority to resolve conflicts when multiple patterns match the same instruction. Higher priority means more specific and/or more profitable transformation.

Priority rangeDescriptionExample
1--2Trivial matches (simple mov, basic arithmetic)Single-operand passthrough
5--11Common 2--3 operand combining patternsStandard FMA combines
14--20Complex 4-operand patterns with constraintsMulti-source ALU combines
22--31Highly specific multi-operand patternsWide register + predicated ops
33--36Maximum specificity (8--9 operands + all modifiers)Full tensor instruction forms

Pattern IDs range from 1 to approximately 244 in the generic and SM120 dispatchers. Multiple matchers can target the same pattern ID with different priorities, creating a priority cascade.

Instruction Node Layout

The peephole subsystem reveals the following fields of the instruction IR node:

OffsetSizeFieldAccessor
+0x001 BOperand type tagisRegister, isImmediate, etc.
+0x044 BPrimary value (register number / immediate)getRegisterClass / getOperandValue
+0x0C2 BOpcode number (16-bit)Direct read in dispatch entry
+0x0E1 BRewritten opcodesub_B28F10 (setRewrittenOpcode)
+0x0F1 BRewritten modifiersub_B28F20 (setRewrittenModifier)
+0x144 BSecondary register fieldDirect read
+0x208 BOperand array base pointersub_B28F30 base address
+0x284 BTotal operand countPart of sub_B28F40 computation
+0x48varOperand mapping table (4 B per slot)sub_BA9CF0 writes here
+0x5C4 BExplicit operand countsub_B28F50 returns this
+0x8C4 BFlags wordBit 0 = rewritten (set by sub_BA9C30)

Each operand is a 32-byte record at base + 32 * index:

Operand offsetSizeContent
+01 BType tag (1=imm, 2=reg, 3=constImm, 10=pred, 15=trueConst, ...)
+44 BPrimary value (register ID; 1023 = wildcard / any-reg)
+204 BSecondary value (modifier / sub-register)

Code Duplication

The pattern matchers exhibit extreme structural duplication. Groups of 2--10 functions are near-identical clones differing only in numeric constants (the specific opcode/modifier values they check, the template ID they assign, and the priority level).

Observed clone clusters in sub_169B190's matchers:

Cluster sizeCountByte size eachAddress range example
~5,560 B5 functions5,5600x167CBB0--0x16E7D20
~5,282 B10 functions5,2820x167E3A0--0x16807E0
~5,298 B4 functions5,2980x16EA5F0--0x16ECA30
~5,846 B3 functions5,8460x16EDC00--0x16EE8B0
~2,718 B7 functions2,7180x166F260--0x1692B60
~2,604 B6 functions2,6040x166AC30--0x166E170

Similarly, in sub_198BCD0's matchers, eight functions of exactly 5,282 bytes each (sub_1982810, sub_1982AE0, sub_1982DB0, sub_1983080, sub_1984B40, sub_1984E10, sub_19850E0, sub_19853B0) share identical structure, varying only in the opcode/modifier constants passed to sub_10AE5C0.

This strongly suggests compiler-generated code from C++ templates or macros that instantiate one matcher function per instruction variant from ISA specification tables -- a pattern consistent with NVIDIA's internal build tooling.

Size Distribution of Matchers

SM120 matchers (1,087 functions, 429 KB)

Size rangeCountDescription
< 200 B37Simple 1--2 modifier checks
200--400 B520Typical 4--8 modifier checks
400--600 B4556--12 modifier checks + operand validation
600--800 B66Complex multi-operand patterns
> 800 B9Deepest nesting, most constrained patterns

Generic matchers (762 functions, ~310 KB)

Size rangeCountDescription
~2,200 Bmost common2--4 instruction field checks
~2,800 BmoderatePatterns with operand constraints
~3,500--4,000 BfewerComplex multi-operand patterns
~5,500--8,500 Brare12+ modifier checks, 8--9 operands

Post-schedule matchers (~1,336 functions)

Size rangeCountDescription
~2,200 Bmost commonSimple 2-instruction patterns
~2,500 Bcommon3-instruction patterns
~3,100 BmoderatePatterns with predicate checks
~5,300 BfewMulti-instruction sequences (8+ operands)
~6,800 B1Largest matcher (sub_1980D10)

Representative Matcher Examples

Simplest: sub_143C3B0 (132 bytes, priority 2, template 1)

Checks: no explicit operands, 2 total slots, first operand is register-or-immediate with register class 1023 or 1. Matches a trivial mov-type instruction for passthrough combining.

Moderate: sub_13CF0C0 (426 bytes, priority 15, template 28)

Checks 5 modifiers: slot 0xD3 == 1181, slot 0xD2 == 1177, slot 0x0C == 59, slot 0xB3 == 772, slot 0xC8 == 1107. Then validates 1 explicit register operand plus 4 additional operands (register, register, immediate, predicate).

Complex: sub_1615980 (priority 36, template 25 -- highest observed priority)

Checks 12 modifier slots: 0x05 == 12, 0xDC == 1206, 0x253 in {2937,2938}, 0x126 == 1493, 0xF2 in {1281,1282}, 0x163 == 1943, 0x178 == 2035, 0x179 in {2037..2041}, 0x1AD in {2253..2257}, 0x7E in {547,548}, 0x19D in {2167,2168}, 0x18D == 2115. No fixed operands, 7 variable operands, each of type 10 (constant memory) with register class 1023 or specific flag constraints. This is the most constrained pattern observed -- likely a fully specified tensor instruction variant.

Post-schedule: sub_1834600 (pattern 17, priority 16)

Checks modifier slots 0xD3 == 1181, 0xD2 == 1177, 0x0C in {60,61}, 0xB3 == 772, 0xC8 == 1107. Then: first operand offset == 1, that operand is immediate, total operand count == 5, followed by register pattern checks.

Infrastructure Helper Functions

Core accessor (sub_10AE5C0, 60 bytes)

The single most-called function in the peephole subsystem (30,768 callers across the full binary). Queries a property of an instruction node by slot ID:

int queryModifier(int64_t ctx, int64_t instr, int slot) {
    if (hasProperty(instr, slot))        // sub_10E32E0
        return getPropertyValue(instr, slot); // sub_10D5E60
    return 0xFFFFFFFF;                   // property not present
}

Node accessors

FunctionSizeSemanticsCall frequency
sub_B28F3012 BgetOperand(instr, idx) -- returns *(instr+32) + 32*idx31,399
sub_B28F4010 BgetTotalOperandSlots(instr) -- returns *(instr+40)+1 - *(instr+92)~2,500
sub_B28F504 BgetExplicitOperandCount(instr) -- returns *(instr+92)~2,100

Rewrite helpers

FunctionSemanticsCall frequency in sub_143C440
sub_B28F10setRewrittenOpcode(instr, byte) -- writes instr[14]1,759
sub_B28F20setRewrittenModifier(instr, byte) -- writes instr[15]1,759
sub_BA9CF0setOperandMapping(instr, slot, val) -- writes instr[72+4*slot]993
sub_BA9C30markRewrittenSimple(instr) -- instr[140] |= 11,222
sub_BA9CB0markRewrittenComplex(instr) -- priority-aware flag update361

The ratio of markRewrittenSimple (1,222) to markRewrittenComplex (361) shows that approximately 77% of rewrites are straightforward replacements, while 23% involve priority negotiation with competing rewrites.

Call Frequency in sub_169B190 (Generic Dispatcher)

CalleeCountRole
sub_B28F10 (setRewrittenOpcode)2,142Write new opcode byte
sub_B28F20 (setRewrittenModifier)2,142Write new modifier byte
sub_15F59B0 (getOperandValue)1,736Extract register number
sub_10AE5C0 (queryModifier)1,303Read instruction property
sub_B28F30 (getOperand)1,281Get operand pointer
sub_BA9C30 (markRewrittenSimple)1,261Simple rewrite commit
sub_BA9CF0 (setOperandMapping)855Map operand slots
sub_BA9CB0 (markRewrittenComplex)589Priority-aware commit

Relationship to Instruction Encoding

Each dispatch function's address range is adjacent to a zone of SASS instruction encoders that consume the rewritten instructions:

  • sub_143C440 (SM120) sits before 123 SM120 encoders at 0x14771E0--0x14A3C80 (180 KB), covering 82 unique SASS opcodes with up to 42 encoding variants per opcode.
  • sub_169B190 (generic) sits before 100 encoding table entries at 0x16DF750--0x16FFFF0 and 36 template expanders at 0x1700000--0x1722D60.
  • sub_198BCD0 (post-schedule) operates on already-scheduled instructions, performing strength reduction and idiom recognition on the final instruction stream.

The encoders are called via vtable dispatch, not directly from the peephole functions. Each encoder packs a 128-bit SASS instruction word using sub_7B9B80(state, bit_offset, bit_width, value) for bit-field insertion.

Function Map

AddressSizeIdentityConfidence
sub_B1293011 BEntry trampoline for generic peepholeCERTAIN
sub_B1294011 BEntry trampoline for SM120 peepholeCERTAIN
sub_B1296011 BEntry trampoline for post-schedule peepholeCERTAIN
sub_169B190280 KBGeneric peephole mega-dispatcherHIGH
sub_143C440233 KBSM120 peephole mega-dispatcherHIGH
sub_198BCD0233 KBPost-schedule peephole mega-dispatcherHIGH
sub_10AE5C060 BqueryModifier(ctx, instr, slot)HIGH
sub_B28F10smallsetRewrittenOpcode(instr, byte)HIGH
sub_B28F20smallsetRewrittenModifier(instr, byte)HIGH
sub_B28F3012 BgetOperand(instr, idx)CERTAIN
sub_B28F4010 BgetTotalOperandSlots(instr)CERTAIN
sub_B28F504 BgetExplicitOperandCount(instr)CERTAIN
sub_BA9C30smallmarkRewrittenSimple(instr)HIGH
sub_BA9CB0smallmarkRewrittenComplex(instr)HIGH
sub_BA9CF0smallsetOperandMapping(instr, slot, value)HIGH
sub_13B9CC0smallgetRegisterClass(field)HIGH
sub_13B9CD0smallisRegister(byte)HIGH
sub_13B9CE0smallisImmediate(byte)HIGH
sub_13B9D00smallisRegisterOrImmediate(byte)HIGH
sub_13B9D10smallisConstantBuffer(byte)HIGH
sub_13B9D40smallisPredicate(byte)HIGH
sub_13B9D50smallisUniformRegister(byte)HIGH
sub_13B9DC0smallopcodeIdentity(uint) -- passthroughCERTAIN
sub_1909030smallopcodePassthrough (post-schedule context)HIGH

Macro Instruction Expansion (sub_8127C0)

Separate from the three pattern-match-and-rewrite mega-dispatchers, ptxas contains a dedicated macro instruction expansion pass at sub_8127C0 (10,720 bytes). This pass resolves register-file constraints for composite instructions -- cases where source or destination operands span register files or where multi-word results need splitting into narrower instruction sequences.

It is called from the master lowering dispatcher sub_8380A0 and runs before instruction scheduling.

Two-phase algorithm

Phase 1 -- Operand scanning and constraint annotation. The pass iterates every instruction in the function's linked list (traversing via instr+8). For each instruction, it reads the opcode at instr+72 and dispatches through a 15-family if-else cascade. For each opcode, it calls sub_812550 (getOperandConstraint) on each source operand to determine register-file affinity:

Return valueMeaning
0Unconstrained
-2Constrained to register file B (e.g., even-aligned pair)
-3Constrained to register file A (e.g., odd-aligned pair)
-1Conflict / unresolvable

The pass annotates register descriptor entries (indexed via ctx+88) at reg+76 (constraint code) and reg+80 (target width code), and builds a linked list of instructions requiring expansion (linked via instr+56). Registers consumed by expansion are marked dead (reg+64 = 5).

Phase 2 -- Instruction rewriting. If any instruction requires expansion, the pass iterates the worklist and performs actual rewrites: replacing composite instructions with equivalent sequences, inserting new instructions via the sub_930040 / sub_92FF10 / sub_92E720 emitters, and deleting originals via sub_9253C0. Register-file mapping uses two lookup tables: dword_21D5EE0[26] (for constraint -2) and dword_21D5F60[16] (for constraint -3).

Between phases, a cleanup loop removes worklist entries with conflicting constraints (both operands invalid), resetting reg+76 = -1.

Opcodes handled

OpcodeMnemonicExpansion pattern
10SHFThree-source constraint check; emits I2IP (36) + new SHF when sources span register files
18FSETPPredicate operand finalization when operand count == 6 and modifier bits match
29PMTRIGLast-operand extraction and finalization
36I2IPDestination register marking and two-source constraint checking
60LEPCStore/load legalization: validates flags, checks register file == 6, recursive chain validation via sub_812480
62, 78, 79BAR_INDEXED, RTT, BSYNCSame legalization path as LEPC
95, 96STS, LDGLast-operand extraction for stores; two-source vector-width constraint checking for loads
97STGSource registration for expansion tracking
130HSET2Validates single-def destination, recursive source constraint chains; inserts HSET2 rewrites or converts to opcode-201 stores
137SM73_FIRSTSame path as HSET2
149UFLOTwo-source validation; marks destination with width code 20; vectorization combining
151UIMADShared three-source path with SHF
190LDGDEPBARShared last-operand path with PMTRIG
201, 202QMMA_16816, QMMA_16832Full multi-operand legalization; inserts barrier instructions for QMMA
283UVIADDPenultimate operand extraction and type resolution
290MOV (sm_104)Same constraint path as SHF/UIMAD
bit 12 set(arch-specific)Last-operand extraction for architecture-extended instructions

sub_812550 -- getOperandConstraint

The single most-called helper (32 call sites), this 40-byte function reads the constraint code from the register descriptor for a given operand reference:

int getOperandConstraint(int64_t ctx, uint32_t *operand_ref) {
    int modifier_bits = operand_ref[1];
    int constraint = reg_array[*operand_ref & 0xFFFFFF].constraint;  // reg+76
    if ((modifier_bits & 0xFE000000) == 0)
        return constraint;      // no sub-register modifier => raw value
    // Apply modifier-aware transformations:
    //   constraint -2 + certain modifier combos => -3 or -1
    //   constraint -3 + modifier bit 0x3C000000 => -1; + sign bit => -2
    ...
}

sub_812480 -- validateOperandChain

Recursively walks use-def chains through HSET2 (130) and SM73_FIRST (137) instructions to verify that an entire operand chain is compatible with a target register file. Uses sub_A9BD00 to resolve the register file for a width code, then checks reg+76 and reg+80 agreement.

Knob gate

Option 183 (target profile offset 13176) controls the expansion distance threshold. When enabled, a secondary value at profile+13184 sets the maximum distance between a register definition and its use before the constraint is considered violated. Default threshold: 7.

Function map

AddressSizeIdentityConfidence
sub_8127C010,720 BExpandMacroInstructions (main pass)HIGH
sub_81255040 BgetOperandConstraintHIGH
sub_812480~170 BvalidateOperandChainHIGH
sub_8125E0~450 BcanExpandStoreChainMEDIUM
sub_800470smallisLegalizableMEDIUM
sub_800360smallresolveOperandTypeMEDIUM
sub_800400smallfinalizeOperandMEDIUM

Cross-References

Evidence Index

ClaimSource
sub_143C440 structure, 1,087 matchers, 373-case switchp1.20-sweep-0x13CF000-0x14A4000.txt lines 1--486
SM120 encoder zone (123 functions, 180 KB)p1.20 lines 269--329
sub_169B190 structure, 762 matchers, 280 KBp1.22 lines 1--460, p1.23 lines 1--588
Generic operand discriminators (sub_15F59C0 family)p1.22 lines 181--201
Clone clusters in generic matchersp1.23 lines 156--174
Post-schedule discriminators (sub_1820170 family)p1.25 lines 271--289
sub_198BCD0 structure, 1,336 callees, 373-case switchp1.26 lines 355--398
Post-schedule 5,282-byte clone groupp1.26 lines 401--424
Rewrite helper call frequenciesp1.20 lines 216--227, p1.23 lines 228--237
Priority 36 as highest observedp1.22 lines 316--327
Instruction node layoutp1.20 lines 406--420, p1.22 lines 367--409