Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Expression & Constant Codegen

The central expression emitter sub_128D0F0 (56 KB, 1751 decompiled lines) is the single function responsible for translating every C/C++ expression in the EDG AST into LLVM IR. It is a large recursive two-level switch: the outer switch classifies the expression node kind (operation, literal, member access, call, etc.), and the inner switch dispatches across 40+ C operators to emit the corresponding LLVM IR instruction sequences. Every named temporary in the output (%arraydecay, %land.ext, %sub.ptr.div, %cond, etc.) originates from explicit SetValueName calls within this function, closely mirroring Clang's IRGen naming conventions.

Two companion subsystems handle specialized expression domains: bitfield codegen (sub_1282050 store, sub_1284570 load) lowers C bitfield accesses to shift/mask/or sequences, and constant expression codegen (sub_127D8B0, 1273 lines) produces llvm::Constant* values for compile-time evaluable expressions. Cast codegen (sub_128A450, 669 lines) maps every C cast category to the appropriate LLVM cast opcode.

Master dispatchersub_128D0F0EmitExpr (56 KB, address 0x128D0F0)
Bitfield storesub_1282050EmitBitfieldStore (15 args, R-M-W sequence)
Bitfield loadsub_1284570EmitBitfieldLoad (12 args, extract sequence)
Constant expressionssub_127D8B0EmitConstExpr (1273 lines, recursive)
Cast/conversionsub_128A450EmitCast (669 lines, 11 LLVM opcodes)
Bool conversionsub_127FEC0EmitBoolExpr (expr to i1)
Literal emissionsub_127F650EmitLiteral (numeric/string constants)

Master Expression Dispatcher

Reconstructed signature

// sub_128D0F0
llvm::Value *EmitExpr(CodeGenState **ctx, EDGExprNode *expr,
                      llvm::Type *destTy, unsigned flags, unsigned flags2);

The ctx parameter is a pointer-to-pointer hierarchy:

OffsetField
*ctxIRBuilder state (current function, insert point)
ctx[1]Debug info context: [0] = debug scope, [1] = current BB, [2] = insertion sentinel
ctx[2]LLVM module/context handle

EDG expression node layout

Every expression node passed as expr has a fixed layout:

OffsetSizeField
+0x008Type pointer (EDG type node)
+0x181Outer opcode (expression kind byte)
+0x191Flags byte
+0x2412Source location info
+0x381Inner opcode (operator sub-kind, for kind=1)
+0x488Child/operand pointer

Type nodes carry a tag at offset +140: 12 = typedef alias (follow +160 to unwrap), 1 = void. The typedef-stripping idiom appears 15+ times throughout the function:

// Type unwrapping — strips typedef aliases to canonical type
for (Type *t = expr->type; *(uint8_t*)(t + 140) == 12; t = *(Type**)(t + 160))
    ;

Outer switch — expression categories

The byte at expr+0x18 selects the top-level expression category:

KindCategoryHandler
0x01Operation expressionInner switch on expr+0x38 (40+ C operators)
0x02Literal constantEmitLiteral (sub_127F650)
0x03Member/field accessEmitAddressOf + EmitLoadFromAddress
0x11Call expressionEmitCall (sub_1296570)
0x13Init expressionEmitInitExpr (sub_1281220)
0x14Declaration referenceEmitAddressOf + EmitLoadFromAddress
defaultFatal: "unsupported expression!"

Inner switch — complete opcode reference

When the outer kind is 0x01 (operation), the byte at expr+0x38 selects which C operator to emit. The complete dispatch table follows. Every opcode is listed; no gaps exist between documented entries.

OpcodeC operatorHandler / delegateLLVM pattern
0x00Constant subexprsub_72B0F0 (evaluate) + sub_1286D80 (load)Constant materialization
0x03Compound special AEmitCompoundAssign (sub_1287ED0)Read-modify-write
0x05Dereference (*p)Elide if child is &: IsAddressOfExpr (sub_127B420). Otherwise: recursive EmitExpr + EmitLoad (sub_128B370)%val = load T, ptr %p
0x06Compound special BEmitCompoundAssign (sub_1287ED0)Read-modify-write
0x08Compound special CEmitCompoundAssign (sub_1287ED0)Read-modify-write
0x15Array decaySee Array decay%arraydecay = getelementptr inbounds ...
0x19Parenthesized (x)Tail-call optimization: a2 = child, restart loop(no IR emitted)
0x1Asizeof / alignofEmitSizeofAlignof (sub_128FDE0)Constant integer
0x1CBitwise NOT (~x)sub_15FB630 (xor with -1)%not = xor i32 %x, -1
0x1DLogical NOT (!x)Two-phase: EmitBoolExpr + zext%lnot = icmp eq ..., 0 / %lnot.ext = zext i1 ... to i32
0x1EType-level constConstantFromType (sub_127D2C0)Compile-time constant
0x1FType-level constConstantFromType (sub_127D2C0)Compile-time constant
0x23Pre-increment ++xEmitIncDec (sub_128C390): prefix=1, inc=1%inc = add ... / %ptrincdec = getelementptr ...
0x24Pre-decrement --xEmitIncDec (sub_128C390): prefix=0, inc=0%dec = sub ... / %ptrincdec = getelementptr ...
0x25Post-increment x++EmitIncDec (sub_128C390): prefix=1, inc=0Returns old value; %inc = add ...
0x26Post-decrement x--EmitIncDec (sub_128C390): prefix=0, inc=1Returns old value; %dec = sub ...
0x27-0x2B+, -, *, /, %EmitBinaryArithCmp (sub_128F9F0)add/sub/mul/sdiv/srem (or u/f variants)
0x32Comma (a, b)Emit both sides; return RHS(LHS discarded)
0x33Subscript a[i]EmitSubscriptOp (sub_128B750): GEP + load%arrayidx = getelementptr ... + load
0x34Pointer subtractionSee Pointer subtraction%sub.ptr.div = sdiv exact ...
0x35-0x39==, !=, <, >, <=, >=EmitBinaryArithCmp (sub_128F9F0)icmp eq/ne/slt/sgt/sle/sge (or u/f variants)
0x3A<<EmitShiftOrBitwise (sub_128F580): triple (1, 32, 32)shl
0x3B>>EmitShiftOrBitwise (sub_128F580): triple (14, 33, 33)ashr (signed) / lshr (unsigned)
0x3C&EmitShiftOrBitwise (sub_128F580): triple (2, 38, 34)and
0x3D^EmitShiftOrBitwise (sub_128F580): triple (4, 40, 36)xor
0x3E|EmitShiftOrBitwise (sub_128F580): triple (3, 39, 35)or
0x3FRotateEmitShiftOrBitwise (sub_128F580): triple (5, 41, 37)llvm.fshl / llvm.fshr
0x41-0x46Type-level constsConstantFromType (sub_127D2C0)Compile-time constant
0x49Member access ./->See Member accessgetelementptr + load (or bitfield path)
0x4A+=EmitCompoundAssignWrapper (sub_12901D0) + sub_1288F60Load + add + store
0x4B-=EmitCompoundAssignWrapper (sub_12901D0) + sub_1288370Load + sub + store
0x4C*=EmitCompoundAssignWrapper (sub_12901D0) + sub_1288770Load + mul + store
0x4D/=EmitCompoundAssignWrapper (sub_12901D0) + sub_1289D20Load + div + store
0x4E%=EmitCompoundAssignWrapper (sub_12901D0) + sub_1288DC0Load + rem + store
0x4F&=EmitCompoundAssignWrapper (sub_12901D0) + sub_1288B70Load + and + store
0x50|=EmitCompoundAssignWrapper (sub_12901D0) + sub_1289360Load + or + store
0x51<<=EmitCompoundAssignWrapper (sub_12901D0) + sub_1288090Load + shl + store
0x52>>=EmitCompoundAssignWrapper (sub_12901D0) + sub_1287F30Load + ashr/lshr + store
0x53^=EmitCompoundAssignWrapper (sub_12901D0) + sub_1288230Load + xor + store
0x54,= (rare)EmitCompoundAssignWrapper (sub_12901D0) + sub_128BE50Comma-compound
0x55[]= (subscript compound)EmitCompoundAssignWrapper (sub_12901D0) + sub_128B750GEP + R-M-W
0x56Bitfield assignSee Bitfield CodegenR-M-W sequence
0x57Logical AND &&See Logical ANDland.rhs/land.end + PHI
0x58Logical OR ||See Logical ORlor.rhs/lor.end + PHI
0x59, 0x5A, 0x5DType-level constsConstantFromType (sub_127D2C0)Compile-time constant
0x5BStatement expression ({...})EmitStmtExpr (sub_127FF60); create empty BB if (*a1)[7] == 0Body emission
0x5C, 0x5E, 0x5FCompound specialEmitCompoundAssign (sub_1287ED0)Read-modify-write
0x67Ternary ?:See Ternary operatorcond.true/cond.false/cond.end + PHI
0x68Type-level constConstantFromType (sub_127D2C0)Compile-time constant
0x69Special constEmitSpecialConst (sub_1281200)Constant materialization
0x6FLabel address &&labelGCC extension: sub_12A4D00 (lookup) + sub_1285E30(builder, label, 1)blockaddress(@fn, %label)
0x70Label valuesub_12A4D00 + sub_12812E0(builder, label, type)Indirect goto target
0x71Computed goto goto *psub_12A4D00 + sub_1285E30(builder, label, 0)indirectbr
0x72va_argsub_12A4D00 on va_list child + sub_1286000va_arg lowering
defaultFatalDiag (sub_127B550)"unsupported operation expression!"

Shift and bitwise triple encoding

The EmitShiftOrBitwise (sub_128F580) triple (signedOp, intOp, fpOp) encodes three things: signedOp controls signed-vs-unsigned selection for right shift (14 selects ashr for signed, lshr for unsigned), intOp is the LLVM integer opcode number, and fpOp is the floating-point variant (unused for shift/bitwise but present for uniformity).

Increment / decrement detail

EmitIncDec (sub_128C390, 16 KB) handles integer, floating-point, and pointer types. It reads the expression type to select the arithmetic operation:

  • Integer path: add/sub nsw i32 %x, 1 with name "inc" or "dec". For prefix variants, the incremented value is returned; for postfix, the original value is returned and the increment is stored.
  • Floating-point path: fadd/fsub float %x, 1.0 with the same return-value semantics.
  • Pointer path: getelementptr inbounds T, ptr %p, i64 1 (or i64 -1 for decrement) with name "ptrincdec". Element type comes from the pointed-to type.

All paths load the current value, compute the new value, store back, and return either old or new depending on prefix/postfix.

Compound assignment wrapper mechanics

EmitCompoundAssignWrapper (sub_12901D0) implements the common load-compute-store pattern for all compound assignment operators (+=, -=, etc.):

// sub_12901D0 pseudocode
Value *EmitCompoundAssignWrapper(ctx, expr, impl_fn, flags) {
    Value *addr = EmitAddressOf(ctx, expr->lhs);     // sub_1286D80
    Value *old_val = EmitLoadFromAddress(ctx, addr);  // sub_1287CD0
    Value *rhs_val = EmitExpr(ctx, expr->rhs);        // sub_128D0F0 (recursive)
    Value *new_val = impl_fn(ctx, old_val, rhs_val);  // per-operator function
    EmitStore(ctx, new_val, addr);                     // store back
    return new_val;
}

Each impl_fn is a small function (typically 200-400 lines) that handles integer/float type dispatch and signedness. For example, sub_1288F60 (AddAssign) selects between add, fadd, and pointer-GEP addition.

Member access multi-path handler

Opcode 0x49 handles struct field access (. and ->) through a multi-path dispatcher:

  1. Simple scalar field (field count == 1): Computes field address via EmitAddressOf (sub_1286D80), checks the volatile bit (v349 & 1), copies 12 DWORDs of field descriptor into the local frame, then loads via EmitLoadFromAddress (sub_1287CD0).

  2. Bitfield field: If the field descriptor indicates a bitfield, routes to EmitBitfieldAccess (sub_1282050) which emits the shift/mask extraction sequence.

  3. Nested/union access (field count > 1): Calls ComputeCompositeMemberAddr (sub_1289860) for multi-level GEP computation, then EmitComplexMemberLoad (sub_12843D0).

  4. Write-only context: If the assignment bit (a2+25, bit 2) is set, returns null -- the caller only needs the address, not the loaded value.

Statement expression, label address, and va_arg

Statement expression (0x5B): Emits the compound statement body via EmitStmtExpr (sub_127FF60). If no return basic block exists yet ((*a1)[7] == 0), creates an anonymous empty BB via CreateBasicBlock + SetInsertPoint to serve as the fall-through target. The value of the last expression in the block is the statement expression's result.

Label address (0x6F): Implements the GCC &&label extension. Looks up the label via LookupLabel (sub_12A4D00), then creates a blockaddress(@current_fn, %label) constant via sub_1285E30(builder, label, 1). The second argument 1 distinguishes "take address" from "goto to".

Computed goto (0x71): The goto *ptr extension. Same LookupLabel call, but sub_1285E30(builder, label, 0) with flag 0 emits an indirectbr instruction targeting the resolved label.

va_arg (0x72): Extracts the va_list child node at +72, its sub-child at +16, resolves both via sub_12A4D00, then calls EmitVaArg (sub_1286000) which lowers to a va_arg LLVM instruction with the appropriate type.

Constant vs. instruction dispatch

Throughout all operator emission, a consistent pattern selects between constant folding and IR instruction creation. The byte at Value+16 encodes the LLVM Value subclass kind: values <= 0x10 are constants (ConstantInt, ConstantFP, etc.) and values > 0x10 are instructions. This check appears 20+ times throughout the function, always with the same structure:

// Constant-fold or emit IR? Decision pattern (appears 20+ times)
if (*(uint8_t*)(value + 16) > 0x10) {
    // Real IR instruction -- create via IR builder
    result = CreateCast(opcode, value, destTy, &out, 0);    // sub_15FDBD0
    result = CreateBinOp(opcode, lhs, rhs, &out, 0);       // sub_15FB440
} else {
    // Compile-time constant -- constant-fold at LLVM ConstantExpr level
    result = ConstantExprCast(opcode, value, destTy, 0);    // sub_15A46C0
    result = ConstantFoldBinOp(lhs, rhs, 0, 0);            // sub_15A2B60
}

The dispatch table for the constant-fold vs IR-instruction paths:

OperationIR path (Value > 0x10)Constant path (Value <= 0x10)
Binary opCreateBinOp (sub_15FB440)ConstantFoldBinOp (sub_15A2B60)
Unary NOTCreateUnaryOp (sub_15FB630)ConstantFoldUnary (sub_15A2B00)
CastCreateCast (sub_15FDBD0)ConstantExprCast (sub_15A46C0)
Int comparesub_15FEC10(op=51, pred)sub_15A37B0(pred, lhs, rhs)
Float comparesub_15FEC10(op=52, pred)sub_15A37B0(pred, lhs, rhs)
Sub (constant)CreateBinOp(13=Sub)ConstantFoldSub (sub_15A2B60)
SDiv exactCreateBinOp(18=SDiv) + SetExactFlagConstantFoldSDiv (sub_15A2C90)

When the constant path is taken, no LLVM instruction is created and no BB insertion occurs -- the result is a pure llvm::Constant* that can be used directly. This is critical for expressions like sizeof(int) + 4 where no runtime code should be emitted.

Key Expression Patterns

Array decay

Opcode 0x15. Converts an array lvalue to a pointer to its first element.

When IsArrayType (sub_8D23B0) confirms the source is an array type, the emitter creates an inbounds GEP with two zero indices. The GEP instruction is constructed manually: allocate 72 bytes for 3 operands via AllocateInstruction, compute the result element type, propagate address space qualifiers from the source, then fill operands (base, i64 0, i64 0) and mark inbounds:

%arraydecay = getelementptr inbounds [N x T], ptr %arr, i64 0, i64 0

If the source is already a pointer type (not an array), the function either passes through directly or inserts a ptrtoint / zext if the types differ.

Pointer subtraction

Opcode 0x34. The classic 5-step Clang pattern for (p1 - p2):

%sub.ptr.lhs.cast = ptrtoint ptr %p1 to i64
%sub.ptr.rhs.cast = ptrtoint ptr %p2 to i64
%sub.ptr.sub      = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
%sub.ptr.div      = sdiv exact i64 %sub.ptr.sub, 4    ; element_size=4 for int*

Step 5 (the sdiv exact) is skipped entirely when the element size is 1 (i.e., char* arithmetic), since division by 1 is a no-op. The element size comes from the pointed-to type at offset +128. The exact flag on sdiv tells the optimizer that the division is known to produce no remainder -- a critical optimization hint.

Logical AND (short-circuit)

Opcode 0x57. Creates two basic blocks and a PHI node for C's short-circuit && evaluation:

entry:
    %lhs = icmp ne i32 %a, 0
    br i1 %lhs, label %land.rhs, label %land.end

land.rhs:
    %rhs = icmp ne i32 %b, 0
    br label %land.end

land.end:
    %0 = phi i1 [ false, %entry ], [ %rhs, %land.rhs ]
    %land.ext = zext i1 %0 to i32

The construction sequence:

  1. Create blocks land.end and land.rhs via CreateBasicBlock (sub_12A4D50).
  2. Emit LHS as boolean via EmitBoolExpr (sub_127FEC0).
  3. Conditional branch: br i1 %lhs, label %land.rhs, label %land.end.
  4. Switch insertion point to %land.rhs.
  5. Emit RHS as boolean.
  6. Unconditional branch to %land.end.
  7. Switch to %land.end, construct PHI with 2 incoming edges.
  8. Zero-extend the i1 PHI result to the expression's declared type (i32 typically) with name land.ext.

The PHI node is allocated as 64 bytes via AllocatePHI (sub_1648B60), initialized with opcode 53 (PHI), and given a capacity of 2. Incoming values are stored in a compact layout: [val0, val1, ..., bb0, bb1, ...] where each value slot occupies 24 bytes (value pointer + use-list doubly-linked-list pointers), and basic block pointers form a parallel array after all value slots.

Logical OR (short-circuit)

Opcode 0x58. Identical structure to logical AND but with inverted branch sense: the TRUE outcome of the LHS branches to lor.end (short-circuits to true), and FALSE falls through to evaluate the RHS:

entry:
    %lhs = icmp ne i32 %a, 0
    br i1 %lhs, label %lor.end, label %lor.rhs

lor.rhs:
    %rhs = icmp ne i32 %b, 0
    br label %lor.end

lor.end:
    %0 = phi i1 [ true, %entry ], [ %rhs, %lor.rhs ]
    %lor.ext = zext i1 %0 to i32

Internally, the AND and OR paths share a common tail (merging at a single code point with a variable holding either "lor.ext" or "land.ext").

Ternary / conditional operator

Opcode 0x67. Constructs a full three-block diamond with PHI merge for a ? b : c:

entry:
    %cond.bool = icmp ne i32 %test, 0
    br i1 %cond.bool, label %cond.true, label %cond.false

cond.true:
    %v1 = <emit true expr>
    br label %cond.end

cond.false:
    %v2 = <emit false expr>
    br label %cond.end

cond.end:
    %cond = phi i32 [ %v1, %cond.true ], [ %v2, %cond.false ]

The function creates three blocks (cond.true, cond.false, cond.end), records which basic block each arm finishes in (since the true/false expression emission might create additional blocks), and builds the PHI from those recorded blocks. When one arm is void, the PHI is omitted and whichever arm produced a value is returned directly.

Logical NOT and bitwise NOT

Logical NOT (opcode 0x1D) is a two-phase emit:

%lnot     = icmp eq i32 %x, 0         ; Phase 1: convert to bool
%lnot.ext = zext i1 %lnot to i32      ; Phase 2: extend back to declared type

Phase 1 calls EmitBoolExpr which produces the icmp eq ... 0 comparison. Phase 2 zero-extends the i1 back to the expression's target type. If the value is already a compile-time constant, the constant folder handles it directly.

Bitwise NOT (opcode 0x1C) produces xor with all-ones:

%not = xor i32 %x, -1

Created via CreateUnaryOp (sub_15FB630) which synthesizes xor with -1 (all bits set). Optional zext follows if the result needs widening.

Dereference with address-of elision

Opcode 0x05. Before emitting a load for unary *, the function checks if the child is an address-of expression via IsAddressOfExpr (sub_127B420). If so, the dereference and address-of cancel out -- no IR is emitted, only a debug annotation is attached. This handles the common pattern *&x becoming just x.

Bitfield Codegen

Bitfield loads and stores are lowered to shift/mask/or sequences by two dedicated functions. A path selector CanUseFastBitfieldPath (sub_127F680) determines whether the bitfield fits within a single naturally-aligned container element (fast path) or must be processed byte-by-byte (general path).

EDG bitfield descriptor

The bitfield metadata object carries:

OffsetTypeField
+120qwordContainer type node
+128qwordByte offset within struct
+136byteBit offset within containing byte
+137byteBit width of the field
+140byteType tag (12 = array wrapper, walk chain)
+144byteFlags (bit 3 = signed bitfield)
+160qwordNext/inner type pointer

Fast path (single-container load)

When the bitfield plus its bit range fits within one container element, the fast path loads the entire container and extracts the field with a single shift and mask:

// Example: struct { unsigned a:3; unsigned b:5; } s;
// s.b: byte_offset=0, bit_offset=3, bit_width=5, container=i8

Load s.b (fast path):

%container  = load i8, ptr %s
%shifted    = lshr i8 %container, 3            ; "highclear" -- position field at bit 0
%result     = and i8 %shifted, 31              ; "zeroext" -- mask to 5 bits (0x1F)

The shift amount is computed as 8 * elem_size - bit_width - bit_offset - 8 * (byte_offset % elem_size). When this evaluates to zero, the lshr is constant-folded away.

For signed bitfields, the zero-extend is replaced with an arithmetic sign extension via shift-left then arithmetic-shift-right:

%shifted = lshr i8 %container, 3              ; "highclear"
%signext = ashr i8 %shifted, 5                ; "signext" -- propagates sign bit

Store s.b = val (fast path read-modify-write):

%container     = load i8, ptr %s
%bf.value      = and i8 %val, 31              ; mask to 5 bits
%cleared       = and i8 %container, 7         ; "bf.prev.cleared" -- clear bits [3:7]
%positioned    = shl i8 %bf.value, 3          ; "bf.newval.positioned"
%merged        = or  i8 %cleared, %positioned ; "bf.finalcontainerval"
store i8 %merged, ptr %s

The clear mask is ~((1 << bit_width) - 1) << bit_position). For containers wider than 64 bits, both the clear mask and the value mask are computed via APInt operations (sub_16A5260 to set bit range, sub_16A8F40 to invert).

Byte-by-byte path (spanning load)

When the bitfield spans multiple container elements, it is processed one byte at a time. Each iteration loads a byte, extracts the relevant bits, zero-extends to the accumulator width, shifts into position, and ORs into the running accumulator.

For example, a 20-bit field starting at byte 0, bit 0:

; Byte 0: bits [0:7]
%bf.base.i8ptr = bitcast ptr %s to ptr         ; pointer cast
%byte0.ptr     = getelementptr i8, ptr %bf.base.i8ptr, i64 0
%bf.curbyte.0  = load i8, ptr %byte0.ptr
%bf.byte_zext.0 = zext i8 %bf.curbyte.0 to i32
; accumulator = %bf.byte_zext.0 (shift=0 for first byte)

; Byte 1: bits [8:15]
%byte1.ptr     = getelementptr i8, ptr %bf.base.i8ptr, i64 1
%bf.curbyte.1  = load i8, ptr %byte1.ptr
%bf.byte_zext.1 = zext i8 %bf.curbyte.1 to i32
%bf.position.1  = shl i32 %bf.byte_zext.1, 8   ; "bf.position"
%bf.merge.1     = or  i32 %bf.byte_zext.0, %bf.position.1  ; "bf.merge"

; Byte 2: only 4 bits remain (20 - 16 = 4)
%byte2.ptr         = getelementptr i8, ptr %bf.base.i8ptr, i64 2
%bf.curbyte.2      = load i8, ptr %byte2.ptr
%bf.end.highclear  = lshr i8 %bf.curbyte.2, 4  ; "bf.end.highclear" -- clear top 4 bits
%bf.byte_zext.2    = zext i8 %bf.end.highclear to i32
%bf.position.2     = shl i32 %bf.byte_zext.2, 16
%bf.merge.2        = or  i32 %bf.merge.1, %bf.position.2

The byte-by-byte store path mirrors this in reverse: for boundary bytes (first and last), it loads the existing byte, masks out the target bits with AND, positions the new bits with SHL, and merges with OR. Middle bytes that are entirely overwritten skip the read-modify-write and store directly.

The bf.* naming vocabulary

All bitfield IR values use a consistent naming scheme:

NamePathMeaning
bf.base.i8ptrBothPointer cast to i8*
bf.curbyteLoadCurrent byte in iteration loop
bf.end.highclearLoadlshr to clear unused high bits in last byte
bf.byte_zextLoadzext of byte to accumulator width
bf.positionBothshl to position byte/value within accumulator/container
bf.mergeLoador to merge byte into accumulator
bf.highclearLoadlshr before sign extension
bf.finalvalLoadashr for sign extension
highclearLoad fastFast-path lshr to clear high bits
zeroextLoad fastFast-path zero-extend result
signextLoad fastFast-path ashr sign extension
bf.valueStoreand(input, width_mask) -- isolated field bits
bf.prev.clearedStore fastContainer with old field bits cleared
bf.newval.positionedStore fastNew value shifted to field position
bf.finalcontainervalStore fastor(cleared, positioned) -- final container
bf.reload.valStoreTruncated value for compound assignment reload
bf.reload.sextStoreSign-extended reload via shift pair
bassign.tmpStoreAlloca for temporary during bitfield assignment

Wide bitfield support (> 64 bits)

Both load and store functions handle bitfields wider than 64 bits through APInt operations. The threshold check width > 0x40 (64) appears throughout: values <= 64 bits use inline uint64_t masks computed as 0xFFFFFFFFFFFFFFFF >> (64 - width), while wider values allocate heap-backed APInt word arrays. Every code path carefully frees heap APInts after use. This supports __int128 bitfields in CUDA.

Volatile and alignment

Volatile detection uses a global flag at unk_4D0463C. When set, sub_126A420 queries whether the GEP target address is in volatile memory, propagating the volatile bit to load/store instructions. The alignment parameter for bitfield container loads must be 1; the function asserts on other values with "error generating code for loading from bitfield!".

Duplicate implementations

Two additional copies exist at sub_923780 (store) and sub_925930 (load) -- identical algorithms with the same string names, same opcodes, same control flow. These likely correspond to different template instantiations or address-space variants in the original NVIDIA source. The 0x92xxxx copies are in the main NVVM frontend region while the 0x128xxxx copies are in the codegen helper region.

Constant Expression Codegen

EmitConstExpr (sub_127D8B0) converts EDG constant expression AST nodes into llvm::Constant* values. It is recursive: aggregate initializers call it for each element.

// sub_127D8B0
llvm::Constant *EmitConstExpr(CodeGenState *ctx, EDGConstExprNode *expr,
                               llvm::Type *arrayElemTyOverride);

The constant kind byte at expr[10].byte[13] is the primary dispatch:

KindCategoryOutput type
1Integer constantConstantInt
2String literalConstantDataArray
3Floating-point constantConstantFP
6Address-of constantGlobalVariable*, Function*, or string global
0xAAggregate initializerConstantStruct, ConstantArray, or ConstantAggregateZero
0xENull/emptyReturns 0 (no constant)
defaultFatal: "unsupported constant variant!"

Integer constants

For normal integers (up to 64 bits), the value is extracted via edg::GetSignedIntValue or edg::GetUnsignedIntValue depending on signedness, masked to the actual bit width, and passed to ConstantInt::get(context, APInt).

For __int128 (type size == 16 bytes), the EDG IL stores the value as a decimal string. The path is: edg::GetIntConstAsString(expr) returns the decimal text, then APInt::fromString(128, str, len, radix=10) parses it into a 128-bit APInt. This string-based transfer suggests the EDG IL uses text encoding for portability of wide integers.

APInt memory management follows the standard pattern: values > 64 bits use heap-allocated word arrays (checked via width > 0x40). Every path frees heap APInts after consumption.

When the target LLVM type is a pointer (tag 15), the integer constant is first created, then ConstantExpr::getIntToPtr converts it.

String literals

The character width is determined from a lookup table qword_4F06B40 indexed by the encoding enum at expr[10].byte[8] & 7:

IndexWidthC type
01 bytechar / UTF-8
1platformwchar_t
21 bytechar8_t
3from globalplatform-dependent
4from globalplatform-dependent

The raw byte buffer is built by copying byte_count bytes from the EDG node, reading each character through edg::ReadIntFromBuffer(src, width) -- an endian-aware read function (the EDG IL may store string data in a platform-independent byte order). The buffer is then passed to ConstantDataArray::getRaw(data, byte_count) to create the LLVM constant.

For each character width, the LLVM element type is selected: i8 for 1-byte, i16 for 2-byte, i32 for 4-byte, i64 for 8-byte. Empty strings create zero-element arrays. If the array type override a3 provides a larger size than the literal, the remaining bytes are zero-filled.

Floating-point constants

Raw bit patterns are extracted via edg::ExtractFloatBits(kind, data_ptr), then reinterpreted into native float or double values:

EDG kindC typeConversion path
2floatBitsToFloat -> APFloat(float) -> IEEEsingle semantics
4doubleBitsToDouble -> APFloat(double) -> IEEEdouble semantics
6long doubleTruncated to double (with warning 0xE51)
7__float80Truncated to double (with warning 0xE51)
8, 13__float128Truncated to double (with warning 0xE51)

All extended-precision types (long double, __float80, __float128) are silently lowered through the double path. NVPTX has no hardware support for 80-bit or 128-bit floats, so CICC truncates them to 64-bit IEEE 754. When the compilation context has the appropriate flag (bit 4 at offset +198), a diagnostic warning is emitted identifying the specific type being truncated.

Address-of constants

Sub-dispatched by a byte at expr[11].byte[0]:

  • Byte 0 -- Variable/global reference: Calls GetOrCreateGlobalVariable (sub_1276020), returning a GlobalVariable* as a constant pointer. Debug info is optionally attached.
  • Byte 1 -- Function reference: Calls GetOrCreateFunction (sub_1277140). For static-linkage functions, resolves through LookupFunctionStaticVar.
  • Byte 2 -- String literal reference (&"..."): Validates the node kind is 2 (string), then calls CreateStringGlobalConstant (sub_126A1B0).

Post-processing applies a constant GEP offset if expr[12].qword[0] is nonzero, and performs pointer type cast if the produced type differs from the expected type. Same-address-space mismatches use ConstantExpr::getBitCast; cross-address-space mismatches use ConstantExpr::getAddrSpaceCast. Pointer-to-integer mismatches use ConstantExpr::getPtrToInt with address-space normalization to addrspace(0) first.

Aggregate initializers

The largest case (630+ lines). After stripping typedefs, dispatches on the canonical type tag at +140:

TagTypeOutput
10StructConstantStruct or ConstantAggregateZero
11UnionAnonymous {member_type, [N x i8]}
8ArrayConstantArray
12TypedefStrip and re-dispatch
otherFatal: "unsupported aggregate constant!"

Struct (tag 10): Walks the EDG field list and initializer list in parallel. The field chain is traversed via +112 pointers; the initializer list via +120 next pointers.

  • Padding/zero-width fields are skipped (flag byte at +146, bit 3).
  • For each non-bitfield field, GetFieldIndex (sub_1277B60) returns the LLVM struct element index. If gaps exist between the previous and current index, intermediate slots are filled with Constant::getNullValue (sub_15A06D0).
  • Each field's initializer is processed by recursive EmitConstExpr call.
  • Packed struct fields (flag at +145, bit 4) have their sub-elements extracted individually via ConstantExpr::extractvalue (sub_15A0A60).
  • Missing trailing fields are padded with null values.
  • If the struct has no fields and the initializer list is empty, returns ConstantAggregateZero::get (sub_1598F00) as a shortcut.
  • Final assembly: ConstantStruct::get (sub_159F090) with type compatibility check via Type::isLayoutIdentical (sub_1643C60). If packed, StructType::get(elts, n, true) (sub_15943F0).

Struct bitfield packing (post-processing)

When any bitfield field is detected during the main walk (flag bit 2, &4 at +144), the function re-enters a post-processing phase after the main field loop. This packs bitfield constant values byte-by-byte into the struct's byte array:

// Bitfield packing pseudocode — sub_127D8B0, case 0xA post-processing
StructLayout *layout = DataLayout::getStructLayout(structTy);  // sub_15A9930

for (each bitfield field where flag &4 at +144 && name at +8 is non-null) {
    uint32_t byte_offset = field->byte_offset;
    uint32_t elem_idx = StructLayout::getElementContainingOffset(layout, byte_offset);
                                                                // sub_15A8020
    // Validate the target byte is zero
    assert(elements[elem_idx] == ConstantInt::get(i8, 0),
           "unexpected error while initializing bitfield!");

    // Evaluate bitfield initializer
    Constant *val = EmitConstExpr(ctx, init_expr, 0);          // recursive
    assert(val != NULL, "bit-field constant must have a known value at compile time!");

    APInt bits = extractAPInt(val);  // at constant+24, width at constant+32
    uint8_t bit_width = field->bit_width;    // at +137
    if (bits.width > bit_width)
        bits = APInt::trunc(bits, bit_width);                  // sub_16A5A50

    // Pack into struct bytes, one byte at a time
    uint8_t bit_offset = field->bit_offset;  // at +136 (within first byte)
    while (remaining_bits > 0) {
        uint8_t available = (first_byte ? 8 - bit_offset : 8);
        uint8_t take = min(remaining_bits, available);

        APInt slice = bits;
        if (slice.width > take)
            slice = APInt::trunc(slice, take);                 // sub_16A5A50
        if (take < 8)
            slice = APInt::zext(slice, 8);                     // sub_16A5C50
        slice = slice << bit_offset;                           // shl
        existing_byte |= slice;                                // sub_16A89F0

        elements[byte_index] = ConstantInt::get(ctx, existing_byte);
        bits = bits >> take;                                   // sub_16A7DC0
        remaining_bits -= take;
        bit_offset = 0;       // subsequent bytes start at bit 0
        byte_index++;
    }
}

This implements the C standard's bitfield byte-packing model: bits are inserted starting at the field's bit_offset within its containing byte, potentially spanning multiple bytes. Values wider than 64 bits use heap-backed APInt word arrays.

Union (tag 11): Finds the initialized member via two paths:

  1. Designated initializer (kind 13): *(init+184) is the designated field, *(init+120) is the actual value expression.
  2. Implicit: Walk the field chain (type+160) looking for the first non-skip, non-bitfield field. Named bitfield members are explicitly rejected: "initialization of bit-field in union not supported!". If no field is found: "cannot find initialized union member!".

The member value is emitted recursively. Padding to the full union byte size is added as [N x i8] zeroinitializer. The result is an anonymous {member_type, [N x i8]} struct via ConstantStruct::getAnon (sub_159F090).

Array (tag 8): Resolves element type via GetArrayElementType (sub_8D4050), walks the initializer linked list via +120 next pointers, calls EmitConstExpr recursively for each element. Designated initializers (kind 11) are supported: *(node+176) gives the designated element index, *(node+184) gives the range count. Type mismatches are handled by sub_127D000 (resize constant to target type).

When the declared dimension exceeds the initializer count, remaining elements are filled with Constant::getNullValue. The result uses ConstantArray::get (sub_159DFD0) when all elements have the same LLVM type (the common case), or falls back to an anonymous struct via StructType::get + ConstantStruct::get for heterogeneous cases (which should not occur in well-formed C but is handled defensively).

Cast / Conversion Codegen

EmitCast (sub_128A450) handles every C-level cast category. The function first checks for early exits (skip flag, identity cast where source type equals destination type), then dispatches by source and destination type tags.

// sub_128A450
llvm::Value *EmitCast(CodeGenState **ctx, EDGCastNode *expr,
                      uint8_t is_unsigned, llvm::Type *destTy,
                      uint8_t is_unsigned2, char skip_flag,
                      DiagContext *diag);

Type classification

Type tags at *(type+8):

TagType
1-6Floating-point (1=half, 2=float, 3=double, 4=fp80, 5=fp128, 6=bf16)
11Integer (bit-width encoded in upper bits)
15Pointer
16Vector/aggregate

The test (tag - 1) > 5 means "NOT a float" (tags 1-6 are float types).

Tobool patterns

When the destination type is i1 (bool), the codegen produces comparison-against-zero:

Integer/float source (tags 1-6, 11):

%tobool = icmp ne i32 %val, 0          ; integer source
%tobool = fcmp une float %val, 0.0     ; float source

Float-to-bool uses fcmp une (unordered not-equal), which returns true for any non-zero value including NaN. Integer-to-bool uses icmp ne with a zero constant of matching type.

Pointer source (tag 15):

%tobool = icmp ne ptr %val, null

A shortcut exists: if the source expression is already a comparison result (opcode 61) and the source is already the bool type, the comparison result is returned directly without creating a new instruction.

Integer-to-integer (trunc / zext / sext)

The helper sub_15FE0A0 internally selects the operation based on relative widths:

  • dest_width < src_width -> trunc
  • dest_width > src_width AND unsigned -> zext
  • dest_width > src_width AND signed -> sext

All produce a value named "conv".

Pointer casts

Pointer-to-pointer: In LLVM opaque-pointer mode (which CICC v13 uses for modern SMs), same-address-space casts hit the identity return path and produce no IR. Cross-address-space casts use addrspacecast (opcode 47).

Pointer-to-integer: ptrtoint (opcode 45). Asserts that the destination is actually an integer type.

Integer-to-pointer: A two-step process. First, the integer is widened or narrowed to the pointer bit-width (32 or 64, obtained via sub_127B390). Then inttoptr (opcode 46) converts the properly-sized integer to a pointer:

%conv1 = zext i32 %val to i64          ; step 1: widen to pointer width
%conv  = inttoptr i64 %conv1 to ptr    ; step 2: int -> ptr

Float-to-integer and integer-to-float

Two paths exist for these conversions:

Standard path: Uses LLVM's native cast opcodes. Triggered when the global flag unk_4D04630 is set (relaxed rounding mode), or when the destination is 128-bit, or when the source is fp128:

DirectionSigned opcodeUnsigned opcode
int -> floatsitofp (39)uitofp (40)
float -> intfptosi (41)fptoui (42)

NVIDIA intrinsic path: For SM targets that require round-to-zero semantics on float-int conversions. Constructs an intrinsic function name dynamically and emits it as a plain function call:

// Name construction pseudocode
char buf[64];
if (src_is_double)  strcpy(buf, "__nv_double");
else                strcpy(buf, "__nv_float");

strcat(buf, is_unsigned ? "2u" : "2");

if (dest_bits == 64) strcat(buf, "ll_rz");
else                 strcat(buf, "int_rz");

Producing names like:

IntrinsicConversion
__nv_float2int_rzf32 -> i32, signed, round-to-zero
__nv_float2uint_rzf32 -> u32, unsigned, round-to-zero
__nv_double2ll_rzf64 -> i64, signed, round-to-zero
__nv_double2ull_rzf64 -> u64, unsigned, round-to-zero
__nv_float2ll_rzf32 -> i64, signed, round-to-zero

These are emitted as plain LLVM function calls (call i32 @__nv_float2int_rz(float %val)), not as LLVM intrinsics. The NVIDIA PTX backend later pattern-matches these __nv_ calls to cvt.rz.* PTX instructions. The intrinsic call is created by sub_128A3C0, which builds a function type, looks up or creates the declaration in the module, and emits a CallInst with one argument.

If the source integer is 32-bit but the target needs 64-bit conversion, the function first converts i32 to i64, then recursively calls itself to convert i64 to the target float type.

Float-to-float (fptrunc / fpext)

The source and destination type tags are compared directly. If the destination tag is larger (wider float), opcode 44 (fpext) is used. If smaller, opcode 43 (fptrunc).

%conv = fpext float %val to double       ; float -> double
%conv = fptrunc double %val to float     ; double -> float

Cast control flow summary

EmitCast(ctx, expr, is_unsigned, destTy, is_unsigned2, skip, diag)
  |
  +-- skip_flag set          --> return 0
  +-- destTy == BoolType?
  |     +-- src is float       --> fcmp une %val, 0.0    "tobool"
  |     +-- src is ptr/int     --> icmp ne %val, null/0  "tobool"
  +-- srcTy == destTy          --> return expr (identity)
  +-- ptr -> ptr               --> bitcast(47)           "conv"
  +-- ptr -> int               --> ptrtoint(45)          "conv"
  +-- int -> ptr               --> resize + inttoptr(46) "conv"
  +-- int -> int               --> trunc/zext/sext       "conv"
  +-- int -> float
  |     +-- standard           --> sitofp(39)/uitofp(40) "conv"
  |     +-- nvidia             --> __nv_*2*_rz call      "call"
  +-- float -> int
  |     +-- standard           --> fptosi(41)/fptoui(42) "conv"
  |     +-- nvidia             --> __nv_*2*_rz call      "call"
  +-- float -> float
        +-- wider              --> fpext(44)             "conv"
        +-- narrower           --> fptrunc(43)           "conv"

IR Instruction Infrastructure

BB insertion linked list

After creating any LLVM instruction, it must be inserted into the current basic block. This appears ~30 times across the expression codegen functions as a doubly-linked intrusive list manipulation. The low 3 bits of list pointers carry tag/flag bits (alignment guarantees valid pointers have zero in those positions):

// Repeated BB insertion pattern
Value *tail = ctx[1][1];           // current BB's instruction list tail
if (tail) {
    Value *sentinel = ctx[1][2];   // sentinel node
    InsertIntoBB(tail + 40, inst); // sub_157E9D0
    // Linked list fixup (doubly-linked with 3-bit tag):
    inst->prev = (*sentinel & ~7) | (inst->prev & 7);   // preserve tag bits
    inst->parent = sentinel;
    ((*sentinel & ~7) + 8) = inst + 24;    // old_tail.next = inst
    *sentinel = (*sentinel & 7) | (inst + 24);  // sentinel.head = inst
}

Instruction offsets: +24 = prev pointer, +32 = parent block, +48 = debug location metadata slot.

Debug metadata attachment

After every BB insertion, debug location metadata is cloned and attached:

SetValueName(inst, &name);                    // sub_164B780: e.g. "lnot.ext"
Value *debugLoc = *ctx_debug;
if (debugLoc) {
    Value *cloned = CloneDebugLoc(debugLoc, 2);  // sub_1623A60
    if (inst->debugLoc)
        ReleaseDebugLoc(inst + 48);              // sub_161E7C0: free old
    inst->debugLoc = cloned;
    if (cloned)
        RegisterDebugLoc(cloned, inst + 48);     // sub_1623210
}

Global flags

AddressPurpose
dword_4D04720 + dword_4D04658Debug info emission control. When both zero, source location is forwarded before dispatch
dword_4D04810Bitfield optimization flag. When set, enables bassign.tmp alloca path for bitfield assignments
unk_4D04630When set, forces standard LLVM casts (sitofp/fptosi) instead of __nv_*_rz intrinsics
unk_4D04700When set, marks tobool results as "potentially inexact" via flag bit
unk_4D0463CVolatile detection flag. When set, queries address volatility

Helper Function Reference

AddressRecovered nameRole
sub_128D0F0EmitExprMaster expression dispatcher (this page)
sub_128A450EmitCastAll C-level casts
sub_127D8B0EmitConstExprCompile-time constant expressions
sub_1282050EmitBitfieldStoreBitfield write (R-M-W)
sub_1284570EmitBitfieldLoadBitfield read (extract)
sub_127FEC0EmitBoolExprExpression to i1 conversion
sub_127F650EmitLiteralNumeric/string literal emission
sub_1286D80EmitAddressOfCompute pointer to lvalue
sub_1287CD0EmitLoadFromAddressLoad via computed address
sub_1287ED0EmitCompoundAssignGeneric compound assignment
sub_128C390EmitIncDecPre/post increment/decrement
sub_128F9F0EmitBinaryArithCmpBinary arithmetic and comparison
sub_128F580EmitShiftOrBitwiseShift and bitwise operators
sub_128B750EmitSubscriptOpArray subscript (GEP + load)
sub_128FDE0EmitSizeofAlignofsizeof and alignof operators
sub_12901D0EmitCompoundAssignWrapperWrapper dispatching to per-operator impl
sub_1296570EmitCallFunction call emission
sub_12897E0EmitBitfieldStore (inner)Actual bitfield store logic
sub_127A030GetLLVMTypeEDG type to LLVM type translation
sub_127F680CanUseFastBitfieldPathBitfield path selector
sub_128A3C0EmitIntrinsicConvCall__nv_*_rz intrinsic call helper
sub_12A4D50CreateBasicBlockCreate named BB
sub_12A4DB0EmitCondBranchConditional branch emission
sub_12909B0EmitUnconditionalBranchUnconditional branch emission
sub_1290AF0SetInsertPointSwitch current BB
sub_15FB440CreateBinOpBinary instruction creation
sub_15FDBD0CreateCastCast instruction creation (IR path)
sub_15A46C0ConstantExprCastCast (constant-fold path)
sub_15A0680ConstantInt::getInteger constant creation
sub_159C0E0ConstantInt::get (APInt)Wide integer constant creation
sub_159CCF0ConstantFP::getFloat constant creation
sub_128B370EmitLoadLoad with volatile/type/srcloc
sub_128BE50EmitCommaOpComma operator RHS extraction
sub_1289860ComputeCompositeMemberAddrMulti-level GEP for nested fields
sub_12843D0EmitComplexMemberLoadNested struct/union field load
sub_127FF60EmitStmtExprStatement expression body emission
sub_1281200EmitSpecialConstSpecial constant materialization
sub_1281220EmitInitExprInit expression emission
sub_1285E30EmitBlockAddressblockaddress / indirect branch
sub_1286000EmitVaArgva_arg lowering
sub_127FC40CreateAllocaAlloca with name and alignment
sub_127B420IsAddressOfExprCheck if child is & (for elision)
sub_127B3A0IsVolatileVolatile type query
sub_127B390GetSMVersionReturns current SM target
sub_127B460IsPackedPacked struct type query
sub_127B550FatalDiagFatal diagnostic (never returns)
sub_127C5E0AttachDebugLocDebug location attachment
sub_127D2C0ConstantFromTypeType-level constant (sizeof, etc.)
sub_12A4D00LookupLabelLabel resolution for goto/address
sub_1648A60AllocateInstructionRaw instruction memory allocation
sub_1648B60AllocatePHIPHI node memory allocation
sub_164B780SetValueNameAssigns %name to IR value
sub_157E9D0InsertIntoBasicBlockBB instruction list insertion
sub_1623A60CloneDebugLocDebug location cloning
sub_1623210RegisterDebugLocDebug location list registration
sub_161E7C0ReleaseDebugLocDebug location list removal
sub_15F1EA0InitInstructionInstruction field initialization
sub_15F1F50InitPHINodePHI node initialization (opcode 53)
sub_15F2350SetExactFlagMark sdiv/udiv as exact
sub_15F55D0GrowOperandListRealloc PHI operand array
sub_15FEC10CreateCmpInstICmp/FCmp instruction creation
sub_15FE0A0CreateIntResizeTrunc/zext/sext helper
sub_15FB630CreateUnaryOpUnary NOT (xor -1)
sub_15F9CE0SetGEPOperandsGEP operand filling
sub_15FA2E0SetInBoundsFlagMark GEP as inbounds
sub_8D23B0IsArrayTypeArray type check
sub_72B0F0EvaluateConstantExprEDG constant evaluation
sub_731770NeedsBitfieldTempBitfield temp alloca check

Constant expression helper functions

AddressRecovered nameRole
sub_127D8B0EmitConstExprMaster constant expression emitter
sub_127D000ResizeConstantResize constant to target type
sub_127D120DestroyAPFloatElementAPFloat cleanup in aggregate loop
sub_127D2E0PushElementBulkBulk push to element vector
sub_127D5D0PushElementSingle push to element vector
sub_1277B60GetFieldIndexStruct field index query
sub_1276020GetOrCreateGlobalVarGlobal variable creation/lookup
sub_1277140GetOrCreateFunctionFunction creation/lookup
sub_1280350LookupFunctionStaticVarStatic local variable resolution
sub_126A1B0CreateStringGlobalConstGlobal string constant creation
sub_1598F00ConstantAggregateZero::getZero-initialized aggregate
sub_15991C0ConstantDataArray::getRawRaw byte array constant
sub_159DFD0ConstantArray::getTyped array constant
sub_159F090ConstantStruct::getStruct constant
sub_15943F0StructType::getAnonymous struct type
sub_15A06D0Constant::getNullValueZero constant for any type
sub_15A0A60ConstantExpr::extractvalueSub-element extraction
sub_15A2E80ConstantExpr::getGEPConstant GEP expression
sub_15A4510ConstantExpr::getBitCastConstant bitcast
sub_15A4A70ConstantExpr::getAddrSpaceCastConstant addrspacecast
sub_15A4180ConstantExpr::getPtrToIntConstant ptrtoint
sub_15A8020StructLayout::getElemContainingOffsetBitfield byte lookup
sub_15A9930DataLayout::getStructLayoutStruct layout query
sub_620E90edg::IsSignedIntConstSignedness query
sub_620FA0edg::GetSignedIntValueSigned integer extraction
sub_620FD0edg::GetUnsignedIntValueUnsigned integer extraction
sub_622850edg::GetIntConstAsString__int128 decimal string extraction
sub_622920edg::ExtractFieldOffsetField offset extraction
sub_709B30edg::ExtractFloatBitsFloat raw bits extraction
sub_722AB0edg::ReadIntFromBufferEndian-aware integer read
sub_8D4050edg::GetArrayElementTypeArray element type query
sub_8D4490edg::GetArrayElementCountArray dimension query

LLVM Opcode Constants

Numeric opcode constants used in CreateBinOp, CreateCast, and instruction creation calls throughout the expression codegen:

NumberLLVM instructionUsed by
13subPointer subtraction step 4
18sdivPointer subtraction step 5 (with exact flag)
32shlLeft shift (<<)
33ashr / lshrRight shift (>>, signedness-dependent)
34and (FP variant)Bitwise AND
35or (FP variant)Bitwise OR
36xor (FP variant)Bitwise XOR
37zextZero-extend (bool-to-int, lnot.ext, land.ext)
38andBitwise AND (integer)
39sitofp / orSigned int-to-float / bitwise OR (integer)
40uitofp / xorUnsigned int-to-float / bitwise XOR (integer)
41fptosi / funnel shiftSigned float-to-int / rotate
42fptouiUnsigned float-to-int
43fptruncFloat-to-float truncation
44fpextFloat-to-float extension
45ptrtointPointer-to-integer cast
46inttoptrInteger-to-pointer cast
47bitcast / addrspacecastPointer casts
51ICmp instruction kindInteger comparison creation
52FCmp instruction kindFloat comparison creation
53PHI node kindPHI creation for &&, ||, ?:

PHI Node Construction Detail

PHI nodes are used by three expression types: logical AND (0x57), logical OR (0x58), and ternary (0x67). The construction sequence is identical across all three:

  1. Allocate: AllocatePHI (sub_1648B60) with 64 bytes.
  2. Initialize: InitPHINode (sub_15F1F50) with opcode 53 (PHI), type, and zero for parent/count/incoming.
  3. Set capacity: *(phi+56) = 2 -- two incoming edges.
  4. Set name: SetValueName (sub_164B780) with "land.ext", "lor.ext", or "cond".
  5. Reserve slots: sub_1648880(phi, 2, 1) -- reserve 2 incoming at initial capacity 1.

Adding each incoming value:

count = *(phi+20) & 0xFFFFFFF;           // current operand count
if (count == *(phi+56))                   // capacity full?
    GrowOperandList(phi);                 // sub_15F55D0: realloc

new_idx = (count + 1) & 0xFFFFFFF;
*(phi+20) = new_idx | (*(phi+20) & 0xF0000000);  // update count, preserve flags

// Large-mode flag at *(phi+23) & 0x40 selects operand array location:
base = (*(phi+23) & 0x40) ? *(phi-8) : phi_alloc_base - 24*new_idx;

// Value slot: base + 24*(new_idx-1) — 24 bytes per slot (value ptr + use-list pointers)
slot = base + 24*(new_idx - 1);
*slot = value;                           // incoming value
slot[1] = value.use_next;               // link into value's use-list
slot[2] = &value.use_head | (slot[2] & 3);
value.use_head = slot;

// Basic block slot: stored after all value slots as parallel array
bb_offset = base + 8*(new_idx-1) + 24*num_incoming + 8;
*bb_offset = incoming_bb;

The PHI operand layout is [val0, val1, ..., bb0, bb1, ...] where each value slot occupies 24 bytes (value pointer + doubly-linked use-list pointers), and basic block pointers form a parallel 8-byte array after all value slots.

Duplicate Implementations

Two additional copies of the bitfield codegen exist at sub_923780 (store) and sub_925930 (load) -- identical algorithms with the same string names, same opcodes, same control flow. These are in the 0x92xxxx range (NVVM frontend region) while the primary copies are in the 0x128xxxx range (codegen helper region). They likely correspond to different template instantiations or address-space variants in the original NVIDIA source code.

Diagnostic String Index

StringOrigin functionTrigger
"unsupported expression!"EmitExpr (sub_128D0F0)Default case in outer switch
"unsupported operation expression!"EmitExpr (sub_128D0F0)Default case in inner switch
"constant expressions are not supported!"EmitConstExpr (sub_127D8B0)Unsupported context kind (sub_6E9180 returns true)
"unsupported constant variant!"EmitConstExpr (sub_127D8B0)Unknown constant kind in main switch; also byte != 0/1/2 in address-of
"unsupported float variant!"EmitConstExpr (sub_127D8B0)Float kind 5, or kind < 2
"long double" / "__float80" / "__float128"EmitConstExpr (sub_127D8B0)Warning 0xE51: extended precision truncated to double on CUDA target
"failed to lookup function static variable"EmitConstExpr (sub_127D8B0)Function static address with type tag > 0x10
"taking address of non-string constant is not supported!"EmitConstExpr (sub_127D8B0)&literal where literal kind != 2 (non-string)
"unsupported cast from address constant!"EmitConstExpr (sub_127D8B0)Type mismatch that is not ptr-to-ptr or ptr-to-int
"unsupported aggregate constant!"EmitConstExpr (sub_127D8B0)Type tag not in {8, 10, 11, 12} for aggregate case
"initialization of bit-field in union not supported!"EmitConstExpr (sub_127D8B0)Union initializer targeting a named bitfield
"cannot find initialized union member!"EmitConstExpr (sub_127D8B0)Union field chain exhausted without finding target
"bit-field constant must have a known value at compile time!"EmitConstExpr (sub_127D8B0)Bitfield initializer evaluates to NULL
"unexpected error while initializing bitfield!"EmitConstExpr (sub_127D8B0)Pre-existing byte in struct is not zero when packing
"unexpected non-integer type for cast from pointer type!"EmitCast (sub_128A450)ptrtoint destination is not integer
"unexpected destination type for cast from pointer type"EmitCast (sub_128A450)inttoptr source is not integer
"error generating code for loading from bitfield!"EmitBitfieldLoad (sub_1284570)Alignment assertion failure
"expected result type of bassign to be void!"EmitExpr (sub_128D0F0)Bitfield assign result type validation

Cross-References