Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Math Function Builtins

Math builtins cover floating-point rounding, transcendental approximations, reciprocal/square-root operations, type conversions, and precise arithmetic with explicit rounding modes. They span IDs 21--184 and 276--403, totaling over 230 entries. Unlike most other builtin categories, many math builtins fall through the dispatch switch entirely and resolve via the generic LLVM intrinsic path.

Bit Manipulation (IDs 21--26)

These integer utility operations map directly to hardware instructions available on all SM targets.

IDBuiltinOperation
21--22__nvvm_clz_{i,ll}Count leading zeros (32/64-bit)
23--24__nvvm_popc_{i,ll}Population count (32/64-bit)
25--26__nvvm_brev_{i,ll}Bit reverse (32/64-bit)

Rounding and Absolute Value (IDs 27--46)

Float rounding and absolute value operations exist in three type variants: flush-to-zero single (ftz_f), IEEE single (f), and double (d).

ID RangeOperationVariants
27--29__nvvm_floor_{ftz_f,f,d}Floor
30--32__nvvm_ceil_{ftz_f,f,d}Ceiling
33--35__nvvm_abs_{ftz_f,f,d}Absolute value (integer-style)
36--38__nvvm_fabs_{ftz_f,f,d}Absolute value (float)
39--41__nvvm_round_{ftz_f,f,d}Round to nearest
42--44__nvvm_trunc_{ftz_f,f,d}Truncate toward zero
45--46__nvvm_saturate_{ftz_f,f}Clamp to [0.0, 1.0]

Transcendental Approximations (IDs 47--56)

Hardware-accelerated approximations for transcendental functions. These use the GPU's special function units (SFU) and are not IEEE-compliant.

ID RangeOperationVariants
47--49__nvvm_ex2_approx_{ftz_f,f,d}Base-2 exponential
50--52__nvvm_lg2_approx_{ftz_f,f,d}Base-2 logarithm
53--55__nvvm_sin_approx_{ftz_f,f,d}Sine
56__nvvm_cos_approx_ftz_fCosine (FTZ only registered)

Reciprocal (IDs 57--69)

Full-precision reciprocal with all four IEEE rounding modes and three type variants.

ID RangeOperationRounding Modes
57--69__nvvm_rcp_{rn,rz,rm,rp}_{ftz_f,f,d}RN (nearest), RZ (zero), RM (minus), RP (plus)

The 13 entries cover 4 rounding modes x 3 types, with the FTZ single-precision variant adding one additional entry.

Square Root and Reciprocal Square Root (IDs 70--87)

ID RangeOperationDescription
70--84__nvvm_sqrt_{f,rn,rz,rm,rp}_{ftz_f,f,d}Square root (5 modes x 3 types)
85--87__nvvm_rsqrt_approx_{ftz_f,f,d}Reciprocal square root (SFU approximation)

The sqrt_f variant (without rounding qualifier) uses the default hardware rounding. The rsqrt_approx variants use the SFU fast path.

Type Conversions (IDs 88--184)

The largest math subcategory with 97 entries, covering every combination of source type, destination type, rounding mode, and FTZ flag.

Double-to-Float (IDs 88--95)

__nvvm_d2f_{rn,rz,rm,rp}_{ftz,} -- 4 rounding modes x 2 FTZ variants.

Integer/Float Cross-Conversions (IDs 96--177)

82 entries covering all permutations of:

  • Source types: d (double), f (float), i (int32), ui (uint32), ll (int64), ull (uint64)
  • Destination types: same set
  • Rounding modes: rn, rz, rm, rp

Pattern: __nvvm_{src}2{dst}_{rounding} (e.g., __nvvm_d2i_rn, __nvvm_f2ull_rz).

Half Precision (IDs 178--180)

IDBuiltinDescription
178__nvvm_f2h_rn_ftzFloat to half (FTZ, round nearest)
179__nvvm_f2h_rnFloat to half (round nearest)
180__nvvm_h2fHalf to float

Bitcast (IDs 181--184)

Reinterpret-cast between integer and float types without value conversion. Lowered via sub_12A7DA0 which emits opcode 0x31 (49, bitcast).

IDBuiltinDirection
181__nvvm_bitcast_f2ifloat -> int32
182__nvvm_bitcast_i2fint32 -> float
183__nvvm_bitcast_ll2dint64 -> double
184__nvvm_bitcast_d2lldouble -> int64

Integer Min/Max and Multiply-High (IDs 276--293)

ID RangeOperationTypes
276--279__nvvm_{min,max}_{i,ui}32-bit signed/unsigned
280--283__nvvm_{min,max}_{ll,ull}64-bit signed/unsigned
284--289__nvvm_f{min,max}_{f,ftz_f,d}Float min/max (with FTZ)
290--293__nvvm_mulhi_{i,ui,ll,ull}Upper half of multiplication

Precise Float Arithmetic (IDs 294--349)

These builtins provide IEEE-compliant arithmetic with explicit rounding mode control. Each operation exists in all four rounding modes and up to five type variants (ftz_f, f, ftz_f2, f2, d).

ID RangeOperationEntries
294--313__nvvm_mul_{rn,rz,rm,rp}_{ftz_f,f,ftz_f2,f2,d}20
314--333__nvvm_add_{rn,rz,rm,rp}_{ftz_f,f,ftz_f2,f2,d}20
334--349__nvvm_div_{rn,rz,rm,rp}_{ftz_f,f,d}16

FMA (IDs 383--402)

Fused multiply-add with all rounding/type combinations:

ID RangeOperationEntries
383--402__nvvm_fma_{rn,rz,rm,rp}_{ftz_f,f,d,ftz_f2,f2}20

Miscellaneous (IDs 350, 380--382, 403)

IDBuiltinDescription
350__nvvm_lohi_i2dCompose double from two 32-bit halves
380__nvvm_prmtByte permute (PRMT instruction)
381--382__nvvm_sad_{i,ui}Sum of absolute differences
403__nvvm_fnsFind Nth set bit

Table-Based Lowering for Precise Arithmetic

The precise arithmetic builtins (mul, add, div, fma with rounding modes) are lowered through sub_12B3540 (address 0x12B3540, 10KB), which uses two lazily-initialized red-black trees (std::map<int, triple>) to map builtin IDs to IR opcode triples.

Tree 1 serves three-operand builtins (FMA): maps ID ranges to opcode 0xF59 with variant codes encoding the rounding mode and type.

Tree 2 serves two-operand builtins (mul, add, div): maps to opcodes 0xE3A, 0xE3B, 0x105E, 0x1061 depending on the operation.

The lookup procedure:

  1. Extract up to 4 operand arguments from the call expression
  2. Find the builtin ID in the appropriate tree to obtain (opcode, variant)
  3. Look up the IR function via sub_126A190
  4. Emit the call instruction via sub_1285290
  5. Generate the inline asm fragment via sub_12A8F50

LLVM Intrinsic Fallback Path

Many standard math builtins (floor, ceil, sin, cos, sqrt, fma, exp, log) are not handled by the switch cases at all. When the builtin table lookup returns ID 0 (name not found), the dispatcher falls through to the generic LLVM intrinsic path at LABEL_4 in sub_955A70. This path:

  1. Checks if the name starts with "llvm." (prefix constant 0x6D766C6C)
  2. Looks up the intrinsic via sub_B6ACB0 (LLVM intrinsic name-to-ID)
  3. Lowers all arguments with type-cast insertion where needed
  4. Emits a standard LLVM call via sub_921880

This means functions like llvm.floor.f32, llvm.cos.f64, and llvm.fma.f32 bypass the builtin ID system entirely and map directly to LLVM's intrinsic infrastructure.

Float Compatibility Wrappers (IDs 643--646)

Four C runtime float functions are registered as builtins for compatibility:

IDBuiltinMaps To
643__ceilf__nvvm_ceil_f equivalent
644__floorf__nvvm_floor_f equivalent
645__roundf__nvvm_round_f equivalent
646__truncf__nvvm_trunc_f equivalent