Math Function Builtins

Math builtins cover floating-point rounding, transcendental approximations, reciprocal/square-root operations, type conversions, and precise arithmetic with explicit rounding modes. They span IDs 21--184 and 276--403, totaling over 230 entries. Unlike most other builtin categories, many math builtins fall through the dispatch switch entirely and resolve via the generic LLVM intrinsic path.

Bit Manipulation (IDs 21--26)

These integer utility operations map directly to hardware instructions available on all SM targets.

ID	Builtin	Operation
21--22	`__nvvm_clz_{i,ll}`	Count leading zeros (32/64-bit)
23--24	`__nvvm_popc_{i,ll}`	Population count (32/64-bit)
25--26	`__nvvm_brev_{i,ll}`	Bit reverse (32/64-bit)

Rounding and Absolute Value (IDs 27--46)

Float rounding and absolute value operations exist in three type variants: flush-to-zero single (ftz_f), IEEE single (f), and double (d).

ID Range	Operation	Variants
27--29	`__nvvm_floor_{ftz_f,f,d}`	Floor
30--32	`__nvvm_ceil_{ftz_f,f,d}`	Ceiling
33--35	`__nvvm_abs_{ftz_f,f,d}`	Absolute value (integer-style)
36--38	`__nvvm_fabs_{ftz_f,f,d}`	Absolute value (float)
39--41	`__nvvm_round_{ftz_f,f,d}`	Round to nearest
42--44	`__nvvm_trunc_{ftz_f,f,d}`	Truncate toward zero
45--46	`__nvvm_saturate_{ftz_f,f}`	Clamp to [0.0, 1.0]

Transcendental Approximations (IDs 47--56)

Hardware-accelerated approximations for transcendental functions. These use the GPU's special function units (SFU) and are not IEEE-compliant.

ID Range	Operation	Variants
47--49	`__nvvm_ex2_approx_{ftz_f,f,d}`	Base-2 exponential
50--52	`__nvvm_lg2_approx_{ftz_f,f,d}`	Base-2 logarithm
53--55	`__nvvm_sin_approx_{ftz_f,f,d}`	Sine
56	`__nvvm_cos_approx_ftz_f`	Cosine (FTZ only registered)

Reciprocal (IDs 57--69)

Full-precision reciprocal with all four IEEE rounding modes and three type variants.

ID Range	Operation	Rounding Modes
57--69	`__nvvm_rcp_{rn,rz,rm,rp}_{ftz_f,f,d}`	RN (nearest), RZ (zero), RM (minus), RP (plus)

The 13 entries cover 4 rounding modes x 3 types, with the FTZ single-precision variant adding one additional entry.

Square Root and Reciprocal Square Root (IDs 70--87)

ID Range	Operation	Description
70--84	`__nvvm_sqrt_{f,rn,rz,rm,rp}_{ftz_f,f,d}`	Square root (5 modes x 3 types)
85--87	`__nvvm_rsqrt_approx_{ftz_f,f,d}`	Reciprocal square root (SFU approximation)

The sqrt_f variant (without rounding qualifier) uses the default hardware rounding. The rsqrt_approx variants use the SFU fast path.

Type Conversions (IDs 88--184)

The largest math subcategory with 97 entries, covering every combination of source type, destination type, rounding mode, and FTZ flag.

Double-to-Float (IDs 88--95)

__nvvm_d2f_{rn,rz,rm,rp}_{ftz,} -- 4 rounding modes x 2 FTZ variants.

Integer/Float Cross-Conversions (IDs 96--177)

82 entries covering all permutations of:

Source types: d (double), f (float), i (int32), ui (uint32), ll (int64), ull (uint64)
Destination types: same set
Rounding modes: rn, rz, rm, rp

Pattern: __nvvm_{src}2{dst}_{rounding} (e.g., __nvvm_d2i_rn, __nvvm_f2ull_rz).

Half Precision (IDs 178--180)

ID	Builtin	Description
178	`__nvvm_f2h_rn_ftz`	Float to half (FTZ, round nearest)
179	`__nvvm_f2h_rn`	Float to half (round nearest)
180	`__nvvm_h2f`	Half to float

Bitcast (IDs 181--184)

Reinterpret-cast between integer and float types without value conversion. Lowered via sub_12A7DA0 which emits opcode 0x31 (49, bitcast).

ID	Builtin	Direction
181	`__nvvm_bitcast_f2i`	float -> int32
182	`__nvvm_bitcast_i2f`	int32 -> float
183	`__nvvm_bitcast_ll2d`	int64 -> double
184	`__nvvm_bitcast_d2ll`	double -> int64

Integer Min/Max and Multiply-High (IDs 276--293)

ID Range	Operation	Types
276--279	`__nvvm_{min,max}_{i,ui}`	32-bit signed/unsigned
280--283	`__nvvm_{min,max}_{ll,ull}`	64-bit signed/unsigned
284--289	`__nvvm_f{min,max}_{f,ftz_f,d}`	Float min/max (with FTZ)
290--293	`__nvvm_mulhi_{i,ui,ll,ull}`	Upper half of multiplication

Precise Float Arithmetic (IDs 294--349)

These builtins provide IEEE-compliant arithmetic with explicit rounding mode control. Each operation exists in all four rounding modes and up to five type variants (ftz_f, f, ftz_f2, f2, d).

ID Range	Operation	Entries
294--313	`__nvvm_mul_{rn,rz,rm,rp}_{ftz_f,f,ftz_f2,f2,d}`	20
314--333	`__nvvm_add_{rn,rz,rm,rp}_{ftz_f,f,ftz_f2,f2,d}`	20
334--349	`__nvvm_div_{rn,rz,rm,rp}_{ftz_f,f,d}`	16

FMA (IDs 383--402)

Fused multiply-add with all rounding/type combinations:

ID Range	Operation	Entries
383--402	`__nvvm_fma_{rn,rz,rm,rp}_{ftz_f,f,d,ftz_f2,f2}`	20

Miscellaneous (IDs 350, 380--382, 403)

ID	Builtin	Description
350	`__nvvm_lohi_i2d`	Compose double from two 32-bit halves
380	`__nvvm_prmt`	Byte permute (PRMT instruction)
381--382	`__nvvm_sad_{i,ui}`	Sum of absolute differences
403	`__nvvm_fns`	Find Nth set bit

Table-Based Lowering for Precise Arithmetic

The precise arithmetic builtins (mul, add, div, fma with rounding modes) are lowered through sub_12B3540 (address 0x12B3540, 10KB), which uses two lazily-initialized red-black trees (std::map<int, triple>) to map builtin IDs to IR opcode triples.

Tree 1 serves three-operand builtins (FMA): maps ID ranges to opcode 0xF59 with variant codes encoding the rounding mode and type.

Tree 2 serves two-operand builtins (mul, add, div): maps to opcodes 0xE3A, 0xE3B, 0x105E, 0x1061 depending on the operation.

The lookup procedure:

Extract up to 4 operand arguments from the call expression
Find the builtin ID in the appropriate tree to obtain (opcode, variant)
Look up the IR function via sub_126A190
Emit the call instruction via sub_1285290
Generate the inline asm fragment via sub_12A8F50

LLVM Intrinsic Fallback Path

Many standard math builtins (floor, ceil, sin, cos, sqrt, fma, exp, log) are not handled by the switch cases at all. When the builtin table lookup returns ID 0 (name not found), the dispatcher falls through to the generic LLVM intrinsic path at LABEL_4 in sub_955A70. This path:

Checks if the name starts with "llvm." (prefix constant 0x6D766C6C)
Looks up the intrinsic via sub_B6ACB0 (LLVM intrinsic name-to-ID)
Lowers all arguments with type-cast insertion where needed
Emits a standard LLVM call via sub_921880

This means functions like llvm.floor.f32, llvm.cos.f64, and llvm.fma.f32 bypass the builtin ID system entirely and map directly to LLVM's intrinsic infrastructure.

Float Compatibility Wrappers (IDs 643--646)

Four C runtime float functions are registered as builtins for compatibility:

ID	Builtin	Maps To
643	`__ceilf`	`__nvvm_ceil_f` equivalent
644	`__floorf`	`__nvvm_floor_f` equivalent
645	`__roundf`	`__nvvm_round_f` equivalent
646	`__truncf`	`__nvvm_trunc_f` equivalent

Keyboard shortcuts

CICC Reverse Engineering Reference