Hash Table and Collection Infrastructure

Every associative container in cicc v13.0 is built from the same handful of primitives: a pointer-hash DenseMap/DenseSet with quadratic probing, a wyhash-v4-family string hasher, and a SmallVector with inline buffer optimization. Before this page existed, the same hash table description was duplicated across 30+ wiki pages. This is the single source of truth. If you are reimplementing cicc's data structures, start here.

There are no NVIDIA-specific modifications to the DenseMap hashing or probing logic -- cicc links the LLVM 20.0.0 implementation unmodified. The only NVIDIA-original hash infrastructure is the wyhash-v4 string hasher used for the builtin name table.

DenseMap Layout

Two variants exist, distinguished by bucket stride. Both share the same 28-byte inline header, the same hash function, the same probing sequence, the same sentinel values, and the same growth policy. The header is always embedded directly inside a larger structure (context object, analysis result, pass state) -- never heap-allocated on its own.

Variant A -- DenseSet (8 bytes/bucket)

Offset	Size	Type	Field
+0	8	`uint64_t`	`NumEntries`
+8	8	`ptr`	`Buckets` (heap-allocated array)
+16	4	`uint32_t`	`NumItems` (live entries)
+20	4	`uint32_t`	`NumTombstones`
+24	4	`uint32_t`	`NumBuckets` (always power of 2)

Bucket array size: NumBuckets * 8 bytes. Each bucket holds either a valid pointer, an empty sentinel, or a tombstone sentinel.

Variant B -- DenseMap (16 bytes/bucket)

Same 28-byte header. Each bucket holds a key-value pair at a 16-byte stride:

v30 = (_QWORD *)(buckets + 16LL * slot);   // sub_163D530 line 561
*v30 = key;                                  // +0: key
v30[1] = value;                              // +8: value

Variant B is used by the SelectionDAG builder (context offsets +120 and +152), the NVVM IR node uniquing tables, and any subsystem that maps pointers to pointers.

Where the Variants Appear

Subsystem	Variant	Context offset	Purpose
NVVM IR uniquing (`sub_162D4F0`)	B (16B)	context qw[130..178]	Node deduplication per opcode
SelectionDAG builder (`sub_163D530`)	B (16B)	+120, +152	Node mapping
SelectionDAG builder (`sub_163D530`)	A (8B)	+184	Worklist set
Per-node analysis structures	A (8B)	+72 inside v381	Visited set
CSSA PHI map (`sub_3720740`)	B (16B)	r15+0x60	PHI-to-ID mapping
Coroutine spill tracking	B (16B)	+0x18 inline	Spill/reload tracking
Builtin name table	custom (12B stride)	context+480	Name-to-ID with hash cache

Pointer Hash Function

Every DenseMap/DenseSet instance in cicc that uses pointer keys employs the same hash:

hash(ptr) = (ptr >> 9) ^ (ptr >> 4)

This is LLVM's DenseMapInfo<void*>::getHashValue, unchanged. The right-shift by 4 discards the low bits that are always zero due to 8- or 16-byte alignment. The right-shift by 9 mixes in higher-order address bits to break up the stride patterns that arise from slab allocation (where consecutive objects are separated by a fixed power-of-two). The XOR combines these two views of the pointer into a single hash value that distributes well for both heap-allocated and slab-allocated objects.

Representative decompiled evidence (appears identically in dozens of functions):

v9 = (v12 - 1) & (((unsigned int)v11 >> 9) ^ ((unsigned int)v11 >> 4));

Integer-Key Hash Variant

A separate hash function is used for DenseMap<unsigned, T> instances (integer keys rather than pointers):

hash(key) = key * 37

This is LLVM's DenseMapInfo<unsigned>::getHashValue. It appears in the instruction emitter (sub_2E29BA0), the two-address pass (sub_1F4E3A0), the vector legalization tables, and the SelectionDAG instruction selection cost table (sub_3090F90). Integer-key maps use a different sentinel pair: 0xFFFFFFFF (empty) and 0xFFFFFFFE (tombstone).

wyhash v4 String Hasher -- `sub_CBF760`

The NVVM builtin name table uses a separate, NVIDIA-original hash function for string keys. sub_C92610 is a thin wrapper that tail-calls sub_CBF760. The function dispatches on input length into six code paths, each using different constant sets and mixing strategies:

Length Dispatch Table

Length	Strategy	Constants
0	Return constant	`0x2D06800538D394C2`
1--3	3-byte read + XOR + multiply	seed `0x87275A9B`, mul `0xC2B2AE3D27D4EB4F`, avalanche `0x165667B19E3779F9`
4--8	2x uint32 + combine + rotate	XOR `0xC73AB174C5ECD5A2`, mul `0x9FB21C651E98DF25`
9--16	2x uint64 + 128-bit multiply	XOR `0x6782737BEA4239B9` / `0xAF56BC3B0996523A`, avalanche `0x165667919E3779F9`
17--128	Paired 16B reads from both ends	Per-pair constants, 128-bit multiplies, length mixed with `0x61C8864E7A143579`
129--240	Extended mixing	Delegates to `sub_CBF370`
240+	Bulk processing	Delegates to `sub_CBF100`

Pseudocode (length 1--3, the most common case for short builtins)

fn wyhash_short(data: &[u8], len: usize) -> u32 {
    let a = data[0] as u64;
    let b = data[len / 2] as u64;
    let c = data[len - 1] as u64;
    let combined = a | (b << 8) | (c << 16) | (len as u64) << 24;
    let mixed = combined ^ 0x87275A9B;
    let wide = mixed.wrapping_mul(0xC2B2AE3D27D4EB4F);
    let folded = wide ^ (wide >> 32);
    let result = folded.wrapping_mul(0x165667B19E3779F9);
    (result ^ (result >> 32)) as u32
}

Pseudocode (length 17--128, covering most `__nvvm_*` names)

fn wyhash_medium(data: &[u8], len: usize) -> u32 {
    let pairs = [
        (0x1CAD21F72C81017C, 0xBE4BA423396CFEB8),  // pair 0
        (0x1F67B3B7A4A44072, 0xDB979083E96DD4DE),  // pair 1
        (0x2172FFCC7DD05A82, 0x78E5C0CC4EE679CB),  // pair 2
        // ... additional pairs for 64/96/128 thresholds
    ];
    let (mut v8, mut v10) = (0u64, 0u64);
    // read 16 bytes from front, 16 from back, mix with pair constants
    for i in 0..((len + 15) / 32) {
        let front = read_u128(&data[i * 16..]);
        let back  = read_u128(&data[len - (i + 1) * 16..]);
        (v8, v10) = mix_128(v8, v10, front, back, pairs[i]);
    }
    let combined = v8 ^ v10 ^ (len as u64 ^ 0x61C8864E7A143579);
    let result = 0x165667919E3779F9u64.wrapping_mul(combined ^ (combined >> 37));
    (result ^ (result >> 32)) as u32
}

The final return value is always a uint32 -- the high dword of the 64-bit result XORed with the low dword. Most NVVM builtin names are 8--35 bytes, hitting the optimal 4--8 and 9--16 and 17--128 paths.

Probing Strategy

All DenseMap instances use quadratic probing with triangular-number increments:

slot = hash & (capacity - 1)      // initial probe
step = 1
loop:
    if bucket[slot] == key   -> found
    if bucket[slot] == EMPTY -> not found (insert here)
    if bucket[slot] == TOMBSTONE -> record for reuse
    slot = (slot + step) & (capacity - 1)
    step++

The probe sequence for initial position h visits:

h, h+1, h+3, h+6, h+10, h+15, h+21, ...
h + T(k) where T(k) = k*(k+1)/2   (triangular numbers)

This guarantees that for a power-of-2 table size n, all n slots are visited before any index repeats. The proof relies on the fact that the differences T(k+1) - T(k) = k+1 produce all residues modulo n when n is a power of 2.

Comparison Guard (Builtin Table)

The builtin name hash table (sub_C92740, sub_C92860) adds a triple comparison guard before performing the expensive memcmp:

Cached hash equality: hash_cache[slot] == search_hash
Length equality: entry->length == search_length
Content equality: memcmp(search_data, entry->string_data, length) == 0

The hash cache is stored in a separate array immediately after the bucket array and the end-of-table sentinel. This layout avoids polluting bucket cache lines with hash values that are only needed on collision.

Probing Label: "Linear" vs "Quadratic"

Some analysis reports describe the probing as "linear" because the step variable increments by 1 each iteration. The actual probe position advances quadratically (by accumulating triangular numbers). Both descriptions refer to the same code. This page uses the technically precise term: quadratic probing with triangular numbers.

Growth Policy

Load Factor Threshold -- 75%

After every successful insertion, the map checks whether to grow:

if (4 * (NumItems + 1) >= 3 * NumBuckets)
    // load factor > 75% -> double capacity
    new_capacity = 2 * NumBuckets

Tombstone Compaction -- 12.5%

If the load factor is acceptable but tombstones have accumulated:

elif (NumBuckets - NumTombstones - NumItems <= NumBuckets >> 3)
    // fewer than 12.5% of slots are truly empty
    // rehash at same capacity to clear tombstones
    new_capacity = NumBuckets

Rehash Procedure -- `sub_C929D0`

calloc(new_capacity + 1, bucket_stride) for the new array.
Write the end-of-table sentinel at position new_capacity.
For each live (non-empty, non-tombstone) entry in the old table, reinsert into the new table using quadratic probing.
Copy the cached hash (if the table has a hash cache).
Track the new position of a "current slot" pointer so the caller can continue using the entry it just inserted.
Free the old array.
Reset NumTombstones to 0.
Update NumBuckets to new_capacity.
Return the new position of the tracked slot.

Capacity Constraints

Power of 2: always. Enforced by the bit-smearing pattern: x |= x>>1; x |= x>>2; x |= x>>4; ...; x += 1.
Minimum: 64 buckets for standard DenseMap instances. The builtin name table starts at 16 and grows through 16 -> 32 -> 64 -> 128 -> 256 -> 512 -> 1024 as its 770 entries are inserted.
Allocation: sub_22077B0 (operator new[]), freed via j___libc_free_0.

Sentinel Values

Two sentinel families exist, distinguished by magnitude. Both are chosen to be impossible values for aligned pointers.

NVVM-Layer Sentinels (small magnitude)

Used by the NVVM IR uniquing tables, the SelectionDAG builder maps, and the builtin name table:

Role	Value	Hex	Why safe
Empty	-8	`0xFFFFFFFFFFFFFFF8`	Low 3 bits = `0b000` after masking, but no 8-byte-aligned pointer is this close to `(uint64_t)-1`
Tombstone	-16	`0xFFFFFFFFFFFFFFF0`	Same reasoning, distinct from -8

The builtin name table also uses a value of 2 as an end-of-table sentinel placed at bucket_array[capacity].

LLVM-Layer Sentinels (large magnitude)

Used by the majority of LLVM pass infrastructure -- SCEV, register coalescing, block placement, SLP vectorizer, StructurizeCFG, machine pipeliner, prolog-epilog, and others:

Role	Value	Hex	Decimal
Empty	`0xFFFFFFFFFFFFF000`	`-4096`	-4096
Tombstone	`0xFFFFFFFFFFFFE000`	`-8192`	-8192

Integer-Key Sentinels

Used by DenseMap<unsigned, T> instances (instruction emitter, two-address pass):

Role	Value	Hex
Empty	`0xFFFFFFFF`	32-bit all-ones
Tombstone	`0xFFFFFFFE`	32-bit all-ones minus 1

Which Sentinel Set to Expect

Subsystem	Sentinel pair
NVVM IR uniquing, SelectionDAG builder	-8 / -16
Builtin name table	-8 (tombstone), 0 (empty), 2 (end marker)
SCEV, block placement, SLP vectorizer	-4096 / -8192
Register coalescing, machine pipeliner	-4096 / -8192
StructurizeCFG, prolog-epilog	-4096 / -8192
Instruction emitter, two-address	0xFFFFFFFF / 0xFFFFFFFE
Coroutine spill tracking	0xFFFFFFFFF000 / 0xFFFFFFFFE000
CSSA PHI map	0xFFFFFFFFF000 / 0xFFFFFFFFE000
Debug verify	0xFFFFFFFFF000 / 0xFFFFFFFFE000
LazyCallGraph	0xFFFFFFFFF000 / 0xFFFFFFFFE000

The -8/-16 pair appears exclusively in NVVM-layer (NVIDIA-original) code. The -4096/-8192 pair is the standard LLVM DenseMapInfo<void*> sentinel set. The difference is cosmetic -- both pairs are safe for the same reasons -- but it reveals code provenance: if you see -8/-16, the code was written or heavily modified by NVIDIA; if you see -4096/-8192, it is stock LLVM.

SmallVector Pattern

SmallVector is the universal dynamic array throughout cicc, with two growth implementations:

Layout

[BeginPtr, Size:Count:Capacity, InlineData...]

Offset	Size	Field
+0	8	`data_ptr` (points to inline buffer initially, heap after growth)
+8	4	`size` (live element count)
+12	4	`capacity` (allocated slots)
+16	N	Inline buffer (N = `InlineCapacity * element_size`)

When size == capacity on insertion, the vector grows.

Growth Functions

Function	Address	Description
`SmallVector::grow`	`sub_C8D5F0`	Generic growth -- copies elements, used for non-POD types
`SmallVectorBase::grow_pod`	`sub_C8D7D0`	POD-optimized growth -- uses `realloc` when buffer is heap-allocated
`SmallVector::grow` (MIR)	`sub_16CD150`	Second copy in the MachineIR address range, identical logic
`SmallVector::grow` (extended)	`sub_C8E1E0`	Larger variant (11KB), handles edge cases

Growth Policy

The standard LLVM SmallVector growth: double the current capacity, with a minimum of 1. If the current buffer is the inline buffer, malloc a new heap buffer and memcpy the contents. If the buffer is already on the heap, realloc it (for POD types) or malloc + copy + free (for non-POD types).

new_capacity = max(2 * old_capacity, required_capacity)
if (data_ptr == &inline_buffer)
    heap_buf = malloc(new_capacity * elem_size)
    memcpy(heap_buf, inline_buffer, size * elem_size)
else
    // POD: heap_buf = realloc(data_ptr, new_capacity * elem_size)
    // non-POD: heap_buf = malloc(...); copy; free(old)
data_ptr = heap_buf
capacity = new_capacity

Common Inline Capacities

Observed across the codebase:

Inline capacity	Element size	Total inline bytes	Typical use
2	8	16	SCEV delinearization terms
4	8	32	LazyCallGraph SCC lists, basic block worklists
8	8	64	NVVMReflect call collection, PHI operand lists
16	8	128	AA evaluation pointer sets
22	8	176	Printf argument arrays (stack-allocated)
8	56	448	SROA slice descriptors

Builtin Name Table -- Specialized Hash Table

The builtin name table at context+480 is a specialized variant that does not use the standard DenseMap layout. It stores string entries rather than pointers, includes a parallel hash cache, and uses the wyhash function instead of the pointer hash.

Table Structure (20 bytes)

Offset	Size	Field
+0	8	`bucket_array_ptr`
+8	4	`capacity` (power of 2)
+12	4	`count` (live entries)
+16	4	`tombstone_count`

Memory Layout

[0 .. 8*cap-1]                    bucket_array: cap QWORD pointers
[8*cap .. 8*cap+7]                sentinel: value 2 (end-of-table)
[8*cap+8 .. 8*cap+8+4*cap-1]     hash_cache: uint32 per slot

String Entry (heap-allocated via `sub_C7D670`)

Offset	Size	Field
+0	8	`string_length`
+8	4	`builtin_id` (set after insertion)
+16	N+1	Null-terminated string data

Total allocation: length + 17 bytes, 8-byte aligned. The string data offset (16) is stored at hashtable+20 for use during comparison.

See Builtins for the complete 770-entry builtin ID inventory.

Usage Across the Compiler

Subsystems Using DenseMap (pointer hash, -8/-16 sentinels)

NVVM IR uniquing (sub_162D4F0): 8+ DenseMap instances in the NVVM context object, one per opcode range (0x04--0x1F). Tables at fixed qword-indexed offsets, spaced 32 bytes apart.
SelectionDAG builder (sub_163D530): Three maps at context offsets +120, +152, +184. Map A and B are 16-byte-stride (key-value), Set C is 8-byte-stride (keys only).
Per-node analysis structures: Embedded DenseSet at +72 within analysis objects created during DAG construction.
Memory space optimization (sub_1C6A6C0): DenseMap-style tables for address space tracking.

Subsystems Using DenseMap (pointer hash, -4096/-8192 sentinels)

SCEV (sub_F03CD0 and family): Expression caching, range computation, back-edge taken count.
Register coalescing (sub_1F2F8F0): Already-coalesced set, equivalence class map.
Block placement (sub_2E3B720): Chain membership, tail-merge candidates.
SLP vectorizer (sub_1ACCE50): AllOps and Scalars hash tables (32-byte entries).
StructurizeCFG (sub_1B66CF0): Flow-block mapping, region membership.
Machine pipeliner (sub_20C40D0): Schedule stage tracking.
CSSA (sub_3720740): PHI-to-ID mapping.
Debug/verify (sub_265D050): Instruction validation tables.
LazyCallGraph (sub_D1A040): Edge membership, SCC identity.

Subsystems Using DenseMap (integer hash `key * 37`)

Instruction emitter (sub_2E1F350): Opcode-to-constraint mapping. Sentinels: 0xFFFFFFFF / 0xFFFFFFFE.
Two-address pass (sub_1F4BFE0): TiedOperandMap (56-byte entries, 4 inline). EqClassMap.
Vector legalization (sub_3302A00): Type-split record mapping.
SelectionDAG isel (sub_3090F90): Argument cost table.

Subsystems Using wyhash (string keys)

Builtin name table (sub_90AEE0): 770 NVVM/CUDA builtin names. Uses the specialized 20-byte table header with hash cache.
This is the only known use of sub_CBF760 in cicc.

Key Functions

Function	Address	Size	Role
DenseMap pointer hash	inline	--	`(ptr >> 9) ^ (ptr >> 4)` -- always inlined
DenseMap integer hash	inline	--	`key * 37` -- always inlined
wyhash v4	`sub_CBF760`	~4 KB	String hash, length-dispatched
wyhash wrapper	`sub_C92610`	tiny	Tail-calls `sub_CBF760`
Builtin insert-or-find	`sub_C92740`	~2 KB	Quadratic probe with hash cache
Builtin find-only	`sub_C92860`	~1 KB	Read-only variant of `sub_C92740`
Builtin rehash	`sub_C929D0`	~1 KB	75% load factor, tombstone compaction
Builtin table init	`sub_C92620`	tiny	Creates 16-bucket initial table
SmallVector::grow	`sub_C8D5F0`	~2 KB	Generic element growth
SmallVectorBase::grow_pod	`sub_C8D7D0`	~5 KB	POD-optimized realloc growth
SmallVector::grow (MIR)	`sub_16CD150`	~2 KB	Duplicate in MachineIR range
SmallPtrSet::insertOrFind	`sub_C9A3C0`	~16 KB	Small pointer set with growth
DenseMap grow (LLVM passes)	varies per pass	--	Each pass has its own inlined or outlined rehash

Cross-References

Builtins -- Hash Table and ID Inventory -- complete 770-entry builtin table with wyhash usage
DenseMap and Symbol Table Structures -- original page (now a subset of this one, kept for EDG node layout)
NVVM IR Node -- NVVM context object with DenseMap uniquing tables
CSSA -- PHI hash map with -4096/-8192 sentinels
Register Coalescing -- integer-key and pointer-key hash map variants
SLP Vectorizer -- 32-byte-entry DenseMap with -4096/-8192 sentinels
SCEV -- SCEV expression caching with -4096/-8192 sentinels
Instruction Emitter -- integer-key hash with key * 37

Keyboard shortcuts

CICC Reverse Engineering Reference