Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ROT13-Encoded Names (Mercury Obfuscation)

NVIDIA uses ROT13 as a lightweight obfuscation layer for a specific subset of internal identifiers in nvlink v13.0.88. The encoding is concentrated on Mercury-related content (SM100+ Blackwell codename MERCURY -> ZREPHEL), SASS instruction mnemonics held inside per-architecture opcode tables, configuration knob/option names, and a small set of ELF section-name suffixes used to annotate SASS-level metadata. The decoder function at sub_1A40AC0 is SIMD-vectorized, processing 16 bytes at a time via SSE _mm_load_si128 intrinsics, and is invoked at table-initialization time to convert an encoded string-pool entry into its plaintext form in a fresh heap allocation. This page catalogs every confirmed category of ROT13-encoded content with spot-verified decode tables keyed by string-pool address.

Correction notice (P050c-3 / P083). A prior version of this page claimed the 151 compiler pass names in the master phase table at 0x2443000--0x2445000 were ROT13-encoded, using BevYbbcHaebyyvat -> OriLoopUnrolling as an example. Direct verification against nvlink_strings.json shows that these pass names are stored as plaintext, not ROT13. The token BevYbbcHaebyyvat does not exist anywhere in the binary; only the plaintext OriLoopUnrolling is present, and it lives at 0x24434b0 -- inside the exact address range that had been labelled "ROT13 phase table". The pipeline order and pass list itself remain accurate, but the encoding-format claim was wrong and has been removed. See the Compiler Pass Names (Plaintext, Not ROT13) section below for the corrected treatment.

ROT13 is not a cryptographic cipher -- it is a self-inverse Caesar shift that any reader can undo by hand. Its role here is to prevent casual strings(1) dumps from trivially revealing Mercury internals: grepping a raw nvlink binary for MERCURY or WGMMA returns nothing, but grepping for ZREPHEL or JTZZN reveals the encoded form. The obfuscation is applied to exactly those identifiers NVIDIA wanted to keep out of trivial keyword searches -- most prominently Mercury (Blackwell) builtin templates, Mercury compiler-pass option names, SASS opcode mnemonics inside packed per-arch tables, and knob/option strings. Truly public-facing identifiers (R_CUDA relocation names, EIATTR constants, elfLink error messages, compiler pass diagnostic names in "After <PassName>" output) are stored as plaintext because they appear in linker error output, nvdisasm disassembly headers, or documented tool interfaces.

Overview

CategoryEncoded?CountAddress RegionPrefix/Example
Mercury builtin templates (ZREPHEL_*)ROT13644scattered in string poolZREPHEL_ (= MERCURY_)
Mercury compiler passes (via ctor_007)ROT13220x23F2CB0+ (string-pool), registered at 0x425A40--0x426080Zrephel... (= Mercury...)
SASS opcode mnemonics (qualified forms)ROT13~320+per-arch opcode tablesVZNQ.JVQR (= IMAD.WIDE), JTZZN (= WGMMA)
Configuration knob/option namesROT13~1,2870x23F0000--0x2460000ErtNyybpFcvyyKOybpx2 (= RegAllocSpillXBlock2)
ELF section-name annotationsROT134 confirmed meaningful0x2272970--0x22729bd.npp::s16 (= .acc::f16), .fc::2gb4 (= .sp::2to4)
Compiler pass names (master phase table)plaintext1510x24433f7--0x2443dc1OriLoopUnrolling, GeneralOptimizeEarly, MercEncodeAndDecode
EIATTR / EICOMPAT constantsplaintext111scatteredEIATTR_*, EICOMPAT_*
R_CUDA / R_MERCURY relocation namesplaintext186scatteredR_CUDA_*, R_MERCURY_*
elfLink error messagesplaintext140x1D489E0 lookup table"elfLink: unexpected error" etc.

The total ROT13-encoded count is approximately 30,000+ entries, dominated by the 644 ZREPHEL_* templates, ~1,287 knob names, and the SASS opcode tables. Bulk counts for individual subsets are verified in the Confidence Assessment section at the end of this page; only the 644 ZREPHEL_* figure and the 22 Mercury passes are exhaustively counted -- other subsets are plausible estimates.

ROT13 Decoder Function

Address: sub_1A40AC0 (15,629 bytes, 449 decompiled lines)

The decoder implements a classic ROT13 substitution cipher with SIMD acceleration:

  1. Scalar preamble -- processes unaligned head bytes one at a time: A-M maps to N-Z (+13), N-Z maps to A-M (-13), same for lowercase
  2. SIMD loop -- loads 16 bytes via _mm_load_si128, applies vectorized ROT13 using packed byte comparisons and conditional adds/subtracts
  3. Scalar epilogue -- handles remaining tail bytes after the last aligned 16-byte boundary

The input is copied to a fresh heap allocation (capacity rounded to next power of 2), decoded in-place, then returned. All SASS opcode mnemonic lookups flow through this function during table initialization.

SASS Opcode Mnemonics

Three per-architecture opcode table constructors populate ROT13-encoded instruction mnemonic tables. Each entry is a (name_ptr, name_length) pair starting at offset +4184 within the table object.

ConstructorAddressArchitectureEntriesSize
sm70_opcode_table_constructorsub_1769B50SM70/SM75 (Volta/Turing)~13024,230 bytes
sm100_opcode_table_constructorsub_1782540SM100 (Blackwell)~400111,076 bytes
sm120_opcode_table_constructorsub_1848F70SM120 (RTX 50xx)~400+89,621 bytes
sass_opcode_table_initializersub_1A85E40Emission pass table~32023,753 bytes

Core Arithmetic

ROT13DecodedDescription
VZNQIMADInteger multiply-add
VZNQ_JVQRIMAD_WIDEInteger multiply-add wide
VNQQ3IADD33-input integer add
VNQQ32VIADD32IInteger add with 32-bit immediate
VZHY32VIMUL32IInteger multiply with 32-bit immediate
VZAZKIMNMXInteger min/max
VNOFIABSInteger absolute value
VFRGCISETPInteger set predicate
SNQQFADDFP32 add
SNQQ32VFADD32IFP32 add with 32-bit immediate
SZHYFMULFP32 multiply
SZHY32VFMUL32IFP32 multiply with 32-bit immediate
SSZNFFMAFP32 fused multiply-add
SSZN32VFFMA32IFP32 FMA with 32-bit immediate
SZAZKFMNMXFP32 min/max
SZAZK3FMNMX3FP32 3-input min/max
SFRGCFSETPFP32 set predicate
SFJMNQQFSWZADDFP32 swizzled add
QFRGCDSETPFP64 set predicate

FP16/BF16 (Tensor-path)

ROT13DecodedDescription
SNQQ2FADD2Packed FP16x2 add
SZHY2FMUL2Packed FP16x2 multiply
SSZN2FFMA2Packed FP16x2 FMA
UNQQ2HADD2Packed FP16x2 add (half)
UNQQ2_32VHADD2_32IHADD2 with 32-bit immediate
UZHY2HMUL2Packed FP16x2 multiply (half)
UZHY2_32VHMUL2_32IHMUL2 with 32-bit immediate
USZN2HFMA2Packed FP16x2 FMA (half)
USZN2_32VHFMA2_32IHFMA2 with 32-bit immediate
USZN2_ZZNHFMA2_MMAHFMA2 for matrix multiply
UZAZK2HMNMX2Packed FP16x2 min/max
UFRGC2HSETP2Packed FP16x2 set predicate
UFRG2HSET2Packed FP16x2 set
SUNQQFHADDFP16 add (scalar half)
SUNQQ2FHADD2Packed FP16x2 add (float-half)
SUSZNFHFMAFP16 FMA (scalar half)
SUSZN2FHFMA2Packed FP16x2 FMA (float-half)
SUZHY2FHMUL2Packed FP16x2 multiply (float-half)
DNQQ4QADD4Packed int8x4/FP8x4 quad add
DSZN4QFMA4Packed quad FMA
DZHY4QMUL4Packed quad multiply

Uniform Register (SM75+)

ROT13DecodedDescription
HVNQQ3UIADD3Uniform integer 3-input add
HVZNQUIMADUniform integer multiply-add
HVZAZKUIMNMXUniform integer min/max
HVNOFUIABSUniform integer absolute value
HVFRGCUISETPUniform integer set predicate
HSNQQUFADDUniform FP32 add
HSZHYUFMULUniform FP32 multiply
HSSZNUFFMAUniform FP32 FMA
HSZAZKUFMNMXUniform FP32 min/max
HSFRGCUFSETPUniform FP32 set predicate
HSFRYUFSELUniform FP32 select
HSUNQQUFHADDUniform FP16 add
HSUSZNUFHFMAUniform FP16 FMA
HSEAQUFRNDUniform FP round
HS2SCUF2FPUniform float-to-FP convert
HS2VCUF2IPUniform float-to-integer convert
HV2SCUI2FPUniform integer-to-FP convert
HV2VCUI2IPUniform integer-to-integer convert
HYBC3ULOP3Uniform 3-input logic op
HYBC32VULOP32IUniform logic op with 32-bit immediate
HCYBC3UPLOP3Uniform predicate 3-input logic op
HCEZGUPRMTUniform byte permute
HCFRGCUPSETPUniform predicate set predicate
HFTKGUSGXTUniform sign-extend
HOZFXUBMSKUniform bit mask
HOERIUBREVUniform bit reverse
HC2HEUP2URUniform predicate to uniform register
HE2HCUR2UPUniform register to uniform predicate
HFRGZNKERTUSETMAXREGUniform set max registers
HFRGFUZFMUSETSHMSZUniform set shared memory size
PF2HECS2URControl status to uniform register

Bitwise/Logic

ROT13DecodedDescription
YBC3LOP33-input logic operation (LUT-based)
YBC32VLOP32ILogic op with 32-bit immediate
CYBC3PLOP3Predicate 3-input logic op
OZFXBMSKBit mask generate
FTKGSGXTSign-extend
CEZGPRMTByte permute
FUSSHFFunnel shift
YRNLEALoad effective address
C2EP2RPredicate to register
E2CR2PRegister to predicate
OZBI_OBMOV_BBarrier register move (barrier)
OZBI_EBMOV_RBarrier register move (register)
PF2E_32CS2R_32Control/status to register 32-bit
PF2E_64CS2R_64Control/status to register 64-bit

Memory Operations

ROT13DecodedDescription
ZBIMOVMove (register)
ZBI32VMOV32IMove 32-bit immediate
ZBI64VHEMOV64IURMove 64-bit immediate to uniform register
FRYSELSelect (conditional move)
YQTFGFLDGSTSLoad global, store shared (async copy)
YQTQRCONELDGDEPBARLoad global with dependency barrier
YQTZPLDGMCLoad global multicast
YQGENZLDTRAMLoad texture RAM
ZRZONEMEMBARMemory barrier
ZRZFRGMEMSETMemory set
NGBZTATOMGAtomic (global)
NGBZFATOMSAtomic (shared)
FHNGBZSUATOMSurface atomic
FHERQSUREDSurface reduction
FHDHRELSUQUERYSurface query
FPNGGRESCATTERScatter store
TNGUREGATHERGather load
SBBGCEVAGFOOTPRINTTexture footprint query

Control Flow

ROT13DecodedDescription
OEN_VZZBRA_IMMBranch (immediate offset)
WZC_VZZJMP_IMMJump (immediate)
OERNXBREAKBreak from loop
OFLAPBSYNCBarrier sync (convergence)
SRAPR_TFENCE_GFence (global)
SRAPR_FFENCE_SFence (shared)
SRAPR_GFENCE_TFence (texture)
CERRKVGPREEXITPre-exit annotation
REEONEERRBARError barrier / NOP padding
QRCONEDEPBARDependency barrier
LVRYQYIELDYield execution
ABCNOPNo operation
IBGRVOTEWarp vote
ZNGPUMATCHWarp match
ERQHKREDUXWarp reduction
RYRPGELECTWarp elect (leader selection)
JNECFLAPWARPSYNCWarp synchronization
ANABFYRRCNANOSLEEPNanosecond sleep
ANABGENCNANOTRAPNano trap (debug)
NEEVIRFARRIVESArrive signal

Warp Synchronization (SM90+ Mercury)

ROT13DecodedDescription
JNECTEBHCWARPGROUPWarpgroup operation
JNECTEBHCFRGWARPGROUPSETWarpgroup set
RAQPBYYRPGVIRENDCOLLECTIVEEnd collective operation
FLAPFSYNCSSync with scoreboard
NPDOYXACQBLKAcquire block
NPDOHYXACQBULKAcquire bulk
NPDFUZVAVGACQSHMINITAcquire shared memory init
PPGYCCTLCache control
PPGYYCCTLLCache control L1
PPGYGCCTLTCache control texture
HPPGYUCCTLUniform cache control

Matrix Multiply (Tensor Core)

ROT13DecodedDescription
UZZN_16HMMA_16Half-precision MMA 16-wide
UZZN_16816HMMA_16816HMMA 16x8x16
UZZN_1688HMMA_1688HMMA 16x8x8
UZZN_32HMMA_32HMMA 32-wide
UZZN_FC_1688HMMA_SP_1688Sparse HMMA 16x8x8
VZZN_16816IMMA_16816Integer MMA 16x8x16
VZZN_16832IMMA_16832Integer MMA 16x8x32
VZZN_88IMMA_88Integer MMA 8x8
VZZN_FC_16832IMMA_SP_16832Sparse integer MMA
VZZN_FC_88IMMA_SP_88Sparse integer MMA 8x8
JTZZNWGMMAWarpgroup MMA (SM90+)

SM100+ Blackwell Matrix Ops

ROT13DecodedDescription
DZZN_16816QMMA_16816Quantized MMA 16x8x16
DZZN_16832QMMA_16832Quantized MMA 16x8x32
DZZN_FS_16832QMMA_SF_16832QMMA with scale factor
DZZN_FS_FC_16864QMMA_SF_SP_16864QMMA with scale + sparsity
DZZN_FC_12864QMMA_SP_12864Sparse QMMA 128x64
DZZN_FC_16832QMMA_SP_16832Sparse QMMA 16x8x32
OZZN_168128BMMA_168128Binary MMA 168x128
OZZN_168256BMMA_168256Binary MMA 168x256
OZZN_88128BMMA_88128Binary MMA 88x128
BZZN_16864OMMA_16864Output MMA 168x64
BZZN_FC_168128OMMA_SP_168128Sparse output MMA
ZKDZZNMXQMMAMixed-precision quantized MMA
ZKDZZN_FS_16832MXQMMA_SF_16832MXQMMA with scale factor
OTZZNBGMMABlackwell group MMA
OTZZN_TFOBGMMA_GSBBGMMA with group scoreboard
QTZZNDGMMADouble-precision group MMA
QTZZN_TFODGMMA_GSBDGMMA with group scoreboard
VTZZNIGMMAInteger group MMA
VTZZN_TFOIGMMA_GSBIGMMA with group scoreboard
UTZZNHGMMAHalf-precision group MMA
UTZZN_TFOHGMMA_GSBHGMMA with group scoreboard

SM100+ Unified Tensor Core (UTC)

ROT13DecodedDescription
HGPONE_1PGNUTCBAR_1CTAUTC barrier (1 CTA)
HGPONE_2PGNUTCBAR_2CTAUTC barrier (2 CTA)
HGPPC_1PGNUTCCP_1CTAUTC copy (1 CTA)
HGPPC_2PGNUTCCP_2CTAUTC copy (2 CTA)
HGPZZN_1PGNUTCMMA_1CTAUTC MMA (1 CTA)
HGPZZN_2PGNUTCMMA_2CTAUTC MMA (2 CTA)
HGPFUVSG_1PGNUTCSHIFT_1CTAUTC shift (1 CTA)
HGPFUVSG_2PGNUTCSHIFT_2CTAUTC shift (2 CTA)
HGPNGBZFJFUTCATOMSWSUTC atomic (SWS)
HGPYQFJFUTCLDSWSUTC load (SWS)
HGPFGFJFUTCSTSWSUTC store (SWS)
GPTRA05TCGEN05Tensor core generation 5
HGZNPPGYUTMACCTLUTC macro cache control
HGZNY2PPGYUTMAL2CCTLUTC MAL L2 cache control
HGZNYQTUTMALDGUTC MAL load global
HGZNYFGUTMALSTUTC MAL store
HGZNCSUTMAPFUTC MAP (future)
HGZNFGTUTMASTGUTC MA store global
HGZNERQTUTMAREDGUTC MA reduction global
HGZERQTUTMREDGUTC M reduction global

Barrier/MBarrier

ROT13DecodedDescription
ZONEEVRE_NEEVIRMBARRIER_ARRIVEMBarrier arrive
ZONEEVRE_NEEVIR_QEBCMBARRIER_ARRIVE_DROPMBarrier arrive + drop
ZONEEVRE_PC_NFLAP_NEEVIRMBARRIER_CP_ASYNC_ARRIVEMBarrier cp-async arrive
ZONEEVRE_VAVGMBARRIER_INITMBarrier init
ZONEEVRE_VAINYMBARRIER_INVALMBarrier invalidate
ZONEEVRE_GEL_JNVGMBARRIER_TRY_WAITMBarrier try-wait
ZONEEVRE_GEL_JNVG_CNEVGLMBARRIER_TRY_WAIT_PARITYMBarrier try-wait with parity
ZONEEVRE_GRFG_JNVGMBARRIER_TEST_WAITMBarrier test-wait
ZONEEVRE_GRFG_JNVG_CNEVGLMBARRIER_TEST_WAIT_PARITYMBarrier test-wait with parity
ONE_VAQRKRQBAR_INDEXEDBarrier (indexed)

Texture/Surface/Sampling

ROT13DecodedDescription
GRKGHERTEXTURETexture operation
GRYCITEYPLTexture eyeply?
CVKUFPIXHSPixel half-sample
CVKYQPIXLDPixel load
INOFQVSSVABSDIFFVector absolute difference
INOFQVSS4VABSDIFF4Vector absolute diff (4-wide)
PERQHKCREDUXPredicate reduction
CZGEVTPMTRIGPerformance monitor trigger
PFZGRFGCSMTESTCSM test

Special/Miscellaneous

ROT13DecodedDescription
TRAZRGNQNGNGENMETADATAGenerate metadata
TRGYZRZONFRGETLMEMBASEGet local memory base
FRGYZRZONFRSETLMEMBASESet local memory base
FRGPGNVQSETCTAIDSet CTA ID
FRGZNKERTSETMAXREGSet max registers
FRGFZRZFVMRSETSMEMSIZESet shared memory size
FCNEFVSLSPARSIFYSparsify operation
FCZRGNQNGNSPMETADATASparsity metadata
QRPBZCERFFDECOMPRESSDecompress
EPCZBIRPCMOVRPC move
HGENPRRIRAGUTRACEEVENTTrace event (GPU profiling)
HIVEGPBHAGUVIRTCOUNTUniform virtual count
HTRGARKGJBEXVQUGETNEXTWORKIDUniform get next work ID

Data Conversion

ROT13DecodedDescription
S2S_KF2F_XFloat-to-float convert (extended)
S2V_KF2I_XFloat-to-integer convert (extended)
V2S_KI2F_XInteger-to-float convert (extended)
SEAQ_KFRND_XFloat round (extended)
E2HE_UR2UR_HRegister to uniform register (half)

TTU (Thread Tracing Unit / Graphics)

ROT13DecodedDescription
GGHPPGYTTUCCTLTTU cache control
GGHPYBFRTTUCLOSETTU close
GGHTBTTUGOTTU go
GGHYQTTULDTTU load
GGHYQ_PYBFRTTULD_CLOSETTU load + close
GGHZNPEBTTUMACROTTU macro
GGHZNPEBSHFRTTUMACROFUSETTU macro fuse
GGHBCRATTUOPENTTU open
GGHFGTTUSTTTU store

UDP (Unified Data Path / SM100+)

ROT13DecodedDescription
HOYXPCUBLKCPUnified block copy
HOYXY2PPGYUBLKL2CCTLUnified block L2 cache control
HOYXCSUBLKPFUnified block prefetch
HOYXERQUBLKREDUnified block reduction
HQYPONEUDLCBARUDL barrier
HQYPPCUDLCCPUDL copy
HQYPUZZNUDLCHMMAUDL HMMA
HQYPVZZNUDLCIMMAUDL IMMA
HQYPDZZNUDLCQMMAUDL QMMA
HQCPOYXPCUDPCBLKCPUDPC block copy
HQCPOYXY2PPGYUDPCBLKL2CCTLUDPC block L2 cache control
HQCPOYXERQUDPCBLKREDUDPC block reduction
HQCPGZNPPGYUDPCTMACCTLUDPC TMA cache control
HQCPGZNY2PPGYUDPCTMAL2CCTLUDPC TMA L2 cache control
HQCPGZNYQTUDPCTMALDGUDPC TMA load global
HQCPGZNERQTUDPCTMAREDGUDPC TMA reduction global
HQCPGZNFGTUDPCTMASTGUDPC TMA store global

Mercury Passes (22 ROT13 Boolean Options)

All 22 Mercury-specific passes are registered in ctor_007 at addresses 0x425A40--0x426080. Each is a boolean enable/disable flag stored at a bit offset within the global options structure. The prefix ZREPHEL decodes to MERCURY -- the Blackwell (SM100+) codename.

ROT13 NameDecoded NameBit OffsetReg. AddressType
ZrephelNffhzrCGKCbegnovyvglMercuryAssumePTXPortability0x3D400x425A40assume
ZrephelPbzcnpgrqNffhzrfMercuryCompactedAssumes0x3D500x425A90assume
ZrephelPbafhzrNffhzrfMercuryConsumeAssumes0x3D600x425AE0assume
ZrephelPbairegreFgngfMercuryConverterStats0x3D700x425B30diagnostics
ZrephelQrcFgntrCersreAbaYvirvaCFOMercuryDepStagePreferNonLiveinPSB0x3D780x425B80scoreboard
ZrephelQvfnoyrYrtnyvmngvbaBsGrkGbHEObhaqMercuryDisableLegalizationOfTexToURBound0x3D800x425BD0legalization
ZrephelQhzcVafgfNfOvanelMercuryDumpInstsAsBinary0x3D900x425C20diagnostics
ZrephelRapbqrQrpbqrMercuryEncodeDecode0x3DA00x425C70encoding
ZrephelRapbqrArjJbexreSvyrfMercuryEncodeNewWorkerFiles0x3DB00x425CC0encoding
ZrephelSbeprVFNPynffMercuryForceISAClass0x3DB80x425D10ISA/target
ZrephelSbeprHaxabjaGptra05NggeMercuryForceUnknownTcgen05Attr0x3DB90x425D60ISA/target
ZrephelTraFnffHPbqrMercuryGenSassUCode0x3DC00x425DB0codegen
ZrephelVafregNffhzrfMercuryInsertAssumes0x3DD00x425E00assume
ZrephelVafregOnpxrqtrQrconeMercuryInsertBackedgeDepbar0x3DE00x425E50scoreboard
ZrephelVafregKoybpxJnvgMercuryInsertXblockWait0x3DF00x425EA0scoreboard
ZrephelVffhrQrynlJOFgnyyFrysYbbcMercuryIssueDelayWBStallSelfLoop0x3E000x425EF0scoreboard
ZrephelZretrCebybthrOybpxfMercuryMergePrologueBlocks0x3E100x425F40codegen
ZrephelCerfhzrKoybpxJnvgOrarsvpvnyMercuryPresumeXblockWaitBeneficial0x3E180x425F90scoreboard
ZrephelGrcvqNjnerFoMercuryTepidAwareSb0x3E200x425FE0scheduling
ZrephelGenpxZhygvErnqfJneYngraplMercuryTrackMultiReadsWarLatency0x3E300x426030scheduling
ZrephelHfrNpgvirGuernqPbyyrpgvirVafgfMercuryUseActiveThreadCollectiveInsts0x3E400x426080ISA/target
NqinaprqFOPebffOybpxZrephelNffhzrAdvancedSBCrossBlockMercuryAssume0x5B00x4129E0scoreboard

Related global switches (also ROT13-encoded):

ROT13DecodedReg. Address
HfrZrepFrznagvpfUseMercSemantics0x424BE0
HfrZrepErfbheprfUseMercResources0x424B90
QhzcZrepBcPbhagfDumpMercOpCounts0x410F30

Mercury Builtin Instructions (644 Templates)

The 644 ZREPHEL_* strings are SASS instruction templates for SM100+ (Blackwell) hardware intrinsic operations. Each template encodes a specific operand pattern (source register types, destination types, synchronization modes). Organized by operation family:

FamilyTemplate CountExample Decoded
mbarrier124MERCURY_mbarrier_arrive_drop_shared_cluster_wcoopr
barrier86MERCURY_barrier_cta_red_popc_sync_unaligned
warpgroup40MERCURY_warpgroup_mma_sp_fp16_sync_srcs_r4_ur4_0
atom36MERCURY_atom_global_fp_acq_rel_dests_p_r
fence32MERCURY_fence_tensormap_generic_release_scope_cluster_cta_gpu_sys
redux32MERCURY_redux_f32_sync_unaligned_srcs_r_0
addmin24MERCURY_addmin
max24MERCURY_max_dests_p
elect20MERCURY_elect_sync_unaligned
min20MERCURY_min_dests_p
max318MERCURY_max3_fp
match16MERCURY_match_all_sync_unaligned
vabsdiff414MERCURY_vabsdiff4_srcs_ur_r_0
mov14MERCURY_mov_b32_dests_ur_srcs_sr_0
createpolicy12MERCURY_createpolicy_block
mapa12MERCURY_mapa_copy_generic_dests_r2
vote12MERCURY_vote_sync_unaligned_srcs_r_0
addmax10MERCURY_addmax
cvt10MERCURY_cvt_f16x8_u4x8
cvta10MERCURY_cvta_generic_shared_cluster_dests_ur2
fma8MERCURY_fma_f32x2
red8MERCURY_red_global_fp_release_policy
shfl8MERCURY_shfl_sync_unaligned
st8MERCURY_st_shared_cta_release
cp6MERCURY_cp_async_bulk
ld6MERCURY_ld_shared_cta_acquire
min36MERCURY_min3_int
sad6MERCURY_sad
add4MERCURY_add_in16x2_dests_r
multimem4MERCURY_multimem_red_release_fp
predict4MERCURY_predict_merge_1
tcgen054MERCURY_tcgen05_ld_16dp32bitx2_0
griddepcontrol2MERCURY_griddepcontrol
selmov2MERCURY_selmov

Compiler Pass Names (Plaintext, Not ROT13)

The master phase table at 0x2443000--0x2445000 contains 151 plaintext pass name strings defining the full compilation pipeline order, cross-referenced from an adjacent pointer table starting around 0x2443ff0. These are the same names emitted in "After <PassName>" diagnostic output. They are stored as plaintext, without any ROT13 encoding -- a prior version of this page incorrectly listed them as ROT13.

Evidence of plaintext storage (spot-checks from nvlink_strings.json, all type=0 = ASCII string pool entry):

String-pool addrPlaintext valuePointer-table xref
0x24433f7OriSanitize0x2443fe8
0x2443403GeneralOptimizeEarly0x2443ff0
0x2443418DoSwitchOptFirst0x2444000
0x2443429OriBranchOpt0x2444008
0x2443436OriPerformLiveDeadFirst0x2444010
0x244344eOptimizeBindlessHeaderLoads0x2444018
0x244346aOriLoopSimplification--
0x2443480OriSplitLiveRanges--
0x24434b0OriLoopUnrolling0x2444048
0x24434c1GenerateMovPhi0x2444050
0x24439baOriPerformLiveDeadFourth--
0x2443c07PostSchedule--
0x2443c2bPlaceBlocksInSourceOrder--
0x2443ca2MercEncodeAndDecode--
0x2443d02MercGenerateSassUCode--
0x2443dc1DebuggerBreak--

The string-pool region for these pass names spans 0x24433f7 (OriSanitize, first entry) to 0x2443dc1 (DebuggerBreak, last entry). The previous wiki version listed BevYbbcHaebyyvat as the ROT13 form of OriLoopUnrolling; direct grep against nvlink_strings.json returns zero matches for that encoded token -- it does not exist anywhere in the binary. Only the plaintext OriLoopUnrolling (decoder-output form) is present, and it sits at 0x24434b0 -- inside the exact address range the old "ROT13 phase table" claim named. The root cause of the original error was noticing the address range and assuming the uniform CamelCase naming scheme implied ROT13, without verifying that the stored bytes were actually encoded.

Why plaintext and not ROT13? These names are emitted verbatim into "After <PassName>" diagnostic messages and --verbose/timing output. Obfuscating them would require either a decode step on every diagnostic print (wasteful) or a dedicated plaintext mirror table (redundant). Mercury-specific options (the 22 ctor_007-registered booleans with the Zrephel... prefix) are ROT13-encoded because their names are only consulted at option-parsing time, but the master phase-table names are consulted on every timing dump and diagnostic emission.

The 151 plaintext pass names in pipeline order (grouped by phase) are catalogued in ptxas/peephole.md (for the Ori* family) and ptxas/overview.md (for the full master phase table). Only the encoding classification is corrected here; the pipeline order, pass count, and pass-name list themselves remain accurate in those pages.

ROT13-Encoded ELF Section Name Annotations

Four ELF section-name suffix annotations are stored ROT13-encoded in the binary as isolated string-pool entries. These annotate SASS-level metadata for memory ordering and data format constraints. All four are verbatim confirmed in nvlink_strings.json:

AddrROT13 in BinaryDecoded NameDescription
0x2272970.flap_erfgevpg::funerq::ernq::zzn::n.sync_restrict::shared::read::mma::aMemory sync restriction for shared MMA reads
0x2272995.npp::s16.acc::f16Accumulator section for FP16 data
0x22729ad.fc::2gb4.sp::2to4Sparsity annotation for 2:4 structured sparsity
0x22729bd.eryrnfr::beqrerq.release::orderedMemory ordering (release-ordered)

Mercury debug section name prefix (.nv.merc -> .ai.zrep)

A related but partially runtime-assembled family of 22 Mercury debug section names uses the ROT13 prefix .ai.zrep (= .nv.merc) plus a DWARF section suffix. Only one representative fragment -- .ai.erfreirqFzrz.bssfrg at 0x1f245d7 (= .nv.reservedSmem.offset) -- exists as an isolated string-pool entry. The 22 individual debug section names (.nv.merc.debug_info, .nv.merc.debug_line, etc.) are not present as standalone strings in the string pool; they are built at runtime via concatenation of the ROT13 prefix with a DWARF suffix and ROT13-decoded on demand.

The mapping for reference (prefix is stored and decoded at runtime):

Decoded runtime nameComposition (prefix + suffix)
.nv.merc.ai.zrep (ROT13 prefix, stored)
.nv.merc.debug_info.ai.zrep + .qroht_vasb
.nv.merc.debug_line.ai.zrep + .qroht_yvar
.nv.merc.debug_abbrev.ai.zrep + .qroht_nooeri
.nv.merc.debug_aranges.ai.zrep + .qroht_nenatrf
.nv.merc.debug_frame.ai.zrep + .qroht_senzr
.nv.merc.debug_loc.ai.zrep + .qroht_ybp
.nv.merc.debug_macinfo.ai.zrep + .qroht_znpvasb
.nv.merc.debug_pubnames.ai.zrep + .qroht_choanzrf
.nv.merc.debug_pubtypes.ai.zrep + .qroht_choglcrf
.nv.merc.debug_ranges.ai.zrep + .qroht_enatrf
.nv.merc.debug_str.ai.zrep + .qroht_fge
.nv.merc.nv_debug_ptx_txt.ai.zrep + .ai_qroht_cgk_gkg
.nv.merc.nv_debug_line_sass.ai.zrep + .ai_qroht_yvar_fnff
.nv.merc.nv_debug_info_reg_sass.ai.zrep + .ai_qroht_vasb_ert_fnff
.nv.merc.nv_debug_info_reg_type.ai.zrep + .ai_qroht_vasb_ert_glcr
.nv.merc.symtab_shndx.ai.zrep + .flzgno_fuaqk
.nv.merc.rela.ai.zrep + .eryn
.nv.merc.nv.shared.reserved..ai.zrep + .ai.funerq.erfreirq.
.nv.reservedSmem.ai.erfreirqFzrz (fragment confirmed at 0x1f245d7)
.entry_image_header_indices.ragel_vzntr_urnqre_vaqvprf (runtime-assembled)

Confidence for this runtime-assembled family is LOW -- the ROT13 mapping is mechanically correct (trivially inverse-checkable), but the individual encoded forms are not verifiable via string-pool dumps alone and require decompiler confirmation of the concatenation site.

Selected Knob/Option Names

Over 1,287 ROT13-encoded configuration knob names control the compiler's behavior. Listed here organized by subsystem with selected highlights:

Register Allocation Knobs

ROT13Decoded
ErtNyybpHfreFzrzOlgrfCrePGNRegAllocUserSmemBytesPerCTA
ErtNyybpGuerfubyqSbeQvfpneqPbasyvpgfRegAllocThresholdForDiscardConflicts
ErtNyybpFcvyyKOybpx2RegAllocSpillXBlock2
ranoyr_fzrz_fcvyyvatenable_smem_spilling

Scheduling Knobs

ROT13Decoded
FpurqFlapfCunfrpuxYngraplSchedSyncsPhasechkLatency
FpurqRfgvzngrqYbbcVgrengvbafSchedEstimatedLoopIterations
FpurqYQFYngraplSchedLDSLatency
FpurqYQTOngpuQrynlOvnfSchedLDGBatchDelayBias
FpurqPebffOybpxVafgfGbFcrphyngrSchedCrossBlockInstsToSpeculate
FpurqErfOhflZnpuvarBcpbqrSchedResBusyMachineOpcode

Code Sinking Knobs

ROT13Decoded
FvaxGrkErnqVafgEngvbSinkTexReadInstRatio
FvaxGrkZnkErtGnetrgFpnyrSinkTexMaxRegTargetScale
FvaxGrkVafgfGbVPnpurEngvbSinkTexInstsToICacheRatio
FvaxErzngRanoyrSinkRematEnable
FvaxErzngOhqtrgSinkRematBudget
FvaxPbqrVagbFcyvgOybpxSinkCodeIntoSplitBlock

Loop Optimization Knobs

ROT13Decoded
HaebyyFznyyYbbcYvzvgUnrollSmallLoopLimit
HaebyyZhygvOybpxYbbcfUnrollMultiBlockLoops
HaebyyVafgYvzvgUnrollInstLimit
HaebyyShyyVafgYvzvgUnrollFullInstLimit
HaebyyHaxabjaVafgYvzvgUnrollUnknownInstLimit
FgntrNaqSraprZnkYbbcfStageAndFenceMaxLoops

Texture/Speculation Knobs

ROT13Decoded
FcrphyngvirylUbvfgGrkZnkVafgfSpeculativelyHoistTexMaxInsts
FcrphyngvirylUbvfgGrkZnkAhzGrkVafgfVaFbheprSpeculativelyHoistTexMaxNumTexInstsInSource
FcrphyngvirylUbvfgGrkZnkAhzGrkVafgfVaGnetrgSpeculativelyHoistTexMaxNumTexInstsInTarget
FcrphyngvirylUbvfgGrkZnkAhzGrkVafgfVaOngpuSpeculativelyHoistTexMaxNumTexInstsInBatch
GrkGbVafgEngvbTexToInstRatio

AdvancedSB (Scoreboard) Knobs

ROT13Decoded
NqinaprqFOPebffOybpxAdvancedSBCrossBlock
NqinaprqFOPebffOybpxOhqtrgAdvancedSBCrossBlockBudget
NqinaprqFOQrconeOnpxrqtrAdvancedSBDepbarBackedge
NqinaprqFOQrconeQvfgnaprVaGvzrAdvancedSBDepbarDistanceInTime
NqinaprqFOErfreirq1AdvancedSBReserved1
NqinaprqFOErfreirqUZZNAdvancedSBReservedHMMA
NqinaprqFOFgnyyYvzvgAdvancedSBStallLimit
NqinaprqFOHfrYbbcUrnqreUrhevfgvpAdvancedSBUseLoopHeaderHeuristic

Disable Flags (64 identified)

Selected flags that disable specific optimizations:

ROT13Decoded
QvfnoyrQrnqYbbcRyvzvangvbaDisableDeadLoopElimination
QvfnoyrQrnqFgberRyvzvangvbaDisableDeadStoreElimination
QvfnoyrRneylRkgenpgOPBDisableEarlyExtractBCO
QvfnoyrReeoneNsgreZrzoneDisableErrbarAfterMembar
QvfnoyrUZZNErtNyybpJneDisableHMMARegAllocWar
QvfnoyrSnfgirpRaunaprzragDisableFastvecEnhancement
QvfnoyrSbejneqCebterffJne1842954DisableForwardProgressWar1842954

Cross-References

Confidence Assessment

ROT13 is a self-inverse substitution cipher (A<->N, B<->O, ..., M<->Z). Mapping correctness for any given (encoded, decoded) pair is mechanically checkable, so HIGH confidence requires both: (1) the encoded form found verbatim in nvlink_strings.json, and (2) the listed plaintext being the correct ROT13 inverse. This page was revised under P083 to remove an incorrect claim that the 151 compiler pass names at 0x2443000--0x2445000 were ROT13-encoded; they are stored as plaintext.

Spot-check verification (14 entries from nvlink_strings.json):

ROT13 EncodedDecodedAddr in strings JSONROT13 Correct?Confidence
JTZZNWGMMA0x1f24874 (line 59353)yesHIGH
ZrephelTraFnffHPbqrMercuryGenSassUCode0x23f2cb0 (line 240957)yesHIGH
ZrephelNffhzrCGKCbegnovyvglMercuryAssumePTXPortability0x23f2e80 (line 241221)yesHIGH
ZrephelRapbqrQrpbqrMercuryEncodeDecodeline 241053yesHIGH
ErtNyybpFcvyyKOybpx2RegAllocSpillXBlock20x23f5c30 (line 248301)yesHIGH
QvfnoyrQrnqYbbcRyvzvangvbaDisableDeadLoopElimination0x23fa950 (line 260073)yesHIGH
NqinaprqFOPebffOybpxAdvancedSBCrossBlockline 264597yesHIGH
HGPONE_1PGNUTCBAR_1CTAline 265867yesHIGH
VZNQ_JVQR (qualified)IMAD_WIDEline 269277yesHIGH
.flap_erfgevpg::funerq::ernq::zzn::n.sync_restrict::shared::read::mma::a0x2272970 (line 232924)yesHIGH
.npp::s16.acc::f160x2272995 (line 232956)yesHIGH
.fc::2gb4.sp::2to40x22729ad (line 232988)yesHIGH
.eryrnfr::beqrerq.release::ordered0x22729bd (line 233052)yesHIGH
.ai.erfreirqFzrz.bssfrg.nv.reservedSmem.offset0x1f245d7 (line 58921)yesHIGH

All 14 ROT13 spot-checks verified at HIGH confidence. The former BevYbbcHaebyyvat / OriLoopUnrolling entry has been removed from the spot-check table; the token does not exist in the binary (see the Compiler Pass Names section for the plaintext treatment at 0x24434b0).

Plaintext verification (strings previously claimed to be ROT13):

Plaintext StringAddr in strings JSONClaim in old wikiReality
OriSanitize0x24433f7ROT13 BevFnavgvmrPlaintext only; encoded form absent
GeneralOptimizeEarly0x2443403ROT13 TrarenyBcgvzvmrRneylPlaintext only; encoded form absent
DoSwitchOptFirst0x2443418ROT13 QbFjvgpuBcgSvefgPlaintext only; encoded form absent
OriBranchOpt0x2443429ROT13 BevOenapuBcgPlaintext only; encoded form absent
OriPerformLiveDeadFirst0x2443436ROT13 BevCresbezYvirQrnqSvefgPlaintext only; encoded form absent
OptimizeBindlessHeaderLoads0x244344eROT13 BcgvzvmrOvaqyrffUrnqreYbnqfPlaintext only; encoded form absent
OriLoopUnrolling0x24434b0ROT13 BevYbbcHaebyyvatPlaintext only; encoded form absent
OriPerformLiveDeadFourth0x24439baROT13 BevCresbezYvirQrnqSbheguPlaintext only; encoded form absent
PostSchedule0x2443c07ROT13 CbfgFpurqhyrPlaintext only; encoded form absent
PlaceBlocksInSourceOrder0x2443c2bROT13 CynprOybpxfVaFbheprBeqrePlaintext only; encoded form absent
MercEncodeAndDecode0x2443ca2ROT13 ZrepRapbqrNaqQrpbqrPlaintext only; encoded form absent
MercGenerateSassUCode0x2443d02ROT13 ZrepTrarengrFnffHPbqrPlaintext only; encoded form absent
DebuggerBreak0x2443dc1ROT13 QrohttreOernxPlaintext only; encoded form absent

Every spot-checked pass name inside 0x24433f7--0x2443dc1 is present in nvlink_strings.json as plaintext, and none of the previously-claimed ROT13 forms exist anywhere in the binary. The encoding-format claim for this region is definitively refuted.

Bulk verification:

AspectConfidenceBasis
ROT13 cipher mapping (A<->N etc.)HIGHSelf-inverse alphabet substitution; trivially checkable for any pair; the prefixes ZREPHEL -> MERCURY and Zrephel -> Mercury confirmed mechanically on every spot-checked entry
ROT13 decoder function (sub_1A40AC0)MEDIUMFunction exists at address 0x1a40ac0 (decompiled file present at decompiled/sub_1A40AC0_0x1a40ac0.c); the "SIMD-vectorized _mm_load_si128 16-byte chunks" claim is plausible given function size but not re-verified in this pass
Mercury builtin templates count (644 ZREPHEL_*)HIGHgrep -c '"ZREPHEL_' nvlink_strings.json returns exactly 644; matches the wiki claim
Mercury pass names (22 entries via ctor_007)HIGHAll 22 use the ROT13 prefix Zrephel (= Mercury); spot-checked entries (ZrephelNffhzrCGKCbegnovyvgl, ZrephelRapbqrQrpbqr, ZrephelTraFnffHPbqr) all present in strings; ctor_007 registration address range 0x425A40--0x426080 matches bit-offset progression
Knob/option names (~1,287 ROT13)HIGHSpot-checked knobs (ErtNyybpFcvyyKOybpx2, NqinaprqFOPebffOybpx, QvfnoyrQrnqYbbcRyvzvangvba) all confirmed at stated addresses; aggregate count of ~1,287 not exhaustively recounted
SASS opcode mnemonics (qualified forms)HIGHQualified forms VZNQ.JVQR, VZNQ.UV, HVZNQ, SSZN2, SSZN32V, JTZZN all present in strings
SASS opcode mnemonics (bare forms)MEDIUMBare VZNQ and SSZN are NOT present as isolated string-pool entries -- only suffix-qualified variants appear as standalone strings. Bare base mnemonics live inside per-arch packed opcode tables (sub_1769B50 SM70, sub_1782540 SM100, sub_1848F70 SM120). ROT13 mapping itself is trivially correct
Compiler pass names at 0x2443000--0x2445000HIGH (plaintext)All 151 pass names are stored as plaintext, confirmed by 13 spot-check addresses spanning 0x24433f7 (OriSanitize) to 0x2443dc1 (DebuggerBreak). The prior wiki claim that this region held ROT13 strings is refuted; the correction is made inline above. Pipeline order and pass-name list remain accurate
ELF section-name annotations (4 meaningful)HIGH.flap_erfgevpg::funerq::ernq::zzn::n (0x2272970), .npp::s16 (0x2272995), .fc::2gb4 (0x22729ad), .eryrnfr::beqrerq (0x22729bd) all confirmed in strings at consecutive addresses; ROT13 decodes verified
Mercury debug section names (22 .ai.zrep.* entries)LOWNone of the 22 listed .ai.zrep.qroht_* ROT13 forms appear as isolated strings in the binary string pool. Only one .ai.* ROT13 fragment (.ai.erfreirqFzrz.bssfrg = .nv.reservedSmem.offset) is present at 0x1f245d7. The 22 debug section names are constructed at runtime via concatenation (.ai.zrep prefix + DWARF section suffix, both ROT13), decoded after concatenation. ROT13 mapping itself is correct
Total ROT13-encoded string count (~30,000+)LOWAggregate figure is plausible but not verified by direct counting; only specific subsets (644 ZREPHEL_*, 22 Mercury passes, ~20 spot-checked knobs/opcodes/sections) were exactly counted
Architecture / opcode counts per arch (SM70 ~130, SM100 ~400, SM120 ~400+)MEDIUMConstructor function sizes match the wiki listing; per-arch entry counts not individually re-verified in this pass

Revision history

  • P050c-3 -- added initial Confidence Assessment; detected and flagged that compiler pass names at 0x2443000--0x2445000 are plaintext, not ROT13.
  • P083 -- full page rewrite to remove the incorrect "ROT13 compiler phase table" section and replace it with an explicit plaintext catalogue with spot-verified addresses. Purpose section added. Overview table restructured to distinguish ROT13 from plaintext categories. Mercury debug section names relabelled as "runtime-assembled from ROT13 prefix + suffix" to reflect that the 22 individual forms are not present as isolated strings.