Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Embedded ptxas Options

nvlink v13.0.88 embeds a complete copy of the ptxas assembler/compiler backend. When that backend is invoked (for LTO compilation, PTX JIT, or Mercury finalization), it accepts its own independent set of command-line options separate from nvlink's own CLI flags. Two functions define this option set: sub_1103030 builds the option definition table (long/short names, types, defaults, help text), and sub_1104950 parses argv against that table and populates a compiler-state structure at known offsets.

Option definition buildersub_1103030 at 0x1103030 (29,803 bytes / 1,249 lines)
Option parser / extractorsub_1104950 at 0x1104950 (37,578 bytes / 1,208 lines)
Compilation driver (consumer)sub_1112F30 at 0x1112F30 (65,018 bytes / 2,164 lines)
Feature flag configuratorsub_1100E50 at 0x1100E50 (13,759 bytes / 451 lines)
Shared parser infrastructuresub_42F130 (register), sub_42E390 (extract), sub_42E580 (was-specified)
Version string"Cuda compilation tools, release 13.0, V13.0.88"

How Options Reach the Embedded ptxas

The embedded ptxas backend never reads directly from the user's command line. Options flow through two paths:

  1. -Xptxas passthrough. nvlink's own CLI parser (sub_427AE0) accumulates -Xptxas values into a linked list at qword_2A5F238. The LTO pipeline's sub_429BA0 serializes this list into a space-separated string that becomes part of the argv vector passed to the embedded ptxas entry point.

  2. Programmatic forwarding. sub_429BA0 also appends options derived from nvlink's own state: -arch=sm_N from the target architecture, -maxreg=N from --maxrregcount, -Ofast-compile=LEVEL, -g from --debug, -generate-line-info, and several others. See Option Forwarding to cicc for the complete forwarding logic.

The combined argv vector is passed to sub_1104950, which calls sub_1103030 to build the option table, then parses the vector using the shared sub_42E5A0 parser. Parsed values are extracted with sub_42E390 into a compiler-state structure (base pointer a3), where each option writes to a specific byte offset.

Option Definition Table (sub_1103030)

sub_1103030 creates a fresh option parser instance via sub_42DFE0(0) and registers all options via sub_42F130. Each sub_42F130 call passes:

sub_42F130(parser, long_name, short_name, type, multiplicity, flags, context,
           allowed_keywords, reserved, default_value, reserved2,
           value_placeholder, help_text)

Option types: 0 = file-list, 1 = bool, 2 = string, 4 = integer, 5 = int64, 7 = special. Multiplicity: 0 = flag-only (bool), 1 = single value, 2 = multi-value (accumulates), 3 = multi-value with keyword validation.

After registering all options, sub_1103030 invokes sub_42E5A0 to parse the passed argc/argv against the table, then checks for --trap-into-debugger (calls sub_42FA60 to install signal handlers), --tool-name (updates the internal program name), --help (prints help and exits), and --version (prints version banner and exits). The version banner reads:

ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:55:12_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0

Complete Option Catalog

The following catalog lists every option registered by sub_1103030, in registration order. The "State Offset" column shows where sub_1104950 stores the parsed value within the compiler-state structure (base pointer a3). The "Size" column indicates the sub_42E390 copy size.

Debug and Diagnostic Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
device-debuggbool""""a3+2881Generate debug information for device code
suppress-debug-infosuppress-debug-infobool""""(via ID)1Do not generate debug info sections in final output. Ignored without --device-debug or --generate-line-info
generate-line-infolineinfobool""""a3+1701Generate line-number information for device code
sp-bounds-checksp-bounds-checkbool----a3+2921Generate stack-pointer bounds-checking code. Enabled automatically with -g or -O0
device-stack-protectordevice-stack-protectorboolfalse<true|false>a3+3451Enable stack canaries. Compiler uses heuristics to assess per-function risk
device-stack-protector-frame-size-thresholddevice-stack-protector-sizeint16<N>a3+3484Stack frame size threshold for canary insertion. 0 = instrument all frames
debug-infodebug-infostring""<String>(via ID)8File path for DWARF information output
link-infolink-infostring""<String>a3+1768File path for imported/exported symbol names
verbose-tkinfoverbose-tkinfoboolfalse<true|false>a3+6121Emit object name and command-line in tkinfo section. Auto-enabled with -g
g-tensor-memory-access-checkg-tmem-access-checkbool----a3+6451Enable tensor memory access checks for tcgen05 ops. Default with -g
gno-tensor-memory-access-checkgno-tmem-access-checkbool----a3+6461Disable tensor memory access checks. Overrides g-tensor-memory-access-check
compiler-annotationsannotatebool----a3+6471Annotate compiler-internal information in binary output

Optimization Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
opt-levelOint3<N>a3+1484Optimization level (0-3)
Ofast-compileOfcstring0<0|min|mid|max>a3+1528Fast-compile level. max = fastest compile, mid = balanced, min = minimal impact, 0 = disabled
register-usage-levelregUsageLevelint5<0..10>a3+1604Aggressiveness of register-usage optimizations. Higher = more regs for better code
fast-compilefcbool----a3+4771EXPERIMENTAL: optimize compilation time at runtime performance cost
allow-expensive-optimizationsallow-expensive-optimizationsbool--<true|false>a3+4081Allow compiler to use maximum resources. Default: enabled for -O2 and above
fmadfmadbooltrue<true|false>(via ID)1Enable contraction of FP multiply+add into FMAD/FFMA/DFMA
fastimulfastimulbool""""a3+1681Enable 24-bit integer multiplication
limit-fold-fplimit-fold-fpboolfalse<true|false>a3+3401Enable/disable constant folding of float operations
optimize-float-atomicsopt-fp-atomicsbool----a3+3411Enable FP atomic optimizations that may affect precision
opt-pointersOpbool----a3+3221Optimize 64-bit pointers by truncating to 32-bit
cloningcloningstringyes<yes|no>(via v138)8Enable/disable cloning of device functions
noFwdPrgnoFwdPrgbool----a3+5681Disable forward progress optimization (internal)
cimmcimmbool----a3+2321Use immediate values for literal constants
disable-optimizer-constantsdisable-optimizer-constsbool----a3+2321Disable use of optimizer constant bank. Shares offset with cimm
no-fastregno-fastregbool----a3+2331Disable fast register allocation

Register and Launch Configuration

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
maxrregcountmaxrregcountstring--<archmax/archmin/N>(via nptr)8Maximum registers per GPU function. Accepts archmax, archmin, or integer N. Values below ABI minimum are bumped
device-function-maxrregcountfunc-maxrregcountstring--<archmax/archmin/N>(via v140)8Per-device-function register limit. Only effective with --compile-only. Overrides maxrregcount for device functions
minnctapersmminnctapersmint--<N>a3+4924Minimum CTAs per SM. Ignored if --maxrregcount is used
maxntidmaxntidstring--<Comma separated list>(via v141)8Maximum thread-block dimensions (up to 3 comma-separated values). Ignored if --maxrregcount is used
override-directive-valuesoverride-directive-valuesbool----a3+4961CLI values override PTX directives for minnctapersm, maxntid, maxrregcount

Compilation Mode Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
compile-onlycbool----a3+3421Generate relocatable object (separate compilation)
compile-as-tools-patchastoolspatchbool----a3+3431Compile patch code for CUDA tools. Forces maxrregcount to ABI minimum
extensible-whole-programewpbool----a3+5051Extensible whole-program compilation mode
compile-functionsfstring--<Comma separated list>(via list)8Compile only the named function(s)
entryestring--<entry function>(via list)8Entry function name
slrslrbool----a3+3531(Internal flag)
abi-compileabistringyes<yes>(via v137)8Enable ABI-compliant function compilation

Target Architecture

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
gpu-namearchstring(dynamic)<gpu name>a3+1928Target GPU. .target sm_XY compiles to sm_MN where MN >= XY. Also accepts virtual architectures (parsing only)
machinemint64<bits>a3+2804Host architecture bits. Only 64-bit supported

Cache Policy Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
def-load-cachedlcmstring""--a3+304 (processed)4Default cache modifier on global/generic load
force-load-cacheflcmstring""--a3+312 (processed)4Force cache modifier on global/generic load
def-store-cachedscmstring""--a3+308 (processed)4Default cache modifier on global/generic store
force-store-cachefscmstring""--a3+316 (processed)4Force cache modifier on global/generic store

Cache string values are converted to integer codes by sub_1102260. Specifying both force-load-cache and def-load-cache simultaneously is an error -- force takes precedence and def is zeroed. Same for the store pair.

Warning and Diagnostic Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
warn-on-local-memory-usagewarn-lmem-usagebool----a3+4721Warn if local memory is used
warn-on-spillswarn-spillsbool----a3+4731Warn if registers are spilled to local memory
warn-on-double-precision-usewarn-double-usagebool----a3+4741Warn if double-precision instructions are used
suppress-double-demote-warningsuppress-double-demote-warningbool----a3+3211Suppress warning when double-precision appears on non-DP-capable SM
suppress-stack-size-warningsuppress-stack-size-warningbool----a3+5041Suppress warning when stack size cannot be determined
suppress-async-bulk-multicast-advisory-warning(same)bool----a3+6141Suppress advisory for .multicast::cluster
suppress-sparse-mma-advisory-info(same)bool----a3+6151Suppress advisory info for mma.sp
warning-as-errorWerrorbool----a3+3231Treat all warnings as errors
disable-warningswbool----a3+3241Inhibit all warning messages

Profiling and Statistics

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
compiler-statscompilerStatsstring""<String>a3+728Print compiler statistics. Values: time/t, memory/m, phase-wise/p, detailed/d
compiler-stats-filecompilerStatsFilestring--<String>a3+808File for --compiler-stats output. Requires --compiler-stats
fdevice-time-tracetimeTraceFilestring--<String>a3+888Input trace JSON file for Chrome trace format output
use-trace-piduse-trace-pidint64--<N>a3+968PID for flamechart generation. Requires --fdevice-time-trace
ftrace-phase-afterftracePhaseAfterstring--<String>a3+1048Phase name for ftrace when ptxas invoked as library
verbosevbool----a3+181Print code generation statistics
profile-optionspostring""""a3+2968Profile-specific options

Code Generation Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
dont-merge-basicblocksno-bb-mergebool----a3+161Prevent basic block merging. Useful for debuggable code at slight performance cost
return-at-endret-endbool----a3+171Preserve final return instruction. Needed for breakpoints at function end
disable-smem-reservationdisable-smem-reservationboolfalse<true|false>a3+2341Disable shared memory reservation. Arch-gated: rejected for SM < sm_100
force-relaforce-relabool----a3+5691Force RELA relocations instead of REL
position-independent-codepicboolfalse<true|false>a3+6061Generate PIC. Enabled by default for whole-program compilation
preserve-relocspreserve-relocsbool----a3+4641Generate relocatable variable references and preserve relocations in linked executable
force-externalsfextbool----(via ID)1Generate device shadow variables as externals instead of statics (debug flow)
make-errors-visible-at-exit(same)bool----a3+3441Generate instructions at exit to make memory faults visible
set-texmode-rawset-texmode-rawboolfalse<true|false>a3+5991Set texture mode to raw (internal)
assume-extern-functions-do-not-sync(same)booltrue<true|false>a3+5761Assume extern functions do not synchronize. Rejected for SM < sm_90

Synchronization and Warp Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
legacy-bar-warp-wide-behavior(same)bool----a3+4021Retain legacy behavior where any thread executing bar counts as whole warp. Ignored for sm_70+. Deprecated
no-membermask-overlap(same)boolfalse<true|false>a3+6421Assert no sync instruction uses different overlapping masks
membermask-overlap(same)booltrue<true|false>a3+6431Assert sync instructions may use overlapping masks
print-potentially-overlapping-membermasks(same)bool----a3+6441Print locations of sync instructions with potentially overlapping masks. SM70-75 only
blocks-are-clusters(same)boolfalse<true|false>a3+6651Treat thread blocks as clusters. Rejected for SM < sm_100

Sanitizer Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
sanitizesanitizestring--<string>a3+6488Instrumentation sanitizer. Allowed: memcheck, threadsteer. Incompatible with --compile-as-tools-patch

Concurrency Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
split-compile(same)int--<N>a3+2844Max concurrent threads for compiler optimizations. 0 = CPU count, 1 = ignored
jobserverjobserverbool----a3+6091Enable GNU Jobserver support

PTX Input Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
input-as-stringiasstring--<ptx string>(via list)8Pass PTX as string instead of file. For runtime support or avoiding filesystem
okeyokspecial----a3+5284Deobfuscation key for obfuscated PTX input
ptx-lengthptxlenspecial----a3+5364Length of obfuscated PTX string. Requires --okey and vice versa
keykstringkey<string>(via list)8Hash value representing compiled device code

Output Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
output-fileostringelf.o<file>a3+488Output file path
list-versionversion-lsbool--------Print supported PTX ISA versions and exit

Video and Memory Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
disable-fast-video-emulation(same)boolfalse<true|false>a3+6131Disable fast video emulation
enable-extended-smem(same)boolfalse<true|false>a3+6661Enable extended shared memory (internal)
reserve-null-pointer(same)bool----(via v126)1Reserve address 0 as nil pointer
dont-reserve-null-pointer(same)bool----(via v127)1Do not reserve address 0. Overrides reserve-null-pointer regardless of order

CUDA API Options

Long NameShortTypeDefaultPlaceholderState OffsetSizeDescription
cuda-api-version(same)string--<major>.<minor>a3+580, a3+5844+4CUDA API version. Parsed as %u.%u, stored as major/minor at separate offsets
nv-host(same)string--<file>a3+6728Path to nv.host file. Incompatible with --extensible-whole-program and --compile-only

Workaround Flags (Bug-Fix Switches)

These are numbered software-workaround flags corresponding to internal NVIDIA bug IDs. They allow targeted behavior changes without modifying the broader optimization pipeline.

Long NameShortDefaultState OffsetSizeArch Gate
sw2614554sw2614554falsea3+2351--
sw2837879sw2837879falsea3+2361--
sw1729687sw1729687falsea3+4981SM 87-89 only (v23 in range 14-16)
sw200428197sw200428197falsea3+5001SM > 75 only (v23 > 18)
sw200387803sw200387803falsea3+5011-- (info-level diagnostic)
sw200764156sw200764156truea3+5021SM 89 only (v23 == 24)
sw4575628sw4575628falsea3+2371SM >= sm_100 only (v23 > 26)
sw4915215sw4915215falsea3+6641SM family 100 only. Rejected for virtual archs
sw4936628sw4936628falsea3+5031--
cuda32f3056bbbcudasw32f3056bbbfalsea3+6671--

Utility Options

Long NameShortTypeDefaultDescription
helphbool--Print help and exit
versionVbool--Print version and exit
options-fileoptffile-list--Include options from specified file
tool-nametool-namestring--Change internal tool name
trap-into-debugger_trap_bool--Install signal handlers for assertion/crash traps
uumnuumnbool--(Internal undocumented flag)
fdcmptfdcmptbool--(Internal compatibility flag, gated by SM and uumn)

Option Extraction and Validation (sub_1104950)

After sub_1103030 parses the argv, sub_1104950 calls sub_42E390 for each option to copy the parsed value into the compiler-state structure at a3. The extraction follows a strict order and includes validation logic:

Dependency Validation

sub_1104950 enforces mutual exclusion and dependency rules between options. These checks use sub_42E580 (was-specified) to distinguish "explicitly set" from "has default value":

RuleDiagnostic
--compiler-stats-file without --compiler-statsWarning; a3+80 zeroed
--use-trace-pid without --fdevice-time-traceWarning; a3+96 zeroed
--ftrace-phase-after without --fdevice-time-traceWarning; a3+104 zeroed
--register-usage-level > 10Warning; reset to 5
--register-usage-level with -O0Warning; reset to 5
--compile-only + --fast-compileWarning; fast-compile disabled
--compile-only + --extensible-whole-programWarning; ewp disabled
--nv-host + --extensible-whole-program or --compile-onlyWarning; nv-host zeroed
--device-debug + --generate-line-infoWarning; lineinfo disabled, sw2614554 cleared
force-load-cache + def-load-cacheWarning; def-load-cache zeroed
force-store-cache + def-store-cacheWarning; def-store-cache zeroed
--compile-as-tools-patch + --extensible-whole-programWarning; ewp disabled
--compile-as-tools-patch + --compile-onlyWarning; compile-only disabled
--compile-as-tools-patch + --fast-compileWarning; fast-compile disabled
--sanitize + --compile-as-tools-patchWarning; sanitize zeroed
device-function-maxrregcount in whole-program modeIgnored with warning
--okey without --ptx-lengthError
--ptx-length without --okeyError (unused flag warning)
--blocks-are-clusters for SM < sm_100Warning; flag zeroed
--disable-smem-reservation for SM < sm_100Warning; flag zeroed
--assume-extern-functions-do-not-sync for SM < sm_90Warning; flag zeroed
--print-potentially-overlapping-membermasks for SM > sm_75Warning; flag zeroed
--preserve-relocs for SM >= sm_100 with smem-reservationInfo diagnostic; flag zeroed

Ofast-Compile Interaction

The Ofast-compile level has cascading effects on other options. sub_1104950 validates the level string (max, mid, min, 0) and enforces:

LevelEffect on -OEffect on cloningEffect on sw2614554Other
maxForces -O0Forced offForced on--
midForces -O1Forced offForced on if no split-compilea3+572 = 1
minForces -O1Forced offForced on if no split-compilea3+572 = 1; allow-expensive-optimizations forced on
0No changeNo changeNo change--

maxrregcount Resolution

The maxrregcount value goes through a multi-step resolution in sub_1104950:

  1. If the string is "archmax", use *(arch_profile + 96) -- the architecture's maximum register count.
  2. If the string is "archmin", use *(arch_profile + 100) -- the architecture's minimum register count.
  3. Otherwise, parse as integer via strtoll.
  4. If --compile-as-tools-patch is active and the value exceeds the tools-patch minimum (8 * (SM >= 17) + 16), clamp and warn.
  5. If the integer exceeds the architecture maximum, clamp to *(arch_profile + 96) and warn.
  6. If the integer is below the ABI minimum for the profile, bump up and warn.
  7. Store the final value at a3+136.

The same resolution applies to device-function-maxrregcount, stored at a3+140.

SM Version Derivation

sub_1104950 calls sub_15C3DD0 to convert the gpu-name string (e.g. "sm_90") to an internal numeric version (v23). This integer is used throughout for architecture-gated validation:

v23 RangeArchitectureNotes
7-10SM 50-70 (Maxwell-Volta)Limited feature support
11-16SM 75-89 (Turing-Ada)sw1729687 range: 14-16
17-20SM 90-90a (Hopper)ABI mode threshold
21-26SM 100-103 (Blackwell)smem-reservation enabled
27+SM 110+ (future)Full feature set

The family lookup table at dword_1EED2E0 maps (v23 - 7) to an architecture family code (e.g. 100 for Blackwell). This family code gates sw4915215 (only family 100) and disable-smem-reservation behavior.

Compiler-State Structure Layout

The compiler-state structure populated by sub_1104950 is at least 680 bytes, accessed through base pointer a3. Key regions:

Offset RangeContents
0-8Parsed argv (post-processing)
16-17dont-merge-basicblocks, return-at-end
18-19verbose, virtual-arch flag
24-40Input file lists (PTX files, string inputs)
48-56Output file path
72-112Profiling: compiler-stats, time-trace, trace-pid, ftrace-phase
136-144maxrregcount (entry), maxrregcount (device-function)
148-168opt-level, Ofast-compile string, register-usage-level, machine, fastimul
170-176lineinfo, link-info
192-200gpu-name string pointer
232-237cimm/disable-opt-consts, no-fastreg, disable-smem-reservation, sw2614554/2837879/4575628
280-296machine bits, split-compile, device-debug, lineinfo, sp-bounds-check, profile-options
304-324Cache policies (load/store def/force), suppress-double-demote, opt-pointers, warning-as-error, disable-warnings
326-358ABI mode flags, compile-only, compile-as-tools-patch, make-errors-visible, device-stack-protector, slr, various mode flags
340-344limit-fold-fp, optimize-float-atomics, compile-only, compile-as-tools-patch, make-errors-visible-at-exit
402-409legacy-bar, allow-expensive-optimizations, reserve-null-pointer
440-504Profile path, entry list, compile-functions list, preserve-relocs, warn-lmem/spills/double, fast-compile, maxntid dims, minnctapersm, override-directive-values, sw1729687-sw200764156, sw4936628, suppress-stack-size-warning, extensible-whole-program
505-576extensible-whole-program, uumn, noFwdPrg, force-rela, assume-extern-functions-do-not-sync
580-612CUDA API version major/minor, smem-reservation flags, PIC, disable-fast-video-emulation, suppress-async/sparse warnings, query-controls, apply-controls, verbose-tkinfo
613-680disable-fast-video-emulation, advisory suppressions, query/apply-controls, membermask flags, g/gno-tensor-memory-access-check, compiler-annotations, sanitize, blocks-are-clusters, enable-extended-smem, cuda32f3056bbb, sw4915215, nv-host

Interaction with the Compilation Driver

sub_1112F30 (the main compilation driver) reads options from the compiler-state structure populated by sub_1104950. Key interactions:

  • Cache policies at offsets 304-316 configure the codegen's default and forced load/store cache modifiers. These flow into sub_110AA30 (per-function codegen init) which reads them to set up instruction-level cache annotations.
  • compile-as-tools-patch (offset 343) triggers a special mode where textures, surfaces, samplers, and constants are handled differently. maxrregcount is clamped to ABI minimum.
  • position-independent-code (offset 606) enables PIC code generation. The driver reads this along with compile-only (342) and extensible-whole-program (505) to determine the compilation model.
  • device-debug (offset 288) causes sub_1102070 to be called, which configures debug-mode codegen. When device-debug is set, verbose-tkinfo is auto-enabled.
  • opt-level 0 at offset 148 forces sp-bounds-check on and sw2614554 off, overriding explicit settings.
  • split-compile (offset 284) configures the thread pool created by sub_464AE0 for concurrent per-function compilation.

Feature Flag Mapping (sub_1100E50)

After option parsing, sub_1100E50 translates selected options into a feature-flag bitfield consumed by the codegen backend. It reads the SM version and option bytes from the compiler-state structure, then calls sub_16E3AA0(feature_table, feature_id, bool_value) for approximately 30 feature flags:

Feature IDSourceCondition
3SM versionSM > 7 (sm_60+)
4Option flag 989Explicit option
5SM versionSM > 10 (sm_75+)
7Option flag 739Explicit option
8Option flag 944Explicit option
9extensible-whole-program (889)Explicit option
10Option flag 784Explicit option
18Option flag 997Explicit option
19compile-as-tools-patch (727)Explicit option
20Option flag 1026Explicit option
21Option flag 1027Explicit option
33SM version + debugSM == 29 or 30 (sm_89/90) and not flag 618

Feature names are retrieved through sub_12B5EF0 and stored as "true"/"false" key-value pairs via sub_448E70 for diagnostic purposes.

Cross-References

Confidence Assessment

Aspect-Level Confidence

AspectConfidenceBasis
Builder function address (0x1103030)HIGHFile decompiled/sub_1103030_0x1103030.c exists (1,249 lines); 108 sub_42F130 registration calls enumerated
Parser function address (0x1104950)HIGHFile decompiled/sub_1104950_0x1104950.c exists (1,208 lines); 138 sub_42E390/sub_42E580 extraction calls verified
Compilation driver (0x1112F30)HIGHFile decompiled/sub_1112F30_0x1112f30.c exists
Feature flag configurator (0x1100E50)HIGHFile decompiled/sub_1100E50_0x1100e50.c exists
State offsets (a3+N)HIGHEach offset read directly from sub_42E390 calls in decompiled sub_1104950; byte-level accuracy
Type codes, multiplicity, defaultsHIGHRead from sub_42F130 positional parameters in decompiled sub_1103030
Option count (108 options)HIGHExhaustive enumeration of sub_42F130 calls in sub_1103030 (excluding options registered only in PTX-frontend context)
Dependency validation rulesHIGHEach rule corresponds to a specific if-block in sub_1104950 with sub_42E580 (was-specified) checks
Ofast-compile cascading effectsHIGHControl flow in sub_1104950 explicitly forces -O0/-O1 and cloning off based on string comparison
SM version derivation tableMEDIUMv23 range-to-architecture mapping inferred from conditional branches; exact SM-to-v23 mapping not exhaustively enumerated
Feature flag mapping (sub_1100E50)MEDIUMFeature IDs and source conditions read from decompiled code; feature names not directly visible (retrieved via sub_12B5EF0 at runtime)

Per-Option Verification

Each row below records a direct sub_42F130(..., "<long-name>", ...) call in decompiled/sub_1103030_0x1103030.c. The "Evidence" column identifies where the long-name string lives in nvlink_strings.json or, when the name only appears as a fragment of a longer help/format string, indicates substring. Three options (uumn, cimm, slr) are direct literals in the decompiler output but have no visible standalone entry in the string-table dump -- the backing storage is deduped/overlapped in .rodata but the C-string MUST exist to satisfy the call. All 108 options are additionally verified via their extraction sites in sub_1104950 (sub_42E390 calls writing to a3+N). The "Shared" column marks options that also appear in the ptxas CLI Options page (the standalone ptxas tool), indicating the option surface is shared between the embedded and standalone ptxas binaries.

OptionConfidenceEvidenceShared w/ ptxas
suppress-stack-size-warningHIGHstring at 0x1d32450yes
keyHIGHsubstring in help/format stringsno
okeyHIGHsubstring in help/format stringsno
ptx-lengthHIGHsubstring in help/format stringsyes
entryHIGHsubstring in help/format stringsyes
compile-functionsHIGHstring at 0x1ee929fyes
input-as-stringHIGHsubstring in help/format stringsyes
verboseHIGHstring at 0x1d3256eyes
list-versionHIGHstring at 0x1ee92c9yes
uumnHIGHdirect literal in sub_1103030 line 130; no standalone string entryno
warn-on-local-memory-usageHIGHstring at 0x1ee92e6yes
warn-on-spillsHIGHstring at 0x1ee930dyes
warn-on-double-precision-useHIGHstring at 0x1ee932eyes
compiler-statsHIGHstring at 0x1ee9359yes
compiler-stats-fileHIGHstring at 0x1ee9383yes
fdevice-time-traceHIGHstring at 0x1ee93a5yes
use-trace-pidHIGHstring at 0x1ee93b8yes
ftrace-phase-afterHIGHstring at 0x1ee93d7yes
dont-merge-basicblocksHIGHstring at 0x1ee93f6yes
return-at-endHIGHstring at 0x1ee9415yes
cimmHIGHdirect literal in sub_1103030 line 277; no standalone string entryno
disable-optimizer-constantsHIGHstring at 0x1ee9441yes
no-fastregHIGHstring at 0x1ee945dyes
disable-smem-reservationHIGHstring at 0x1d3259dyes
maxrregcountHIGHsubstring in help/format strings (embedded in -maxrregcount=%d at 0x1d33ec6 and func-maxrregcount at 0x1ee9496)yes
minnctapersmHIGHsubstring in help/format stringsyes
maxntidHIGHsubstring in help/format stringsyes
override-directive-valuesHIGHstring at 0x1ee947cyes
device-function-maxrregcountHIGHstring at 0x1ee94a8yes
register-usage-levelHIGHstring at 0x1ee912byes
device-debugHIGHsubstring in help/format stringsyes
suppress-debug-infoHIGHsubstring in help/format stringsyes
generate-line-infoHIGHsubstring in help/format stringsyes
sp-bounds-checkHIGHstring at 0x1ee94d3yes
device-stack-protectorHIGHstring at 0x1d32891yes
device-stack-protector-frame-size-thresholdHIGHstring at 0x1d33b68yes
debug-infoHIGHsubstring in help/format stringsno
link-infoHIGHstring at 0x1ee94ffyes
opt-levelHIGHsubstring in help/format stringsyes
Ofast-compileHIGHsubstring in help/format stringsyes
fastimulHIGHstring at 0x1ee9509no
output-fileHIGHstring at 0x1d32482yes
gpu-nameHIGHstring at 0x1ee9534yes
suppress-double-demote-warningHIGHstring at 0x1eeb108yes
force-externalsHIGHstring at 0x1ee954dyes
profile-optionsHIGHstring at 0x1ee955dyes
abi-compileHIGHstring at 0x1ee958ayes
def-load-cacheHIGHstring at 0x1ee95a1yes
def-store-cacheHIGHstring at 0x1ee95b5yes
force-load-cacheHIGHstring at 0x1ee95cayes
force-store-cacheHIGHstring at 0x1ee95e0yes
machineHIGHstring at 0x1d324b2yes
opt-pointersHIGHstring at 0x1ee95f5yes
warning-as-errorHIGHstring at 0x1d32625yes
disable-warningsHIGHstring at 0x1d325f0yes
cloningHIGHstring at 0x1ee917byes
compile-onlyHIGHsubstring in help/format stringsyes
compile-as-tools-patchHIGHsubstring in help/format stringsyes
slrHIGHdirect literal in sub_1103030 line 734; no standalone string entryno
optimize-float-atomicsHIGHstring at 0x1ee962byes
preserve-relocsHIGHsubstring in help/format stringsyes
make-errors-visible-at-exitHIGHstring at 0x1ee9642yes
reserve-null-pointerHIGHsubstring in help/format stringsno
dont-reserve-null-pointerHIGHstring at 0x1d32583yes
fast-compileHIGHsubstring in help/format stringsyes
sw2614554HIGHsubstring in help/format stringsyes
sw2837879HIGHsubstring in help/format stringsyes
sw1729687HIGHsubstring in help/format stringsyes
sw200428197HIGHsubstring in help/format stringsyes
sw200387803HIGHsubstring in help/format stringsyes
sw200764156HIGHsubstring in help/format stringsyes
sw4575628HIGHsubstring in help/format stringsyes
set-texmode-rawHIGHstring at 0x1ee96e3yes
fdcmptHIGHsubstring in help/format stringsyes
cuda-api-versionHIGHsubstring in help/format stringsyes
noFwdPrgHIGHsubstring in help/format stringsno
assume-extern-functions-do-not-syncHIGHstring at 0x1eeb6e8yes
legacy-bar-warp-wide-behaviorHIGHstring at 0x1ee96f3yes
disable-fast-video-emulationHIGHstring at 0x1ee9711yes
suppress-async-bulk-multicast-advisory-warningHIGHstring at 0x1eeb8c0yes
suppress-sparse-mma-advisory-infoHIGHstring at 0x1eeb928yes
limit-fold-fpHIGHstring at 0x1ee974byes
sanitizeHIGHstring at 0x1ee9759yes
split-compileHIGHsubstring in help/format strings (embedded in -split-compile=%d at 0x1d3229b)yes
jobserverHIGHstring at 0x1ee9777yes
fmadHIGHsubstring in help/format stringsyes
allow-expensive-optimizationsHIGHstring at 0x1ee979fyes
extensible-whole-programHIGHsubstring in help/format stringsyes
force-relaHIGHstring at 0x1d326c2yes
position-independent-codeHIGHstring at 0x1ee97c1yes
verbose-tkinfoHIGHstring at 0x1d32744yes
no-membermask-overlapHIGHsubstring in help/format stringsyes
membermask-overlapHIGHsubstring in help/format stringsyes
print-potentially-overlapping-membermasksHIGHstring at 0x1eebe68yes
g-tensor-memory-access-checkHIGHstring at 0x1ee97efyes
gno-tensor-memory-access-checkHIGHstring at 0x1eec010yes
compiler-annotationsHIGHstring at 0x1ee982byes
sw4915215HIGHsubstring in help/format stringsyes
sw4936628HIGHstring at 0x1ee9851yes
blocks-are-clustersHIGHsubstring in help/format stringsyes
enable-extended-smemHIGHstring at 0x1d328e8yes
cuda32f3056bbbHIGHstring at 0x1ee986cno
nv-hostHIGHsubstring in help/format strings (embedded in --nv-host at 0x1eec1bd)no
tool-nameHIGHstring at 0x1d3291byes
helpHIGHsubstring in help/format stringsyes
versionHIGHsubstring in help/format stringsyes
options-fileHIGHstring at 0x1d3293byes
trap-into-debuggerHIGHstring at 0x1d3294fyes

Summary: 108 options enumerated. 105/108 have direct string evidence in nvlink_strings.json (either exact or embedded as substring in a longer help/format string). 3/108 (uumn, cimm, slr) appear only as direct string literals in the decompiled sub_1103030 calls -- these are short 3-4 character names whose storage is deduped or tail-overlapped with longer strings, invisible to a naive standalone-string dumper but provably present at the call site. 97/108 options (90%) are shared with the standalone ptxas tool documented in the ptxas CLI Options page, confirming that the embedded ptxas in nvlink is substantially the same option-parsing codebase as the separate binary. The nvlink-only options (key, okey, ptx-length, input-as-string -- wait, these ARE shared; the true nvlink-only set is: uumn, cimm, fastimul, slr, reserve-null-pointer, debug-info, nv-host, noFwdPrg, cuda32f3056bbb) are either obsolete debugging flags or NVIDIA-internal undocumented behavior switches.