Linker Script Generation
nvlink can generate GNU ld linker scripts that instruct the host linker to preserve CUDA-specific ELF sections during host linking. When nvcc compiles a CUDA program, device code is embedded in the host object files inside special sections. Without a linker script that names these sections, the host linker silently discards them. The -ghls (--gen-host-linker-script) option activates this code path, which bypasses the entire device linking pipeline and instead produces a linker script fragment (or a complete augmented script) and exits.
This feature exists because nvcc's driver needs a linker script at host link time. Rather than shipping a static script, nvcc invokes nvlink -ghls to generate one dynamically, accounting for the host toolchain's default script and architecture-specific flags.
| CLI option | -ghls / --gen-host-linker-script with values lcs-aug or lcs-abs |
| Mode variable | dword_2A77DC0 at 0x2A77DC0 (values 1 or 2) |
| Parsed value storage | qword_2A5F1D0 (pointer to the option string) |
| Default value | lcs-abs (when -ghls is given without an argument) |
| Implementation | main() at 0x409800, lines 1743-1936 of the decompiled output |
| Shell execution | sub_42FA70 at 0x42FA70 (a system() wrapper) |
| Template size | 130 bytes (0x82) |
| Template rodata address | 0x1d34450 (shared by all three fwrite call sites) |
| Host compiler dword | ::src global (populated from --host-ccbin, default "gcc") |
| Shared flag | byte_2A5F1D8 (--shared was seen) |
| Relocatable flag | byte_2A5F1E8 (-r was seen) |
| Machine width | dword_2A5F30C (32 or 64, or 0 if unset) |
| Verbose flag | byte_2A5F2D8 (--verbose enables #$ trace) |
| Xlinker list head | qword_2A5F2E8 (linked list of pre-composed ld flags) |
The SECTIONS Template
All code paths share a single hardcoded 130-byte string that defines three CUDA-specific host ELF sections:
SECTIONS
{
.nvFatBinSegment : { *(.nvFatBinSegment) }
__nv_relfatbin : { *(__nv_relfatbin) }
.nv_fatbin : { *(.nv_fatbin) }
}
The fwrite call uses size 1 and count 0x82 (130), which is the exact byte length of this string excluding the null terminator. The string is referenced from three separate fwrite calls in the binary, all pointing to the same data address.
Note the trailing space after *(__nv_relfatbin) } on the second section line -- this is present in the binary and is written verbatim.
The Three Sections
| Section name | ELF convention | Description |
|---|---|---|
.nvFatBinSegment | Standard dotted name | Contains the embedded fatbin blob -- the concatenation of device code compiled for all target GPU architectures. This is the primary container that the CUDA runtime locates at program startup. |
__nv_relfatbin | Non-dotted (double underscore prefix) | Contains a relocatable reference to the fatbin data. The CUDA runtime's registration mechanism (__cudaRegisterFatBinary) uses this section to locate the fatbin at load time. The section data begins with the fatbin magic 0xBA55ED50 followed by a size field. |
.nv_fatbin | Standard dotted name | Alternative fatbin container used in certain linking configurations (e.g., relocatable linking with -r). Provides a secondary location for fatbin data when the primary .nvFatBinSegment is not suitable. |
Without these linker script entries, GNU ld treats these as unknown sections and either discards them or merges them incorrectly during host linking. The script ensures they appear as distinct, named output sections in the host executable.
The consumer-side function sub_476D90 at 0x476D90 validates that a host ELF contains these sections by calling sub_476EC0 (a section-name predicate) for each of .nvFatBinSegment, __nv_relfatbin, and .nv_fatbin. It then extracts the __nv_relfatbin data and verifies the fatbin magic at offset 0.
Mode 1: Standalone Fragment (lcs-aug)
Mode 1 is triggered by -ghls=lcs-aug and produces the SECTIONS template as a standalone fragment. The mode variable dword_2A77DC0 is set to 1.
Behavior
When -o is specified, the script is written to the output file in truncate mode ("w"):
// main() line 1830-1848
if (dword_2A77DC0 == 1) {
if (filename) {
FILE *f = fopen(filename, "w");
if (!f)
fatal_error(&unk_2A5B710, filename, ...);
fwrite(SECTIONS_TEMPLATE, 1, 0x82, f);
fclose(f);
exit(0);
}
// fall through to stdout path
}
When -o is not specified, execution falls through to the common stdout path at line 1925, which writes the same template to stdout and exits.
The lcs-aug name stands for "linker-script augmentation." The output is a fragment meant to be appended to an existing linker script by the caller (nvcc), not used as a complete script on its own.
Mode 2: Full Augmented Script (lcs-abs)
Mode 2 is triggered by -ghls=lcs-abs (or -ghls with no argument, since lcs-abs is the default). The mode variable dword_2A77DC0 is set to 2. This mode extracts the host linker's built-in default script, appends the CUDA SECTIONS block, and validates the result. It is a five-step pipeline that shells out to gcc, ld, grep, and sed.
Step 1: Build the Host Compiler Verbose Command
Before the mode dispatch, all linker script modes share a command-construction path. The base compiler comes from --host-ccbin (stored in ::src), defaulting to "gcc":
char *cmd = host_ccbin;
if (!host_ccbin)
cmd = "gcc";
The string " -v --verbose" is appended via a 16-byte SSE store (xmmword_1D34770). Then, depending on the link flags:
- If
--sharedis active (byte_2A5F1D8): appends" -shared " - If
-ris active (byte_2A5F1E8): appends" -r " - If
--machine=64(dword_2A5F30C == 64): appends" -m64 " - If
--machine=32(dword_2A5F30C == 32): appends" -m32 "
The --shared and -r flags are mutually exclusive. The -shared check takes priority (tested first).
Step 2: Build the collect2 Detection Pipeline
The code appends a shell pipeline that extracts linker flags from the compiler's verbose output:
<compiler> -v --verbose [-shared|-r] [-m64|-m32] \
2>&1 | grep collect2 \
| grep -wo -e -pie \
-e "-z [^[:space:]]*" \
-e "-m [^[:space:]]*" \
-e -r \
-e -shared \
| tr "\n" " "
The decompiled string constant (line 1818-1820):
" 2>&1 | grep collect2 | grep -wo -e -pie -e \"-z [^[:space:]]*\" "
"-e \"-m [^[:space:]]*\" -e -r -e -shared | tr \"\\n\" \" \" "
This pipeline works because:
gcc -v --verboseprints the complete compiler invocation sequence to stderr, including the internal call tocollect2(GCC's wrapper aroundld).grep collect2isolates the line containing the actual linker invocation.- The second
grep -woextracts only the architecture-significant flags:-pie,-z <arg>(e.g.,-z relro,-z now),-m <arg>(e.g.,-m elf_x86_64),-r, and-shared. tr "\n" " "joins the extracted flags into a single space-separated string.
The entire pipeline is wrapped in $(...) for shell command substitution, so the extracted flags become arguments to the subsequent ld --verbose call:
// Line 1821-1828: wrap in $(...) for substitution
strcpy(wrapper, "$(");
strcat(wrapper, pipeline);
// Append closing ')' -- written as *(_WORD*) = 41 (ASCII ')')
Step 3: Extract the Host Linker Default Script
The extracted flags are prepended to an ld --verbose invocation:
ld --verbose $(extracted_flags) \
| grep -Fvx -e "$(ld -V)" \
| sed '1,2d;$d' \
> <output_file>
The decompiled construction (lines 1858-1878):
// Build: "ld --verbose " + collect2_flags
strcpy(buf, "ld --verbose ");
strcat(buf, collect2_flags);
// Append filter pipeline
strcat(buf, " | grep -Fvx -e \"$(ld -V)\" | sed '1,2d;$d' > ");
// Append output destination
if (filename)
strcat(buf, filename);
else
strcat(buf, "/dev/stdout"); // hex: 0x6474732F7665642F + "out"
The pipeline steps:
-
ld --verbose $(flags)-- When invoked with--verbose,ldprints its built-in default linker script between two===banner lines. The$(flags)substitution passes the architecture flags extracted in step 2, ensuringldselects the correct default script for the target configuration (e.g., 64-bit, PIE, shared). -
grep -Fvx -e "$(ld -V)"-- Removes the version string thatld -Voutputs. The-Fflag treats the pattern as a fixed string,-vinverts the match (remove matching lines), and-xrequires the entire line to match. This stripsld's version identification from the output. -
sed '1,2d;$d'-- Deletes the first two lines (the opening===banner and blank line) and the last line (the closing===banner), leaving just the script body. -
Output -- Written to the file specified by
-o, or to/dev/stdoutif-ois not given. The/dev/stdoutpath is constructed via two hex-encoded memory stores:0x6474732F7665642Fdecodes to/dev/std(little-endian) andbyte_74756Fcontributesout.
The command is executed via sub_42FA70 (the system() wrapper at 0x42FA70). If --verbose is enabled, the command string is printed to stderr as #$ <command> before execution.
After execution, the intermediate buffers are freed via sub_431000 (arena free).
Step 4: Append the CUDA Sections
If step 3 succeeded (return code 0) and -o was specified, the output file is reopened in append mode and the SECTIONS template is appended:
// main() line 1892-1907
if (filename) {
FILE *f = fopen(filename, "a"); // append mode
if (!f)
fatal_error(&unk_2A5B710, filename, ...);
fwrite(SECTIONS_TEMPLATE, 1, 0x82, f);
fclose(f);
The result is a complete linker script: the host linker's default script (all the standard section definitions, entry point, memory layout) followed by the three CUDA-specific section definitions. This augmented script can be passed to ld -T to replace its built-in script entirely.
Step 5: Validate with ld -T
After appending, the generated script is validated by invoking ld with the -T flag:
ld -T <output_file> 2>&1 | grep 'no input files' > /dev/null
The decompiled construction (lines 1909-1919):
strcpy(buf, "ld -T ");
strcat(buf, filename);
strcat(buf, " 2>&1 | grep 'no input files' > /dev/null");
The validation logic is inverted: since no object files are provided, a syntactically valid script will cause ld to emit the error "no input files". The grep succeeds (exit 0), and sub_42FA70 returns 0 -- indicating the script is well-formed. If the script has syntax errors, ld emits a different error message, grep fails (exit 1), and the validation fails.
On validation success, the linker proceeds to exit with code 0. On failure, execution jumps to LABEL_23, which calls sub_467460(&unk_2A5B750, ...) to emit a fatal error.
If --verbose is enabled, the validation command is also printed to stderr.
The ld --verbose Pipeline
Mode 2 (lcs-abs) is implemented as two sequential shell commands whose combined effect is to produce a syntactically valid, architecture-correct, CUDA-augmented linker script. The rest of this section reconstructs the full pipeline from the decompiled main() at 0x409800 (lines 1743-1923) and the string constants at 0x1d3415a ("ld --verbose "), 0x1d3416f ("ld -T "), 0x1d343d8 (collect2 filter pipeline), 0x1d34450 (SECTIONS template), 0x1d34508 (grep+sed tail), and 0x1d34770 (the " -v --verbose" xmmword loaded via _mm_load_si128).
Data Flow Diagram
+---------------------------------------------------+
| main() mode dispatch (dword_2A77DC0 == 2) |
+---------------------------------------------------+
|
v
+---------+ |
| ::src |---- "gcc" (default) ------->|
| (ccbin) | |
+---------+ |
v
+---------------------------------------------------+
| Step 1: Compose compiler verbose command |
| (sub_426AA0 arena allocations) |
| |
| <ccbin> + " -v --verbose" (xmmword @1D34770) |
| if byte_2A5F1D8: append " -shared " |
| elif byte_2A5F1E8: append " -r " |
| if dword_2A5F30C==64: append " -m64 " |
| elif dword_2A5F30C==32: append " -m32 " |
+---------------------------------------------------+
|
v
+---------------------------------------------------+
| Step 2: Append collect2 filter pipeline |
| (string @1D343D8, 119 bytes) |
| |
| " 2>&1 | grep collect2 | grep -wo |
| -e -pie -e \"-z [^[:space:]]*\" |
| -e \"-m [^[:space:]]*\" -e -r -e -shared |
| | tr \"\\n\" \" \" " |
| |
| wrap with "$(" ... ")" |
+---------------------------------------------------+
|
v (call 1: extraction)
+---------------------------------------------------+
| Step 3: Build ld --verbose command |
| (string @1D3415A = "ld --verbose ") |
| |
| "ld --verbose " + $(<compiler filter pipeline>)|
| + " | grep -Fvx -e \"$(ld -V)\" |
| | sed '1,2d;$d' > " |
| + (::filename OR "/dev/stdout") |
+---------------------------------------------------+
|
v
sub_42FA70 -> system() [returns 0 on ok]
|
v
+---------------------------------------------------+
| Step 4: Append CUDA SECTIONS (130 bytes @1D34450) |
| |
| fopen(::filename, "a") |
| fwrite(SECTIONS_TEMPLATE, 1, 0x82, f) |
| fclose(f) |
+---------------------------------------------------+
|
v (call 2: validation)
+---------------------------------------------------+
| Step 5: Validate with ld -T |
| (string @1D3416F = "ld -T ") |
| |
| "ld -T " + ::filename |
| + " 2>&1 | grep 'no input files' > /dev/null" |
+---------------------------------------------------+
|
v
sub_42FA70 -> system() [returns 0 on ok]
|
v
exit(0)
Exact Command Strings
The binary stores four literal shell-command fragments that, when assembled, form the two commands Mode 2 executes. The following table gives each fragment's rodata address, byte length, and the main() xref that reads it.
| Rodata addr | Length | Value (as stored) | xref site |
|---|---|---|---|
0x1D34770 | 16 | " -v --verbose" (xmmword, _mm_load_si128) | main+... (line 1784) |
0x1D343D8 | 119 | " 2>&1 | grep collect2 | grep -wo -e -pie -e \"-z [^[:space:]]*\" -e \"-m [^[:space:]]*\" -e -r -e -shared | tr \"\\n\" \" \" " | main+... (line 1818-1820) |
0x1D3415A | 13 | "ld --verbose " | main+0x199 (main+0x409999) |
0x1D34508 | 46 | " | grep -Fvx -e \"$(ld -V)\" | sed '1,2d;$d' > " | main+... (line 1864) |
0x1D3416F | 6 | "ld -T " | main+0x2E8 (main+0x409AE8) |
0x1D34450 | 130 | The SECTIONS { ... } template (shared by three fwrite calls) | three sites: lines 1838, 1898, 1926 |
0x1D34168 | 6 | "#$ %s\n" (verbose trace prefix) | main+... and sub_42FA70+0xBE |
The two composed commands that actually reach system() are reproduced below. Placeholders in angle brackets are the runtime-substituted values from the option parser.
Extraction command (arena buffer chain v18 -> v19 -> v20 -> v21 -> v24|v319):
ld --verbose $( <ccbin> -v --verbose [-shared|-r] [-m64|-m32] \
2>&1 | grep collect2 \
| grep -wo -e -pie \
-e "-z [^[:space:]]*" \
-e "-m [^[:space:]]*" \
-e -r \
-e -shared \
| tr "\n" " " ) \
| grep -Fvx -e "$(ld -V)" \
| sed '1,2d;$d' \
> <output_file_or_/dev/stdout>
Validation command (arena buffer chain v38 -> v39 -> v40 -> v41):
ld -T <output_file> 2>&1 | grep 'no input files' > /dev/null
Parser "State Machine"
nvlink does not parse the ld --verbose output itself -- all parsing is delegated to grep, sed, and tr. The effective state machine that the shell pipeline implements is a three-stage line filter. Each stage reads lines from stdin and writes accepted lines to stdout; a line that survives all three stages lands in the output file.
stdin (ld --verbose output)
|
| S0 (line 1): "GNU ld (GNU Binutils for <distro>) <version>"
| S1 (line 2): " Supported emulations: ..."
| S2 (line 3): "using internal linker script:"
| S3 (line 4): "=================================================="
| S4 (lines 5..N-1): <script body, possibly preceded by blank>
| S5 (line N): "=================================================="
| S6 (line N+1): NUL (EOF)
|
v
+-----------------------------------------------------------+
| Stage A: grep -Fvx -e "$(ld -V)" |
| |
| Drops every line whose entire content equals any line of |
| the multi-line "ld -V" version banner. `-F` = fixed |
| string, `-v` = invert, `-x` = whole line. Typical "ld -V" |
| output spans 4-5 lines (version, copyright, supported |
| emulations), which are therefore erased from the main |
| "--verbose" output. Effect: suppresses the redundant |
| version preamble that otherwise leaks into the script. |
+-----------------------------------------------------------+
|
v
+-----------------------------------------------------------+
| Stage B: sed '1,2d;$d' |
| |
| - '1,2d' -- delete lines 1 and 2 (the "using internal |
| linker script:" header and the opening |
| "===" banner). |
| - '$d' -- delete the final line (the closing "==="). |
| |
| After stage A has already removed the version banner, |
| stage B strips the three remaining decorative lines, |
| leaving only the verbatim body of ld's built-in script. |
+-----------------------------------------------------------+
|
v
+-----------------------------------------------------------+
| Stage C: shell redirect ">" |
| |
| Truncating write to <output_file> or /dev/stdout. No |
| filtering; just byte copy. |
+-----------------------------------------------------------+
|
v
stdout (script body, terminated by trailing newline from ld)
The decorative lines that stages A and B together eliminate are exactly:
GNU ld (GNU Binutils for <distro>) <version>(banner line 1)- The blank or
Supported emulations: ...line (banner line 2) using internal linker script:(the label that introduces the script)==================================================(opening separator)==================================================(closing separator)
Lines 1, 2, and 4 are the targets of sed '1,2d;$d' after stage A, while the multi-line version banner ld -V that stage A removes is a near-superset of lines 1-2 of the --verbose output -- the authors used belt-and-suspenders filtering because ld -V on some distros emits a single line and on others emits several, and the exact line counts differ by binutils version. Running both filters ensures that whichever lines escape grep -Fvx are still caught by sed '1,2d;$d'.
Template Transformation Rules
The pipeline applies a purely additive transformation. The extracted script body is preserved verbatim; only a fixed 130-byte SECTIONS block is appended. No token rewriting, no macro substitution, no path editing occurs. The rules are:
| Rule | Applied to | Effect |
|---|---|---|
| R1: Strip version preamble | ld --verbose output | Stage A + B of the filter pipeline delete banner lines |
| R2: Preserve body verbatim | ld --verbose output | No other edits to lines that survive filtering |
| R3: Append SECTIONS block | Filtered output file | 130-byte template concatenated via fopen(file, "a") |
| R4: No OUTPUT_FORMAT rewrite | Filtered output file | The OUTPUT_FORMAT(...) and OUTPUT_ARCH(...) directives from the default script are kept unchanged; architecture consistency is already guaranteed by the collect2 flag extraction in step 2 |
| R5: No INSERT directive | Appended template | The CUDA SECTIONS block is concatenated at the end of the file; it relies on the fact that a standalone SECTIONS { ... } block in GNU ld augments (does not replace) the implicit output sections |
Because rule R5 is subtle, it is worth restating: when GNU ld sees a linker script containing a standalone SECTIONS { ... } block in addition to a full default script body, it processes the two as consecutive SECTIONS commands. Output sections from the first block are placed first, then the second block's sections are appended. This is the mechanism that allows .nvFatBinSegment, __nv_relfatbin, and .nv_fatbin to land in the output image without colliding with the default .text, .data, and .bss placement.
Because rule R1 strips the version banner, the output file is syntactically a pure linker-script fragment -- it is legal as a -T argument. The validation step in Step 5 relies on this: if the filter failed to strip the banner (e.g., because ld -V returned unexpected output), the extra non-script text would trigger a syntax error and ld -T would not print the expected no input files message, causing the grep to return non-zero and failing validation.
Where the Result Goes
The pipeline never pipes the generated script into a child process via stdin. It always materializes the script to a file (or /dev/stdout), and it is the calling driver (nvcc) that later passes the file path as an argument to the real host linker invocation.
The two possible destinations are:
::filename value | Redirect target | Subsequent consumer |
|---|---|---|
non-NULL (-o <file> was given) | the literal filename | nvcc passes -Wl,-T,<file> (or equivalently -T <file> to collect2) when it performs the host link |
| NULL | /dev/stdout (0x6474732F7665642F + byte_74756F) | Whoever invoked nvlink -ghls captures stdout -- typically a pipe or here-doc inside the nvcc driver |
The /dev/stdout path is constructed via two hex-encoded memory stores: the 8-byte immediate 0x6474732F7665642F decodes to "/dev/std" (little-endian), and the 4-byte tail byte_74756F contributes "out\0". This avoids shipping a separate /dev/stdout string in rodata, saving a few bytes and ensuring the path cannot be relocated by a rodata patcher.
Note the asymmetry between Mode 1 and Mode 2:
- In Mode 1, the SECTIONS template is written directly to
stdout(viafwrite(..., stdout)at line 1925) when no output file is given. No shell command is invoked. - In Mode 2, the
ld --verbosepipeline's redirect points at/dev/stdout(not the C stdio stream), and the subsequent SECTIONS append occurs throughfopen(::filename, "a"). When::filenameis NULL in Mode 2, the append-step'sfopenpath is skipped -- the only output the user sees is the (already-redirected)ld --verbosebody from Step 3, without the CUDA SECTIONS block. This is a latent inconsistency in the decompiled code: Mode 2 with no-oproduces an incomplete script that lacks the CUDA sections. In practicenvccalways supplies-owhen invokingnvlink -ghls=lcs-abs, so the buggy branch is never exercised.
Who Consumes the Script
The generated script is consumed by the host linker driver, not directly by ld. The chain is:
nvccinvokesnvlink -ghls=lcs-abs -o /tmp/<stem>.ld --host-ccbin <cc> [link-flags]nvlinkexecutes the pipeline described above, producing/tmp/<stem>.ldnvccthen invokes the host compiler as a linker driver:<cc> -Wl,-T,/tmp/<stem>.ld host1.o host2.o ... -lstdc++ -lcudart ...- The host compiler's internal
collect2forwards the-Ttold ldreads/tmp/<stem>.ldin place of its built-in default script (the-Tflag, unlike-dT, fully replaces the default script)
Step 5 is why nvlink goes to the trouble of extracting ld's built-in default script: a -T script must be complete on its own, not a fragment, and the default script is architecture-dependent (32 vs 64 bit, PIE vs non-PIE, shared vs executable). Mode 1 (lcs-aug) skips extraction because the caller intends to use -dT (the "augment" form) or to splice the fragment into a larger script manually.
The --Xlinker / --host-linker-options Path
When --host-linker-options (short form --Xlinker) is specified, the command construction in step 1-2 takes an alternative path. Instead of building the gcc -v --verbose + collect2 pipeline, the code iterates through the linked list of --Xlinker values (qword_2A5F2E8) and concatenates them into the command string directly:
// main() lines 1746-1776
if (qword_2A5F2E8) {
// Iterate linked list: each node has [next_ptr, value_string]
node = *(qword **)qword_2A5F2E8;
result = *(char **)(qword_2A5F2E8 + 8);
while (node) {
option = (char *)node[1];
// Allocate and concatenate
buf = arena_alloc(strlen(result) + strlen(option) + 1);
strcpy(buf, result);
result = strcat(buf, option);
node = (qword *)*node;
}
// result now contains all Xlinker options concatenated
}
This path bypasses the collect2 detection entirely. The -Xlinker values are treated as pre-composed ld flags, and the mode 2 pipeline uses them directly in the ld --verbose invocation. The option help text describes this as "Specify options directly to the host linker (ignored by nvlink)" -- the options are not used during device linking, only during linker script generation.
Error Handling
Three error conditions are handled:
| Condition | Error source | Behavior |
|---|---|---|
| Cannot open output file | fopen returns NULL | sub_467460(&unk_2A5B710, filename, ...) -- fatal error with filename |
| Shell command fails | sub_42FA70 returns nonzero | sub_467460(&unk_2A5B750, ...) -- fatal error for invalid script generation |
| Validation fails | ld -T grep returns nonzero | Same error via LABEL_23 -- the generated script is malformed |
All errors route through the standard error reporting system (sub_467460). The error at unk_2A5B750 is specific to linker script generation failure. The error at unk_2A5B710 is the generic "cannot open file" error shared with other output paths.
An unexpected mode value (anything other than 1 or 2 when the linker script path is entered) triggers sub_467460(&unk_2A5B750, ...) as a defensive check. This is unreachable in practice since the mode is only set to 1 or 2 by the option parser.
Verbose Trace
When --verbose (byte_2A5F2D8) is enabled, each shell command executed during mode 2 is printed to stderr with the prefix #$ :
if (byte_2A5F2D8)
fprintf(stderr, "#$ %s\n", command);
This affects two commands: the ld --verbose extraction pipeline (line 1881) and the ld -T validation command (line 1916). Mode 1 does not execute any shell commands, so verbose has no effect there.
Mutual Exclusion with Input Files
If -ghls is specified alongside input files (qword_2A5F330 != NULL), the option parser emits a fatal error via sub_467460(&unk_2A5B760, ...). Linker script generation is a standalone operation and cannot be combined with device linking. This check is performed in nvlink_parse_options at 0x427AE0 immediately after setting the mode variable.
When nvcc Uses This
The linker script generation feature is invoked by nvcc's driver during host linking of CUDA programs. The typical sequence is:
nvcccompiles device code to fatbins and embeds them in host.ofiles- Before host linking,
nvccinvokesnvlink -ghls=lcs-abs -o /tmp/script.ld --host-ccbin <compiler> [--shared] [-m64|-m32] nvlinkgenerates the augmented script and exitsnvccpasses-T /tmp/script.ldto the host linker (ldorcollect2)- The host linker preserves
.nvFatBinSegment,__nv_relfatbin, and.nv_fatbinsections in the output executable - At runtime,
__cudaRegisterFatBinarylocates the fatbin data via these sections
The lcs-aug mode is available for cases where nvcc wants only the CUDA fragment (to manually splice into an existing script), but the default lcs-abs mode is what nvcc typically uses for standard compilation flows.
Function Cross-Reference
| Function | Address | Role in linker script generation |
|---|---|---|
main | 0x409800 | Contains the entire linker script generation logic (lines 1743-1936) |
nvlink_parse_options | 0x427AE0 | Parses -ghls, sets dword_2A77DC0, validates mutual exclusion |
sub_42FA70 | 0x42FA70 | system() wrapper -- executes the shell pipelines |
sub_426AA0 | 0x426AA0 | Arena allocator for command string buffers |
sub_431000 | 0x431000 | Arena free -- releases intermediate buffers |
sub_467460 | 0x467460 | Fatal error emission |
sub_476D90 | 0x476D90 | Consumer side -- validates host ELF contains the three CUDA sections |
sub_476D80 | 0x476D80 | Predicate -- checks for .nvFatBinSegment section existence |
sub_476EC0 | 0x476EC0 | Section name lookup predicate used by the above |
Cross-References
Internal (nvlink wiki):
- CLI Flags --
-ghls/--gen-host-linker-scriptoption and itslcs-aug/lcs-absargument values - Environment Variables --
--host-ccbinsetting that determines the host compiler used for script generation - Pipeline Entry --
main()lines 1743--1936 where the linker script generation logic resides - Output Phase -- "Host Linker Script Output" sub-section documents this path from the pipeline's perspective (Mode 2 is one of the non-ELF output routes)
- NVIDIA Section Types -- Fatbin sections (
.nvFatBinSegment,__nv_relfatbin,.nv_fatbin) referenced in the SECTIONS template - Host ELF Input -- Host ELF processing that validates the presence of the three CUDA sections
- Fatbin Extraction -- How embedded fatbins in host objects are located and extracted
- Error Reporting --
sub_467460fatal error emission on script generation or validation failure - Memory Arenas -- Arena allocator (
sub_426AA0,sub_431000) for command string buffer management - Library Search -- Another subsystem that composes shell-like arguments from CLI options and environment variables (
-l<name>,-L<dir>,LIBRARY_PATH), sharing the same arena-allocated buffer-chain idiom used by the linker script command builder. Both subsystems funnel through infrastructure wrappers: library search invokessub_42A2D0(archive probing with directopen/readsyscalls), while linker script generation invokessub_42FA70(thesystem()wrapper that shells out togcc,ld,grep, andsed)
Confidence Assessment
| Claim | Confidence | Evidence |
|---|---|---|
| SECTIONS template string with 3 CUDA sections | HIGH | Exact string at 0x1d34450 in strings JSON: "SECTIONS\n{\n\t.nvFatBinSegment : { *(.nvFatBinSegment) }\n\t__nv_relfatbin : { *(__nv_relfatbin) } \n\t.nv_fatbin : { *(.nv_fatbin) }\n}\n" |
| Template size is 130 bytes (0x82) | MEDIUM | String length matches approximately; the fwrite size was inferred from decompiled main() |
-ghls / --gen-host-linker-script option | HIGH | Strings "gen-host-linker-script" at 0x1d327fc and "Input files are not allowed with -ghls option" at 0x1d34e80 |
lcs-aug and lcs-abs mode values | HIGH | String "lcs-aug,lcs-abs" at 0x1d3282d; "lcs-aug" standalone at 0x1d329bd |
system() wrapper at sub_42FA70 | HIGH | Decompiled: calls system(v9) after optional fprintf(stream, "#$ %s\n", a6) for verbose trace |
Verbose trace prefix "#$ " | HIGH | sub_42FA70 decompiled: fprintf(stream, "#$ %s\n", a6) -- exact format |
| collect2 detection pipeline string | HIGH | Exact string at 0x1d343d8: ' 2>&1 | grep collect2 | grep -wo -e -pie -e "-z [^[:space:]]*" -e "-m [^[:space:]]*" -e -r -e -shared | tr "\\n" " "' |
ld --verbose extraction command | HIGH | String "ld --verbose " at 0x1d3415a in strings JSON |
ld -T validation command | HIGH | String "ld -T " at 0x1d3416f |
Validation uses grep 'no input files' | HIGH | Exact string " 2>&1 | grep 'no input files' > /dev/null" at 0x1d34508 |
--host-ccbin option for host compiler | HIGH | String "host-ccbin" at 0x1d3283d |
--shared and -r linker flags | HIGH | String "Percolate the nvcc -shared option" at 0x1d33ad0; -shared and -r in collect2 grep pipeline |
Section names .nvFatBinSegment, __nv_relfatbin, .nv_fatbin | HIGH | Individual strings at 0x1d40770, 0x1d40781, 0x1d40790 in strings JSON |
Consumer function sub_476D90 validates host ELF sections | HIGH | Decompiled file exists; calls sub_476EC0 for section-name lookup |
Mode variable dword_2A77DC0 values 1 and 2 | MEDIUM | Inferred from main() decompiled conditional branching; not directly visible as named constant |
Signal handler in sub_42FA70 checks v10 & 0x7F for tool termination | HIGH | Decompiled: if (__OFSUB__((v10 & 0x7F) + 1, 1)) followed by sub_467460(&unk_2A5BB00, ...) for signal and sub_467460(&unk_2A5BB40, ...) for core dump |
" -v --verbose" is appended via a single 16-byte SSE store from xmmword_1D34770 | HIGH | Decompiled main() line 1784: *v173 = _mm_load_si128((const __m128i *)&xmmword_1D34770); -- 16 bytes exactly matches " -v --verbose\0\0" |
Mode 2 with no -o produces a script without the CUDA SECTIONS block | MEDIUM | Decompiled main() line 1892 gates the append-step behind if (::filename); the else-branch at line 1925 writes only the bare template to stdout, which is dead code for Mode 2 because Step 3 already redirected to /dev/stdout |
ld -T replaces (does not augment) the default script | HIGH | GNU ld documented behavior: -T scriptfile replaces default; -dT or script loaded via INSERT augments. Mode 2 generates a script that is self-contained precisely so -T (not -dT) suffices |
Pipeline consumer chain (nvcc -> host cc -> collect2 -> ld) | MEDIUM | nvcc behavior documented in CUDA toolkit reference; the Mode 2 pipeline's collect2-aware flag extraction step (Step 2) is direct evidence that the eventual consumer of the script is a collect2-invoked ld, not bare ld |
/dev/stdout literal built from two hex stores (0x6474732F7665642F + byte_74756F) | HIGH | Decompiled main() lines 1876-1878: *(_QWORD *)v320 = 0x6474732F7665642FLL; *((_DWORD *)v320 + 2) = (_DWORD)&byte_74756F; -- bytes 2F 76 65 64 2F 73 74 64 decode to /dev/std in little-endian, followed by "out\0" |