Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Memory Spaces

Every CUDA variable that resides in GPU memory belongs to one of four memory spaces: __device__ (global memory), __shared__ (per-block scratchpad), __constant__ (read-only broadcast memory), or __managed__ (unified memory). cudafe++ encodes memory space as a two-byte bitfield at offsets +148 and +149 of the variable entity node. These two bytes are the variable-side analog of the execution space byte at +182 used for functions -- the two systems are complementary but independent.

The memory space bitfield passes through three processing stages. First, attribute handlers in attribute.c set the appropriate bits and enforce mutual exclusion constraints (no __shared__ + __constant__, no thread_local, no grid_constant conflict). Second, declaration processing in decls.c applies additional validation: VLA restrictions for __shared__, constexpr and external-linkage restrictions for __constant__/__device__, and structured binding constraints for all spaces. Third, symbol reference recording in symbol_ref.c checks whether host code illegally accesses device-side variables at reference time.

Memory spaces apply exclusively to variables (entity kind 7). __shared__ and __constant__ have no function-side meaning -- only __device__ (kind 'W', 87) doubles as a function execution space attribute.

Key Facts

PropertyValue
Memory space offsetEntity node byte +148 (3-bit bitfield)
Extended space offsetEntity node byte +149 (1 bit for __managed__)
__device__ handlersub_40EB80 (apply_nv_device_attr, 100 lines, attribute.c)
__managed__ handlersub_40E0D0 (apply_nv_managed_attr, 47 lines, attribute.c:10523)
__shared__ handlerKind 'Z' (90), not individually decompiled; sets +148 |= 0x02
__constant__ handlerKind '[' (91), not individually decompiled; sets +148 |= 0x04
Declaration processorsub_4DEC90 (variable_declaration, 1098 lines, decls.c)
Variable declarationsub_4CA6C0 (decl_variable, 1090 lines, decls.c:7730)
Variable fixupsub_4CC150 (cuda_variable_fixup, 120 lines, decls.c)
Defined-variable checksub_4DC200 (mark_defined_variable, 26 lines, decls.c)
Cross-space reference checkersub_72A650 / sub_72B510 (record_symbol_reference_full, symbol_ref.c)
Device-var-in-host checkersub_6BCF10 (nv_check_device_variable_in_host, nv_transforms.c)
Post-validationsub_6BC890 (nv_validate_cuda_attributes, 161 lines, nv_transforms.c)
Attribute kind codes'W'=87 (__device__), 'Z'=90 (__shared__), '['=91 (__constant__), 'f'=102 (__managed__)

The Memory Space Bitfield (Entity +148 / +149)

Byte +148: Primary Memory Space

Byte at entity+148:

  bit 0  (0x01)   __device__       Variable in device global memory
  bit 1  (0x02)   __shared__       Variable in per-block shared memory
  bit 2  (0x04)   __constant__     Variable in constant memory
  bit 3  (0x08)   type_member      Set when variable inherits space from type context
  bit 4  (0x10)   device_at_file   __device__ at file scope (no enclosing function)
  bit 7  (0x80)   weak_odr         Set by apply_nv_weak_odr_attr (sub_40AD80)

Bits 3, 4, and 7 are set by decl_variable (sub_4CA6C0) during declaration processing, not by the attribute handlers. Bit 3 is set via *(_BYTE *)(v33 + 148) |= 8u when the variable inherits its memory space from a type context (such as a static member of a class with a device annotation). Bit 4 is set via *(_BYTE *)(v43 + 148) = v73 | 0x10 when a __device__ variable is declared at file scope (dword_126C5D8 == -1, meaning no enclosing function).

Byte +149: Extended Memory Space

Byte at entity+149:

  bit 0  (0x01)   __managed__    Unified memory (host + device accessible)
  bits 1-7        (reserved)

Word-Level Access

Some validation code reads bytes +148 and +149 together as a 16-bit word. The __grid_constant__ conflict check in apply_nv_managed_attr tests:

// sub_40E0D0, line 26 (apply_nv_managed_attr)
if ( (a2[164] & 4) != 0 && (*((_WORD *)a2 + 74) & 0x102) != 0 )

Here (_WORD *)(a2 + 148) (offset 74 in 16-bit units) is tested against 0x0102. In little-endian layout, 0x0102 means byte +148 bit 1 (__shared__) OR byte +149 bit 0 (__managed__). This catches the case where a __grid_constant__ parameter also carries __shared__ or __managed__.

Mutual Exclusion

In valid CUDA programs, at most one of __device__, __shared__, and __constant__ should be set. However, __managed__ always implies __device__ -- the handler sets both +149 bit 0 and +148 bit 0. The validation logic permits __device__ + __managed__ but rejects combinations like __shared__ + __constant__.

The mutual exclusion check appears identically in both apply_nv_managed_attr and apply_nv_device_attr:

// From sub_40EB80 (apply_nv_device_attr), variable path:
v9 = *(_BYTE *)(a2 + 148) | 1;     // set __device__ bit
*(_BYTE *)(a2 + 148) = v9;
if ( ((v9 & 2) != 0) + ((v9 & 4) != 0) == 2 )
    sub_4F81B0(3481, a1 + 56);      // error: conflicting spaces

The expression ((v9 & 2) != 0) + ((v9 & 4) != 0) == 2 is true only when both __shared__ (bit 1) and __constant__ (bit 2) are set simultaneously. This means:

  • __device__ + __shared__ is allowed (the bits coexist)
  • __device__ + __constant__ is allowed
  • __shared__ + __constant__ triggers error 3481

Attribute Handlers

apply_nv_managed_attr -- sub_40E0D0

The __managed__ handler is the simplest and most thoroughly documented. It demonstrates the full validation pattern that all memory space handlers share.

Entry point: Called from apply_one_attribute (sub_413240) when attribute kind is 'f' (102).

Decompiled logic (47 lines, attribute.c:10523):

// sub_40E0D0 -- apply_nv_managed_attr
// a1: attribute node, a2: entity node, a3: entity kind

// Gate: only applies to variables
if ( a3 != 7 )
    internal_error("attribute.c", 10523, "apply_nv_managed_attr");

// Step 1: Set managed flag AND device flag
v3 = a2[148];           // save old memory space byte
a2[149] |= 1;           // set __managed__ bit
a2[148] = v3 | 1;       // set __device__ bit (managed implies device)

// Step 2: Mutual exclusion check
if ( ((v3 & 2) != 0) + ((v3 & 4) != 0) == 2 )
    error(3481, ...);    // __shared__ + __constant__ conflict

// Step 3: Thread-local check
if ( (char)a2[161] < 0 )
    error(3482, ...);    // __managed__ on thread_local

// Step 4: Local variable check
if ( (a2[81] & 4) != 0 )
    error(3485, ...);    // __managed__ on local variable

// Step 5: __grid_constant__ conflict
if ( (a2[164] & 4) != 0 && (*(WORD*)(a2 + 148) & 0x102) != 0 )
{
    // Determine which space string to display
    v4 = a2[148];
    v5 = "__constant__";
    if ( (v4 & 4) == 0 ) {
        v5 = "__managed__";
        if ( (a2[149] & 1) == 0 ) {
            v5 = "__shared__";
            if ( (v4 & 2) == 0 ) {
                v5 = "__device__";
                if ( (v4 & 1) == 0 )
                    v5 = "";
            }
        }
    }
    error(3577, ..., v5);   // incompatible with __grid_constant__
}

The space-name selection cascade (__constant__ > __managed__ > __shared__ > __device__ > empty) is used in error messages to show which memory space conflicts with __grid_constant__. The cascade tests bits in priority order, matching the most "restrictive" space first.

apply_nv_device_attr -- sub_40EB80

The __device__ handler is dual-purpose: it handles both variables (a3 == 7) and functions (a3 == 11).

Entry point: Called from apply_one_attribute when attribute kind is 'W' (87).

Variable path (entity kind 7):

// sub_40EB80, variable branch
*(_BYTE *)(a2 + 148) |= 1;          // set __device__ bit

// Validation (identical to __managed__):
// 1. Error 3481 if __shared__ + __constant__ both set
// 2. Error 3482 if thread_local (byte +161 bit 7)
// 3. Error 3485 if local variable (byte +81 bit 2)
// 4. Error 3577 if __grid_constant__ conflict

Function path (entity kind 11):

// sub_40EB80, function branch
// Check: not an implicitly-deleted function
if ( (*(_QWORD *)(a2 + 184) & 0x800001000000LL) != 0x800000000000LL
     || (*(_BYTE *)(a2 + 176) & 2) != 0 )
{
    // Conflict with __global__
    if ( !dword_106BFF0 && (*(_BYTE *)(a2 + 182) & 0x40) != 0 )
        error(3481, ...);

    *(_BYTE *)(a2 + 182) |= 0x23;    // set device execution space

    // Local function with __global__ conflict
    if ( (*(_BYTE *)(a2 + 81) & 4) != 0 && (*(_BYTE *)(a2 + 182) & 0x40) != 0 )
        error(3688, ...);

    // __device__ on main()
    if ( a2 == qword_126EB70 && (*(_BYTE *)(a2 + 182) & 0x20) != 0 )
        warning(3538, ...);
}
else
{
    // Implicitly-deleted function: just warn
    v14 = get_entity_display_name(a2);
    error(3469, ..., "__device__", v14);
}

// Check function parameters for missing default initializers
// (error 3669 for parameters without defaults in device context)

The function path is documented in Execution Spaces -- here we focus on the variable path.

shared and constant Handlers

The __shared__ and __constant__ attribute handlers are dispatched through apply_one_attribute (sub_413240) when attribute kind codes 'Z' (90) and '[' (91) are encountered. Their variable-path logic mirrors __device__ and __managed__:

Step__shared__ ('Z')__constant__ ('[')
Set memory space bitbyte +148 |= 0x02byte +148 |= 0x04
Mutual exclusion (3481)Check __constant__ bit (bit 2)Check __shared__ bit (bit 1)
Thread-local check (3482)YesYes
Local variable check (3485)YesYes
__grid_constant__ conflict (3577)YesYes

The __shared__ and __constant__ keywords apply only to variables (kind 7). Unlike __device__, they do not have a function-path branch -- there is no __shared__ or __constant__ function execution space.

Variable Declaration Processing

sub_4DEC90 -- variable_declaration

The top-level declaration processor (decls.c) performs additional CUDA-specific validation after attribute handlers have set the memory space bits. This function is 1098 lines and handles both normal variable declarations and static data member definitions.

CUDA-specific checks in variable_declaration:

ErrorConditionDescription
149Memory space attribute at illegal scopeCUDA storage class at namespace scope (specific scenarios)
892auto with __constant__auto-typed __constant__ variable
893auto with CUDA attributeauto-typed variable with other CUDA memory space
3510__shared__ with VLA__shared__ variable with variable-length array type
3566__constant__ + constexpr + auto__constant__ constexpr with auto deduction
3567CUDA variable with VLACUDA memory-space variable with VLA type
3568__constant__ + constexpr__constant__ combined with constexpr
3578CUDA attribute in discarded branchCUDA attribute on variable in constexpr-if discarded branch
3579CUDA attribute + structured bindingCUDA attribute at namespace scope with structured binding
3580CUDA attribute on VLACUDA attribute on variable-length array

Memory space string selection (used in error messages):

// sub_4DEC90, line ~357: selecting display name for the memory space
v50 = "__constant__";
if ( (v49 & 4) == 0 ) {
    v50 = "__managed__";
    if ( (*(_BYTE *)(v15 + 149) & 1) == 0 ) {
        v50 = "__host__ __device__" + 9;   // pointer arithmetic: = "__device__"
        if ( (v49 & 2) != 0 )
            v50 = "__shared__";
    }
}

The string "__device__" is produced by taking the string "__host__ __device__" and advancing by 9 bytes, skipping past "__host__ ". This is a binary-level optimization -- the compiler shares string storage between the combined "__host__ __device__" literal and the standalone "__device__" reference.

sub_4CA6C0 -- decl_variable

The core variable declaration function (1090 lines, decls.c:7730) handles CUDA memory space propagation during symbol table entry creation. Key behaviors:

Storage class mapping: When declaration state byte at offset +269 equals 5, it indicates a CUDA memory space storage class. The function performs a scope walk to determine the correct namespace scope for the variable. If a prior declaration exists at the same scope (dword_126C5DC == dword_126C5B4), the CUDA storage class is reset to allow redeclaration.

Scope walk: Traverses the scope chain (784-byte scope entries at qword_126C5E8, indexed by dword_126C5E4) upward through class scopes (scope_kind 4) and template scopes (bit 0x20 at scope entry +9), until reaching a non-class, non-template scope. This determines whether the variable is at namespace scope, class scope, or block scope.

Error 3483 -- memory space in non-device function: When a variable with a device memory space bit (+148 bit 0 set) is declared inside a function body, and the enclosing routine is NOT device-only (+182 & 0x30 != 0x20), the function emits error 3483 with the storage kind and space name:

// From sub_4CA6C0, ~line 886-910
if (!at_namespace_scope) {
    char space = entity->byte_148;
    if (storage_class != 1 && (space & 0x01)) {
        routine_descriptor = qword_126C5D0;
        if (routine_descriptor) {
            entity_ptr = *(routine_descriptor + 32);
            if (entity_ptr && (entity_ptr[182] & 0x30) != 0x20) {
                const char *name = get_space_name(entity);  // priority cascade
                const char *kind = (storage_class == 2) ? "a static" : "an automatic";
                error(3483, source_loc, kind, name);
            }
        }
    }
}

File-scope device flag: When a __device__ variable is at file scope (dword_126C5D8 == -1), the function sets bit 4 of +148:

if ((entity->byte_148 & 0x01) && dword_126C5D8 == -1)
    entity->byte_148 |= 0x10;   // bit 4: device_at_file_scope

Redeclaration checking: When a variable is redeclared, the function compares memory space encoding at offset +136 (the attribute byte) between the existing and new entity. Error 1306 is emitted for mismatched CUDA memory spaces.

Memory space propagation: Calls sub_4C4750 (set_variable_attributes) for final attribute propagation, and sub_4CA480 (check_variable_redeclaration) for prior-declaration compatibility.

sub_4DC200 -- mark_defined_variable

Post-declaration validation for device-memory variables with external linkage (26 lines):

// sub_4DC200 -- mark_defined_variable (decompiled)
void mark_defined_variable(entity_t *a1, int a2) {
    if (a1[164] & 0x10) {   // already marked as defined
        if (!dword_106BFD0                    // cross-space checking not overridden
            && (a1[148] & 3) == 1             // __device__ set, __shared__ NOT set
            && !is_compiler_generated(a1)     // not compiler-generated
            && (a1[80] & 0x70) != 0x10)       // not anonymous
        {
            warning(3648, a1 + 64);           // external linkage warning
        }
    } else if (!a2 && (*(byte*)(*(qword*)a1 + 81) & 2)) {
        error(1655, ...);   // tentative definition of constexpr
    } else {
        // Same 3648 check on first definition
        if (!dword_106BFD0 && (a1[148] & 3) == 1 && ...)
            warning(3648, a1 + 64);
        a1[164] |= 0x10;   // mark as defined
    }
}

The condition (a1[148] & 3) == 1 tests that bit 0 (__device__) is set AND bit 1 (__shared__) is NOT set. This catches __device__ variables (including __device__ __constant__ and __device__ __managed__, since those have bit 0 set) but excludes __shared__ variables (which have bit 1 set). The check is NOT about __constant__ alone -- a pure __constant__ variable (only bit 2 set, value 0x04) would yield (0x04 & 3) == 0, failing the test. The p1.06 report's characterization of error 3648 as "constant with external linkage" is misleading; the actual condition is "device-accessible (non-shared) variable with external linkage."

sub_4CC150 -- cuda_variable_fixup

Called from variable_declaration after CUDA constexpr-if detection. This function:

  • Manipulates variable entity fields at offset +148 (memory space) and +162 (visibility flags)
  • Adjusts scope chains using the 784-byte scope entry array
  • Creates new type entries for CUDA-specific variable rewriting

Bit Assignment Resolution

Two sweep reports provided conflicting bit assignments for byte +148:

Sourcebit 0bit 1bit 2
p1.01 (attribute.c handlers)__device____shared____constant__
p1.06 (decls.c)__constant____shared____managed__

The decompiled code resolves this definitively in favor of the p1.01 assignment. Two independent functions confirm it:

  1. sub_40E0D0 (apply_nv_managed_attr) sets a2[149] |= 1 (managed at +149) and a2[148] = v3 | 1 (device at +148 bit 0). The subsequent conflict check tests (v3 & 2) for __shared__ and (v3 & 4) for __constant__.

  2. sub_40EB80 (apply_nv_device_attr) sets *(_BYTE *)(a2 + 148) | 1 (device at +148 bit 0), then uses the identical conflict test ((v9 & 2) != 0) + ((v9 & 4) != 0) == 2.

The canonical encoding is:

Byte +148:  bit 0 = __device__,  bit 1 = __shared__,  bit 2 = __constant__
Byte +149:  bit 0 = __managed__

The p1.06 report's alternative encoding is an analysis error, caused by mark_defined_variable (sub_4DC200) testing +148 & 3 == 1 in the context of error 3648. That test checks for __device__ set (bit 0) without __shared__ (bit 1) -- not for __constant__ at bit 0. The error was then characterized as "constant with external linkage" based on the error message text rather than the actual bit test.

Validation Constraints

managed Constraints

__managed__ has the strictest requirements among memory space annotations. All five checks occur in apply_nv_managed_attr (sub_40E0D0):

ConstraintBinary testErrorDescription
Variables onlya3 != 7internal_error__managed__ can only apply to variables, not functions or types
No shared+constant((old & 2) != 0) + ((old & 4) != 0) == 23481Both __shared__ and __constant__ already set
Not thread-local(signed char)byte+161 < 03482Bit 7 of +161 = thread_local storage
Not reference/localbyte+81 & 43485Bit 2 of +81 = reference type or local variable
Not grid_constantbyte+164 & 4 and word +148 & 0x01023577__grid_constant__ parameter with managed or shared space

The __managed__ keyword requires compute capability >= 3.0. This is verified at compilation time via version threshold comparisons (qword_126EF90 > 0x78B3, where 0x78B3 = 30899 in the CUDA version encoding scheme). The specific error code for architecture-too-low is not captured in the decompiled attribute handler.

shared Constraints

__shared__ variables have restrictions enforced across multiple functions:

ConstraintWhereErrorDescription
No VLA typesub_4DEC903510__shared__ variable cannot have variable-length array type
No VLA (general)sub_4DEC903580CUDA memory-space attribute on variable-length array
Not thread-localAttribute handler3482__shared__ on thread_local variable
Not local (non-block)Attribute handler3485Cannot appear on local variables outside device function scope
No grid_constantAttribute handler3577Incompatible with __grid_constant__ parameter

constant Constraints

__constant__ carries additional restrictions related to constexpr and type:

ConstraintWhereErrorDescription
No constexprsub_4DEC903568__constant__ combined with constexpr (when managed+device bits also set)
No constexpr+autosub_4DEC903566Constexpr with const-qualified type
No VLA typesub_4DEC903567CUDA memory-space variable with VLA type
Not thread-localAttribute handler3482__constant__ on thread_local variable
Not localAttribute handler3485Cannot appear on local variables
No grid_constantAttribute handler3577Incompatible with __grid_constant__ parameter

Note: Error 3648 (external linkage warning) is emitted by sub_4DC200 but the condition tests (byte+148 & 3) == 1, which checks for __device__ set without __shared__ -- not specifically __constant__. The check applies to any device-accessible non-shared variable, including __device__, __device__ __constant__, and __device__ __managed__.

Cross-Space Variable Access Checking

When host code references a device-side variable, the symbol reference recorder emits diagnostics. This checking occurs in record_symbol_reference_full (sub_72A650 / sub_72B510, symbol_ref.c) and is gated by global flags dword_106BFD0 and dword_106BFCC.

Gate Logic

1. Is cross-space checking enabled?
   → dword_106BFD0 != 0 OR dword_106BFCC != 0

2. Is the referenced entity a variable (kind == 7)?
   → Yes: proceed to nv_check_device_var_ref_in_host
   → No (kind 10/11/20 -- function): check nv_check_host_var_ref_in_device

3. Get current routine from scope stack (dword_126C5D8)
4. Check routine execution space at +182 (0x30 mask):
   → 0x00 or 0x10 (host): emit device-var-in-host errors
   → 0x20 (device): emit host-var-in-device errors

Device Variable Referenced from Host Code

The nv_check_device_var_ref_in_host path (assert string at symbol_ref.c:2347) checks memory space bits and produces specific errors based on which space the variable occupies:

ErrorConditionDescription
3548Variable has __shared__ or __constant__ (byte+148 bits 1-2)Reference to __shared__ / __constant__ variable from host code
3549Variable has __constant__ and reference is in initializer context (ref_kind bit 4)Initializer referencing device memory variable from host
3550Variable has __shared__ and reference is a write (ref_kind bit 1)Write to __shared__ variable from host code
3486Via sub_6BCF10 -- complex linkage check (+176 & 0x200000000002000, +166 == 5, +168 in [1,4])Illegal device variable reference from host (operator function context)

Host Variable Referenced from Device Code

The nv_check_host_var_ref_in_device path (assert string at symbol_ref.c:2390) handles the reverse direction:

ErrorConditionDescription
3623Device-only function referenced outside device contextUse of __device__-only function outside the bodies of device functions

The error 3623 has two context strings:

  • "outside the bodies of device functions" -- general case
  • "from a constexpr or consteval __device__ function" -- constexpr context

Relaxation: dword_106BF40

When dword_106BF40 is set (corresponding to --expt-relaxed-constexpr), and the current routine at +182 has the device annotation pattern (& 0x30 == 0x20) with +177 bit 1 set (explicit __device__), cross-space variable access checks are suppressed. This allows constexpr device functions to reference host variables during constant evaluation.

Host Reference Arrays

When the backend emits host-side code, variables marked with __device__, __shared__, or __constant__ are registered in ELF section arrays so the CUDA runtime can discover them at load time. The emission function sub_6BCF80 (nv_emit_host_reference_array) writes entries into six separate sections:

SectionArray NameMemory SpaceLinkage
.nvHRDEhostRefDeviceArrayExternalLinkage__device__External
.nvHRDIhostRefDeviceArrayInternalLinkage__device__Internal
.nvHRCEhostRefConstantArrayExternalLinkage__constant__External
.nvHRCIhostRefConstantArrayInternalLinkage__constant__Internal
.nvHRKEhostRefKernelArrayExternalLinkage__global__ (kernel)External
.nvHRKIhostRefKernelArrayInternalLinkage__global__ (kernel)Internal

Each array entry contains the mangled name of the device symbol as a byte array:

extern "C" {
    extern __attribute__((section(".nvHRDE")))
    __attribute__((weak))
    const unsigned char hostRefDeviceArrayExternalLinkage[] = {
        /* mangled name bytes */ 0x0
    };
}

Six global lists (at addresses unk_1286780 through unk_12868C0) accumulate symbols during compilation, one per section type. Note that __shared__ variables do NOT get host reference arrays -- they have no host-visible address.

Redeclaration Compatibility

When a variable is redeclared, decl_variable (sub_4CA6C0) compares the memory space bits between the prior declaration and the new one. Error 1306 is emitted for mismatched CUDA memory spaces:

Error 1306: CUDA memory space mismatch on redeclaration

The comparison tests byte +148 of both the existing entity and the new declaration's computed attributes. The CUDA memory space acts as an implicit storage class -- storage class value 5 in the declaration state (offset 269) indicates a CUDA-specific storage class that requires special scope-walking behavior.

String Table Usage

The memory space keywords appear in the binary's string table and are referenced by error message formatting code:

StringUsage
"__constant__"Error messages for __constant__ constraints, space name display
"__managed__"Error messages for __managed__ constraints
"__device__"Obtained via "__host__ __device__" + 9 (pointer arithmetic), or direct literal
"__shared__"Error messages for __shared__ constraints
"__host__ __device__"Combined string; +9 yields "__device__"

The pointer-arithmetic trick for "__device__" appears in both sub_4DEC90 (variable_declaration) and error message formatting throughout the attribute handlers. It saves binary space by reusing the combined "__host__ __device__" string constant.

Error Code Summary

Attribute Application Errors

ErrorSeverityDescription
3481ErrorConflicting CUDA memory spaces (__shared__ + __constant__ simultaneously)
3482ErrorCUDA memory space attribute on thread_local variable
3485ErrorCUDA memory space attribute on local variable
3577ErrorMemory space incompatible with __grid_constant__ parameter

Declaration Processing Errors

ErrorSeverityDescription
149ErrorIllegal CUDA storage class at namespace scope
892Errorauto type with __constant__ variable
893Errorauto type with CUDA memory space variable
1306ErrorCUDA memory space mismatch on redeclaration
3483ErrorMemory space qualifier on automatic/static variable in non-device function
3510Error__shared__ variable with variable-length array
3566Error__constant__ with constexpr and auto deduction
3567ErrorCUDA variable with VLA type
3568Error__constant__ combined with constexpr
3578ErrorCUDA attribute in constexpr-if discarded branch
3579ErrorCUDA attribute at namespace scope with structured binding
3580ErrorCUDA attribute on variable-length array
3648WarningDevice-accessible (non-shared) variable with external linkage

Cross-Space Reference Errors

ErrorSeverityDescription
3486ErrorIllegal device variable reference from host (operator function context)
3548ErrorReference to __shared__ / __constant__ variable from host code
3549ErrorInitializer referencing device memory variable from host
3550ErrorWrite to __shared__ variable from host code
3623ErrorUse of __device__-only function outside device context

Global State Variables

VariableTypeDescription
dword_126EFA8intCUDA mode flag (nonzero when compiling CUDA)
dword_126EFB4intCUDA dialect (2 = CUDA C++)
dword_126EFACintExtended CUDA features flag
dword_126EFA4intCUDA version-check control
qword_126EF98int64CUDA version threshold (hex: 0x9E97 = 40599, 0x9D6C, etc.)
qword_126EF90int64CUDA version threshold (hex: 0x78B3 = 30899 for compute_30)
dword_106BFD0intEnable cross-space reference checking (primary)
dword_106BFCCintEnable cross-space reference checking (secondary)
dword_106BF40intAllow __device__ function refs in host (--expt-relaxed-constexpr)
dword_106BFF0intRelaxed execution space mode (permits otherwise-illegal combos)
qword_126EB70ptrEntity pointer for main() (prevents __device__ on main)
qword_126C5E8ptrScope stack base pointer (784-byte entries)
dword_126C5E4intCurrent scope stack top index
dword_126C5D8intCurrent function scope index (-1 if none)

Function Map

AddressIdentitySizeSource
sub_40AD80apply_nv_weak_odr_attr0.2 KBattribute.c:10497
sub_40E0D0apply_nv_managed_attr0.4 KBattribute.c:10523
sub_40E1F0apply_nv_global_attr (variant 1)0.9 KBattribute.c
sub_40E7F0apply_nv_global_attr (variant 2)0.9 KBattribute.c
sub_40EB80apply_nv_device_attr1.0 KBattribute.c
sub_4108E0apply_nv_host_attr0.3 KBattribute.c
sub_413240apply_one_attribute (dispatch)5.9 KBattribute.c
sub_413ED0apply_attributes_to_entity4.9 KBattribute.c
sub_40A310attribute_display_name0.6 KBattribute.c:1307
sub_4CA6C0decl_variable11 KBdecls.c:7730
sub_4CC150cuda_variable_fixup1.2 KBdecls.c:20654
sub_4DC200mark_defined_variable0.3 KBdecls.c
sub_4DEC90variable_declaration11 KBdecls.c:12956
sub_6BC890nv_validate_cuda_attributes1.6 KBnv_transforms.c
sub_6BCF10nv_check_device_variable_in_host0.2 KBnv_transforms.c
sub_6BCF80nv_emit_host_reference_array0.8 KBnv_transforms.c
sub_72A650record_symbol_reference_full (6-arg)6.6 KBsymbol_ref.c
sub_72B510record_symbol_reference_full (4-arg)7.3 KBsymbol_ref.c

See Also