global Function Constraints
The __global__ attribute designates a CUDA kernel -- a function that executes on the GPU and is callable from host code via the <<<...>>> launch syntax. Of all CUDA execution space attributes, __global__ imposes the most constraints. cudafe++ enforces these constraints across three separate validation passes: attribute application (when __global__ is first applied to an entity), post-declaration validation (after all attributes on a declaration are resolved), and semantic analysis (during template instantiation, redeclaration merging, and lambda processing). This page documents all constraint checks, their implementation in the binary, the entity node fields they inspect, and the diagnostics they emit.
Key Facts
| Property | Value |
|---|---|
| Source files | attribute.c (apply handler), nv_transforms.c (post-validation), class_decl.c (redeclaration, lambda), decls.c (template packs) |
| Apply handler (variant 1) | sub_40E1F0 (89 lines) |
| Apply handler (variant 2) | sub_40E7F0 (86 lines) |
| Post-validation | sub_6BC890 (nv_validate_cuda_attributes, 161 lines) |
| Attribute kind byte | 0x58 = 'X' |
| OR mask applied | entity+182 |= 0x61 (bits 0 + 5 + 6) |
| HD combined flag | entity+182 |= 0x80 (set when __global__ applied to function already marked __host__) |
| Total constraint checks | 37 distinct error conditions |
| Entity fields read | +81, +144, +148, +152, +166, +176, +179, +182, +183, +184, +191 |
| Relaxed mode flag | dword_106BFF0 (suppresses certain conflict checks) |
| main() entity pointer | qword_126EB70 (compared to detect __global__ main) |
Two Variants of apply_nv_global_attr
Two nearly identical functions implement the __global__ application logic. Both perform the same 11 validation checks and apply the same 0x61 bitmask. The difference is purely structural: sub_40E1F0 uses a for loop with a null-terminated break for the parameter default-init iteration, while sub_40E7F0 uses a do-while loop with an explicit null check and early return. Both exist because EDG's attribute subsystem may route through different call paths depending on whether the attribute appears on a declaration or a definition.
// Pseudocode for apply_nv_global_attr (sub_40E1F0 / sub_40E7F0)
// a1: attribute node, a2: entity node, a3: target kind
entity_t* apply_nv_global_attr(attr_node_t* a1, entity_t* a2, uint8_t a3) {
// Gate: only applies to functions (kind 11)
if (a3 != 11)
return a2;
// ---- Phase 1: Linkage / constexpr lambda check ----
// Bits 47 and 24 of the 48-bit field at +184
if ((a2->qword_184 & 0x800001000000) == 0x800000000000) {
// Constexpr lambda with internal linkage but no local flag
char* name = get_entity_display_name(a2, 0); // sub_6BC6B0
emit_error(3469, a1->src_loc, "__global__", name);
return a2; // bail out, do not apply __global__
}
// ---- Phase 2: Structural constraints ----
// 2a. Static member function check
if ((signed char)a2->byte_176 < 0 && !(a2->byte_81 & 0x04))
emit_warning(3507, a1->src_loc, "__global__"); // severity 5
// 2b. operator() check
if (a2->byte_166 == 5)
emit_error(3644, a1->src_loc); // severity 7
// 2c. Exception specification check (uses return type chain)
type_t* ret = a2->type_chain; // entity+144
while (ret->kind == 12) // skip cv-qualifier wrappers
ret = ret->referenced; // type+144
if (ret->prototype->exception_spec) // proto+152 -> +56
emit_error(3647, a1->src_loc); // auto/decltype(auto) return
// 2d. Execution space conflict
uint8_t es = a2->byte_182;
if (!relaxed_mode && (es & 0x60) == 0x20) // already __device__ only
emit_error(3481, a1->src_loc);
if (es & 0x10) // already __host__ explicit
emit_error(3481, a1->src_loc);
// 2e. Return type must be void
if (!(a2->byte_179 & 0x10)) { // not constexpr
if (a2->byte_191 & 0x01) // is lambda
emit_error(3506, a1->src_loc);
else {
type_t* base = skip_typedefs(a2->type_chain); // sub_7A68F0
if (!is_void_type(base->referenced)) // sub_7A6E90
emit_error(3505, a1->src_loc);
}
}
// 2f. Variadic (ellipsis) check
type_t* proto_type = a2->type_chain; // +144
while (proto_type->kind == 12)
proto_type = proto_type->referenced;
if (proto_type->prototype->flags_16 & 0x01) // bit 0 of proto+16
emit_error(3503, a1->src_loc);
// ---- Phase 3: Apply the bitmask ----
a2->byte_182 |= 0x61; // device_capable + device_annotation + global_kernel
// ---- Phase 4: Additional checks (after bitmask set) ----
// 4a. Local function (constexpr local)
if (a2->byte_81 & 0x04)
emit_error(3688, a1->src_loc);
// 4b. main() function check
if (a2 == main_entity && (a2->byte_182 & 0x20))
emit_error(3538, a1->src_loc);
// ---- Phase 5: Parameter iteration (__grid_constant__ warning) ----
if (a1->flags & 0x01) { // attr_node+11 bit 0: applies to parameters
// Walk parameter list from prototype
proto_type = a2->type_chain;
while (proto_type->kind == 12)
proto_type = proto_type->referenced;
param_t* param = *proto_type->prototype->param_list; // deref +152
source_loc_t loc = a1->src_loc; // +56
for (; param != NULL; param = param->next) {
// Peel cv-qualifier wrappers
type_t* ptype = param->type; // param[1]
while (ptype->kind == 12)
ptype = ptype->referenced;
// Check: is type a __grid_constant__ candidate?
if (!has_grid_constant_flag(ptype) && scope_index == -1) {
// sub_7A6B60: checks byte+133 bit 5 (0x20)
int64_t scope = scope_table_base + 784 * scope_table_index;
if ((scope->flags_6 & 0x06) == 0 && scope->kind_4 != 12) {
type_t* ptype2 = param->type;
while (ptype2->kind == 12)
ptype2 = ptype2->referenced;
if (!ptype2->default_init) // type+120 == NULL
emit_error(3669, &loc);
}
}
}
}
// ---- Phase 6: HD combined flag ----
if (a2->byte_182 & 0x40) // __global__ now set
a2->byte_182 |= 0x80; // mark as combined HD
return a2;
}
Execution Order Detail
The 0x61 bitmask is applied before the local-function (3688) and main() (3538) checks but after all structural checks (3507, 3644, 3647, 3481, 3505/3506, 3503). This means the bitmask is set even when errors are emitted -- cudafe++ continues processing after errors to collect as many diagnostics as possible in a single compilation pass.
The constexpr-lambda check at the top (error 3469) is the only check that causes an early return. If the function is a constexpr lambda with wrong linkage, the bitmask is NOT set and no further validation is performed.
Validation Error Catalog
The 37 validation errors are organized by the phase in which they are checked and by semantic category. Error codes below are cudafe++ internal diagnostic numbers; severity values match the sub_4F41C0 severity parameter (5 = warning, 7 = error, 8 = hard error).
Category 1: Return Type
| Error | Severity | Check | Message |
|---|---|---|---|
| 3505 | 7 | !is_void_type(skip_typedefs(entity+144)->referenced) | a __global__ function must have a void return type |
| 3506 | 7 | entity+191 & 0x01 (lambda) and non-void | a __global__ function must not have a deduced return type |
| 3647 | 7 | entity+152 -> +56 != NULL (exception spec present on return proto) | auto/decltype(auto) deduced return type |
Error 3505 and 3506 are mutually exclusive paths guarded by the byte+179 & 0x10 constexpr flag. When the function is not constexpr, the handler checks whether it is a lambda (3506 path, which checks byte+191 bit 0) or a regular function (3505 path, which resolves through skip_typedefs via sub_7A68F0 and tests is_void_type via sub_7A6E90). The skip_typedefs function follows the type chain while type->kind == 12 (cv-qualifier wrapper) and type->byte_161 & 0x7F == 0 (no qualifier flags). The is_void_type function follows the same chain and returns kind == 1 (void).
Error 3647 is checked independently of 3505/3506. The check examines the exception specification pointer at prototype offset +56. In EDG's type system, auto and decltype(auto) return types are represented with a non-null exception specification node on the return type's prototype -- this is a repurposed field that indicates the return type is deduced.
Category 2: Parameters
| Error | Severity | Check | Message |
|---|---|---|---|
| 3503 | 8 | proto+16 & 0x01 (has ellipsis) | a __global__ function cannot have ellipsis |
| 3702 | 7 | param_flags & 0x02 (rvalue ref) | a __global__ function cannot have a parameter with rvalue reference type |
| -- | 7 | Parameter with __restrict__ on reference type | a __global__ function cannot have a parameter with __restrict__ qualified reference type |
| -- | 7 | Parameter of type va_list | A __global__ function or function template cannot have a parameter with va_list type |
| -- | 7 | Parameter of type std::initializer_list | a __global__ function or function template cannot have a parameter with type std::initializer_list |
| -- | 7 | Oversized alignment on win32 | cannot pass a parameter with a too large explicit alignment to a __global__ function on win32 platforms |
| 3669 | 8 | Device-scope parameter without default init | __grid_constant__ parameter warning (device-side check) |
Error 3503 (ellipsis) is checked in the apply handler by testing bit 0 of the function prototype's flags word at offset +16. This bit indicates the parameter list ends with ....
Error 3702 (rvalue reference) is checked in the post-validation pass (sub_6BC890), not in the apply handler. The post-validator walks the parameter list and checks byte offset +32 (bit 1) of each parameter node.
The __restrict__ reference, va_list, initializer_list, and win32 alignment checks are scattered across separate validation functions in nv_transforms.c and are triggered during declaration processing rather than during attribute application.
Error 3669 is checked in the apply handler's parameter iteration loop. It walks each parameter, resolves through cv-qualifier wrappers, and tests whether sub_7A6B60 returns false (meaning the parameter type has bit 5 of byte+133 clear -- not a __grid_constant__ type) AND the scope lookup produces a non-array, non-qualifier type without a default initializer at type+120.
Category 3: Modifiers
| Error | Severity | Check | Message |
|---|---|---|---|
| 3507 | 5 | (signed char)byte_176 < 0 && !(byte_81 & 0x04) | A __global__ function or function template cannot be marked constexpr (warning for static member) |
| 3688 | 8 | byte_81 & 0x04 (local function) | A __global__ function or function template cannot be marked constexpr (constexpr local) |
| 3481 | 8 | Execution space conflict (see matrix) | Conflicting CUDA execution spaces |
| -- | 7 | Function is consteval | A __global__ function or function template cannot be marked consteval |
| 3644 | 7 | byte_166 == 5 (operator function kind) | An operator function cannot be a __global__ function |
| -- | 7 | Defined in friend declaration | A __global__ function or function template cannot be defined in a friend declaration |
| -- | 7 | Exception specification present | An exception specification is not allowed for a __global__ function or function template |
| -- | 7 | Declared in inline unnamed namespace | A __global__ function or function template cannot be declared within an inline unnamed namespace |
| 3538 | 7 | a2 == qword_126EB70 (is main()) | function main cannot be marked __device__ or __global__ |
Error 3507 deserves special attention. The decompiled code shows:
if ((signed char)a2->byte_176 < 0 && !(a2->byte_81 & 0x04))
emit_warning(3507, ...);
The signed char cast means byte_176 >= 0x80 (bit 7 set = static member function). The !(byte_81 & 0x04) condition ensures it is NOT a local function. The emitter uses severity 5 (warning via sub_4F8DB0), meaning this is a warning, not an error -- NVIDIA chose to warn rather than reject __global__ on static members, though the official documentation says it is not allowed. The displayed string is "A __global__ function or function template cannot be marked constexpr" with "__global__" as the attribute name parameter, though the actual semantic is "static member function" per the field being checked.
Error 3644 checks entity+166 == 5. This field stores the "operator function kind" enum value, where 5 corresponds to operator(). This prevents lambda call operators or functors from being directly marked __global__.
Error 3688 is checked after the bitmask is set (byte_182 |= 0x61). It tests byte_81 & 0x04, which indicates a local (block-scope) function. The handler emits with severity 8 (via sub_4F81B0, hard error).
Error 3538 compares the entity pointer against qword_126EB70, which holds the entity pointer for main() (set during initial declaration processing). The condition also requires byte_182 & 0x20 (device annotation bit set), which is always true after |= 0x61.
Category 4: Template Constraints
| Error | Severity | Check | Message |
|---|---|---|---|
| -- | 7 | Pack parameter is not last template parameter | Pack template parameter must be the last template parameter for a variadic __global__ function template |
| -- | 7 | Multiple pack parameters | Multiple pack parameters are not allowed for a variadic __global__ function template |
These checks are performed during template declaration processing in decls.c, not in the apply handler. They constrain variadic __global__ function templates: CUDA requires that pack parameters appear last (so the runtime can enumerate kernel arguments), and only a single pack is permitted (the CUDA launch infrastructure cannot handle multiple parameter packs).
Category 5: Redeclaration
| Error | Severity | Check | Message |
|---|---|---|---|
| -- | 7 | Previously __global__, now no execution space | a __global__ function(%no1) redeclared without __global__ |
| -- | 7 | Previously __global__, now __host__ | a __global__ function(%no1) redeclared with __host__ |
| -- | 7 | Previously __global__, now __device__ | a __global__ function(%no1) redeclared with __device__ |
| -- | 7 | Previously __global__, now __host__ __device__ | a __global__ function(%no1) redeclared with __host__ __device__ |
These four error variants are symmetrical with the reverse direction:
a __device__ function(%no1) redeclared with __global__a __host__ function(%no1) redeclared with __global__a __host__ __device__ function(%no1) redeclared with __global__
Redeclaration checks occur during declaration merging in class_decl.c. When a function is redeclared and the execution space of the new declaration does not match the original, cudafe++ emits one of these errors. The %no1 format specifier inserts the function name. These checks run independently of the apply_nv_global_attr handler -- they operate on the merged entity after both attribute sets have been processed.
Category 6: Constexpr Lambda Linkage
| Error | Severity | Check | Message |
|---|---|---|---|
| 3469 | 5 | (qword_184 & 0x800001000000) == 0x800000000000 | __global__ on constexpr lambda with wrong linkage |
This is the first check in the apply handler and the only one that causes early return. The 48-bit field at entity+184 encodes template and linkage properties. Bit 47 (0x800000000000) indicates internal linkage or a similar constraint, while bit 24 (0x000001000000) indicates a local entity. When bit 47 is set but bit 24 is clear, the entity is a constexpr lambda that cannot legally receive __global__. The handler calls sub_6BC6B0 (get_entity_display_name) to format the entity name for the diagnostic message, then returns without setting the bitmask.
Category 7: Post-Validation (sub_6BC890)
These checks run after all attributes on a declaration have been applied, in the nv_validate_cuda_attributes function:
| Error | Severity | Check | Message |
|---|---|---|---|
| 3702 | 7 | Parameter with rvalue reference flag (bit 1 at param+32) | a __global__ function cannot have a parameter with rvalue reference type |
| 3661 | 7 | __nv_register_params__ on __global__ | __nv_register_params__ is not allowed on a __global__ function |
| 3534 | 7 | __launch_bounds__ on non-__global__ | %s attribute is not allowed on a non-__global__ function |
| 3707 | 7 | maxBlocksPerCluster < cluster product | total number of blocks in cluster computed from %s exceeds __launch_bounds__ specified limit |
| 3715 | 7 | __maxnreg__ on non-__global__ | __maxnreg__ is not allowed on a non-__global__ function |
| 3719 | 7 | Both __launch_bounds__ and __maxnreg__ | __launch_bounds__ and __maxnreg__ may not be used on the same declaration |
| 3695 | 4 | __global__ without __launch_bounds__ | no __launch_bounds__ specified for __global__ function (warning) |
Error 3695 is a severity-4 diagnostic (informational warning). It fires when a __global__ function has no associated launch configuration, encouraging developers to specify __launch_bounds__ for optimal register allocation. This is the only constraint that is a soft advisory rather than a hard or standard error.
Entity Node Field Reference
The apply handler reads and writes specific fields within the entity node. Complete field semantics:
| Offset | Size | Field Name | Role in __global__ Validation |
|---|---|---|---|
+81 | 1 byte | local_flags | Bit 2 (0x04): function is local (block-scope). Checked for 3688 and as exemption for 3507. |
+144 | 8 bytes | type_chain | Pointer to return type. Followed through kind==12 cv-qualifier wrappers. |
+152 | 8 bytes | prototype | Function prototype pointer. At prototype+16: flags (bit 0 = ellipsis). At prototype+56: exception spec pointer. At prototype+0: parameter list head (double deref for first param). |
+166 | 1 byte | operator_kind | Value 5 = operator(). Checked for 3644. |
+176 | 1 byte | member_flags | Bit 7 (0x80, checked as signed char < 0): static member function. Checked for 3507. |
+179 | 1 byte | constexpr_flags | Bit 4 (0x10): function is constexpr. Guards 3505/3506 check (skipped if constexpr). |
+182 | 1 byte | execution_space | The primary execution space bitfield. |= 0x61 sets global kernel. Read for conflict checks (0x60, 0x10 masks). |
+183 | 1 byte | extended_cuda | Bit 3 (0x08): __nv_register_params__. Checked in post-validation. Bit 6 (0x40): __cluster_dims__ set. |
+184 | 8 bytes | linkage_template | 48-bit field encoding template/linkage flags. Only lower 48 bits used; mask 0x800001000000 checks constexpr lambda linkage. |
+191 | 1 byte | lambda_flags | Bit 0 (0x01): entity is a lambda. Routes to 3506 instead of 3505 for void-return check. |
+256 | 8 bytes | launch_config | Pointer to launch configuration struct (56 bytes). NULL if no launch attributes applied. Read in post-validation. |
The 0x61 Bitmask
The OR mask 0x61 sets three bits in the execution space byte:
0x61 = 0b01100001
bit 0 (0x01): device_capable -- function can run on device
bit 5 (0x20): device_annotation -- has explicit device-side annotation
bit 6 (0x40): global_kernel -- function is a __global__ kernel
Bit 0 is shared with __device__ (0x23) and __host__ (0x15). It serves as a "has CUDA annotation" predicate -- any entity with bit 0 set has been explicitly annotated with at least one execution space keyword. This enables fast if (byte_182 & 0x01) checks throughout the codebase.
Bit 5 is shared with __device__. A __global__ function is considered device-annotated because kernel code executes on the GPU.
Bit 6 is unique to __global__. The mask byte_182 & 0x40 is the canonical predicate for "is this a kernel function?" used in dozens of locations throughout the binary.
HD Combined Flag (0x80)
After setting 0x61, the handler checks whether bit 6 (0x40, global kernel) is now set. If so, it ORs 0x80 into the byte. This bit means "combined host+device" and is set as a secondary effect. The logic at the end of the function:
if (a2->byte_182 & 0x40) // just set via |= 0x61
a2->byte_182 |= 0x80; // always true after apply
This means every __global__ function ends up with byte_182 & 0x80 set, which marks it as "combined" in the execution space classification. This is semantically correct: a kernel has both a host-side stub (for launching) and device-side code (for execution).
Parameter Iteration for grid_constant
The final section of the apply handler iterates the function's parameter list to check for parameters that should be annotated __grid_constant__. This check only runs when attr_node->flags bit 0 (a1+11 & 0x01) is set, indicating the attribute application context includes parameter-level processing.
The iteration follows this structure:
// Navigate to function prototype
type_t* proto_type = entity->type_chain; // +144
while (proto_type->kind == 12) // skip cv-qualifiers
proto_type = proto_type->referenced; // +144
// Get parameter list head (double dereference)
param_t** param_list = proto_type->prototype->param_head; // proto+152 -> deref
param_t* param = *param_list; // deref again
for (; param != NULL; param = param->next) {
// Navigate to unqualified parameter type
type_t* ptype = param[1]; // param->type (offset 8)
while (ptype->kind == 12)
ptype = ptype->referenced;
// sub_7A6B60: checks byte+133 bit 5 (0x20) -- "has __grid_constant__"
bool has_gc = (ptype->byte_133 & 0x20) != 0;
if (!has_gc && dword_126C5C4 == -1) {
// Scope table lookup
int64_t scope = qword_126C5E8 + 784 * dword_126C5E4;
uint8_t scope_flags = scope->byte_6;
uint8_t scope_kind = scope->byte_4;
// Skip if scope has qualifier flags or is a cv-qualified scope
if ((scope_flags & 0x06) == 0 && scope_kind != 12) {
// Re-navigate to unqualified type
type_t* ptype2 = param[1];
while (ptype2->kind == 12)
ptype2 = ptype2->referenced;
// Check for default initializer
if (ptype2->qword_120 == 0)
emit_error(3669, &saved_source_loc);
}
}
}
The scope table lookup uses a 784-byte scope structure (at qword_126C5E8 indexed by dword_126C5E4) to determine whether the current context is device-side. The dword_126C5C4 == -1 check verifies we are in device compilation mode. This entire parameter iteration is a device-side warning mechanism: it alerts developers when a kernel parameter lacks a default initializer in a context where __grid_constant__ would be appropriate.
Post-Declaration Validation (sub_6BC890)
After all attributes on a declaration are applied, nv_validate_cuda_attributes (sub_6BC890, 161 lines) performs cross-attribute consistency checks. For __global__ functions, this function enforces:
Rvalue Reference Parameters (3702)
// Walk parameter list
type_t* ret = entity->type_chain;
while (ret->kind == 12)
ret = ret->referenced;
param_t* param = **((param_t***)ret + 19); // proto -> param list
while (param) {
if (param->byte_32 & 0x02) // rvalue reference flag
emit_error(3702, source_loc);
param = param->next;
}
This check scans all parameters for the rvalue reference flag (bit 1 at parameter node offset +32). Kernel functions cannot accept rvalue references because kernel launch involves copying arguments through the CUDA runtime, which does not support move semantics across the host-device boundary.
nv_register_params Conflict (3661)
if (entity->byte_183 & 0x08) { // __nv_register_params__ set
if (entity->byte_182 & 0x40)
emit_error(3661, ..., "__global__");
else if ((entity->byte_182 & 0x30) == 0x20)
emit_error(3661, ..., "__host__");
}
The __nv_register_params__ attribute (bit 3 of byte+183) is incompatible with __global__ because kernel parameter passing uses a fixed ABI that cannot be overridden.
Launch Configuration Without global (3534)
launch_config_t* lc = entity->launch_config; // +256
if (lc && !(entity->byte_182 & 0x40)) {
if (lc->maxThreadsPerBlock || lc->minBlocksPerMultiprocessor)
emit_error(3534, ..., "__launch_bounds__");
}
The __launch_bounds__, __cluster_dims__, and __block_size__ attributes require __global__. If a non-kernel function has any of these, error 3534 fires.
Cluster Dimension Product Check (3707)
if (lc->cluster_dim_x > 0 && lc->maxBlocksPerCluster > 0) {
uint64_t product = lc->cluster_dim_x * lc->cluster_dim_y * lc->cluster_dim_z;
if (lc->maxBlocksPerCluster < product)
emit_error(3707, ...);
}
launch_bounds and maxnreg Conflict (3719)
if (lc->maxThreadsPerBlock && lc->maxnreg >= 0)
emit_error(3719, ..., "__launch_bounds__ and __maxnreg__");
These two attributes provide contradictory register pressure hints and cannot coexist.
Missing launch_bounds Warning (3695)
if ((entity->byte_182 & 0x40) &&
(!lc || (!lc->maxThreadsPerBlock && !lc->minBlocksPerMultiprocessor)))
emit_warning(3695);
Severity 4 (advisory). Encourages developers to annotate kernels with __launch_bounds__ for optimal register allocation.
Execution Space Conflict Matrix
When __global__ is applied to a function that already has an execution space annotation, the handler checks for conflicts using two conditions:
// Condition 1: already __device__ only (without relaxed mode)
if (!dword_106BFF0 && (byte_182 & 0x60) == 0x20)
error(3481);
// Condition 2: already __host__ explicit
if (byte_182 & 0x10)
error(3481);
Current byte_182 | Applying __global__ | (byte & 0x60) == 0x20 | byte & 0x10 | Result |
|---|---|---|---|---|
0x00 (none) | |= 0x61 -> 0x61 | false | false | accepted |
0x23 (__device__) | true | false | error 3481 (unless relaxed) | |
0x15 (__host__) | false | true | error 3481 | |
0x37 (__host__ __device__) | false | true | error 3481 | |
0x61 (__global__) | true | false | error 3481 (unless relaxed) -- idempotent bitmask |
In relaxed mode (dword_106BFF0 != 0), the first condition is suppressed, allowing __device__ + __global__ combinations. The second condition (explicit __host__) is never relaxed.
Helper Functions
| Address | Identity | Lines | Purpose |
|---|---|---|---|
sub_6BC6B0 | get_entity_display_name | 49 | Formats entity name for diagnostic messages. Handles demangling, strips leading ::. |
sub_7A68F0 | skip_typedefs | 19 | Follows type chain through kind==12 wrappers while byte_161 & 0x7F == 0. |
sub_7A6E90 | is_void_type | 16 | Follows type chain through kind==12, returns kind == 1. |
sub_7A6B60 | has_grid_constant_flag | 9 | Follows type chain through kind==12, returns byte_133 & 0x20. |
sub_4F7510 | emit_error_with_names | 66 | Emits error with two string arguments (attribute name + entity name). |
sub_4F8DB0 | emit_warning_with_name | 38 | Emits warning (severity 5) with one string argument. |
sub_4F8200 | emit_error_basic | 10 | Emits error with severity + code + source location. |
sub_4F81B0 | emit_error_minimal | 10 | Emits error (severity 8) with code + source location. |
sub_4F8490 | emit_error_with_extra | 38 | Emits error with one supplementary argument. |
Additional global Constraints (Outside Apply Handler)
Beyond the apply handler and post-validation, several other subsystems enforce __global__-specific rules. These checks occur during template instantiation, lambda processing, and declaration merging:
Template Argument Type Restrictions
CUDA restricts which types can appear as template arguments in __global__ function template instantiations:
- Host-local types (defined inside a
__host__function) cannot be used - Private/protected class members cannot be used (unless the class is local to a
__device__/__global__function) - Unnamed types cannot be used (unless local to a
__device__/__global__function) - Lambda closure types cannot be used (unless the lambda is defined in a
__device__/__global__function, or is an extended lambda with--extended-lambda) - Texture/surface variables cannot be used as non-type template arguments
- Private/protected template template arguments from class scope cannot be used
Static Global Template Stub
In whole-program compilation mode (-rdc=false) with -static-global-template-stub=true:
- Extern
__global__function templates are not supported __global__function template instantiations must have definitions in the current TU
Device-Side Restrictions
Functions marked __global__ (or __device__) are subject to additional restrictions during semantic analysis:
address of labelextension is not supported- ASM operands may specify only one constraint letter
- Certain ASM constraint letters are forbidden
- Texture/surface variables cannot have their address taken or be indirected
- Anonymous union member variables at global/namespace scope cannot be directly accessed
- Function-scope static variables require a memory space specifier
- Dynamic initialization of function-scope static variables is not supported
Function Map
| Address | Identity | Lines | Source File |
|---|---|---|---|
sub_40E1F0 | apply_nv_global_attr (variant 1) | 89 | attribute.c |
sub_40E7F0 | apply_nv_global_attr (variant 2) | 86 | attribute.c |
sub_6BC890 | nv_validate_cuda_attributes | 161 | nv_transforms.c |
sub_6BC6B0 | get_entity_display_name | 49 | nv_transforms.c |
sub_7A68F0 | skip_typedefs | 19 | types.c |
sub_7A6E90 | is_void_type | 16 | types.c |
sub_7A6B60 | has_grid_constant_flag | 9 | types.c |
sub_4F7510 | emit_error_with_names | 66 | error.c |
sub_4F8DB0 | emit_warning_with_name | 38 | error.c |
sub_4F8200 | emit_error_basic | 10 | error.c |
sub_4F81B0 | emit_error_minimal | 10 | error.c |
sub_4F8490 | emit_error_with_extra | 38 | error.c |
sub_413240 | apply_one_attribute (dispatch) | 585 | attribute.c |
Global Variables
| Global | Address | Purpose |
|---|---|---|
dword_106BFF0 | 0x106BFF0 | Relaxed mode flag. When set, suppresses __device__ + __global__ conflict (3481). |
qword_126EB70 | 0x126EB70 | Pointer to the entity node for main(). Compared during 3538 check. |
dword_126C5C4 | 0x126C5C4 | Scope index sentinel (-1 = device compilation mode). Guards 3669 parameter check. |
dword_126C5E4 | 0x126C5E4 | Current scope table index. |
qword_126C5E8 | 0x126C5E8 | Scope table base pointer. Each entry is 784 bytes. |
Cross-References
- Execution Spaces -- bitfield layout, conflict matrix, virtual override checking
- Attribute System Overview -- dispatch table, attribute node structure, application pipeline
- grid_constant -- the parameter attribute that interacts with the 3669 check
- Launch Configuration Attributes --
__launch_bounds__,__cluster_dims__,__block_size__(post-validation errors 3534, 3707, 3715, 3719, 3695) - Entity Node Layout -- full byte map with all CUDA fields
- Kernel Stubs -- host-side stub generation triggered by
byte_182 & 0x40 - CUDA Template Restrictions -- template argument type restrictions for
__global__instantiations - Diagnostics Overview -- error emission functions and severity levels