Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Virtual Override Execution Space Matrix

When a derived class overrides a base class virtual function in CUDA, the execution spaces of both functions must be compatible. A __device__ virtual cannot be overridden by a __host__ function, a __host__ virtual cannot be overridden by a __device__ function, and so on. cudafe++ enforces these rules inside record_virtual_function_override (sub_432280, 437 lines, class_decl.c), which runs each time the EDG front-end registers a virtual override during class body scanning. The function performs three tasks: (1) propagate the base class's execution space obligations onto the derived function, (2) detect illegal mismatches and emit one of six dedicated error messages (3542--3547), and (3) fall through to standard EDG override recording (covariant returns, [[nodiscard]], override/final, requires-clause checks).

This page documents the override checking logic at reimplementation-grade depth: reconstructed pseudocode from the decompiled binary, a complete compatibility matrix, the six error messages with their diagnostic tags, and the relaxed-mode flag that softens certain checks.

Key Facts

PropertyValue
Binary functionsub_432280 (record_virtual_function_override, 437 lines)
Source fileclass_decl.c
Parametersa1=derivation_info, a2=overriding_sym, a3=overridden_sym, a4=base_class_info, a5=covariant_return_adjustment
Entity field readbyte +182 (execution space bitfield) on both overridden and overriding entities
Classification maskbyte & 0x30 -- two-bit extraction: 0x00=implicit host, 0x10=explicit host, 0x20=device, 0x30=HD
Propagation bits0x10 (host_explicit), 0x20 (device_annotation)
Attribute lookupsub_5CEE70 with kind 87 (__device__) and 86 (__host__)
Error emissionsub_4F4F10 with severity 8 (hard error)
Relaxed mode flagdword_106BFF0 (relaxed_attribute_mode)
Implicitly-HD testbyte +177 & 0x10 on entity -- constexpr / __forceinline__ bypass
Override-involved markbyte +176 |= 0x02 on overriding entity
Assertion guardnv_is_device_only_routine from nv_transforms.h:367

Why Virtual Functions Need Execution Space Checks

Standard C++ imposes no concept of execution space on virtual functions. CUDA introduces three execution spaces (__host__, __device__, __host__ __device__) and one launch-only space (__global__). When a virtual function in a base class is declared with one execution space, every override in every derived class must be callable in the same space. If the base declares a __device__ virtual, calling it through a base pointer on the GPU must dispatch to the derived override -- which is only possible if the override is also __device__ (or __host__ __device__).

__global__ functions cannot be virtual at all (error 3505/3506 prevents this at the attribute application stage), so the override matrix only covers three spaces: __host__, __device__, and __host__ __device__. An unannotated function counts as implicit __host__.

Function Entry: Mark and Resolve Entities

The function begins by resolving the actual entity nodes from the symbol table entries:

// sub_432280 entry (lines 60-69 of decompiled output)
//
// a2 = overriding_sym (symbol table entry for the derived-class function)
// a3 = overridden_sym (symbol table entry for the base-class function)
//
// v10 = entity of overridden function:  *(overridden_sym + 88)
// v11 = entity of overriding function:  *(*(overriding_sym) + 88)
//
// The entity node at offset +88 is the "associated routine entity" --
// the actual function representation containing execution space bits.

int64_t overridden_entity = *(int64_t*)(overridden_sym + 88);   // v10
int64_t overriding_entity = *(int64_t*)(*(int64_t*)overriding_sym + 88);  // v11

// Mark the overriding entity as "involved in an override"
*(uint8_t*)(overriding_entity + 176) |= 0x02;

The +176 |= 0x02 flag marks the derived function as "override-involved." This flag is consumed downstream by the exception specification resolver and other class completion logic.

Phase 1: Implicitly-HD Fast Path and Execution Space Propagation

The first branch tests byte +177 & 0x10 on the overriding entity. This bit indicates the function is implicitly __host__ __device__ -- set for constexpr functions (implicitly HD since CUDA 7.5) and __forceinline__ functions. When this bit is set, the override is exempt from mismatch checking, but execution space propagation still occurs.

// Phase 1: implicitly-HD check and propagation (lines 70-94)
void check_and_propagate(int64_t overriding_entity, int64_t overridden_entity) {

    if (overriding_entity->byte_177 & 0x10) {
        // Overriding function is implicitly HD (constexpr / __forceinline__)
        //
        // Skip mismatch errors entirely -- an implicitly-HD function is
        // compatible with any base execution space.  But we must still
        // propagate the base's space obligations onto the derived entity
        // so that downstream passes (IL marking, code generation) know
        // what to emit.

        if (!(overridden_entity->byte_177 & 0x10)) {
            // Overridden function is NOT implicitly HD -- it has an explicit
            // execution space.  We need to propagate that space.
            //
            // Guard: skip propagation for constexpr lambdas with internal
            // linkage but no override flag (a degenerate case).
            if ((overridden_entity->qword_184 & 0x800001000000) == 0x800000000000
                && !(overridden_entity->byte_176 & 0x02)) {
                // Degenerate case -- skip propagation
                goto done_nvidia_checks;
            }

            uint8_t base_es = overridden_entity->byte_182;

            // Propagate __host__ obligation:
            // If the base is NOT device-only (i.e., base is host, HD, or
            // unannotated), the derived function inherits the host obligation.
            if ((base_es & 0x30) != 0x20) {
                overriding_entity->byte_182 |= 0x10;   // set host_explicit
            }

            // Propagate __device__ obligation:
            // If the base has the device_annotation bit set, the derived
            // function inherits the device obligation.
            if (base_es & 0x20) {
                overriding_entity->byte_182 |= 0x20;   // set device_annotation
            }
        }

        goto done_nvidia_checks;
    }

    // ... Phase 2 continues below
}

Why Propagation Matters

Propagation ensures that a derived class inherits its base class's execution space obligations even when the derived function is implicitly HD. Consider:

struct Base {
    __device__ virtual void f();        // byte_182 & 0x30 == 0x20
};

struct Derived : Base {
    constexpr void f() override;        // byte_177 & 0x10 set (implicitly HD)
};

Without propagation, Derived::f would have byte_182 == 0x00 (no explicit annotation). The device-side IL pass would skip it, and a virtual call base_ptr->f() on the GPU would dispatch to a function never compiled for the device. Propagation sets byte_182 |= 0x20 (device_annotation), ensuring the function is included in device IL.

The propagation follows strict rules:

Base byte_182 & 0x30Propagated to overriding entity
0x00 (implicit host)|= 0x10 (host_explicit)
0x10 (explicit host)|= 0x10 (host_explicit)
0x20 (device)|= 0x20 (device_annotation)
0x30 (HD)|= 0x10 then |= 0x20 (both)

Phase 2: Explicit Annotation Mismatch Detection

When the overriding function is NOT implicitly HD (byte_177 & 0x10 == 0), the checker must verify that the derived function's explicit execution space matches the base. It does this by querying the attribute lists on the overriding symbol for __device__ (kind 87) and __host__ (kind 86) attributes using sub_5CEE70.

The overriding symbol has two attribute list pointers: offset +184 (primary attributes) and offset +200 (secondary/redeclaration attributes). Both are checked for each attribute kind.

Reconstructed Pseudocode

// Phase 2: explicit annotation mismatch detection (lines 96-188)
//
// At this point, overriding_entity->byte_177 & 0x10 == 0 (not implicitly HD).
// We must determine what execution space annotations the overriding function
// has, and compare against the overridden function's execution space.

void check_override_mismatch(
    int64_t overriding_sym,       // a2
    int64_t overriding_entity,    // v11
    int64_t overridden_entity,    // v10
    int64_t overridden_sym_list,  // v6 = a2+48 (location info for diagnostics)
    int64_t overridden_sym_arg,   // v8 = a3 (for diagnostics)
    int64_t base_sym              // v9 = *a2 (for diagnostics)
) {
    // -- Assertion: overridden entity must exist --
    if (!overridden_entity) {
        internal_error("nv_transforms.h", 367, "nv_is_device_only_routine");
    }

    // -- Extract overridden execution space --
    uint8_t base_es    = overridden_entity->byte_182;
    uint8_t mask_30    = base_es & 0x30;     // 0x00/0x10/0x20/0x30
    bool    base_no_device_annotation = (base_es & 0x20) == 0;  // v56
    bool    base_is_hd = (mask_30 == 0x30);  // v58
    uint8_t base_device_bit = base_es & 0x20;  // v55

    // -- Check overriding function for __device__ attribute (kind 87) --
    bool has_device_attr = find_attribute(87, overriding_sym->attr_list_184)
                        || find_attribute(87, overriding_sym->attr_list_200);

    if (has_device_attr) {
        // Overriding function has __device__.
        // Now check if it also has __host__ (kind 86) -- making it HD.

        bool has_host_attr = find_attribute(86, overriding_sym->attr_list_184)
                          || find_attribute(86, overriding_sym->attr_list_200);

        if (has_host_attr) {
            // --- Overriding is __host__ __device__ ---
            if (base_device_bit) {
                // Base has device_annotation (bit 5 set).
                // If base is device-only (mask_30 == 0x20), error 3544.
                if (mask_30 == 0x20) {
                    emit_error(8, 3544, location, overridden, base);
                }
                // If base is HD (mask_30 == 0x30), it's legal -- no error.
                // If base has device_bit but mask_30 != 0x20 and != 0x30,
                // that can't happen (bit 5 set implies mask_30 is 0x20 or 0x30).
            } else {
                // Base has no device_annotation -- base is host or implicit host.
                emit_error(8, 3543, location, overridden, base);
            }
        } else {
            // --- Overriding is __device__ only ---
            // Fall through to LABEL_83 logic.
            goto device_only_check;
        }
    } else {
        // Overriding function has NO __device__ attribute.
        // It's either explicit __host__ or implicit host (no annotation).

        if (dword_106BFF0) {
            // Relaxed mode: check if overriding has explicit __host__.
            bool has_host_attr = find_attribute(86, overriding_sym->attr_list_184)
                              || find_attribute(86, overriding_sym->attr_list_200);

            if (!has_host_attr) {
                // No explicit __host__ either -- implicit host.
                // In relaxed mode, an implicit-host override is treated like
                // a device-only override for certain base configurations.
                // Jump into the device-only path with modified conditions.
                goto device_only_check_relaxed;
            }
            // Explicit __host__ in relaxed mode: fall through to normal checks.
        }

        // --- Overriding is __host__ (explicit or implicit) ---
        if (mask_30 == 0x20) {
            // Base is __device__ only
            emit_error(8, 3545, location, overridden, base);
        } else if (mask_30 == 0x30) {
            // Base is __host__ __device__
            emit_error(8, 3546, location, overridden, base);
        }
        // else: base is host/implicit-host, same space -- no error.
        goto done_nvidia_checks;
    }

device_only_check:
    // Overriding is __device__ only (has __device__ but no __host__).
    // v39 = base_no_device_annotation (v56), v40 = 1 (always set entering here).
    {
        bool should_error = base_no_device_annotation;  // v39
        bool relaxed_extra = true;                      // v40

device_only_check_relaxed:
        // (relaxed mode entry: v39 = 0, a1 = v56 = base_no_device_annotation)

        if (dword_106BFF0) {
            // Relaxed mode: the error fires unconditionally when
            // base has no device annotation (base is host/implicit-host).
            // In strict mode, same condition applies.
            should_error = base_no_device_annotation;
            relaxed_extra = true;   // always true in relaxed
        }

        if (should_error) {
            // Base is host-only (no device_annotation) and override is device-only.
            emit_error(8, 3542, location, overridden, base);
        } else if (base_is_hd && relaxed_extra) {
            // Base is HD, override is device-only.
            // v40 (relaxed_extra) is always 1 from Entry A, so this
            // fires in both strict and relaxed modes for D-overrides-HD.
            emit_error(8, 3547, location, overridden, base);
        }
        // else: base is device-only too -- compatible, no error.
    }

done_nvidia_checks:
    // Continue to standard EDG override recording...
}

Decision Tree (Simplified)

overriding byte_177 & 0x10?
  YES (implicitly HD) --> propagate, skip mismatch check
  NO  --> extract base_es = overridden byte_182
          has __device__ attr on overriding?
            YES --> also has __host__ attr?
              YES (override=HD):
                base has device_annotation?
                  YES and mask_30==0x20 --> ERROR 3544
                  NO                    --> ERROR 3543
              NO (override=D-only):
                base has NO device_annotation? --> ERROR 3542
                base is HD?                    --> ERROR 3547
            NO (override=H or implicit-H):
              base mask_30==0x20 --> ERROR 3545
              base mask_30==0x30 --> ERROR 3546
              otherwise         --> legal (same space)

The Six Error Messages

Each mismatch produces one of six errors. All are emitted at severity 8 (hard error) and are individually suppressible by their diagnostic tag via --diag_suppress or #pragma nv_diag_suppress.

InternalDisplayDiagnostic TagMessage Template
354220085vfunc_incompat_exec_h_dexecution space mismatch: overridden entity (%n1) is a __host__ function, but overriding entity (%n2) is a __device__ function
354320086vfunc_incompat_exec_h_hdexecution space mismatch: overridden entity (%n1) is a __host__ function, but overriding entity (%n2) is a __host__ __device__ function
354420087vfunc_incompat_exec_d_hdexecution space mismatch: overridden entity (%n1) is a __device__ function, but overriding entity (%n2) is a __host__ __device__ function
354520088vfunc_incompat_exec_d_hexecution space mismatch: overridden entity (%n1) is a __device__ function, but overriding entity (%n2) is a __host__ function
354620089vfunc_incompat_exec_hd_hexecution space mismatch: overridden entity (%n1) is a __host__ __device__ function, but overriding entity (%n2) is a __host__ function
354720090vfunc_incompat_exec_hd_dexecution space mismatch: overridden entity (%n1) is a __host__ __device__ function, but overriding entity (%n2) is a __device__ function

The display number is computed as internal + 16543 (the standard CUDA error renumbering from construct_text_message). The tag naming convention is vfunc_incompat_exec_{overridden}_{overriding}.

The %n1 and %n2 fill-ins resolve to the entity display names of the base and derived functions respectively, including their full qualified names and parameter types.

Suppression Example

# Suppress by tag (preferred)
nvcc --diag_suppress=vfunc_incompat_exec_h_d file.cu

# Suppress by display number
nvcc --diag_suppress=20085 file.cu

# Suppress in source
#pragma nv_diag_suppress vfunc_incompat_exec_h_d

Complete Compatibility Matrix

This table shows every combination of base (overridden) and derived (overriding) execution space. "Implicit H" means the function has no execution space annotation (byte_182 & 0x30 == 0x00). Since implicit host and explicit __host__ are treated identically for override purposes (both lack the device_annotation bit and have mask_30 != 0x20), they share the same row/column behavior.

__global__ is excluded because __global__ functions cannot be virtual -- the attribute handler rejects __global__ on virtual functions before override checking ever runs.

The matrix is the same in both strict mode (dword_106BFF0 == 0) and relaxed mode (dword_106BFF0 == 1). The relaxed flag changes the code path used to reach the error decision but produces the same result for all input combinations.

Derived: H / implicit HDerived: DDerived: HDDerived: implicitly HD
Base: H / implicit Hlegalerror 3542error 3543legal + propagate |= 0x10
Base: Derror 3545legalerror 3544legal + propagate |= 0x20
Base: HDerror 3546error 3547legallegal + propagate |= 0x10, |= 0x20

Reading the matrix: each row is the base class virtual function's space; each column is the derived class override's space. "Legal" means no error is emitted and the override is recorded normally. "Legal + propagate" means the override is accepted AND the base's execution space bits are OR'd into the derived entity's byte_182.

The diagonal (same space in base and derived) is always legal. The last column (implicitly HD) is always legal because an implicitly HD function is compatible with every execution space -- the mismatch check is skipped entirely and only propagation runs.

Why Both Modes Produce the Same Matrix

Tracing the LABEL_83 code path with the two entry points reveals that dword_106BFF0 does NOT gate error 3547. In the critical device-only-override path (Entry A), v40 is set to 1 before reaching LABEL_83 regardless of the relaxed flag. The flag only changes the assignment to a1 and v40 via conditional moves (cmovz/cmovnz in the disassembly), but the net effect is identical for all input combinations:

LABEL_83 internals (decompiled, annotated):
  a2 = 3542;                          // tentative error
  if (!dword_106BFF0) a1 = v39;       // strict: a1 = v39
  if (dword_106BFF0) v40 = 1;         // relaxed: force v40 = 1
  // BUT v40 was already 1 from Entry A (line 134)
  if (a1) emit_error(3542);           // base has no device_annotation
  else if (v58 && v40) emit_error(3547);  // base is HD
  else skip;                          // base is D-only (compatible)

Entry A sets v39 = v56, v40 = 1, a1 = v56. In strict mode, a1 is overwritten to v39 (same value). In relaxed mode, a1 stays v56 (same value). Either way, a1 = v56 = (base has no device annotation). The v40 = 1 from Entry A is preserved. The result is identical.

The relaxed flag introduces a second entry point (Entry B) for overriding functions with no explicit annotation. In relaxed mode, such functions are routed through LABEL_83 with v39 = 0 and a1 = v56, producing the same device-only check logic. In strict mode, the same functions take the direct H/implicit-H path and produce errors 3545/3546 for device/HD bases. Both paths reach the same conclusions.

Relaxed Mode: The Unannotated Override Path

When dword_106BFF0 == 1 and the overriding function has no __device__ attribute, the checker takes an additional step before falling through to the H/implicit-H path. It queries the overriding symbol for explicit __host__ (kind 86). If __host__ IS found, the function is confirmed as explicit host and errors 3545/3546 apply normally. If __host__ is NOT found (truly unannotated), the function is reclassified through the device-only check path (LABEL_83). This reclassification does not change the error outcome -- an unannotated function overriding a host base still sees no error (both are host-space), and an unannotated function overriding a device or HD base still produces the appropriate error.

Propagation Details

When the overriding function is implicitly HD (byte_177 & 0x10), execution space is propagated from the base to the derived entity by OR-ing bits into byte_182:

// Propagation (direct from decompiled sub_432280, lines 77-91)
uint8_t base_es = overridden_entity->byte_182;

// If base is NOT device-only, derived inherits host obligation
if ((base_es & 0x30) != 0x20) {
    overriding_entity->byte_182 |= 0x10;   // host_explicit bit
    base_es = overridden_entity->byte_182;  // re-read (compiler artifact)
}

// If base has device_annotation, derived inherits device obligation
if (base_es & 0x20) {
    overriding_entity->byte_182 |= 0x20;   // device_annotation bit
}

The re-read of overridden_entity->byte_182 after setting 0x10 on the overriding entity is a compiler artifact (the decompiler shows it reading back from v10+182 into v22, but v10 is the overridden entity, so the value hasn't changed). The OR operations are on the overriding entity only.

Propagation Matrix

Base space (byte_182 & 0x30)Bits OR'd into overriding byte_182Net effect on overriding entity
0x00 (implicit H)|= 0x10Becomes explicit host (0x10)
0x10 (explicit H)|= 0x10Becomes explicit host (0x10)
0x20 (D only)|= 0x20Becomes device-annotated (0x20)
0x30 (HD)|= 0x10, then |= 0x20Becomes HD (0x30)

After propagation, the overriding entity's byte_182 accurately reflects the execution space obligations inherited from its base class. Downstream passes (device/host separation, IL marking, code generation) use this byte to determine whether the function needs device-side compilation, host-side compilation, or both.

Relaxed Mode (dword_106BFF0)

The global flag dword_106BFF0 (relaxed_attribute_mode, default 1 per CLI defaults) controls permissive handling of execution space annotations across the compiler. Its primary effects are on attribute application (allowing __device__ + __global__ coexistence) and cross-space call validation. For virtual override checking, its effect is narrower:

  1. Unannotated override reclassification. In relaxed mode, when the overriding function has neither __device__ nor __host__ attributes explicitly, the checker additionally queries the overriding symbol for __host__ (kind 86). If __host__ is NOT found, the checker treats the unannotated function as potentially device-compatible and routes through the device-only check path (LABEL_83). This can produce error 3542 (D overrides H) for an implicit-host function, which would otherwise only see errors 3545/3546.

  2. No error suppression for overrides. Unlike attribute application where relaxed mode suppresses error 3481, relaxed mode does NOT suppress any of the six override errors. All six fire at severity 8 in both modes. The flag dword_106BFF0 modulates the code path taken to reach the error decision, not the severity or suppression of the error itself.

Additional Override Checks (Non-CUDA)

After the CUDA execution space checks, sub_432280 continues with standard EDG override validation:

ErrorConditionMeaning
1788Base has [[nodiscard]], derived does notMissing [[nodiscard]] on override
1789Derived has [[nodiscard]], base does notExtraneous [[nodiscard]] on override
1850Overriding a final virtual functionOverride of final function
2935Derived has requires-clause, base does notRequires-clause mismatch
2936Base has requires-clause, derived does notRequires-clause mismatch

These are standard C++ checks unrelated to CUDA execution spaces.

Example: Override Interactions

// Example 1: Legal same-space override
struct Base {
    __device__ virtual void f();
};
struct Derived : Base {
    __device__ void f() override;     // Legal: D overrides D
};

// Example 2: Error 3542 -- D overrides H
struct Base2 {
    virtual void f();                 // Implicit __host__
};
struct Derived2 : Base2 {
    __device__ void f() override;     // ERROR 3542 (20085)
};
// error #20085-D: execution space mismatch: overridden entity (Base2::f)
//   is a __host__ function, but overriding entity (Derived2::f)
//   is a __device__ function

// Example 3: Error 3546 -- H overrides HD
struct Base3 {
    __host__ __device__ virtual void f();
};
struct Derived3 : Base3 {
    void f() override;                // ERROR 3546 (20089)
};
// error #20089-D: execution space mismatch: overridden entity (Base3::f)
//   is a __host__ __device__ function, but overriding entity (Derived3::f)
//   is a __host__ function

// Example 4: Legal constexpr override with propagation
struct Base4 {
    __device__ virtual int g();
};
struct Derived4 : Base4 {
    constexpr int g() override;       // Legal: implicitly HD, propagates |= 0x20
};
// Derived4::g now has byte_182 |= 0x20 (device_annotation)
// and is included in device IL compilation.

// Example 5: Error 3547 -- D overrides HD
struct Base5 {
    __host__ __device__ virtual void h();
};
struct Derived5 : Base5 {
    __device__ void h() override;     // ERROR 3547 (20090)
};

Function Map

AddressIdentityLinesSource
sub_432280record_virtual_function_override437class_decl.c
sub_5CEE70find_attribute (attribute list lookup by kind)~30attribute.c
sub_4F4F10emit_diag_with_entity_pair (severity, error, loc, base, derived)~100error.c
sub_4F2930internal_error (assertion failure)~20error.c
sub_41A6E0dump_override_entry (debug trace helper)~40class_decl.c
sub_41D010add_to_override_list~20class_decl.c
sub_5E20D0allocate_override_entry (40-byte node)~15mem.c
sub_432130resolve_indeterminate_exception_specification~60class_decl.c

Override Entry Structure

Each recorded override is stored as a 40-byte linked list node:

Override entry (40 bytes):
  +0x00 (0):   next pointer
  +0x08 (8):   base_class_symbol (entity in base class vtable)
  +0x10 (16):  derived_class_entity (overriding function entity)
  +0x18 (24):  flags (0 initially, set during processing)
  +0x20 (32):  covariant_return_adjustment (pointer or NULL)

The override list is managed via:

  • qword_E7FE98: list head (most recent entry)
  • qword_E7FEA0: free list head (recycled 40-byte entries)
  • qword_E7FE90: allocation counter

When debug tracing is enabled (dword_126EFCC > 3), the function prints "newly created: ", "existing entry: ", "after modification: ", and "removing: " to stderr via fwrite, followed by calls to sub_41A6E0 to dump the entry contents.

Cross-References