Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

EDG 6.6 Overview

cudafe++ is built on top of Edison Design Group's (EDG) commercial C++ frontend, version 6.6. EDG provides the complete C++ language implementation -- lexer, preprocessor, parser, semantic analysis, type system, template instantiation, overload resolution, constant evaluation, and Itanium ABI name mangling. NVIDIA licenses this frontend and compiles it from source with CUDA-specific modifications injected at three distinct integration levels: a dedicated NVIDIA source file (nv_transforms.c), surgical modifications to EDG source files that call into NVIDIA headers, and a large layer of CUDA property-query leaf functions that permeate every compilation phase.

The build path embedded in the binary is:

/dvs/p4/build/sw/rel/gpgpu/toolkit/r13.0/compiler/drivers/compiler/edg/EDG_6.6/src/

Source Tree

The binary contains debug path references to 52 .c files and 13 .h files. Together these constitute the entire EDG frontend plus NVIDIA's single dedicated source file.

Source Files (.c)

#FilePipeline role
1attribute.cC++11/GNU/CUDA attribute parsing and validation
2class_decl.cClass/struct/union declaration processing, lambda scanning
3cmd_line.cCommand-line argument parsing (276 flags)
4const_ints.cCompile-time integer constant evaluation
5cp_gen_be.cBackend -- .int.c code generation, source sequence walking
6debug.cDebug output and IL dump infrastructure
7decl_inits.cDeclaration initializer processing
8decl_spec.cDeclaration specifier parsing (storage class, type qualifiers)
9declarator.cDeclarator parsing (pointers, arrays, function signatures)
10decls.cGeneral declaration processing
11disambig.cSyntactic disambiguation (expression vs. declaration)
12error.cDiagnostic message formatting and emission (3,795 messages)
13expr.cExpression parsing and semantic analysis
14exprutil.cExpression utility functions (coercion, evaluation)
15extasm.cExtended inline assembly parsing
16fe_init.cFrontend initialization (36 subsystem init routines)
17fe_wrapup.cFrontend finalization (5-pass wrapup sequence)
18float_pt.cFloating-point literal parsing
19floating.cIEEE 754 constant folding (arbitrary precision)
20folding.cGeneral constant folding
21func_def.cFunction definition processing
22host_envir.cHost environment interface (file I/O, exit, signals)
23il.cIL node creation, linking, and management
24il_alloc.cIL arena allocator (region-based, 64KB blocks)
25il_to_str.cIL-to-string conversion for debug display
26il_walk.cIL tree walking with 5 callback functions
27interpret.cConstexpr interpreter (compile-time evaluation engine)
28layout.cStruct/class memory layout computation
29lexical.cLexer / tokenizer (357 token kinds)
30literals.cString and numeric literal processing
31lookup.cName lookup (unqualified, qualified, ADL)
32lower_name.cItanium ABI name mangling
33macro.cPreprocessor macro expansion
34mem_manage.cInternal memory management (arena allocator, tracking)
35modules.cC++20 module support (mostly stubs in CUDA build)
36nv_transforms.cNVIDIA-authored -- CUDA AST transforms, lambda wrappers
37overload.cC++ overload resolution
38pch.cPrecompiled header support
39pragma.cPragma processing (43 pragma kinds)
40preproc.cPreprocessor directives (#include, #ifdef, etc.)
41scope_stk.cScope stack management
42src_seq.cSource sequence (declaration ordering for emission)
43statements.cStatement parsing and semantic analysis
44symbol_ref.cSymbol reference tracking
45symbol_tbl.cSymbol table operations (hash-based lookup)
46sys_predef.cSystem predefinitions (built-in types, macros)
47target.cTarget configuration (data model, ABI)
48templates.cTemplate instantiation, specialization, deduction
49trans_copy.cTranslation unit IL deep copy
50trans_corresp.cCross-TU type correspondence verification (RDC)
51trans_unit.cTranslation unit lifecycle (the main entry point)
52types.cC++ type system (22 type kinds, queries, construction)

Header Files (.h)

#FileContents
1decls.hDeclaration node structure definitions
2float_type.hFloating-point type descriptors
3il.hIL entry kind enums, node structure definitions
4lexical.hToken kind enums, lexer state
5mem_manage.hMemory allocator interface
6modules.hModule system declarations
7nv_transforms.hNVIDIA-authored -- CUDA transform API, called from EDG files
8overload.hOverload resolution structures
9scope_stk.hScope stack interface
10symbol_tbl.hSymbol table interface
11types.hType node structure, type kind enum
12util.hGeneral utility macros and inline functions
13walk_entry.hIL walking callback signatures

Code Breakdown

The binary contains approximately 6,300 identifiable functions in the EDG portion of the code:

CategoryFunctions% of binaryDescription
Attributed to source files~2,200~35%Matched to one of the 52 .c files via assert strings, source path references, or address-range mapping
Unmapped EDG functions~2,900~46%EDG code without source file attribution (inlined, optimized, or from headers)
C++ runtime / ABI~1,200~19%Itanium ABI runtime, exception handling, std:: library, operator new/delete

Top 10 Source Files by Function Count

RankFileFunctionsPrimary responsibility
1expr.c~195Expression parsing, operator semantics, implicit conversions
2il.c~185IL node creation, entry kind dispatch, node linking
3templates.c~172Template instantiation worklist, SFINAE, deduction
4exprutil.c~154Expression coercion, arithmetic conversions, lvalue analysis
5symbol_tbl.c~102Symbol table hash operations, scope chain walking
6overload.c~100Candidate set construction, ICS ranking, best viable function
7class_decl.c~90Class body parsing, member declarations, lambda scanning
8attribute.c~83Attribute parsing, CUDA attribute validation dispatch
9cp_gen_be.c~81Backend emission, .int.c generation, device stub writing
10scope_stk.c~72Scope push/pop, scope kind management, lookup context

Architecture: Classic Frontend Pipeline

EDG implements a textbook multi-pass compiler frontend. cudafe++ drives it in a single-threaded, sequential pipeline from main() at 0x408950:

  source.cu
     |
     v
  +-----------+     lexical.c, macro.c, preproc.c, literals.c
  |  Lexer /  |     357 token kinds, trigraph handling, raw string
  |  Preproc  |     adjustment, __CUDA_ARCH__ macro injection
  +-----------+
     |  token stream
     v
  +-----------+     expr.c, declarator.c, decl_spec.c, statements.c,
  |  Parser   |     class_decl.c, disambig.c, func_def.c, extasm.c
  |           |     Recursive-descent with disambiguation
  +-----------+
     |  parse tree
     v
  +-----------+     overload.c, exprutil.c, lookup.c, templates.c,
  | Semantic  |     types.c, attribute.c, const_ints.c, folding.c
  | Analysis  |     Type checking, overload resolution, template
  |           |     instantiation, constexpr evaluation
  +-----------+
     |  annotated AST
     v
  +-----------+     il.c, il_alloc.c, il_walk.c, scope_stk.c,
  |  IL Build |     symbol_tbl.c, src_seq.c, trans_unit.c
  |           |     Scope-linked graph of all declarations, types,
  |           |     expressions, statements, templates
  +-----------+
     |  IL graph
     v
  +-----------+     fe_wrapup.c, lower_name.c, trans_corresp.c
  |  Wrapup   |     5-pass finalization: dead code marking,
  |           |     name lowering, cross-TU correspondence (RDC)
  +-----------+
     |  finalized IL
     v
  +-----------+     cp_gen_be.c, nv_transforms.c, host_envir.c
  |  Backend  |     Walk source sequence, emit .int.c file,
  | Emission  |     inject CUDA stubs, lambda wrappers, host
  |           |     reference arrays, managed variable boilerplate
  +-----------+
     |
     v
  output.int.c

The process_translation_unit function (sub_7A40A0 in trans_unit.c) is the main entry point for compilation. It allocates a 424-byte TU descriptor, opens the source file, and orchestrates the parse-to-IL sequence. For the main compilation path, it calls:

  1. sub_586240 -- parse the translation unit (drives lexer + parser)
  2. sub_4E8A60 -- standard compilation finalization (IL completion)
  3. sub_588F90 -- fe_wrapup (5-pass IL finalization)
  4. sub_489000 -- backend entry (.int.c emission, "Back end time")

NVIDIA Modifications

NVIDIA's CUDA integration is organized in three layers, from most isolated to most pervasive.

Level 1: NVIDIA-Authored Source (nv_transforms.c + nv_transforms.h)

A single dedicated NVIDIA source file at address range 0x6BAE70--0x6BE4A0, containing approximately 34 functions in ~14KB of code. This file implements all CUDA-specific AST transformations:

FunctionAddressPurpose
nv_init_transforms0x6BAE70Zero all NVIDIA transform state at startup
emit_device_lambda_wrapper0x6BB790Generate __nv_dl_wrapper_t<Tag, F1..FN> partial specialization
emit_hdl_wrapper (non-mutable)0x6BBB10Generate __nv_hdl_wrapper_t<false, ...> type-erased wrapper
emit_hdl_wrapper (mutable)0x6BBEE0Same as above but operator() is non-const
emit_array_capture_helpers0x6BC290Generate __nv_lambda_array_wrapper for 2D-8D arrays
nv_validate_cuda_attributes0x6BC890Validate __launch_bounds__, __cluster_dims__, __maxnreg__
nv_reset_capture_bitmasks0x6BCBC0Zero device/host-device capture bitmasks per TU
nv_record_capture_count0x6BCBF0Set bit N in capture bitmap for wrapper generation
nv_emit_lambda_preamble0x6BCC20Master emitter: inject all __nv_* templates into compilation
nv_find_parent_lambda_function0x6BCDD0Walk scope chain for enclosing device/global function
nv_emit_host_reference_array0x6BCF80Generate .nvHRKE/.nvHRDI/etc. ELF section arrays
nv_get_full_nv_static_prefix0x6BE300Build scoped name + register entity in host ref arrays

The companion header nv_transforms.h declares the API surface that EDG source files call into. This is the primary NVIDIA integration point -- EDG code never calls nv_transforms.c functions directly; it calls through the header's declarations.

Key data structures managed by nv_transforms.c:

GlobalSizePurpose
unk_1286980128 bytes (1024 bits)Device lambda capture-count bitmap
unk_1286900128 bytes (1024 bits)Host-device lambda capture-count bitmap
qword_12868F0pointerEntity-to-closure ID hash table
qword_1286A00pointerCached anonymous namespace name (_GLOBAL__N_<file>)
qword_1286760pointerCached static name prefix string
unk_1286780--unk_12868C06 listsHost reference array symbol lists (one per section type)
dword_126E2704 bytesC++17 noexcept-in-type-system flag

Level 2: NVIDIA-Modified EDG Files

Three EDG source files contain direct calls into nv_transforms.h functions, making them the "NVIDIA-aware" EDG files:

cp_gen_be.c -- The backend code generator. When it encounters a type named __nv_lambda_preheader_injection during source sequence walking, it calls nv_emit_lambda_preamble (sub_6BCC20) to inject the entire __nv_* template library. It also calls NVIDIA functions for host reference array emission, managed variable boilerplate, and device stub generation.

class_decl.c -- The class/struct declaration processor. The scan_lambda function (sub_447930, 2113 lines) detects __host__/__device__ annotations on lambda expressions, validates CUDA-specific constraints (35+ error codes in range 3592--3690), and records capture counts in the bitmaps via nv_record_capture_count.

statements.c -- The statement parser. Calls NVIDIA transform functions for statement-level CUDA validation, such as checking that __syncthreads() is not called in divergent control flow within __global__ functions.

Level 3: CUDA Property Query Layer

The most pervasive integration layer consists of 104 small leaf functions clustered at addresses 0x7A6000--0x7AA000 (within types.c). These are type-system query functions that answer questions like "is this type a __device__ pointer?", "does this class have __shared__ storage?", "is this a kernel function type?".

Each follows a canonical pattern:

bool is_<property>_type(type_node *t) {
    while (t->kind == 12)       // 12 = tk_typedef
        t = t->referenced_type; // strip typedef layers
    return <check on underlying type>;
}

These 104 accessors account for 3,648 total call sites across the binary. The top callers by call-site count:

AddressCallersIdentityReturns
0x7A8A30407is_class_or_struct_or_union_typekind in {9, 10, 11}
0x7A9910389type_pointed_toptr->referenced_type (kind == 6)
0x7A9E70319get_cv_qualifiersaccumulated cv-qual bits (& 0x7F)
0x7A6B60299is_dependent_typebit 5 of byte +133
0x7A7630243is_object_pointer_typekind == 6 && !(bit 0 of +152)
0x7A8370221is_array_typekind == 8
0x7A7B30199is_member_pointer_or_refkind == 6 && (bit 0 of +152)
0x7A6AC0185is_reference_typekind == 7
0x7A8DC0169is_function_typekind == 14
0x7A6E90140is_void_typekind == 1

CUDA integration is pervasive because these tiny accessors are called from every phase of compilation -- the parser checks execution space during declaration, semantic analysis validates cross-space calls, the type system queries CUDA qualifiers during overload resolution, and the backend reads them during IL emission. There is no isolated "CUDA layer"; the CUDA awareness is distributed across the entire frontend through these leaf functions.

Type Kind Constants

The type query functions operate on a type_node structure (176 bytes, IL entry kind 6). The kind field at offset +132 encodes:

kindNameDescription
0tk_noneNull/invalid
1tk_voidvoid
2tk_integerAll integer types including bool, char, enums
3tk_floatfloat
4tk_doubledouble
5tk_long_doublelong double
6tk_pointerPointer types (object and member)
7tk_referenceLvalue reference (T&)
8tk_arrayArray types (T[], T[N])
9tk_structstruct
10tk_classclass
11tk_unionunion
12tk_typedefTypedef alias (stripped by all query functions)
13tk_pointer_to_memberPointer-to-member (T C::*)
14tk_functionFunction type
15tk_bitfieldBit-field
16tk_pack_expansionParameter pack expansion
17tk_pack_expansionAlternate pack expansion form
18tk_autoauto / decltype(auto) placeholder
19tk_rvalue_referenceRvalue reference (T&&)
20tk_nullptr_tstd::nullptr_t

Memory Management

EDG uses a custom region-based arena allocator implemented in mem_manage.c (address range 0x6B5E40--0x6BA230). Key characteristics:

  • Block size: 64KB (0x10000) per block
  • Region model: Multiple numbered regions (file-scope = region 1, per-function = region N)
  • Free list recycling: Freed blocks go to qword_1280730 for reuse before new allocation
  • Trim threshold: Blocks with more than 1,887 unused bytes are split; remainder goes to free list
  • Tracking: All allocations recorded for watermark monitoring (qword_1280718 = total, qword_1280710 = peak)
  • Dual mode: Malloc-based (mode 0) or mmap-based (mode 1), selected by dword_1280728 from CLI flag

Block structure (48+ bytes header per 64KB block):

OffsetTypeField
+0void*Next pointer (block chain)
+8void*Current allocation pointer
+16void*High-water mark within block
+24void*End-of-block pointer
+32int64Block total size (0 if sub-block)
+40byteTrimmed flag
+48--Start of usable data

The free_fe function (sub_6BA230, 533 lines) implements a hash-table-based deduplicating allocator for front-end object deallocation, using open addressing with linear probing.

C++20 Modules (Stubs)

The modules.c file (address range 0x7C0C60--0x7C2560) contains approximately 20 functions implementing the C++20 module import/export interface. CUDA does not support C++20 modules, so most functions are stubs that return 0:

  • has_pending_template_definition_from_module -- returns 0
  • has_pending_template_specializations_from_module -- returns 0
  • Seven additional stub functions at 0x7C2350--0x7C2410 -- all return 0

The non-stub functions handle the binary module interface file format (magic header {0x9A, 0x13, 0x37, 0x7D}) and basic module name matching, likely preserved from the EDG baseline for future CUDA module support.

Cross-TU Correspondence (RDC Mode)

When compiling with Relocatable Device Code (--rdc), multiple translation units are processed sequentially. The trans_corresp.c file (address range 0x7A00D0--0x7A38A0) implements structural equivalence checking between types from different TUs:

  • verify_class_type_correspondence (sub_7A00D0, 703 lines) -- Deep comparison of class types: base classes, friend declarations, member functions, nested types, template parameters
  • verify_enum_type_correspondence (sub_7A0E10) -- Enum underlying type and enumerator list comparison
  • verify_function_type_correspondence (sub_7A1230) -- Parameter list and return type comparison
  • set_type_correspondence (sub_7A1460) -- Links two corresponding types across TUs

The trans_unit.c file manages TU lifecycle with a stack-based model:

GlobalPurpose
qword_106BA10Current translation unit pointer
qword_106B9F0Primary (first) translation unit
qword_106BA18TU stack top
dword_106B9E8TU stack depth (excluding primary)

process_translation_unit (sub_7A40A0) allocates a 424-byte TU descriptor and drives the parse-to-completion sequence. switch_translation_unit (sub_7A3D60) saves/restores per-TU state (registered variables, scope stack, file scope) when switching between TUs during RDC compilation.

Cross-References