- Overview
- C++ dialect
- Abbreviations used in the code (AKA Carbon abbreviation decoder ring)
.deffiles- Index types
- ValueStore
- Template metaprogramming
- Local lambdas to reduce duplicate code
- Immediately invoked function expressions (IIFE)
- Declarations in conditions
- CRTP or "Curiously recurring template pattern"
- Multiple inheritance
- Defining constants usable in constexpr contexts
The toolchain implementation uses some implementation techniques that may not be commonly found in typical C++ code.
The toolchain implementation does not use some C++ features, following Google's C++ style guide:
Note that abbreviations are typically only used in code, not comments (except when referring to an entity from the code).
- Addr: "address"
- Arg: "argument"
- Decl: "declaration"
- Expr: "expression"
- SubExpr: "subexpression"
- Float: "floating point"
- Init: "initialization"
- Inst: "instruction"
- Int: "integer"
- Loc: "location"
- Param: "parameter"
- Paren: "parenthesis"
- Ref: "reference"
- Deref: "dereference"
- Subst: "substitute"
Phrase abbreviations (where we have an abbreviation for a phrase, where we wouldn't perform all of the abbreviations of those words individually):
- InitRepr: "initializing representation"
- ObjectRepr: "object representation"
- SemIR: "semantics intermediate representation"
- ValueRepr: "value representation"
The Carbon toolchain uses a technique related to X-macros to generate code that operates over a collection of types, enumerators, or another similar list of names. This works as follows:
- A
.deffile is provided, that is intended to be repeatedly included by way of#include. - The user of the
.defdefines a macro, with a name and a form specified by the.deffile, for example#define CARBON_EACH_WIDGET(Name) Scope::Name,. - A
#includeof the.deffile expands toCARBON_EACH_WIDGET(Name1),CARBON_EACH_WIDGET(Name2), ... for each widget name, and then#undefs theCARBON_EACH_WIDGETmacro.
For example:
enum Widgets {
#define CARBON_EACH_WIDGET(Name) Name,
#include "widgets.def"
}... would expand to an enumeration definition with one enumerator per widget name.
Most .def files will have a corresponding EnumBase
child class (if widgets.def has X-macros, widgets.h and widgets.cpp has
the EnumBase child class). These work similarly to an enum class, with the
addition of a name() function and << stream operator support. Many also have
further utility functions for information related to the enum value.
In code, these types and values can be used directly in a switch. They will
convert to an internal actual enum class for the switch, and receive
corresponding compiler safety checks that all enum values are handled.
Carbon makes frequent use of
IndexBase and IdBase. The IndexBase and
IdBase types are small wrappers around int32_t to provide a measure of
type-checking when passing around indices to vector-like storage types. The only
difference is that IndexBase supports all comparison operators, whereas
IdBase only supports equality comparison.
Variable naming will often have _id at the end to indicate that it corresponds
to an IdBase. This may include the full type, as in operand_inst_id being an
InstId for an operand.
A block is an array of ids. These will be indicated with either a _block
suffix or pluralization (for example, param_refs pluralizing refs).
The ref concept in a name means that there is an underlying instruction block,
but only a subset of instructions are present in the refs block. For example,
function parameters have a sequence, and also have a refs block with one entry
per parameter. The refs block allows parameters to be counted and accessed
directly, rather than through vector iteration.
Many of Carbon's data types are stored in a
ValueStore or related type with similar
semantics (sem_ir has several such classes).
ValueStore links an indexing type to a value type with vector-like storage.
The indices typically use IdBase.
ValueStores APIs follow the shape of simple array access and mutation:
Addwhich takes a value and returns the index.Gettakes an index and returns a reference to the value (possibly a constant reference).- Other vector-like functionality, including
sizeorReserve
Each ValueStore instance should be named after its value type. When taking a
return of one of these functions, it's common to use auto and rely on the name
of the storage type to imply the returned type.
Some name mirroring examples are:
intsis aValueStore<IntId, llvm::APInt>; values are integers.functionsis aValueStore<SemIR::FunctionId, SemIR::Function>; values are functions.string_literalsis aValueStore<StringLiteralValueId, llvm::StringRef>; values are string literals.- A reference is used in order to avoid string copies.
There are also a number of wrappers around ValueStore that provide some
additional functionality and which are named with the Store suffix, such as
InstStore or CanonicalValueStore.
A fairly complete list of ValueStore (and ValueStore wrapper) uses should be
available on checking's Context class.
TODO: show example patterns
- InstLikeTypeInfo from toolchain/sem_ir/inst.h
- templated using
- std::declval
- decltype
- static_assert
- if constexpr
- template specialization, for example
Inst::FromRaw<T>(maybe also type traits?)
The toolchain uses a primitive form of struct reflection to operate generically
over the fields in a typed SemIR instruction. This is implemented in
common/struct_reflection.h, and the interface to the functionality is
StructReflection::AsTuple(your_struct), which converts the given struct into a
std::tuple containing the same fields in the same order.
The presence of specific fields in a struct with a specified type is detected using the following idiom:
// HasField<T> is true if T has a `U field` field of type FieldType.
template <typename T> concept HasField = requires (T x) {
{ &T::field } -> std::same_as<FieldType T::*>;
};See HasKindMemberAsField in
toolchain/sem_ir/typed_insts.h for an
example.
Sometimes code that would be repeated in a function is factored into a local variable containing a lambda:
auto common_code = [&](AType param1, AnotherType param2) {
// code that would otherwise be repeated
...
}
if (something) {
common_code(...);
}
if (something_else) {
common_code(...)
}Compared to defining a new function, this has the advantage of being able to be declared in context and access the local variables of the enclosing function.
Instead of creating a separate function with its own name that will be called once to produce the initial value for a variable, the function can be declared inline and then immediately called.
This can be used for complex initialization, as in:
// variable declaration
static const llvm::ArrayRef<std::byte> entropy_bytes =
// initializer starts with a lambda
[]() -> llvm::ArrayRef<std::byte> {
static llvm::SmallVector<std::byte> bytes;
// a bunch of code
// return the value to initialize the variable with
return bytes;
// finish defining the lambda, and then immediately invoke it
}();It can also be used inside a CARBON_DCHECK to avoid computation that is only
needed in debug builds:
CARBON_DCHECK([&] {
// a bunch of code
// condition that will be tested by CARBON_DCHECK
return complicated && multiple_parts;
// finish defining the lambda, and then immediately invoke it
}(), "Complicated things went wrong");See a description of this technique on wikipedia.
The condition part of an if statement may contain a declaration with an
initializer followed by a semicolon (;) and then the proper boolean condition
expression, as in:
if (auto verify = tree.Verify(); !verify.ok()) {The condition can be replaced by a declaration entirely, as in:
if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal)) {
// Equivalent to:
if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal); equals) {or
if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>()) {
// Equivalent to:
if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>(); literal) {This is a common way of handling a function that returns an optional value.
See https://en.cppreference.com/w/cpp/language/if
Curiously Recurring Template Pattern - cppreference.com
Curiously recurring template pattern - Wikipedia
Examples:
template <typename DerivedT, ...>in enum_base.htemplate <typename DerivedT>in ostream.h
We use multiple inheritance to support uses of CRTP.
Example:
struct NameScopeId : public IndexBase, public Printable<NameScopeId> {To declare a constant usable at compile time in constexpr contexts as a static
class member, we use this pattern:
Declaration:
class Foo {
// ...
static const std::array<ElementType, ElementCount> MyTable;
static constexpr auto ComputeMyTable()
-> std::array<ElementType, ElementCount> { ... }
};Definition:
constexpr std::array<ElementType, ElementCount>
Foo::MyTable = Foo::ComputeMyTable();Note the const on the declaration does not match the constexpr on
definition, and that the definition is outside of the class body. This allows
the initializer to depend on the definition of the class.
Further note that this only works with static members of classes, not static variables in functions.
Due to a Clang bug, this technique does not work in a class template. The following pattern can be used instead:
template <typename T>
class Foo {
// ...
template <typename Self = Foo>
static constexpr auto MyValueImpl = Self();
static constexpr const Foo& MyValue = MyValueImpl<>;
// ...
};The parameters of the variable template can be chosen to allow reuse of the same variable template for multiple static data members.
For example, see NodeStack::IdKindTable in
check/node_stack.h.
A global constant may use a single definition without a separate declaration:
static constexpr std::array<bool, 256> IsIdStartByteTable = [] {
std::array<bool, 256> table = {};
// ...
return table;
}();Note this example is using an immediately invoked function expression to compute the initial value, which is common.
Examples: