From 7d877e26ca9db943ffc2b89e1ca07428eab5bb70 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 26 Sep 2022 11:07:23 +0200 Subject: don't refer to the compile-time interpreter as "Miri" (#1471) --- src/doc/rustc-dev-guide/book.toml | 1 + src/doc/rustc-dev-guide/src/SUMMARY.md | 2 +- src/doc/rustc-dev-guide/src/appendix/glossary.md | 3 +- src/doc/rustc-dev-guide/src/const-eval.md | 9 +- .../rustc-dev-guide/src/const-eval/interpret.md | 238 ++++++++++++++++++++ src/doc/rustc-dev-guide/src/miri.md | 239 --------------------- 6 files changed, 246 insertions(+), 246 deletions(-) create mode 100644 src/doc/rustc-dev-guide/src/const-eval/interpret.md delete mode 100644 src/doc/rustc-dev-guide/src/miri.md (limited to 'src/doc/rustc-dev-guide') diff --git a/src/doc/rustc-dev-guide/book.toml b/src/doc/rustc-dev-guide/book.toml index c86ec563825..dc216760ed3 100644 --- a/src/doc/rustc-dev-guide/book.toml +++ b/src/doc/rustc-dev-guide/book.toml @@ -43,3 +43,4 @@ warning-policy = "error" [output.html.redirect] "/compiletest.html" = "tests/compiletest.html" "/diagnostics/sessiondiagnostic.html" = "diagnostics/diagnostic-structs.html" +"/miri.html" = "const-eval/interpret.html" diff --git a/src/doc/rustc-dev-guide/src/SUMMARY.md b/src/doc/rustc-dev-guide/src/SUMMARY.md index 99b24fe5949..360265c0e34 100644 --- a/src/doc/rustc-dev-guide/src/SUMMARY.md +++ b/src/doc/rustc-dev-guide/src/SUMMARY.md @@ -152,7 +152,7 @@ - [MIR optimizations](./mir/optimizations.md) - [Debugging](./mir/debugging.md) - [Constant evaluation](./const-eval.md) - - [miri const evaluator](./miri.md) + - [Interpreter](./const-eval/interpret.md) - [Monomorphization](./backend/monomorph.md) - [Lowering MIR](./backend/lowering-mir.md) - [Code Generation](./backend/codegen.md) diff --git a/src/doc/rustc-dev-guide/src/appendix/glossary.md b/src/doc/rustc-dev-guide/src/appendix/glossary.md index 375db493c8e..42306dc1c2c 100644 --- a/src/doc/rustc-dev-guide/src/appendix/glossary.md +++ b/src/doc/rustc-dev-guide/src/appendix/glossary.md @@ -36,6 +36,7 @@ Term | Meaning infcx   | The type inference context (`InferCtxt`). (see `rustc_middle::infer`) inference variable   | When doing type or region inference, an "inference variable" is a kind of special type/region that represents what you are trying to infer. Think of X in algebra. For example, if we are trying to infer the type of a variable in a program, we create an inference variable to represent that unknown type. intern   | Interning refers to storing certain frequently-used constant data, such as strings, and then referring to the data by an identifier (e.g. a `Symbol`) rather than the data itself, to reduce memory usage and number of allocations. See [this chapter](../memory.md) for more info. +interpreter   | The heart of const evaluation, running MIR code at compile time. ([see more](../const-eval/interpret.md)) intrinsic   | Intrinsics are special functions that are implemented in the compiler itself but exposed (often unstably) to users. They do magical and dangerous things. (See [`std::intrinsics`](https://doc.rust-lang.org/std/intrinsics/index.html)) IR   | Short for Intermediate Representation, a general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it. IRLO   | `IRLO` or `irlo` is sometimes used as an abbreviation for [internals.rust-lang.org](https://internals.rust-lang.org). @@ -47,7 +48,7 @@ Term | Meaning [LLVM]   | (actually not an acronym :P) an open-source compiler backend. It accepts LLVM IR and outputs native binaries. Various languages (e.g. Rust) can then implement a compiler front-end that outputs LLVM IR and use LLVM to compile to all the platforms LLVM supports. memoization   | The process of storing the results of (pure) computations (such as pure function calls) to avoid having to repeat them in the future. This is typically a trade-off between execution speed and memory usage. MIR   | The Mid-level IR that is created after type-checking for use by borrowck and codegen. ([see more](../mir/index.md)) -miri   | An interpreter for MIR used for constant evaluation. ([see more](../miri.md)) +Miri   | A tool to detect Undefined Behavior in (unsafe) Rust code. ([see more](https://github.com/rust-lang/miri)) monomorphization   | The process of taking generic implementations of types and functions and instantiating them with concrete types. For example, in the code we might have `Vec`, but in the final executable, we will have a copy of the `Vec` code for every concrete type used in the program (e.g. a copy for `Vec`, a copy for `Vec`, etc). normalize   | A general term for converting to a more canonical form, but in the case of rustc typically refers to [associated type normalization](../traits/goals-and-clauses.md#normalizeprojection---type). newtype   | A wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices. diff --git a/src/doc/rustc-dev-guide/src/const-eval.md b/src/doc/rustc-dev-guide/src/const-eval.md index 35f5c82ea89..a7b1c896333 100644 --- a/src/doc/rustc-dev-guide/src/const-eval.md +++ b/src/doc/rustc-dev-guide/src/const-eval.md @@ -77,11 +77,10 @@ As a consequence, all decoding of `ValTree` must happen by matching on the type decisions depending on that. The value itself gives no useful information without the type that belongs to it. -Other constants get represented as [`ConstValue::Scalar`] -or [`ConstValue::Slice`] if possible. This means that the `const_eval_*` -functions cannot be used to create miri-pointers to the evaluated constant. -If you need the value of a constant inside Miri, you need to directly work with -[`const_to_op`]. +Other constants get represented as [`ConstValue::Scalar`] or +[`ConstValue::Slice`] if possible. These values are only useful outside the +compile-time interpreter. If you need the value of a constant during +interpretation, you need to directly work with [`const_to_op`]. [`GlobalId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.GlobalId.html [`ConstValue::Scalar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/value/enum.ConstValue.html#variant.Scalar diff --git a/src/doc/rustc-dev-guide/src/const-eval/interpret.md b/src/doc/rustc-dev-guide/src/const-eval/interpret.md new file mode 100644 index 00000000000..ee044505e10 --- /dev/null +++ b/src/doc/rustc-dev-guide/src/const-eval/interpret.md @@ -0,0 +1,238 @@ +# Interpreter + + + +The interpreter is a virtual machine for executing MIR without compiling to +machine code. It is usually invoked via `tcx.const_eval_*` functions. The +interpreter is shared between the compiler (for compile-time function +evaluation, CTFE) and the tool [Miri](https://github.com/rust-lang/miri/), which +uses the same virtual machine to detect Undefined Behavior in (unsafe) Rust +code. + +If you start out with a constant: + +```rust +const FOO: usize = 1 << 12; +``` + +rustc doesn't actually invoke anything until the constant is either used or +placed into metadata. + +Once you have a use-site like: + +```rust,ignore +type Foo = [u8; FOO - 42]; +``` + +The compiler needs to figure out the length of the array before being able to +create items that use the type (locals, constants, function arguments, ...). + +To obtain the (in this case empty) parameter environment, one can call +`let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is + +```rust,ignore +let gid = GlobalId { + promoted: None, + instance: Instance::mono(length_def_id), +}; +``` + +Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of +the MIR of the array length expression. The MIR will look something like this: + +```mir +Foo::{{constant}}#0: usize = { + let mut _0: usize; + let mut _1: (usize, bool); + + bb0: { + _1 = CheckedSub(const FOO, const 42usize); + assert(!move (_1.1: bool), "attempt to subtract with overflow") -> bb1; + } + + bb1: { + _0 = move (_1.0: usize); + return; + } +} +``` + +Before the evaluation, a virtual memory location (in this case essentially a +`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result. + +At the start of the evaluation, `_0` and `_1` are +`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite +a mouthful: [`Operand`] can represent either data stored somewhere in the +[interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization) +immediate data stored in-line. And [`Immediate`] can either be a single +(potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer), +or a pair of two of them. In our case, the single scalar value is *not* (yet) +initialized. + +When the initialization of `_1` is invoked, the value of the `FOO` constant is +required, and triggers another call to `tcx.const_eval_*`, which will not be shown +here. If the evaluation of FOO is successful, `42` will be subtracted from its +value `4096` and the result stored in `_1` as +`Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, +Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value, +the second part is a bool that's true if an overflow happened. A `Scalar::Raw` +also stores the size (in bytes) of this scalar value; we are eliding that here. + +The next statement asserts that said boolean is `0`. In case the assertion +fails, its error message is used for reporting a compile-time error. + +Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw { +data: 4054, .. }))` is stored in the virtual memory was allocated before the +evaluation. `_0` always refers to that location directly. + +After the evaluation is done, the return value is converted from [`Operand`] to +[`ConstValue`] by [`op_to_const`]: the former representation is geared towards +what is needed *during* const evaluation, while [`ConstValue`] is shaped by the +needs of the remaining parts of the compiler that consume the results of const +evaluation. As part of this conversion, for types with scalar values, even if +the resulting [`Operand`] is `Indirect`, it will return an immediate +`ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`). +This makes using the result much more efficient and also more convenient, as no +further queries need to be executed in order to get at something as simple as a +`usize`. + +Future evaluations of the same constants will not actually invoke +the interpreter, but just use the cached result. + +[`Operand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/interpret/enum.Operand.html +[`Immediate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/interpret/enum.Immediate.html +[`ConstValue`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.ConstValue.html +[`Scalar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.Scalar.html +[`op_to_const`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/const_eval/eval_queries/fn.op_to_const.html + +## Datastructures + +The interpreter's outside-facing datastructures can be found in +[rustc_middle/src/mir/interpret](https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/mir/interpret). +This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A +`ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin +pointer), `Slice` (to represent byte slices and strings, as needed for pattern +matching) or `ByRef`, which is used for anything else and refers to a virtual +allocation. These allocations can be accessed via the methods on +`tcx.interpret_interner`. A `Scalar` is either some `Raw` integer or a pointer; +see [the next section](#memory) for more on that. + +If you are expecting a numeric result, you can use `eval_usize` (panics on +anything that can't be represented as a `u64`) or `try_eval_usize` which results +in an `Option` yielding the `Scalar` if possible. + +## Memory + +To support any kind of pointers, the interpreter needs to have a "virtual memory" that the +pointers can point to. This is implemented in the [`Memory`] type. In the +simplest model, every global variable, stack variable and every dynamic +allocation corresponds to an [`Allocation`] in that memory. (Actually using an +allocation for every MIR stack variable would be very inefficient; that's why we +have `Operand::Immediate` for stack variables that are both small and never have +their address taken. But that is purely an optimization.) + +Such an `Allocation` is basically just a sequence of `u8` storing the value of +each byte in this allocation. (Plus some extra data, see below.) Every +`Allocation` has a globally unique `AllocId` assigned in `Memory`. With that, a +[`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and +an offset into the allocation (indicating which byte of the allocation the +pointer points to). It may seem odd that a `Pointer` is not just an integer +address, but remember that during const evaluation, we cannot know at which +actual integer address the allocation will end up -- so we use `AllocId` as +symbolic base addresses, which means we need a separate offset. (As an aside, +it turns out that pointers at run-time are +[more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).) + +These allocations exist so that references and raw pointers have something to +point to. There is no global linear heap in which things are allocated, but each +allocation (be it for a local variable, a static or a (future) heap allocation) +gets its own little memory with exactly the required size. So if you have a +pointer to an allocation for a local variable `a`, there is no possible (no +matter how unsafe) operation that you can do that would ever change said pointer +to a pointer to a different local variable `b`. +Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays the same. + +This, however, causes a problem when we want to store a `Pointer` into an +`Allocation`: we cannot turn it into a sequence of `u8` of the right length! +`AllocId` and offset together are twice as big as a pointer "seems" to be. This +is what the `relocation` field of `Allocation` is for: the byte offset of the +`Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored +out-of-band. The two are reassembled when the `Pointer` is read from memory. +The other bit of extra data an `Allocation` needs is `undef_mask` for keeping +track of which of its bytes are initialized. + +### Global memory and exotic allocations + +`Memory` exists only during evaluation; it gets destroyed when the +final value of the constant is computed. In case that constant contains any +pointers, those get "interned" and moved to a global "const eval memory" that is +part of `TyCtxt`. These allocations stay around for the remaining computation +and get serialized into the final output (so that dependent crates can use +them). + +Moreover, to also support function pointers, the global memory in `TyCtxt` can +also contain "virtual allocations": instead of an `Allocation`, these contain an +`Instance`. That allows a `Pointer` to point to either normal data or a +function, which is needed to be able to evaluate casts from function pointers to +raw pointers. + +Finally, the [`GlobalAlloc`] type used in the global memory also contains a +variant `Static` that points to a particular `const` or `static` item. This is +needed to support circular statics, where we need to have a `Pointer` to a +`static` for which we cannot yet have an `Allocation` as we do not know the +bytes of its value. + +[`Memory`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/interpret/struct.Memory.html +[`Allocation`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.Allocation.html +[`Pointer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.Pointer.html +[`GlobalAlloc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.GlobalAlloc.html + +### Pointer values vs Pointer types + +One common cause of confusion in the interpreter is that being a pointer *value* and having +a pointer *type* are entirely independent properties. By "pointer value", we +refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into +the interpreter's virtual memory. This is in contrast to `Scalar::Raw`, which is just some +concrete integer. + +However, a variable of pointer or reference *type*, such as `*const T` or `&T`, +does not have to have a pointer *value*: it could be obtained by casting or +transmuting an integer to a pointer. +And similarly, when casting or transmuting a reference to some +actual allocation to an integer, we end up with a pointer *value* +(`Scalar::Ptr`) at integer *type* (`usize`). This is a problem because we +cannot meaningfully perform integer operations such as division on pointer +values. + +## Interpretation + +Although the main entry point to constant evaluation is the `tcx.const_eval_*` +functions, there are additional functions in +[rustc_const_eval/src/const_eval](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/index.html) +that allow accessing the fields of a `ConstValue` (`ByRef` or otherwise). You should +never have to access an `Allocation` directly except for translating it to the +compilation target (at the moment just LLVM). + +The interpreter starts by creating a virtual stack frame for the current constant that is +being evaluated. There's essentially no difference between a constant and a +function with no arguments, except that constants do not allow local (named) +variables at the time of writing this guide. + +A stack frame is defined by the `Frame` type in +[rustc_const_eval/src/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/master/compiler/rustc_const_eval/src/interpret/eval_context.rs) +and contains all the local +variables memory (`None` at the start of evaluation). Each frame refers to the +evaluation of either the root constant or subsequent calls to `const fn`. The +evaluation of another constant simply calls `tcx.const_eval_*`, which produce an +entirely new and independent stack frame. + +The frames are just a `Vec`, there's no way to actually refer to a +`Frame`'s memory even if horrible shenanigans are done via unsafe code. The only +memory that can be referred to are `Allocation`s. + +The interpreter now calls the `step` method (in +[rustc_const_eval/src/interpret/step.rs](https://github.com/rust-lang/rust/blob/master/compiler/rustc_const_eval/src/interpret/step.rs) +) until it either returns an error or has no further statements to execute. Each +statement will now initialize or modify the locals or the virtual memory +referred to by a local. This might require evaluating other constants or +statics, which just recursively invokes `tcx.const_eval_*`. diff --git a/src/doc/rustc-dev-guide/src/miri.md b/src/doc/rustc-dev-guide/src/miri.md deleted file mode 100644 index c5de358d209..00000000000 --- a/src/doc/rustc-dev-guide/src/miri.md +++ /dev/null @@ -1,239 +0,0 @@ -# Miri - - - -The Miri (**MIR** **I**nterpreter) engine is a virtual machine for executing MIR without -compiling to machine code. It is usually invoked via `tcx.const_eval_*` functions. -In the following, we will refer to the Miri engine as just "Miri", but note that -there also is a stand-alone -[tool called "Miri"](https://github.com/rust-lang/miri/) that is based on the -engine (sometimes referred to as Miri-the-tool to disambiguate it from the -engine). - -If you start out with a constant: - -```rust -const FOO: usize = 1 << 12; -``` - -rustc doesn't actually invoke anything until the constant is either used or -placed into metadata. - -Once you have a use-site like: - -```rust,ignore -type Foo = [u8; FOO - 42]; -``` - -The compiler needs to figure out the length of the array before being able to -create items that use the type (locals, constants, function arguments, ...). - -To obtain the (in this case empty) parameter environment, one can call -`let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is - -```rust,ignore -let gid = GlobalId { - promoted: None, - instance: Instance::mono(length_def_id), -}; -``` - -Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of -the MIR of the array length expression. The MIR will look something like this: - -```mir -Foo::{{constant}}#0: usize = { - let mut _0: usize; - let mut _1: (usize, bool); - - bb0: { - _1 = CheckedSub(const FOO, const 42usize); - assert(!move (_1.1: bool), "attempt to subtract with overflow") -> bb1; - } - - bb1: { - _0 = move (_1.0: usize); - return; - } -} -``` - -Before the evaluation, a virtual memory location (in this case essentially a -`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result. - -At the start of the evaluation, `_0` and `_1` are -`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite -a mouthful: [`Operand`] can represent either data stored somewhere in the -[interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization) -immediate data stored in-line. And [`Immediate`] can either be a single -(potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer), -or a pair of two of them. In our case, the single scalar value is *not* (yet) -initialized. - -When the initialization of `_1` is invoked, the value of the `FOO` constant is -required, and triggers another call to `tcx.const_eval_*`, which will not be shown -here. If the evaluation of FOO is successful, `42` will be subtracted from its -value `4096` and the result stored in `_1` as -`Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, -Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value, -the second part is a bool that's true if an overflow happened. A `Scalar::Raw` -also stores the size (in bytes) of this scalar value; we are eliding that here. - -The next statement asserts that said boolean is `0`. In case the assertion -fails, its error message is used for reporting a compile-time error. - -Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw { -data: 4054, .. }))` is stored in the virtual memory was allocated before the -evaluation. `_0` always refers to that location directly. - -After the evaluation is done, the return value is converted from [`Operand`] to -[`ConstValue`] by [`op_to_const`]: the former representation is geared towards -what is needed *during* const evaluation, while [`ConstValue`] is shaped by the -needs of the remaining parts of the compiler that consume the results of const -evaluation. As part of this conversion, for types with scalar values, even if -the resulting [`Operand`] is `Indirect`, it will return an immediate -`ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`). -This makes using the result much more efficient and also more convenient, as no -further queries need to be executed in order to get at something as simple as a -`usize`. - -Future evaluations of the same constants will not actually invoke -Miri, but just use the cached result. - -[`Operand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/interpret/enum.Operand.html -[`Immediate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/interpret/enum.Immediate.html -[`ConstValue`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.ConstValue.html -[`Scalar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.Scalar.html -[`op_to_const`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/const_eval/eval_queries/fn.op_to_const.html - -## Datastructures - -Miri's outside-facing datastructures can be found in -[rustc_middle/src/mir/interpret](https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/mir/interpret). -This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A -`ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin -pointer), `Slice` (to represent byte slices and strings, as needed for pattern -matching) or `ByRef`, which is used for anything else and refers to a virtual -allocation. These allocations can be accessed via the methods on -`tcx.interpret_interner`. A `Scalar` is either some `Raw` integer or a pointer; -see [the next section](#memory) for more on that. - -If you are expecting a numeric result, you can use `eval_usize` (panics on -anything that can't be represented as a `u64`) or `try_eval_usize` which results -in an `Option` yielding the `Scalar` if possible. - -## Memory - -To support any kind of pointers, Miri needs to have a "virtual memory" that the -pointers can point to. This is implemented in the [`Memory`] type. In the -simplest model, every global variable, stack variable and every dynamic -allocation corresponds to an [`Allocation`] in that memory. (Actually using an -allocation for every MIR stack variable would be very inefficient; that's why we -have `Operand::Immediate` for stack variables that are both small and never have -their address taken. But that is purely an optimization.) - -Such an `Allocation` is basically just a sequence of `u8` storing the value of -each byte in this allocation. (Plus some extra data, see below.) Every -`Allocation` has a globally unique `AllocId` assigned in `Memory`. With that, a -[`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and -an offset into the allocation (indicating which byte of the allocation the -pointer points to). It may seem odd that a `Pointer` is not just an integer -address, but remember that during const evaluation, we cannot know at which -actual integer address the allocation will end up -- so we use `AllocId` as -symbolic base addresses, which means we need a separate offset. (As an aside, -it turns out that pointers at run-time are -[more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).) - -These allocations exist so that references and raw pointers have something to -point to. There is no global linear heap in which things are allocated, but each -allocation (be it for a local variable, a static or a (future) heap allocation) -gets its own little memory with exactly the required size. So if you have a -pointer to an allocation for a local variable `a`, there is no possible (no -matter how unsafe) operation that you can do that would ever change said pointer -to a pointer to a different local variable `b`. -Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays the same. - -This, however, causes a problem when we want to store a `Pointer` into an -`Allocation`: we cannot turn it into a sequence of `u8` of the right length! -`AllocId` and offset together are twice as big as a pointer "seems" to be. This -is what the `relocation` field of `Allocation` is for: the byte offset of the -`Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored -out-of-band. The two are reassembled when the `Pointer` is read from memory. -The other bit of extra data an `Allocation` needs is `undef_mask` for keeping -track of which of its bytes are initialized. - -### Global memory and exotic allocations - -`Memory` exists only during the Miri evaluation; it gets destroyed when the -final value of the constant is computed. In case that constant contains any -pointers, those get "interned" and moved to a global "const eval memory" that is -part of `TyCtxt`. These allocations stay around for the remaining computation -and get serialized into the final output (so that dependent crates can use -them). - -Moreover, to also support function pointers, the global memory in `TyCtxt` can -also contain "virtual allocations": instead of an `Allocation`, these contain an -`Instance`. That allows a `Pointer` to point to either normal data or a -function, which is needed to be able to evaluate casts from function pointers to -raw pointers. - -Finally, the [`GlobalAlloc`] type used in the global memory also contains a -variant `Static` that points to a particular `const` or `static` item. This is -needed to support circular statics, where we need to have a `Pointer` to a -`static` for which we cannot yet have an `Allocation` as we do not know the -bytes of its value. - -[`Memory`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/interpret/struct.Memory.html -[`Allocation`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.Allocation.html -[`Pointer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.Pointer.html -[`GlobalAlloc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/enum.GlobalAlloc.html - -### Pointer values vs Pointer types - -One common cause of confusion in Miri is that being a pointer *value* and having -a pointer *type* are entirely independent properties. By "pointer value", we -refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into -Miri's virtual memory. This is in contrast to `Scalar::Raw`, which is just some -concrete integer. - -However, a variable of pointer or reference *type*, such as `*const T` or `&T`, -does not have to have a pointer *value*: it could be obtained by casting or -transmuting an integer to a pointer. -And similarly, when casting or transmuting a reference to some -actual allocation to an integer, we end up with a pointer *value* -(`Scalar::Ptr`) at integer *type* (`usize`). This is a problem because we -cannot meaningfully perform integer operations such as division on pointer -values. - -## Interpretation - -Although the main entry point to constant evaluation is the `tcx.const_eval_*` -functions, there are additional functions in -[rustc_const_eval/src/const_eval](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/index.html) -that allow accessing the fields of a `ConstValue` (`ByRef` or otherwise). You should -never have to access an `Allocation` directly except for translating it to the -compilation target (at the moment just LLVM). - -Miri starts by creating a virtual stack frame for the current constant that is -being evaluated. There's essentially no difference between a constant and a -function with no arguments, except that constants do not allow local (named) -variables at the time of writing this guide. - -A stack frame is defined by the `Frame` type in -[rustc_const_eval/src/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/master/compiler/rustc_const_eval/src/interpret/eval_context.rs) -and contains all the local -variables memory (`None` at the start of evaluation). Each frame refers to the -evaluation of either the root constant or subsequent calls to `const fn`. The -evaluation of another constant simply calls `tcx.const_eval_*`, which produce an -entirely new and independent stack frame. - -The frames are just a `Vec`, there's no way to actually refer to a -`Frame`'s memory even if horrible shenanigans are done via unsafe code. The only -memory that can be referred to are `Allocation`s. - -Miri now calls the `step` method (in -[rustc_const_eval/src/interpret/step.rs](https://github.com/rust-lang/rust/blob/master/compiler/rustc_const_eval/src/interpret/step.rs) -) until it either returns an error or has no further statements to execute. Each -statement will now initialize or modify the locals or the virtual memory -referred to by a local. This might require evaluating other constants or -statics, which just recursively invokes `tcx.const_eval_*`. -- cgit 1.4.1-3-g733a5