about summary refs log tree commit diff
path: root/src/librustc_trans/trans/doc.rs
diff options
context:
space:
mode:
Diffstat (limited to 'src/librustc_trans/trans/doc.rs')
-rw-r--r--src/librustc_trans/trans/doc.rs237
1 files changed, 237 insertions, 0 deletions
diff --git a/src/librustc_trans/trans/doc.rs b/src/librustc_trans/trans/doc.rs
new file mode 100644
index 00000000000..013483d0003
--- /dev/null
+++ b/src/librustc_trans/trans/doc.rs
@@ -0,0 +1,237 @@
+// Copyright 2014 The Rust Project Developers. See the COPYRIGHT
+// file at the top-level directory of this distribution and at
+// http://rust-lang.org/COPYRIGHT.
+//
+// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+// option. This file may not be copied, modified, or distributed
+// except according to those terms.
+
+/*!
+
+# Documentation for the trans module
+
+This module contains high-level summaries of how the various modules
+in trans work. It is a work in progress. For detailed comments,
+naturally, you can refer to the individual modules themselves.
+
+## The Expr module
+
+The expr module handles translation of expressions. The most general
+translation routine is `trans()`, which will translate an expression
+into a datum. `trans_into()` is also available, which will translate
+an expression and write the result directly into memory, sometimes
+avoiding the need for a temporary stack slot. Finally,
+`trans_to_lvalue()` is available if you'd like to ensure that the
+result has cleanup scheduled.
+
+Internally, each of these functions dispatches to various other
+expression functions depending on the kind of expression. We divide
+up expressions into:
+
+- **Datum expressions:** Those that most naturally yield values.
+  Examples would be `22`, `box x`, or `a + b` (when not overloaded).
+- **DPS expressions:** Those that most naturally write into a location
+  in memory. Examples would be `foo()` or `Point { x: 3, y: 4 }`.
+- **Statement expressions:** That that do not generate a meaningful
+  result. Examples would be `while { ... }` or `return 44`.
+
+## The Datum module
+
+A `Datum` encapsulates the result of evaluating a Rust expression.  It
+contains a `ValueRef` indicating the result, a `ty::t` describing
+the Rust type, but also a *kind*. The kind indicates whether the datum
+has cleanup scheduled (lvalue) or not (rvalue) and -- in the case of
+rvalues -- whether or not the value is "by ref" or "by value".
+
+The datum API is designed to try and help you avoid memory errors like
+forgetting to arrange cleanup or duplicating a value. The type of the
+datum incorporates the kind, and thus reflects whether it has cleanup
+scheduled:
+
+- `Datum<Lvalue>` -- by ref, cleanup scheduled
+- `Datum<Rvalue>` -- by value or by ref, no cleanup scheduled
+- `Datum<Expr>` -- either `Datum<Lvalue>` or `Datum<Rvalue>`
+
+Rvalue and expr datums are noncopyable, and most of the methods on
+datums consume the datum itself (with some notable exceptions). This
+reflects the fact that datums may represent affine values which ought
+to be consumed exactly once, and if you were to try to (for example)
+store an affine value multiple times, you would be duplicating it,
+which would certainly be a bug.
+
+Some of the datum methods, however, are designed to work only on
+copyable values such as ints or pointers. Those methods may borrow the
+datum (`&self`) rather than consume it, but they always include
+assertions on the type of the value represented to check that this
+makes sense. An example is `shallow_copy()`, which duplicates
+a datum value.
+
+Translating an expression always yields a `Datum<Expr>` result, but
+the methods `to_[lr]value_datum()` can be used to coerce a
+`Datum<Expr>` into a `Datum<Lvalue>` or `Datum<Rvalue>` as
+needed. Coercing to an lvalue is fairly common, and generally occurs
+whenever it is necessary to inspect a value and pull out its
+subcomponents (for example, a match, or indexing expression). Coercing
+to an rvalue is more unusual; it occurs when moving values from place
+to place, such as in an assignment expression or parameter passing.
+
+### Lvalues in detail
+
+An lvalue datum is one for which cleanup has been scheduled. Lvalue
+datums are always located in memory, and thus the `ValueRef` for an
+LLVM value is always a pointer to the actual Rust value. This means
+that if the Datum has a Rust type of `int`, then the LLVM type of the
+`ValueRef` will be `int*` (pointer to int).
+
+Because lvalues already have cleanups scheduled, the memory must be
+zeroed to prevent the cleanup from taking place (presuming that the
+Rust type needs drop in the first place, otherwise it doesn't
+matter). The Datum code automatically performs this zeroing when the
+value is stored to a new location, for example.
+
+Lvalues usually result from evaluating lvalue expressions. For
+example, evaluating a local variable `x` yields an lvalue, as does a
+reference to a field like `x.f` or an index `x[i]`.
+
+Lvalue datums can also arise by *converting* an rvalue into an lvalue.
+This is done with the `to_lvalue_datum` method defined on
+`Datum<Expr>`. Basically this method just schedules cleanup if the
+datum is an rvalue, possibly storing the value into a stack slot first
+if needed. Converting rvalues into lvalues occurs in constructs like
+`&foo()` or `match foo() { ref x => ... }`, where the user is
+implicitly requesting a temporary.
+
+Somewhat surprisingly, not all lvalue expressions yield lvalue datums
+when trans'd. Ultimately the reason for this is to micro-optimize
+the resulting LLVM. For example, consider the following code:
+
+    fn foo() -> Box<int> { ... }
+    let x = *foo();
+
+The expression `*foo()` is an lvalue, but if you invoke `expr::trans`,
+it will return an rvalue datum. See `deref_once` in expr.rs for
+more details.
+
+### Rvalues in detail
+
+Rvalues datums are values with no cleanup scheduled. One must be
+careful with rvalue datums to ensure that cleanup is properly
+arranged, usually by converting to an lvalue datum or by invoking the
+`add_clean` method.
+
+### Scratch datums
+
+Sometimes you need some temporary scratch space.  The functions
+`[lr]value_scratch_datum()` can be used to get temporary stack
+space. As their name suggests, they yield lvalues and rvalues
+respectively. That is, the slot from `lvalue_scratch_datum` will have
+cleanup arranged, and the slot from `rvalue_scratch_datum` does not.
+
+## The Cleanup module
+
+The cleanup module tracks what values need to be cleaned up as scopes
+are exited, either via panic or just normal control flow. The basic
+idea is that the function context maintains a stack of cleanup scopes
+that are pushed/popped as we traverse the AST tree. There is typically
+at least one cleanup scope per AST node; some AST nodes may introduce
+additional temporary scopes.
+
+Cleanup items can be scheduled into any of the scopes on the stack.
+Typically, when a scope is popped, we will also generate the code for
+each of its cleanups at that time. This corresponds to a normal exit
+from a block (for example, an expression completing evaluation
+successfully without panic). However, it is also possible to pop a
+block *without* executing its cleanups; this is typically used to
+guard intermediate values that must be cleaned up on panic, but not
+if everything goes right. See the section on custom scopes below for
+more details.
+
+Cleanup scopes come in three kinds:
+- **AST scopes:** each AST node in a function body has a corresponding
+  AST scope. We push the AST scope when we start generate code for an AST
+  node and pop it once the AST node has been fully generated.
+- **Loop scopes:** loops have an additional cleanup scope. Cleanups are
+  never scheduled into loop scopes; instead, they are used to record the
+  basic blocks that we should branch to when a `continue` or `break` statement
+  is encountered.
+- **Custom scopes:** custom scopes are typically used to ensure cleanup
+  of intermediate values.
+
+### When to schedule cleanup
+
+Although the cleanup system is intended to *feel* fairly declarative,
+it's still important to time calls to `schedule_clean()` correctly.
+Basically, you should not schedule cleanup for memory until it has
+been initialized, because if an unwind should occur before the memory
+is fully initialized, then the cleanup will run and try to free or
+drop uninitialized memory. If the initialization itself produces
+byproducts that need to be freed, then you should use temporary custom
+scopes to ensure that those byproducts will get freed on unwind.  For
+example, an expression like `box foo()` will first allocate a box in the
+heap and then call `foo()` -- if `foo()` should panic, this box needs
+to be *shallowly* freed.
+
+### Long-distance jumps
+
+In addition to popping a scope, which corresponds to normal control
+flow exiting the scope, we may also *jump out* of a scope into some
+earlier scope on the stack. This can occur in response to a `return`,
+`break`, or `continue` statement, but also in response to panic. In
+any of these cases, we will generate a series of cleanup blocks for
+each of the scopes that is exited. So, if the stack contains scopes A
+... Z, and we break out of a loop whose corresponding cleanup scope is
+X, we would generate cleanup blocks for the cleanups in X, Y, and Z.
+After cleanup is done we would branch to the exit point for scope X.
+But if panic should occur, we would generate cleanups for all the
+scopes from A to Z and then resume the unwind process afterwards.
+
+To avoid generating tons of code, we cache the cleanup blocks that we
+create for breaks, returns, unwinds, and other jumps. Whenever a new
+cleanup is scheduled, though, we must clear these cached blocks. A
+possible improvement would be to keep the cached blocks but simply
+generate a new block which performs the additional cleanup and then
+branches to the existing cached blocks.
+
+### AST and loop cleanup scopes
+
+AST cleanup scopes are pushed when we begin and end processing an AST
+node. They are used to house cleanups related to rvalue temporary that
+get referenced (e.g., due to an expression like `&Foo()`). Whenever an
+AST scope is popped, we always trans all the cleanups, adding the cleanup
+code after the postdominator of the AST node.
+
+AST nodes that represent breakable loops also push a loop scope; the
+loop scope never has any actual cleanups, it's just used to point to
+the basic blocks where control should flow after a "continue" or
+"break" statement. Popping a loop scope never generates code.
+
+### Custom cleanup scopes
+
+Custom cleanup scopes are used for a variety of purposes. The most
+common though is to handle temporary byproducts, where cleanup only
+needs to occur on panic. The general strategy is to push a custom
+cleanup scope, schedule *shallow* cleanups into the custom scope, and
+then pop the custom scope (without transing the cleanups) when
+execution succeeds normally. This way the cleanups are only trans'd on
+unwind, and only up until the point where execution succeeded, at
+which time the complete value should be stored in an lvalue or some
+other place where normal cleanup applies.
+
+To spell it out, here is an example. Imagine an expression `box expr`.
+We would basically:
+
+1. Push a custom cleanup scope C.
+2. Allocate the box.
+3. Schedule a shallow free in the scope C.
+4. Trans `expr` into the box.
+5. Pop the scope C.
+6. Return the box as an rvalue.
+
+This way, if a panic occurs while transing `expr`, the custom
+cleanup scope C is pushed and hence the box will be freed. The trans
+code for `expr` itself is responsible for freeing any other byproducts
+that may be in play.
+
+*/