src/rustc/middle/region.rs


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404

/*

Region resolution. This pass runs before typechecking and resolves region
names to the appropriate block.

This seems to be as good a place as any to explain in detail how
region naming, representation, and type check works.

### Naming and so forth

We really want regions to be very lightweight to use. Therefore,
unlike other named things, the scopes for regions are not explicitly
declared: instead, they are implicitly defined.  Functions declare new
scopes: if the function is not a bare function, then as always it
inherits the names in scope from the outer scope.  Within a function
declaration, new names implicitly declare new region variables.  Outside
of function declarations, new names are illegal.  To make this more
concrete, here is an example:

    fn foo(s: &a.S, t: &b.T) {
        let s1: &a.S = s; // a refers to the same a as in the decl
        let t1: &c.T = t; // illegal: cannot introduce new name here
    }

The code in this file is what actually handles resolving these names.
It creates a couple of maps that map from the AST node representing a
region ptr type to the resolved form of its region parameter.  If new
names are introduced where they shouldn't be, then an error is
reported.

If regions are not given an explicit name, then the behavior depends
a bit on the context.  Within a function declaration, all unnamed regions
are mapped to a single, anonymous parameter.  That is, a function like:

    fn foo(s: &S) -> &S { s }

is equivalent to a declaration like:

    fn foo(s: &a.S) -> &a.S { s }

Within a function body or other non-binding context, an unnamed region
reference is mapped to a fresh region variable whose value can be
inferred as normal.

The resolved form of regions is `ty::region`.  Before I can explain
why this type is setup the way it is, I have to digress a little bit
into some ill-explained type theory.

### Universal Quantification

Regions are more complex than type parameters because, unlike type
parameters, they can be universally quantified within a type.  To put
it another way, you cannot (at least at the time of this writing) have
a variable `x` of type `fn<T>(T) -> T`.  You can have an *item* of
type `fn<T>(T) -> T`, but whenever it is referenced within a method,
that type parameter `T` is replaced with a concrete type *variable*
`$T`.  To make this more concrete, imagine this code:

    fn identity<T>(x: T) -> T { x }
    let f = identity; // f has type fn($T) -> $T
    f(3u); // $T is bound to uint
    f(3);  // Type error

You can see here that a type error will result because the type of `f`
(as opposed to the type of `identity`) is not universally quantified
over `$T`.  That's fancy math speak for saying that the type variable
`$T` refers to a specific type that may not yet be known, unlike the
type parameter `T` which refers to some type which will never be
known.

Anyway, regions work differently.  If you have an item of type
`fn(&a.T) -> &a.T` and you reference it, its type remains the same:
only when the function *is called* is `&a` instantiated with a
concrete region variable.  This means you could call it twice and give
different values for `&a` each time.

This more general form is possible for regions because they do not
impact code generation.  We do not need to monomorphize functions
differently just because they contain region pointers.  In fact, we
don't really do *anything* differently.

### Representing regions; or, why do I care about all that?

The point of this discussion is that the representation of regions
must distinguish between a *bound* reference to a region and a *free*
reference.  A bound reference is one which will be replaced with a
fresh type variable when the function is called, like the type
parameter `T` in `identity`.  They can only appear within function
types.  A free reference is a region that may not yet be concretely
known, like the variable `$T`.

To see why we must distinguish them carefully, consider this program:

    fn item1(s: &a.S) {
        let choose = fn@(s1: &a.S) -> &a.S {
            if some_cond { s } else { s1 }
        };
    }

Here, the variable `s1: &a.S` that appears within the `fn@` is a free
reference to `a`.  That is, when you call `choose()`, you don't
replace `&a` with a fresh region variable, but rather you expect `s1`
to be in the same region as the parameter `s`.

But in this program, this is not the case at all:

    fn item2() {
        let identity = fn@(s1: &a.S) -> &a.S { s1 };
    }

To distinguish between these two cases, `ty::region` contains two
variants: `re_bound` and `re_free`.  In `item1()`, the outer reference
to `&a` would be `re_bound(rid_param("a", 0u))`, and the inner reference
would be `re_free(rid_param("a", 0u))`.  In `item2()`, the inner reference
would be `re_bound(rid_param("a", 0u))`.

#### Implications for typeck

In typeck, whenever we call a function, we must go over and replace
all references to `re_bound()` regions within its parameters with
fresh type variables (we do not, however, replace bound regions within
nested function types, as those nested functions have not yet been
called).

Also, when we typecheck the *body* of an item, we must replace all
`re_bound` references with `re_free` references.  This means that the
region in the type of the argument `s` in `item1()` *within `item1()`*
is not `re_bound(re_param("a", 0u))` but rather `re_free(re_param("a",
0u))`.  This is because, for any particular *invocation of `item1()`*,
`&a` will be bound to some specific region, and hence it is no longer
bound.

*/

import driver::session::session;
import middle::ty;
import syntax::{ast, visit};
import syntax::codemap::span;
import syntax::print::pprust;
import syntax::ast_util::new_def_hash;

import std::list;
import std::list::list;
import std::map;
import std::map::hashmap;

type parent = option<ast::node_id>;

/* Records the parameter ID of a region name. */
type binding = {node_id: ast::node_id,
                name: str,
                br: ty::bound_region};

// Mapping from a block/expr/binding to the innermost scope that
// bounds its lifetime.  For a block/expression, this is the lifetime
// in which it will be evaluated.  For a binding, this is the lifetime
// in which is in scope.
type region_map = hashmap<ast::node_id, ast::node_id>;

type ctxt = {
    sess: session,
    def_map: resolve::def_map,
    region_map: region_map,

    // The parent scope is the innermost block, call, or alt
    // expression during the execution of which the current expression
    // will be evaluated.  Generally speaking, the innermost parent
    // scope is also the closest suitable ancestor in the AST tree.
    //
    // There is a subtle point concerning call arguments.  Imagine
    // you have a call:
    //
    // { // block a
    //     foo( // call b
    //        x,
    //        y);
    // }
    //
    // In what lifetime are the expressions `x` and `y` evaluated?  At
    // first, I imagine the answer was the block `a`, as the arguments
    // are evaluated before the call takes place.  But this turns out
    // to be wrong.  The lifetime of the call must encompass the
    // argument evaluation as well.
    //
    // The reason is that evaluation of an earlier argument could
    // create a borrow which exists during the evaluation of later
    // arguments.  Consider this torture test, for example,
    //
    // fn test1(x: @mut ~int) {
    //     foo(&**x, *x = ~5);
    // }
    //
    // Here, the first argument `&**x` will be a borrow of the `~int`,
    // but the second argument overwrites that very value! Bad.
    // (This test is borrowck-pure-scope-in-call.rs, btw)
    parent: parent
};

// Returns true if `subscope` is equal to or is lexically nested inside
// `superscope` and false otherwise.
fn scope_contains(region_map: region_map, superscope: ast::node_id,
                  subscope: ast::node_id) -> bool {
    let mut subscope = subscope;
    while superscope != subscope {
        alt region_map.find(subscope) {
            none { ret false; }
            some(scope) { subscope = scope; }
        }
    }
    ret true;
}

fn nearest_common_ancestor(region_map: region_map, scope_a: ast::node_id,
                           scope_b: ast::node_id) -> option<ast::node_id> {

    fn ancestors_of(region_map: region_map, scope: ast::node_id)
                    -> ~[ast::node_id] {
        let mut result = ~[scope];
        let mut scope = scope;
        loop {
            alt region_map.find(scope) {
                none { ret result; }
                some(superscope) {
                    vec::push(result, superscope);
                    scope = superscope;
                }
            }
        }
    }

    if scope_a == scope_b { ret some(scope_a); }

    let a_ancestors = ancestors_of(region_map, scope_a);
    let b_ancestors = ancestors_of(region_map, scope_b);
    let mut a_index = vec::len(a_ancestors) - 1u;
    let mut b_index = vec::len(b_ancestors) - 1u;

    // Here, ~[ab]_ancestors is a vector going from narrow to broad.
    // The end of each vector will be the item where the scope is
    // defined; if there are any common ancestors, then the tails of
    // the vector will be the same.  So basically we want to walk
    // backwards from the tail of each vector and find the first point
    // where they diverge.  If one vector is a suffix of the other,
    // then the corresponding scope is a superscope of the other.

    if a_ancestors[a_index] != b_ancestors[b_index] {
        ret none;
    }

    loop {
        // Loop invariant: a_ancestors[a_index] == b_ancestors[b_index]
        // for all indices between a_index and the end of the array
        if a_index == 0u { ret some(scope_a); }
        if b_index == 0u { ret some(scope_b); }
        a_index -= 1u;
        b_index -= 1u;
        if a_ancestors[a_index] != b_ancestors[b_index] {
            ret some(a_ancestors[a_index + 1u]);
        }
    }
}

fn parent_id(cx: ctxt, span: span) -> ast::node_id {
    alt cx.parent {
      none {
        cx.sess.span_bug(span, "crate should not be parent here");
      }
      some(parent_id) {
        parent_id
      }
    }
}

fn record_parent(cx: ctxt, child_id: ast::node_id) {
    alt cx.parent {
      none { /* no-op */ }
      some(parent_id) {
        #debug["parent of node %d is node %d", child_id, parent_id];
        cx.region_map.insert(child_id, parent_id);
      }
    }
}

fn resolve_block(blk: ast::blk, cx: ctxt, visitor: visit::vt<ctxt>) {
    // Record the parent of this block.
    record_parent(cx, blk.node.id);

    // Descend.
    let new_cx: ctxt = {parent: some(blk.node.id) with cx};
    visit::visit_block(blk, new_cx, visitor);
}

fn resolve_arm(arm: ast::arm, cx: ctxt, visitor: visit::vt<ctxt>) {
    visit::visit_arm(arm, cx, visitor);
}

fn resolve_pat(pat: @ast::pat, cx: ctxt, visitor: visit::vt<ctxt>) {
    alt pat.node {
      ast::pat_ident(path, _) {
        let defn_opt = cx.def_map.find(pat.id);
        alt defn_opt {
          some(ast::def_variant(_,_)) {
            /* Nothing to do; this names a variant. */
          }
          _ {
            /* This names a local. Bind it to the containing scope. */
            record_parent(cx, pat.id);
          }
        }
      }
      _ { /* no-op */ }
    }

    visit::visit_pat(pat, cx, visitor);
}

fn resolve_expr(expr: @ast::expr, cx: ctxt, visitor: visit::vt<ctxt>) {
    record_parent(cx, expr.id);
    alt expr.node {
      ast::expr_call(*) {
        #debug["node %d: %s", expr.id, pprust::expr_to_str(expr)];
        let new_cx = {parent: some(expr.id) with cx};
        visit::visit_expr(expr, new_cx, visitor);
      }
      ast::expr_alt(subexpr, _, _) {
        #debug["node %d: %s", expr.id, pprust::expr_to_str(expr)];
        let new_cx = {parent: some(expr.id) with cx};
        visit::visit_expr(expr, new_cx, visitor);
      }
      ast::expr_fn(_, _, _, cap_clause) |
      ast::expr_fn_block(_, _, cap_clause) {
        // although the capture items are not expressions per se, they
        // do get "evaluated" in some sense as copies or moves of the
        // relevant variables so we parent them like an expression
        for (*cap_clause).each |cap_item| {
            record_parent(cx, cap_item.id);
        }
        visit::visit_expr(expr, cx, visitor);
      }
      _ {
        visit::visit_expr(expr, cx, visitor);
      }
    }
}

fn resolve_local(local: @ast::local, cx: ctxt, visitor: visit::vt<ctxt>) {
    record_parent(cx, local.node.id);
    visit::visit_local(local, cx, visitor);
}

fn resolve_item(item: @ast::item, cx: ctxt, visitor: visit::vt<ctxt>) {
    // Items create a new outer block scope as far as we're concerned.
    let new_cx: ctxt = {parent: none with cx};
    visit::visit_item(item, new_cx, visitor);
}

fn resolve_fn(fk: visit::fn_kind, decl: ast::fn_decl, body: ast::blk,
              sp: span, id: ast::node_id, cx: ctxt,
              visitor: visit::vt<ctxt>) {

    let fn_cx = alt fk {
      visit::fk_item_fn(*) | visit::fk_method(*) |
      visit::fk_ctor(*) | visit::fk_dtor(*) {
        // Top-level functions are a root scope.
        {parent: some(id) with cx}
      }

      visit::fk_anon(*) | visit::fk_fn_block(*) {
        // Closures continue with the inherited scope.
        cx
      }
    };

    #debug["visiting fn with body %d. cx.parent: %? \
            fn_cx.parent: %?",
           body.node.id, cx.parent, fn_cx.parent];

    for decl.inputs.each |input| {
        cx.region_map.insert(input.id, body.node.id);
    }

    visit::visit_fn(fk, decl, body, sp, id, fn_cx, visitor);
}

fn resolve_crate(sess: session, def_map: resolve::def_map, crate: @ast::crate)
        -> region_map {
    let cx: ctxt = {sess: sess,
                    def_map: def_map,
                    region_map: map::int_hash(),
                    parent: none};
    let visitor = visit::mk_vt(@{
        visit_block: resolve_block,
        visit_item: resolve_item,
        visit_fn: resolve_fn,
        visit_arm: resolve_arm,
        visit_pat: resolve_pat,
        visit_expr: resolve_expr,
        visit_local: resolve_local
        with *visit::default_visitor()
    });
    visit::visit_crate(*crate, cx, visitor);
    ret cx.region_map;
}