about summary refs log tree commit diff
path: root/src/doc
diff options
context:
space:
mode:
authorBoxy <supbscripter@gmail.com>2023-07-11 19:46:07 +0100
committerMichael Goulet <michael@errs.io>2023-07-11 16:43:01 -0400
commit26aea847939d57a593eb650aa1e2ee59033a7e5d (patch)
tree2cc9708f7016eac3f773cbea40a47b850cfefb48 /src/doc
parent823509ac3209010910c610e5475f66ecc5839127 (diff)
downloadrust-26aea847939d57a593eb650aa1e2ee59033a7e5d.tar.gz
rust-26aea847939d57a593eb650aa1e2ee59033a7e5d.zip
split file and replace with deep dive doc
Diffstat (limited to 'src/doc')
-rw-r--r--src/doc/rustc-dev-guide/src/SUMMARY.md3
-rw-r--r--src/doc/rustc-dev-guide/src/bound-vars-and-params.md61
-rw-r--r--src/doc/rustc-dev-guide/src/early-late-bound.md266
3 files changed, 242 insertions, 88 deletions
diff --git a/src/doc/rustc-dev-guide/src/SUMMARY.md b/src/doc/rustc-dev-guide/src/SUMMARY.md
index 4870e0cbfa0..ca262e3eebb 100644
--- a/src/doc/rustc-dev-guide/src/SUMMARY.md
+++ b/src/doc/rustc-dev-guide/src/SUMMARY.md
@@ -109,9 +109,10 @@
     - [`TypeFolder` and `TypeFoldable`](./ty-fold.md)
     - [Generic arguments](./generic_arguments.md)
     - [Constants in the type system](./constants.md)
+    - [Bound vars and Parameters](./bound-vars-and-params.md)
 - [Type inference](./type-inference.md)
 - [Trait solving](./traits/resolution.md)
-    - [Early and Late Bound Parameters](./early-late-bound.md)
+    - [Early and Late Bound Parameter Definitions](./early-late-bound.md)
     - [Higher-ranked trait bounds](./traits/hrtb.md)
     - [Caching subtleties](./traits/caching.md)
     - [Specialization](./traits/specialization.md)
diff --git a/src/doc/rustc-dev-guide/src/bound-vars-and-params.md b/src/doc/rustc-dev-guide/src/bound-vars-and-params.md
new file mode 100644
index 00000000000..923eae10fd1
--- /dev/null
+++ b/src/doc/rustc-dev-guide/src/bound-vars-and-params.md
@@ -0,0 +1,61 @@
+# Bound vars and parameters
+
+## Early-bound parameters
+
+Early-bound parameters in rustc are identified by an index, stored in the
+[`ParamTy`] struct for types or the [`EarlyBoundRegion`] struct for lifetimes.
+The index counts from the outermost declaration in scope. This means that as you
+add more binders inside, the index doesn't change.
+
+For example,
+
+```rust,ignore
+trait Foo<T> {
+  type Bar<U> = (Self, T, U);
+}
+```
+
+Here, the type `(Self, T, U)` would be `($0, $1, $2)`, where `$N` means a
+[`ParamTy`] with the index of `N`.
+
+In rustc, the [`Generics`] structure carries this information. So the
+[`Generics`] for `Bar` above would be just like for `U` and would indicate the
+'parent' generics of `Foo`, which declares `Self` and `T`.  You can read more
+in [this chapter](./generics.md).
+
+[`ParamTy`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.ParamTy.html
+[`EarlyBoundRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBoundRegion.html
+[`Generics`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Generics.html
+
+## Late-bound parameters
+
+Late-bound parameters in `rustc` are handled quite differently (they are also
+specialized to lifetimes since, right now, only late-bound lifetimes are
+supported, though with GATs that has to change). We indicate their potential
+presence by a [`Binder`] type. The [`Binder`] doesn't know how many variables
+there are at that binding level. This can only be determined by walking the
+type itself and collecting them. So a type like `for<'a, 'b> ('a, 'b)` would be
+`for (^0.a, ^0.b)`. Here, we just write `for` because we don't know the names
+of the things bound within.
+
+Moreover, a reference to a late-bound lifetime is written `^0.a`:
+
+- The `0` is the index; it identifies that this lifetime is bound in the
+  innermost binder (the `for`).
+- The `a` is the "name"; late-bound lifetimes in rustc are identified by a
+  "name" -- the [`BoundRegionKind`] enum. This enum can contain a
+  [`DefId`][defid] or it might have various "anonymous" numbered names. The
+  latter arise from types like `fn(&u32, &u32)`, which are equivalent to
+  something like `for<'a, 'b> fn(&'a u32, &'b u32)`, but the names of those
+  lifetimes must be generated.
+
+This setup of not knowing the full set of variables at a binding level has some
+advantages and some disadvantages. The disadvantage is that you must walk the
+type to find out what is bound at the given level and so forth. The advantage
+is primarily that, when constructing types from Rust syntax, if we encounter
+anonymous regions like in `fn(&u32)`, we just create a fresh index and don't have
+to update the binder.
+
+[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html
+[`BoundRegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegionKind.html
+[defid]: ./hir.html#identifiers-in-the-hir
diff --git a/src/doc/rustc-dev-guide/src/early-late-bound.md b/src/doc/rustc-dev-guide/src/early-late-bound.md
index 29b2136b4c4..f0f9f0a85f0 100644
--- a/src/doc/rustc-dev-guide/src/early-late-bound.md
+++ b/src/doc/rustc-dev-guide/src/early-late-bound.md
@@ -1,107 +1,199 @@
-# Early and Late Bound Variables
+# Early and Late Bound Parameter Definitions
 
-In Rust, item definitions (like `fn`) can often have generic parameters, which
-are always [_universally_ quantified][quant]. That is, if you have a function
-like
+Understanding this page likely requires a rudimentary understanding of higher ranked
+trait bounds/`for<'a>`and also what types such as `dyn for<'a> Trait<'a>` and 
+`for<'a> fn(&'a u32)` mean. Reading [the nomincon chapter](https://doc.rust-lang.org/nomicon/hrtb.html)
+on HRTB may be useful for understanding this syntax. The meaning of `for<'a> fn(&'a u32)`
+is incredibly similar to the meaning of `T: for<'a> Trait<'a>`.
 
+If you are looking for information on the `RegionKind` variants `ReLateBound` and `ReEarlyBound`
+you should look at the section on [bound vars and params](./bound-vars-and-params.md). This section
+discusses what makes generic parameters on functions and closures late/early bound. Not the general
+concept of bound vars and generic parameters which `RegionKind` has named somewhat confusingly
+with this topic.
+
+## What does it mean for parameters to be early or late bound
+
+All function definitions conceptually have a zst (this is represented by `TyKind::FnDef` in rustc).
+The only generics on this zst are the early bound parameters of the function definition. e.g.
 ```rust
-fn foo<T>(x: T) { }
+fn foo<'a>(_: &'a u32) {}
+
+fn main() {
+    let b = foo;
+    //  ^ `b` has type `FnDef(foo, [])` (no substs because `'a` is late bound)
+    assert!(std::mem::size_of_val(&b) == 0);
+}
 ```
 
-this function is defined "for all T" (not "for some specific T", which would be
-[_existentially_ quantified][quant]).
+In order to call `b` the late bound parameters do need to be provided, these are inferred at the
+call site instead of when we refer to `foo`.
+```rust
+fn main() {
+    let b = foo;
+    let a: &'static u32 = &10;
+    foo(a);
+    // the lifetime argument for `'a` on `foo` is inferred at the callsite
+    // the generic parameter `'a` on `foo` is inferred to `'static` here
+}
+```
+
+Because late bound parameters are not part of the `FnDef`'s substs this allows us to prove trait
+bounds such as `F: for<'a> Fn(&'a u32)` where `F` is `foo`'s `FnDef`. e.g.
+```rust
+fn foo_early<'a, T: Trait<'a>>(_: &'a u32, _: T) {}
+fn foo_late<'a, T>(_: &'a u32, _: T) {}
+
+fn accepts_hr_func<F: for<'a> Fn(&'a u32, u32)>(_: F) {}
+
+fn main() {
+    // doesnt work, the substituted bound is `for<'a> FnDef<'?0>: Fn(&'a u32, u32)`
+    // `foo_early` only implements `for<'a> FnDef<'a>: Fn(&'a u32, u32)`- the lifetime
+    // of the borrow in the function argument must be the same as the lifetime
+    // on the `FnDef`.
+    accepts_hr_func(foo_early);
+    
+    // works, the substituted bound is `for<'a> FnDef: Fn(&'a u32, u32)`
+    accepts_hr_func(foo_late);
+}
+
+// the builtin `Fn` impls for `foo_early` and `foo_late` look something like:
+// `foo_early`
+impl<'a, T: Trait<'a>> Fn(&'a u32, T) for FooEarlyFnDef<'a, T> { ... }
+// `foo_late`
+impl<'a, T> Fn(&'a u32, T) for FooLateFnDef<T> { ... }
+
+```
+
+Early bound parameters are present on the `FnDef`. Late bound generic parameters are not present
+on the `FnDef` but are instead constrained by the builtin `Fn*` impl.
+
+The same distinction applies to closures. Instead of `FnDef` we are talking about the anonymous
+closure type. Closures are [currently unsound](https://github.com/rust-lang/rust/issues/84366) in
+ways that are closely related to the distinction between early/late bound
+parameters (more on this later)
+
+The early/late boundness of generic parameters is only relevent for the desugaring of
+functions/closures into types with builtin `Fn*` impls. It does not make sense to talk about
+in other contexts.
+
+The `generics_of` query in rustc only contains early bound parameters. In this way it acts more
+like `generics_of(my_func)` is the generics for the FnDef than the generics provided to the function
+body although it's not clear to the author of this section if this was the actual justification for
+making `generics_of` behave this way.
 
-[quant]: ./appendix/background.md#quantified
+## What parameters are currently late bound
 
-While Rust *items* can be quantified over types, lifetimes, and constants, the
-types of values in Rust are only ever quantified over lifetimes. So you can
-have a type like `for<'a> fn(&'a u32)`, which represents a function pointer
-that takes a reference with any lifetime, or `for<'a> dyn Trait<'a>`, which is
-a `dyn` trait for a trait implemented for any lifetime; but we have no type
-like `for<T> fn(T)`, which would be a function that takes a value of *any type*
-as a parameter. This is a consequence of monomorphization -- to support a value
-of type `for<T> fn(T)`, we would need a single function pointer that can be
-used for a parameter of any type, but in Rust we generate customized code for
-each parameter type.
+Below are the current requirements for determining if a generic parameter is late bound. It is worth
+keeping in mind that these are not necessarily set in stone and it is almost certainly possible to 
+be more flexible.
 
-One consequence of this asymmetry is a weird split in how we represent some
-generic types: _early-_ and _late-_ bound parameters.
-Basically, if we cannot represent a type (e.g. a universally quantified type),
-we have to bind it _early_ so that the unrepresentable type is never around.
+### Must be a lifetime parameter
 
-Consider the following example:
+Rust can't support types such as `for<T> dyn Trait<T>` or `for<T> fn(T)`, this is a
+fundamental limitation of the language as we are required to monomorphize type/const
+parameters and cannot do so behind dynamic dispatch. (technically we could probably
+support `for<T> dyn MarkerTrait<T>` as there is nothing to monomorphize)
 
-```rust,ignore
-fn foo<'a, 'b, T>(x: &'a u32, y: &'b T) where T: 'b { ... }
+Not being able to support `for<T> dyn Trait<T>` resulted in making all type and const
+parameters early bound. Only lifetime parameters can be late bound.
+
+### Must not appear in the where clauses
+
+In order for a generic parameter to be late bound it must not appear in any where clauses.
+This is currently an incredibly simplistic check that causes lifetimes to be early bound even
+if the where clause they appear in are always true, or implied by well formedness of function
+arguments. e.g.
+```rust
+fn foo1<'a: 'a>(_: &'a u32) {}
+//     ^^ early bound parameter because it's in a `'a: 'a` clause
+//        even though the bound obviously holds all the time
+fn foo2<'a, T: Trait<'a>(a: T, b: &'a u32) {}
+//     ^^ early bound parameter because it's used in the `T: Trait<'a>` clause
+fn foo3<'a, T: 'a>(_: &'a T) {}
+//     ^^ early bound parameter because it's used in the `T: 'a` clause
+//        even though that bound is implied by wellformedness of `&'a T`
+fn foo4<'a, 'b: 'a>(_: Inv<&'a ()>, _: Inv<&'b ()>) {}
+//      ^^  ^^         ^^^ note:
+//      ^^  ^^         `Inv` stands for `Invariant` and is used to
+//      ^^  ^^          make the the type parameter invariant. This
+//      ^^  ^^          is necessary for demonstration purposes as
+//      ^^  ^^          `for<'a, 'b> fn(&'a (), &'b ())` and
+//      ^^  ^^          `for<'a> fn(&'a u32, &'a u32)` are subtypes-
+//      ^^  ^^          of eachother which makes the bound trivially
+//      ^^  ^^          satisfiable when making the fnptr. `Inv`
+//      ^^  ^^          disables this subtyping.
+//      ^^  ^^
+//      ^^^^^^ both early bound parameters because they are present in the
+//            `'b: 'a` clause
 ```
 
-We cannot treat `'a`, `'b`, and `T` in the same way.  Types in Rust can't have
-`for<T> { .. }`, only `for<'a> {...}`, so whenever you reference `foo` the type
-you get back can't be `for<'a, 'b, T> fn(&'a u32, y: &'b T)`. Instead, the `T`
-must be substituted early. In particular, you have:
+The reason for this requirement is that we cannot represent the `T: Trait<'a>` or `'a: 'b` clauses
+on a function pointer. `for<'a, 'b> fn(Inv<&'a ()>, Inv<&'b ()>)` is not a valid function pointer to
+represent`foo4` as it would allow calling the function without `'b: 'a` holding.
 
-```rust,ignore
-let x = foo; // T, 'b have to be substituted here
-x(...);      // 'a substituted here, at the point of call
-x(...);      // 'a substituted here with a different value
+### Must be constrained by where clauses or function argument types
+
+The builtin impls of the `Fn*` traits for closures and `FnDef`s cannot not have any unconstrained
+parameters. For example the following impl is illegal:
+```rust
+impl<'a> Trait for u32 { type Assoc = &'a u32; }
+```
+We must not end up with a similar impl for the `Fn*` traits e.g.
+```rust
+impl<'a> Fn<()> for FnDef { type Assoc = &'a u32 }
 ```
 
-## Early-bound parameters
+Violating this rule can trivially lead to unsoundness as seen in [#84366](https://github.com/rust-lang/rust/issues/84366).
+Additionally if we ever support late bound type params then an impl like:
+```rust
+impl<T> Fn<()> for FnDef { type Assoc = T; }
+```
+would break the compiler in various ways.
 
-Early-bound parameters in rustc are identified by an index, stored in the
-[`ParamTy`] struct for types or the [`EarlyBoundRegion`] struct for lifetimes.
-The index counts from the outermost declaration in scope. This means that as you
-add more binders inside, the index doesn't change.
+In order to ensure that everything functions correctly, we do not allow generic parameters to
+be late bound if it would result in a builtin impl that does not constrain all of the generic
+parameters on the builtin impl. Making a generic parameter be early bound trivially makes it be
+constrained by the builtin impl as it ends up on the self type.
 
-For example,
+Because of the requirement that late bound parameters must not appear in where clauses, checking
+this is simpler than the rules for checking impl headers constrain all the parameters on the impl.
+We only have to ensure that all late bound parameters appear at least once in the function argument
+types outside of an alias (e.g. an associated type).
 
-```rust,ignore
-trait Foo<T> {
-  type Bar<U> = (Self, T, U);
-}
+The requirement that they not indirectly be in the substs of an alias for it to count is the
+same as why the follow code is forbidden:
+```rust
+impl<T: Trait> OtherTrait for <T as Trait>::Assoc { type Assoc = T }
 ```
+There is no guarantee that `<T as Trait>::Assoc` will normalize to different types for every
+instantiation of `T`. If we were to allow this impl we could get overlapping impls and the
+same is true of the builtin `Fn*` impls.
+
+## Making more generic parameters late bound
+
+It is generally considered desirable for more parameters to be late bound as it makes
+the builtin `Fn*` impls more flexible. Right now many of the requirements for making
+a parameter late bound are overly restrictive as they are tied to what we can currently 
+(or can ever) do with fn ptrs.
+
+It would be theoretically possible to support late bound params in `where`-clauses in the 
+language by introducing implication types which would allow us to express types such as: 
+`for<'a, 'b: 'a> fn(Inv<&'a u32>, Inv<&'b u32>)` which would ensure `'b: 'a` is upheld when
+calling the function pointer. 
+
+It would also be theoretically possible to support it by making the coercion to a fn ptr
+instantiate the parameter with an infer var while still allowing the FnDef to not have the
+generic parameter present as trait impls are perfectly capable of representing the where clauses
+on the function on the impl itself. This would also allow us to support late bound type/const
+vars allowing bounds like `F: for<T> Fn(T)` to hold.
+
+It is almost somewhat unclear if we can change the `Fn` traits to be structured differently
+so that we never have to make a parameter early bound just to make the builtin impl have all
+generics be constrained. Of all the possible causes of a generic parameter being early bound
+this seems the most difficult to remove.
+
+Whether these would be good ideas to implement is a separate question- they are only brought
+up to illustrate that the current rules are not necessarily set in stone and a result of
+"its the only way of doing this".
 
-Here, the type `(Self, T, U)` would be `($0, $1, $2)`, where `$N` means a
-[`ParamTy`] with the index of `N`.
-
-In rustc, the [`Generics`] structure carries this information. So the
-[`Generics`] for `Bar` above would be just like for `U` and would indicate the
-'parent' generics of `Foo`, which declares `Self` and `T`.  You can read more
-in [this chapter](./generics.md).
-
-[`ParamTy`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.ParamTy.html
-[`EarlyBoundRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBoundRegion.html
-[`Generics`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Generics.html
-
-## Late-bound parameters
-
-Late-bound parameters in `rustc` are handled quite differently (they are also
-specialized to lifetimes since, right now, only late-bound lifetimes are
-supported, though with GATs that has to change). We indicate their potential
-presence by a [`Binder`] type. The [`Binder`] doesn't know how many variables
-there are at that binding level. This can only be determined by walking the
-type itself and collecting them. So a type like `for<'a, 'b> ('a, 'b)` would be
-`for (^0.a, ^0.b)`. Here, we just write `for` because we don't know the names
-of the things bound within.
-
-Moreover, a reference to a late-bound lifetime is written `^0.a`:
-
-- The `0` is the index; it identifies that this lifetime is bound in the
-  innermost binder (the `for`).
-- The `a` is the "name"; late-bound lifetimes in rustc are identified by a
-  "name" -- the [`BoundRegionKind`] enum. This enum can contain a
-  [`DefId`][defid] or it might have various "anonymous" numbered names. The
-  latter arise from types like `fn(&u32, &u32)`, which are equivalent to
-  something like `for<'a, 'b> fn(&'a u32, &'b u32)`, but the names of those
-  lifetimes must be generated.
-
-This setup of not knowing the full set of variables at a binding level has some
-advantages and some disadvantages. The disadvantage is that you must walk the
-type to find out what is bound at the given level and so forth. The advantage
-is primarily that, when constructing types from Rust syntax, if we encounter
-anonymous regions like in `fn(&u32)`, we just create a fresh index and don't have
-to update the binder.
-
-[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html
-[`BoundRegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegionKind.html
-[defid]: ./hir.html#identifiers-in-the-hir