about summary refs log tree commit diff
path: root/src/doc/nomicon/vec-alloc.md
diff options
context:
space:
mode:
authorSteve Klabnik <steve@steveklabnik.com>2017-02-07 19:08:12 -0500
committerSteve Klabnik <steve@steveklabnik.com>2017-02-13 13:41:10 -0500
commit22d4adf14aabfb0ea0cfc8daff203e8d474cbc52 (patch)
tree47e38ca93da0581bf477919da94d422e469d55ee /src/doc/nomicon/vec-alloc.md
parente943e68a47dfbdd73d34f3b40e628f3031f90b6a (diff)
downloadrust-22d4adf14aabfb0ea0cfc8daff203e8d474cbc52.tar.gz
rust-22d4adf14aabfb0ea0cfc8daff203e8d474cbc52.zip
Port Nomicon to mdbook
1. move everything under a src directory
2. add README.md to the SUMMARY.md
Diffstat (limited to 'src/doc/nomicon/vec-alloc.md')
-rw-r--r--src/doc/nomicon/vec-alloc.md223
1 files changed, 0 insertions, 223 deletions
diff --git a/src/doc/nomicon/vec-alloc.md b/src/doc/nomicon/vec-alloc.md
deleted file mode 100644
index bc60a577bd3..00000000000
--- a/src/doc/nomicon/vec-alloc.md
+++ /dev/null
@@ -1,223 +0,0 @@
-% Allocating Memory
-
-Using Unique throws a wrench in an important feature of Vec (and indeed all of
-the std collections): an empty Vec doesn't actually allocate at all. So if we
-can't allocate, but also can't put a null pointer in `ptr`, what do we do in
-`Vec::new`? Well, we just put some other garbage in there!
-
-This is perfectly fine because we already have `cap == 0` as our sentinel for no
-allocation. We don't even need to handle it specially in almost any code because
-we usually need to check if `cap > len` or `len > 0` anyway. The traditional
-Rust value to put here is `0x01`. The standard library actually exposes this
-as `alloc::heap::EMPTY`. There are quite a few places where we'll
-want to use `heap::EMPTY` because there's no real allocation to talk about but
-`null` would make the compiler do bad things.
-
-All of the `heap` API is totally unstable under the `heap_api` feature, though.
-We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
-the `heap` API anyway, so let's just get that dependency over with.
-
-So:
-
-```rust,ignore
-#![feature(alloc, heap_api)]
-
-use std::mem;
-
-use alloc::heap::EMPTY;
-
-impl<T> Vec<T> {
-    fn new() -> Self {
-        assert!(mem::size_of::<T>() != 0, "We're not ready to handle ZSTs");
-        unsafe {
-            // need to cast EMPTY to the actual ptr type we want, let
-            // inference handle it.
-            Vec { ptr: Unique::new(heap::EMPTY as *mut _), len: 0, cap: 0 }
-        }
-    }
-}
-```
-
-I slipped in that assert there because zero-sized types will require some
-special handling throughout our code, and I want to defer the issue for now.
-Without this assert, some of our early drafts will do some Very Bad Things.
-
-Next we need to figure out what to actually do when we *do* want space. For
-that, we'll need to use the rest of the heap APIs. These basically allow us to
-talk directly to Rust's allocator (jemalloc by default).
-
-We'll also need a way to handle out-of-memory (OOM) conditions. The standard
-library calls the `abort` intrinsic, which just calls an illegal instruction to
-crash the whole program. The reason we abort and don't panic is because
-unwinding can cause allocations to happen, and that seems like a bad thing to do
-when your allocator just came back with "hey I don't have any more memory".
-
-Of course, this is a bit silly since most platforms don't actually run out of
-memory in a conventional way. Your operating system will probably kill the
-application by another means if you legitimately start using up all the memory.
-The most likely way we'll trigger OOM is by just asking for ludicrous quantities
-of memory at once (e.g. half the theoretical address space). As such it's
-*probably* fine to panic and nothing bad will happen. Still, we're trying to be
-like the standard library as much as possible, so we'll just kill the whole
-program.
-
-We said we don't want to use intrinsics, so doing exactly what `std` does is
-out. Instead, we'll call `std::process::exit` with some random number.
-
-```rust
-fn oom() {
-    ::std::process::exit(-9999);
-}
-```
-
-Okay, now we can write growing. Roughly, we want to have this logic:
-
-```text
-if cap == 0:
-    allocate()
-    cap = 1
-else:
-    reallocate()
-    cap *= 2
-```
-
-But Rust's only supported allocator API is so low level that we'll need to do a
-fair bit of extra work. We also need to guard against some special
-conditions that can occur with really large allocations or empty allocations.
-
-In particular, `ptr::offset` will cause us a lot of trouble, because it has
-the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to
-not have dealt with this instruction, here's the basic story with GEP: alias
-analysis, alias analysis, alias analysis. It's super important to an optimizing
-compiler to be able to reason about data dependencies and aliasing.
-
-As a simple example, consider the following fragment of code:
-
-```rust
-# let x = &mut 0;
-# let y = &mut 0;
-*x *= 7;
-*y *= 3;
-```
-
-If the compiler can prove that `x` and `y` point to different locations in
-memory, the two operations can in theory be executed in parallel (by e.g.
-loading them into different registers and working on them independently).
-However the compiler can't do this in general because if x and y point to
-the same location in memory, the operations need to be done to the same value,
-and they can't just be merged afterwards.
-
-When you use GEP inbounds, you are specifically telling LLVM that the offsets
-you're about to do are within the bounds of a single "allocated" entity. The
-ultimate payoff being that LLVM can assume that if two pointers are known to
-point to two disjoint objects, all the offsets of those pointers are *also*
-known to not alias (because you won't just end up in some random place in
-memory). LLVM is heavily optimized to work with GEP offsets, and inbounds
-offsets are the best of all, so it's important that we use them as much as
-possible.
-
-So that's what GEP's about, how can it cause us trouble?
-
-The first problem is that we index into arrays with unsigned integers, but
-GEP (and as a consequence `ptr::offset`) takes a signed integer. This means
-that half of the seemingly valid indices into an array will overflow GEP and
-actually go in the wrong direction! As such we must limit all allocations to
-`isize::MAX` elements. This actually means we only need to worry about
-byte-sized objects, because e.g. `> isize::MAX` `u16`s will truly exhaust all of
-the system's memory. However in order to avoid subtle corner cases where someone
-reinterprets some array of `< isize::MAX` objects as bytes, std limits all
-allocations to `isize::MAX` bytes.
-
-On all 64-bit targets that Rust currently supports we're artificially limited
-to significantly less than all 64 bits of the address space (modern x64
-platforms only expose 48-bit addressing), so we can rely on just running out of
-memory first. However on 32-bit targets, particularly those with extensions to
-use more of the address space (PAE x86 or x32), it's theoretically possible to
-successfully allocate more than `isize::MAX` bytes of memory.
-
-However since this is a tutorial, we're not going to be particularly optimal
-here, and just unconditionally check, rather than use clever platform-specific
-`cfg`s.
-
-The other corner-case we need to worry about is empty allocations. There will
-be two kinds of empty allocations we need to worry about: `cap = 0` for all T,
-and `cap > 0` for zero-sized types.
-
-These cases are tricky because they come
-down to what LLVM means by "allocated". LLVM's notion of an
-allocation is significantly more abstract than how we usually use it. Because
-LLVM needs to work with different languages' semantics and custom allocators,
-it can't really intimately understand allocation. Instead, the main idea behind
-allocation is "doesn't overlap with other stuff". That is, heap allocations,
-stack allocations, and globals don't randomly overlap. Yep, it's about alias
-analysis. As such, Rust can technically play a bit fast and loose with the notion of
-an allocation as long as it's *consistent*.
-
-Getting back to the empty allocation case, there are a couple of places where
-we want to offset by 0 as a consequence of generic code. The question is then:
-is it consistent to do so? For zero-sized types, we have concluded that it is
-indeed consistent to do a GEP inbounds offset by an arbitrary number of
-elements. This is a runtime no-op because every element takes up no space,
-and it's fine to pretend that there's infinite zero-sized types allocated
-at `0x01`. No allocator will ever allocate that address, because they won't
-allocate `0x00` and they generally allocate to some minimal alignment higher
-than a byte. Also generally the whole first page of memory is
-protected from being allocated anyway (a whole 4k, on many platforms).
-
-However what about for positive-sized types? That one's a bit trickier. In
-principle, you can argue that offsetting by 0 gives LLVM no information: either
-there's an element before the address or after it, but it can't know which.
-However we've chosen to conservatively assume that it may do bad things. As
-such we will guard against this case explicitly.
-
-*Phew*
-
-Ok with all the nonsense out of the way, let's actually allocate some memory:
-
-```rust,ignore
-fn grow(&mut self) {
-    // this is all pretty delicate, so let's say it's all unsafe
-    unsafe {
-        // current API requires us to specify size and alignment manually.
-        let align = mem::align_of::<T>();
-        let elem_size = mem::size_of::<T>();
-
-        let (new_cap, ptr) = if self.cap == 0 {
-            let ptr = heap::allocate(elem_size, align);
-            (1, ptr)
-        } else {
-            // as an invariant, we can assume that `self.cap < isize::MAX`,
-            // so this doesn't need to be checked.
-            let new_cap = self.cap * 2;
-            // Similarly this can't overflow due to previously allocating this
-            let old_num_bytes = self.cap * elem_size;
-
-            // check that the new allocation doesn't exceed `isize::MAX` at all
-            // regardless of the actual size of the capacity. This combines the
-            // `new_cap <= isize::MAX` and `new_num_bytes <= usize::MAX` checks
-            // we need to make. We lose the ability to allocate e.g. 2/3rds of
-            // the address space with a single Vec of i16's on 32-bit though.
-            // Alas, poor Yorick -- I knew him, Horatio.
-            assert!(old_num_bytes <= (::std::isize::MAX as usize) / 2,
-                    "capacity overflow");
-
-            let new_num_bytes = old_num_bytes * 2;
-            let ptr = heap::reallocate(*self.ptr as *mut _,
-                                        old_num_bytes,
-                                        new_num_bytes,
-                                        align);
-            (new_cap, ptr)
-        };
-
-        // If allocate or reallocate fail, we'll get `null` back
-        if ptr.is_null() { oom(); }
-
-        self.ptr = Unique::new(ptr as *mut _);
-        self.cap = new_cap;
-    }
-}
-```
-
-Nothing particularly tricky here. Just computing sizes and alignments and doing
-some careful multiplication checks.
-