diff options
| author | Steve Klabnik <steve@steveklabnik.com> | 2017-02-07 19:08:12 -0500 |
|---|---|---|
| committer | Steve Klabnik <steve@steveklabnik.com> | 2017-02-13 13:41:10 -0500 |
| commit | 22d4adf14aabfb0ea0cfc8daff203e8d474cbc52 (patch) | |
| tree | 47e38ca93da0581bf477919da94d422e469d55ee /src/doc/nomicon/vec-alloc.md | |
| parent | e943e68a47dfbdd73d34f3b40e628f3031f90b6a (diff) | |
| download | rust-22d4adf14aabfb0ea0cfc8daff203e8d474cbc52.tar.gz rust-22d4adf14aabfb0ea0cfc8daff203e8d474cbc52.zip | |
Port Nomicon to mdbook
1. move everything under a src directory 2. add README.md to the SUMMARY.md
Diffstat (limited to 'src/doc/nomicon/vec-alloc.md')
| -rw-r--r-- | src/doc/nomicon/vec-alloc.md | 223 |
1 files changed, 0 insertions, 223 deletions
diff --git a/src/doc/nomicon/vec-alloc.md b/src/doc/nomicon/vec-alloc.md deleted file mode 100644 index bc60a577bd3..00000000000 --- a/src/doc/nomicon/vec-alloc.md +++ /dev/null @@ -1,223 +0,0 @@ -% Allocating Memory - -Using Unique throws a wrench in an important feature of Vec (and indeed all of -the std collections): an empty Vec doesn't actually allocate at all. So if we -can't allocate, but also can't put a null pointer in `ptr`, what do we do in -`Vec::new`? Well, we just put some other garbage in there! - -This is perfectly fine because we already have `cap == 0` as our sentinel for no -allocation. We don't even need to handle it specially in almost any code because -we usually need to check if `cap > len` or `len > 0` anyway. The traditional -Rust value to put here is `0x01`. The standard library actually exposes this -as `alloc::heap::EMPTY`. There are quite a few places where we'll -want to use `heap::EMPTY` because there's no real allocation to talk about but -`null` would make the compiler do bad things. - -All of the `heap` API is totally unstable under the `heap_api` feature, though. -We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of -the `heap` API anyway, so let's just get that dependency over with. - -So: - -```rust,ignore -#![feature(alloc, heap_api)] - -use std::mem; - -use alloc::heap::EMPTY; - -impl<T> Vec<T> { - fn new() -> Self { - assert!(mem::size_of::<T>() != 0, "We're not ready to handle ZSTs"); - unsafe { - // need to cast EMPTY to the actual ptr type we want, let - // inference handle it. - Vec { ptr: Unique::new(heap::EMPTY as *mut _), len: 0, cap: 0 } - } - } -} -``` - -I slipped in that assert there because zero-sized types will require some -special handling throughout our code, and I want to defer the issue for now. -Without this assert, some of our early drafts will do some Very Bad Things. - -Next we need to figure out what to actually do when we *do* want space. For -that, we'll need to use the rest of the heap APIs. These basically allow us to -talk directly to Rust's allocator (jemalloc by default). - -We'll also need a way to handle out-of-memory (OOM) conditions. The standard -library calls the `abort` intrinsic, which just calls an illegal instruction to -crash the whole program. The reason we abort and don't panic is because -unwinding can cause allocations to happen, and that seems like a bad thing to do -when your allocator just came back with "hey I don't have any more memory". - -Of course, this is a bit silly since most platforms don't actually run out of -memory in a conventional way. Your operating system will probably kill the -application by another means if you legitimately start using up all the memory. -The most likely way we'll trigger OOM is by just asking for ludicrous quantities -of memory at once (e.g. half the theoretical address space). As such it's -*probably* fine to panic and nothing bad will happen. Still, we're trying to be -like the standard library as much as possible, so we'll just kill the whole -program. - -We said we don't want to use intrinsics, so doing exactly what `std` does is -out. Instead, we'll call `std::process::exit` with some random number. - -```rust -fn oom() { - ::std::process::exit(-9999); -} -``` - -Okay, now we can write growing. Roughly, we want to have this logic: - -```text -if cap == 0: - allocate() - cap = 1 -else: - reallocate() - cap *= 2 -``` - -But Rust's only supported allocator API is so low level that we'll need to do a -fair bit of extra work. We also need to guard against some special -conditions that can occur with really large allocations or empty allocations. - -In particular, `ptr::offset` will cause us a lot of trouble, because it has -the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to -not have dealt with this instruction, here's the basic story with GEP: alias -analysis, alias analysis, alias analysis. It's super important to an optimizing -compiler to be able to reason about data dependencies and aliasing. - -As a simple example, consider the following fragment of code: - -```rust -# let x = &mut 0; -# let y = &mut 0; -*x *= 7; -*y *= 3; -``` - -If the compiler can prove that `x` and `y` point to different locations in -memory, the two operations can in theory be executed in parallel (by e.g. -loading them into different registers and working on them independently). -However the compiler can't do this in general because if x and y point to -the same location in memory, the operations need to be done to the same value, -and they can't just be merged afterwards. - -When you use GEP inbounds, you are specifically telling LLVM that the offsets -you're about to do are within the bounds of a single "allocated" entity. The -ultimate payoff being that LLVM can assume that if two pointers are known to -point to two disjoint objects, all the offsets of those pointers are *also* -known to not alias (because you won't just end up in some random place in -memory). LLVM is heavily optimized to work with GEP offsets, and inbounds -offsets are the best of all, so it's important that we use them as much as -possible. - -So that's what GEP's about, how can it cause us trouble? - -The first problem is that we index into arrays with unsigned integers, but -GEP (and as a consequence `ptr::offset`) takes a signed integer. This means -that half of the seemingly valid indices into an array will overflow GEP and -actually go in the wrong direction! As such we must limit all allocations to -`isize::MAX` elements. This actually means we only need to worry about -byte-sized objects, because e.g. `> isize::MAX` `u16`s will truly exhaust all of -the system's memory. However in order to avoid subtle corner cases where someone -reinterprets some array of `< isize::MAX` objects as bytes, std limits all -allocations to `isize::MAX` bytes. - -On all 64-bit targets that Rust currently supports we're artificially limited -to significantly less than all 64 bits of the address space (modern x64 -platforms only expose 48-bit addressing), so we can rely on just running out of -memory first. However on 32-bit targets, particularly those with extensions to -use more of the address space (PAE x86 or x32), it's theoretically possible to -successfully allocate more than `isize::MAX` bytes of memory. - -However since this is a tutorial, we're not going to be particularly optimal -here, and just unconditionally check, rather than use clever platform-specific -`cfg`s. - -The other corner-case we need to worry about is empty allocations. There will -be two kinds of empty allocations we need to worry about: `cap = 0` for all T, -and `cap > 0` for zero-sized types. - -These cases are tricky because they come -down to what LLVM means by "allocated". LLVM's notion of an -allocation is significantly more abstract than how we usually use it. Because -LLVM needs to work with different languages' semantics and custom allocators, -it can't really intimately understand allocation. Instead, the main idea behind -allocation is "doesn't overlap with other stuff". That is, heap allocations, -stack allocations, and globals don't randomly overlap. Yep, it's about alias -analysis. As such, Rust can technically play a bit fast and loose with the notion of -an allocation as long as it's *consistent*. - -Getting back to the empty allocation case, there are a couple of places where -we want to offset by 0 as a consequence of generic code. The question is then: -is it consistent to do so? For zero-sized types, we have concluded that it is -indeed consistent to do a GEP inbounds offset by an arbitrary number of -elements. This is a runtime no-op because every element takes up no space, -and it's fine to pretend that there's infinite zero-sized types allocated -at `0x01`. No allocator will ever allocate that address, because they won't -allocate `0x00` and they generally allocate to some minimal alignment higher -than a byte. Also generally the whole first page of memory is -protected from being allocated anyway (a whole 4k, on many platforms). - -However what about for positive-sized types? That one's a bit trickier. In -principle, you can argue that offsetting by 0 gives LLVM no information: either -there's an element before the address or after it, but it can't know which. -However we've chosen to conservatively assume that it may do bad things. As -such we will guard against this case explicitly. - -*Phew* - -Ok with all the nonsense out of the way, let's actually allocate some memory: - -```rust,ignore -fn grow(&mut self) { - // this is all pretty delicate, so let's say it's all unsafe - unsafe { - // current API requires us to specify size and alignment manually. - let align = mem::align_of::<T>(); - let elem_size = mem::size_of::<T>(); - - let (new_cap, ptr) = if self.cap == 0 { - let ptr = heap::allocate(elem_size, align); - (1, ptr) - } else { - // as an invariant, we can assume that `self.cap < isize::MAX`, - // so this doesn't need to be checked. - let new_cap = self.cap * 2; - // Similarly this can't overflow due to previously allocating this - let old_num_bytes = self.cap * elem_size; - - // check that the new allocation doesn't exceed `isize::MAX` at all - // regardless of the actual size of the capacity. This combines the - // `new_cap <= isize::MAX` and `new_num_bytes <= usize::MAX` checks - // we need to make. We lose the ability to allocate e.g. 2/3rds of - // the address space with a single Vec of i16's on 32-bit though. - // Alas, poor Yorick -- I knew him, Horatio. - assert!(old_num_bytes <= (::std::isize::MAX as usize) / 2, - "capacity overflow"); - - let new_num_bytes = old_num_bytes * 2; - let ptr = heap::reallocate(*self.ptr as *mut _, - old_num_bytes, - new_num_bytes, - align); - (new_cap, ptr) - }; - - // If allocate or reallocate fail, we'll get `null` back - if ptr.is_null() { oom(); } - - self.ptr = Unique::new(ptr as *mut _); - self.cap = new_cap; - } -} -``` - -Nothing particularly tricky here. Just computing sizes and alignments and doing -some careful multiplication checks. - |
