| Age | Commit message (Collapse) | Author | Lines |
|
|
|
|
|
|
|
mem::swap the obvious way for types smaller than the SIMD optimization's block size
LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't. Found in the `replace_with` RFC discussion.
Examples of the improvements:
<details>
<summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary>
```rust
type Demo = [u16; 3];
pub fn swap_demo(x: &mut Demo, y: &mut Demo) {
std::mem::swap(x, y);
}
```
nightly:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
.seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE
sub rsp, 32
.seh_stackalloc 32
.seh_endprologue
movzx eax, word ptr [rcx + 4]
mov word ptr [rsp + 4], ax
mov eax, dword ptr [rcx]
mov dword ptr [rsp], eax
movzx eax, word ptr [rdx + 4]
mov word ptr [rcx + 4], ax
mov eax, dword ptr [rdx]
mov dword ptr [rcx], eax
movzx eax, word ptr [rsp + 4]
mov word ptr [rdx + 4], ax
mov eax, dword ptr [rsp]
mov dword ptr [rdx], eax
add rsp, 32
ret
.seh_handlerdata
.section .text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE
.seh_endproc
```
this PR:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
mov r8d, dword ptr [rcx]
movzx r9d, word ptr [rcx + 4]
movzx eax, word ptr [rdx + 4]
mov word ptr [rcx + 4], ax
mov eax, dword ptr [rdx]
mov dword ptr [rcx], eax
mov word ptr [rdx + 4], r9w
mov dword ptr [rdx], r8d
ret
```
</details>
<details>
<summary>`replace_with` optimizes down much better</summary>
Inspired by https://github.com/rust-lang/rfcs/pull/2490,
```rust
fn replace_with<T, F>(x: &mut Option<T>, f: F)
where F: FnOnce(Option<T>) -> Option<T>
{
*x = f(x.take());
}
pub fn inc_opt(mut x: &mut Option<i32>) {
replace_with(&mut x, |i| i.map(|j| j + 1));
}
```
Rust 1.26.0:
```asm
_ZN4blah7inc_opt17heb0acb64c51777cfE:
mov rax, qword ptr [rcx]
movabs r8, 4294967296
add r8, rax
shl rax, 32
movabs rdx, -4294967296
and rdx, r8
xor r8d, r8d
test rax, rax
cmove rdx, rax
setne r8b
or rdx, r8
mov qword ptr [rcx], rdx
ret
```
Nightly (better thanks to ScalarPair, maybe?):
```asm
_ZN4blah7inc_opt17h66df690be0b5899dE:
mov r8, qword ptr [rcx]
mov rdx, r8
shr rdx, 32
xor eax, eax
test r8d, r8d
setne al
add edx, 1
mov dword ptr [rcx], eax
mov dword ptr [rcx + 4], edx
ret
```
This PR:
```asm
_ZN4blah7inc_opt17h1426dc215ecbdb19E:
xor eax, eax
cmp dword ptr [rcx], 0
setne al
mov dword ptr [rcx], eax
add dword ptr [rcx + 4], 1
ret
```
Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (https://github.com/rust-lang/rust/pull/49420#issuecomment-376805721).
</details>
|
|
LLVM isn't able to remove the alloca for the unaligned block in the SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't. Found in the `replace_with` RFC discussion.
|
|
The documentation of Unique::empty() and NonNull::dangling() could
potentially suggest that they work as sentinel values indicating a
not-yet-initialized pointer. However, they both declare a non-null
pointer equal to the alignment of the type, which could potentially
reference a valid value of that type (specifically, the first such valid
value in memory). Explicitly document that the return value of these
functions does not work as a sentinel value.
|
|
Add #[repr(transparent)] to some libcore types
* `UnsafeCell`
* `Cell`
* `NonZero*`
* `NonNull`
* `Unique`
CC https://github.com/rust-lang/rust/issues/43036
|
|
|
|
* `UnsafeCell`
* `Cell`
* `NonZero*`
* `NonNull`
* `Unique`
|
|
|
|
Fixes https://github.com/rust-lang/rust/issues/50657.
|
|
Implement [T]::align_to
Note that this PR deviates from what is accepted by RFC slightly by making `align_offset` to return an offset in elements, rather than bytes. This is necessary to sanely support `[T]::align_to` and also simply makes more sense™. The caveat is that trying to align a pointer of ZST is now an equivalent to `is_aligned` check, rather than anything else (as no number of ZST elements will align a misaligned ZST pointer).
It also implements the `align_to` slightly differently than proposed in the RFC to properly handle cases where size of T and U aren’t co-prime.
Furthermore, a promise is made that the slice containing `U`s will be as large as possible (contrary to the RFC) – otherwise the function is quite useless.
The implementation uses quite a few underhanded tricks and takes advantage of the fact that alignment is a power-of-two quite heavily to optimise the machine code down to something that results in as few known-expensive instructions as possible. Currently calling `ptr.align_offset` with an unknown-at-compile-time `align` results in code that has just a single "expensive" modulo operation; the rest is "cheap" arithmetic and bitwise ops.
cc https://github.com/rust-lang/rust/issues/44488 @oli-obk
As mentioned in the commit message for align_offset, many thanks go to Chris McDonald.
|
|
Keep only the language item. This removes some indirection and makes
codegen worse for debug builds, but simplifies code significantly, which
is a good tradeoff to make, in my opinion.
Besides, the codegen can be improved even further with some constant
evaluation improvements that we expect to happen in the future.
|
|
This is necessary if we want to implement `[T]::align_to` and is more
useful in general.
This implementation effort has begun during the All Hands and represents
a month of my futile efforts to do any sort of maths. Luckily, I
found the very very nice Chris McDonald (cjm) on IRC who figured out the
core formulas for me! All the thanks for existence of this PR go to
them!
Anyway… Those formulas were mangled by yours truly into the arcane forms
you see here to squeeze out the best assembly possible on most of the
modern architectures (x86 and ARM were evaluated in practice). I mean,
just look at it: *one actual* modulo operation and everything else is
just the cheap single cycle ops! Admitedly, the naive solution might be
faster in some common scenarios, but this code absolutely butchers the
naive solution on the worst case scenario.
Alas, the result of this arcane magic also means that the code pretty
heavily relies on the preconditions holding true and breaking those
preconditions will unleash the UB-est of all UBs! So don’t.
|
|
There was [some confusion](https://github.com/rust-lang/rust/pull/49767#issuecomment-389250815) and I accidentally merged a PR that wasn't ready.
|
|
It is now an implementation detail of ptr::NonNull and num::NonZero*
|
|
Rewrite docs for `std::ptr`
This PR attempts to resolve #29371.
This is a fairly major rewrite of the `std::ptr` docs, and deserves a fair bit of scrutiny. It adds links to the GNU libc docs for various instrinsics, adds internal links to types and functions referenced in the docs, adds new, more complex examples for many functions, and introduces a common template for discussing unsafety of functions in `std::ptr`.
All functions in `std::ptr` (with the exception of `ptr::eq`) are unsafe because they either read from or write to a raw pointer. The "Safety" section now informs that the function is unsafe because it dereferences a raw pointer and requires that any pointer to be read by the function points to "a valid vaue of type `T`".
Additionally, each function imposes some subset of the following conditions on its arguments.
* The pointer points to valid memory.
* The pointer points to initialized memory.
* The pointer is properly aligned.
These requirements are discussed in the "Undefined Behavior" section along with the consequences of using functions that perform bitwise copies without requiring `T: Copy`. I don't love my new descriptions of the consequences of making such copies. Perhaps the old ones were good enough?
Some issues which still need to be addressed before this is merged:
- [ ] The new docs assert that `drop_in_place` is equivalent to calling `read` and discarding the value. Is this correct?
- [ ] Do `write_bytes` and `swap_nonoverlapping` require properly aligned pointers?
- [ ] The new example for `drop_in_place` is a lackluster.
- [ ] Should these docs rigorously define what `valid` memory is? Or should is that the job of the reference? Should we link to the reference?
- [ ] Is it correct to require that pointers that will be read from refer to "valid values of type `T`"?
- [x] I can't imagine ever using `{read,write}_volatile` with non-`Copy` types. Should I just link to {read,write} and say that the same issues with non-`Copy` types apply?
- [x] `write_volatile` doesn't link back to `read_volatile`.
- [ ] Update docs for the unstable [`swap_nonoverlapping`](https://github.com/rust-lang/rust/issues/42818)
- [ ] Update docs for the unstable [unsafe pointer methods RFC](https://github.com/rust-lang/rfcs/pull/1966)
Looking forward to your feedback.
r? @steveklabnik
|
|
Non-`Copy` types should not be in volatile memory.
|
|
|
|
Make `Vec::new` a `const fn`
`RawVec::empty/_in` are a hack. They're there because `if size_of::<T> == 0 { !0 } else { 0 }` is not allowed in `const` yet. However, because `RawVec` is unstable, the `empty/empty_in` constructors can be removed when #49146 is done...
|
|
std: Mark `ptr::Unique` with `#[doc(hidden)]`
`Unique` is now perma-unstable, so let's hide its docs.
|
|
|
|
|
|
|
|
|
|
Fixes #49608
|
|
|
|
- Remove redundant "unsafe" from module description.
- Add a missing `Safety` heading to `read_unaligned`.
- Remove weasel words in `Undefined Behavior` description for
`write{,_unaligned,_bytes}`.
|
|
|
|
|
|
The rendered version does not make clear that this is a link to another
page, and it breaks the anchor link.
|
|
|
|
- Add links to the GNU libc docs for `memmove`, `memcpy`, and
`memset`, as well as internally linking to other functions in `std::ptr`
- List sources of UB for all functions.
- Add example to `ptr::drop_in_place` and compares it to `ptr::read`.
- Add examples which more closely mirror real world uses for the
functions in `std::ptr`. Also, move the reimplementation of `mem::swap`
to the examples of `ptr::read` and use a more interesting example for
`copy_nonoverlapping`.
- Change module level description
|
|
Bonus: might make code than uses `.len()` on slice iterators faster
|
|
Seems more useful to say that it has the same size as `*mut T`.
|
|
Introduce unsafe offset_from on pointers
Adds intrinsics::exact_div to take advantage of the unsafe, which reduces the implementation from
```asm
sub rcx, rdx
mov rax, rcx
sar rax, 63
shr rax, 62
lea rax, [rax + rcx]
sar rax, 2
ret
```
down to
```asm
sub rcx, rdx
sar rcx, 2
mov rax, rcx
ret
```
(for `*const i32`)
See discussion on the `offset_to` tracking issue https://github.com/rust-lang/rust/issues/41079
Some open questions
- Would you rather I split the intrinsic PR from the library PR?
- Do we even want the safe version of the API? https://github.com/rust-lang/rust/issues/41079#issuecomment-374426786 I've added some text to its documentation that even if it's not UB, it's useless to use it between pointers into different objects.
and todos
- [x] ~~I need to make a codegen test~~ Done
- [x] ~~Can the subtraction use nsw/nuw?~~ No, it can't https://github.com/rust-lang/rust/pull/49297#discussion_r176697574
- [x] ~~Should there be `usize` variants of this, like there are now `add` and `sub` that you almost always want over `offset`? For example, I imagine `sub_ptr` that returns `usize` and where it's UB if the distance is negative.~~ Can wait for later; C gives a signed result https://github.com/rust-lang/rust/issues/41079#issuecomment-375842235, so we might as well, and this existing to go with `offset` makes sense.
|
|
|
|
|
|
Adds intrinsics::exact_div to take advantage of the unsafe, which reduces the implementation from
```asm
sub rcx, rdx
mov rax, rcx
sar rax, 63
shr rax, 62
lea rax, [rax + rcx]
sar rax, 2
ret
```
down to
```asm
sub rcx, rdx
sar rcx, 2
mov rax, rcx
ret
```
(for `*const i32`)
|
|
It has been deprecated for about one release cycle.
|
|
These will eventually be removed
(though the NonZero<T> lang item will likely stay).
|
|
|
|
also minor doc fixes.
Closes #43941
|
|
|
|
This is less verbose than going through raw pointers to cast with `as`.
|
|
|
|
|
|
|
|
|
|
|