| Age | Commit message (Collapse) | Author | Lines |
|
Use constant eval to do strict mem::uninit/zeroed validity checks
I'm not sure about the code organisation here, I just dumped the check in rustc_const_eval at the root. Not hard to move it elsewhere, in any case.
Also, this means cranelift codegen intrinsics lose the strict checks, since they don't seem to depend on rustc_const_eval, and I didn't see a point in keeping around two copies.
I also left comments in the is_zero_valid methods about "uhhh help how do i do this", those apply to both methods equally.
Also rustc_codegen_ssa now depends on rustc_const_eval... is this okay?
Pinging `@RalfJung` since you were the one who mentioned this to me, so I'm assuming you're interested.
Haven't had a chance to run full tests on this since it's really warm, and it's 1AM, I'll check out any failures/comments in the morning :)
|
|
|
|
|
|
This is no longer used only for debugging options (e.g. `-Zoutput-width`, `-Zallow-features`).
Rename it to be more clear.
|
|
|
|
|
|
|
|
There are several indications that we should not ZST as a ScalarInt:
- We had two ways to have ZST valtrees, either an empty `Branch` or a `Leaf` with a ZST in it.
`ValTree::zst()` used the former, but the latter could possibly arise as well.
- Likewise, the interpreter had `Immediate::Uninit` and `Immediate::Scalar(Scalar::ZST)`.
- LLVM codegen already had to special-case ZST ScalarInt.
So instead add new ZST variants to those types that did not have other variants
which could be used for this purpose.
|
|
Use less string interning
This removes string interning in a couple of places where doing so won't result in perf improvements. I also switched one place to use pre-interned symbols.
|
|
|
|
Rollup of 8 pull requests
Successful merges:
- #96856 (Fix ProjectionElem validation)
- #97711 (Improve soundness of rustc_arena)
- #98507 (Finishing touches for `#[expect]` (RFC 2383))
- #98692 (rustdoc: Cleanup more FIXMEs)
- #98901 (incr: cache dwarf objects in work products)
- #98930 (Make MIR basic blocks field public)
- #98973 (Remove (unused) inherent impl anchors)
- #98981 ( Edit `rustc_mir_dataflow::framework` documentation )
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
|
|
rustc_codegen_ssa: use `project_index`, not `project_field`, for array literals.
See https://github.com/rust-lang/rust/pull/98615#issuecomment-1170082774 for some context.
In short, we were using `project_field` even for array `mir::Rvalue::Aggregate`s, which results in benchmarks like `deep-vector.rs` (and presumably also some real-world usecases?) being impacted by how we handle non-array aggregate fields.
(This is a separate PR so that we can measure the perf effects in isolation)
r? `@nikic`
|
|
|
|
Allow arithmetic and certain bitwise ops on AtomicPtr
This is mainly to support migrating from `AtomicUsize`, for the strict provenance experiment.
This is a pretty dubious set of APIs, but it should be sufficient to allow code that's using `AtomicUsize` to manipulate a tagged pointer atomically. It's under a new feature gate, `#![feature(strict_provenance_atomic_ptr)]`, but I'm not sure if it needs its own tracking issue. I'm happy to make one, but it's not clear that it's needed.
I'm unsure if it needs changes in the various non-LLVM backends. Because we just cast things to integers anyway (and were already doing so), I doubt it.
API change proposal: https://github.com/rust-lang/libs-team/issues/60
Fixes #95492
|
|
|
|
Change enum->int casts to not go through MIR casts.
follow-up to https://github.com/rust-lang/rust/pull/96814
this simplifies all backends and even gives LLVM more information about the return value of `Rvalue::Discriminant`, enabling optimizations in more cases.
|
|
|
|
|
|
Enable MIR inlining
Continuation of https://github.com/rust-lang/rust/pull/82280 by `@wesleywiser.`
#82280 has shown nice compile time wins could be obtained by enabling MIR inlining.
Most of the issues in https://github.com/rust-lang/rust/issues/81567 are now fixed,
except the interaction with polymorphization which is worked around specifically.
I believe we can proceed with enabling MIR inlining in the near future
(preferably just after beta branching, in case we discover new issues).
Steps before merging:
- [x] figure out the interaction with polymorphization;
- [x] figure out how miri should deal with extern types;
- [x] silence the extra arithmetic overflow warnings;
- [x] remove the codegen fulfilment ICE;
- [x] remove the type normalization ICEs while compiling nalgebra;
- [ ] tweak the inlining threshold.
|
|
|
|
This is mainly to support migrating from AtomicUsize, for the strict
provenance experiment.
Fixes #95492
|
|
|
|
r=oli-obk
Added llvm lifetime annotations to function call argument temporaries.
The goal of this change is to ensure that llvm will do stack slot
optimization on these temporaries. This ensures that in code like:
```rust
const A: [u8; 1024] = [0; 1024];
fn copy_const() {
f(A);
f(A);
}
```
we only use 1024 bytes of stack space, instead of 2048 bytes.
I am new to developing for the rust compiler, and as such not entirely sure, but I believe this should be sufficient to close #98156.
Also, this does not contain a test case to ensure this keeps working, primarily because I am not sure how to go about testing this. I would love some suggestions as to how that could be approached.
|
|
Instead we generate a discriminant rvalue and cast the result of that.
|
|
|
|
|
|
Simplify memory ordering intrinsics
This changes the names of the atomic intrinsics to always fully include their memory ordering arguments.
```diff
- atomic_cxchg
+ atomic_cxchg_seqcst_seqcst
- atomic_cxchg_acqrel
+ atomic_cxchg_acqrel_release
- atomic_cxchg_acqrel_failrelaxed
+ atomic_cxchg_acqrel_relaxed
// And so on.
```
- `seqcst` is no longer implied
- The failure ordering on chxchg is no longer implied in some cases, but now always explicitly part of the name.
- `release` is no longer shortened to just `rel`. That was especially confusing, since `relaxed` also starts with `rel`.
- `acquire` is no longer shortened to just `acq`, such that the names now all match the `std::sync::atomic::Ordering` variants exactly.
- This now allows for more combinations on the compare exchange operations, such as `atomic_cxchg_acquire_release`, which is necessary for #68464.
- This PR only exposes the new possibilities through unstable intrinsics, but not yet through the stable API. That's for [a separate PR](https://github.com/rust-lang/rust/pull/98383) that requires an FCP.
Suffixes for operations with a single memory order:
| Order | Before | After |
|---------|--------------|------------|
| Relaxed | `_relaxed` | `_relaxed` |
| Acquire | `_acq` | `_acquire` |
| Release | `_rel` | `_release` |
| AcqRel | `_acqrel` | `_acqrel` |
| SeqCst | (none) | `_seqcst` |
Suffixes for compare-and-exchange operations with two memory orderings:
| Success | Failure | Before | After |
|---------|---------|--------------------------|--------------------|
| Relaxed | Relaxed | `_relaxed` | `_relaxed_relaxed` |
| Relaxed | Acquire | :x: | `_relaxed_acquire` |
| Relaxed | SeqCst | :x: | `_relaxed_seqcst` |
| Acquire | Relaxed | `_acq_failrelaxed` | `_acquire_relaxed` |
| Acquire | Acquire | `_acq` | `_acquire_acquire` |
| Acquire | SeqCst | :x: | `_acquire_seqcst` |
| Release | Relaxed | `_rel` | `_release_relaxed` |
| Release | Acquire | :x: | `_release_acquire` |
| Release | SeqCst | :x: | `_release_seqcst` |
| AcqRel | Relaxed | `_acqrel_failrelaxed` | `_acqrel_relaxed` |
| AcqRel | Acquire | `_acqrel` | `_acqrel_acquire` |
| AcqRel | SeqCst | :x: | `_acqrel_seqcst` |
| SeqCst | Relaxed | `_failrelaxed` | `_seqcst_relaxed` |
| SeqCst | Acquire | `_failacq` | `_seqcst_acquire` |
| SeqCst | SeqCst | (none) | `_seqcst_seqcst` |
|
|
|
|
|
|
|
|
The goal of this change is to ensure that llvm will do stack slot
optimization on these temporaries. This ensures that in code like:
```rust
const A: [u8; 1024] = [0; 1024];
fn copy_const() {
f(A);
f(A);
}
```
we only use 1024 bytes of stack space, instead of 2048 bytes.
|
|
|
|
|
|
Introduce `-Zvirtual-function-elimination` codegen flag
Fixes #68262
This PR adds a codegen flag `-Zvirtual-function-elimination` to enable the VFE optimization in LLVM. To make this work, additonal information has to be added to vtables ([`!vcall_visibility` metadata](https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata) and a `typeid` of the trait). Furthermore, instead of just `load`ing functions, the [`llvm.type.checked.load` intrinsic](https://llvm.org/docs/LangRef.html#llvm-type-checked-load-intrinsic) has to be used to map functions to vtables.
For technical details of the changes, see the commit messages.
I also tested this flag on https://github.com/tock/tock on different boards to verify that this fixes the issue https://github.com/tock/tock/issues/2594. This flag is able to improve the size of the resulting binary by about 8k-9k bytes by removing the unused debug print functions.
[Rendered documentation update](https://github.com/flip1995/rust/blob/pk-vfe/src/doc/rustc/src/codegen-options/index.md#virtual-function-elimination)
|
|
|
|
Add the intrinsic
declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type)
This is used in the VFE optimization when lowering loading functions
from vtables to LLVM IR. The `metadata` is used to map the function to
all vtables this function could belong to. This ensures that functions
from vtables that might be used somewhere won't get removed.
|
|
And likewise for the `Const::val` method.
Because its type is called `ConstKind`. Also `val` is a confusing name
because `ConstKind` is an enum with seven variants, one of which is
called `Value`. Also, this gives consistency with `TyS` and `PredicateS`
which have `kind` fields.
The commit also renames a few `Const` variables from `val` to `c`, to
avoid confusion with the `ConstKind::Value` variant.
|
|
Co-authored-by: Oli Scherer <github35764891676564198441@oli-obk.de>
|
|
|
|
|
|
A pointer to address cast are often special-cased.
Introduce a dedicated cast kind to make them easy distinguishable.
|
|
clone_on_copy
useless_format
bind_instead_of_map
filter_map_identity
useless_conversion
map_flatten
unnecessary_unwrap
|
|
rustc_codegen_ssa: cleanup `AtomicOrdering`
* Remove unused `NotAtomic` ordering.
* Rename `Monotonic` to `Relaxed` - a Rust specific name.
* Derive copy and clone.
|
|
* Remove unused `NotAtomic` ordering.
* Rename `Monotonic` to `Relaxed` - a Rust specific name.
|
|
|
|
|
|
|
|
Like we have `add`/`sub` which are the `usize` version of `offset`, this adds the `usize` equivalent of `offset_from`. Like how `.add(d)` replaced a whole bunch of `.offset(d as isize)`, you can see from the changes here that it's fairly common that code actually knows the order between the pointers and *wants* a `usize`, not an `isize`.
As a bonus, this can do `sub nuw`+`udiv exact`, rather than `sub`+`sdiv exact`, which can be optimized slightly better because it doesn't have to worry about negatives. That's why the slice iterators weren't using `offset_from`, though I haven't updated that code in this PR because slices are so perf-critical that I'll do it as its own change.
This is an intrinsic, like `offset_from`, so that it can eventually be allowed in CTFE. It also allows checking the extra safety condition -- see the test confirming that CTFE catches it if you pass the pointers in the wrong order.
|
|
The reverse postorder, unlike preorder, is now cached inside the MIR
body. Code generation uses reverse postorder anyway, so it might be
a small perf improvement to use it here as well.
|
|
Reduce duplication of RPO calculation of mir
Computing the RPO of mir is not a low-cost thing, but it is duplicate in many places. In particular the `iterate_to_fixpoint` method which is called multiple times when computing the data flow.
This PR reduces the number of times the RPO is recalculated as much as possible, which should save some compile time.
|