| Age | Commit message (Collapse) | Author | Lines |
|
Expose algebraic floating point intrinsics
# Problem
A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.
See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.
### C++: 10us ✅
With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
<th>C++</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
#pragma clang fp reassociate(on)
float sum = 0.0;
for (size_t i = 0; i < len; ++i) {
sum += a[i] * b[i];
}
return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>
### Nightly Rust: 10us ✅
With rustc 1.86.0-nightly (8239a37f9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>
### Stable Rust: 84us ❌
With rustc 1.84.1 (e71f9a9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum += a[i] * b[i];
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>
# Proposed Change
Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.
# Alternatives Considered
https://github.com/rust-lang/rust/issues/21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.
In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.
# References
* https://github.com/rust-lang/rust/issues/21690
* https://github.com/rust-lang/libs-team/issues/532
* https://github.com/rust-lang/rust/issues/136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps
try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
|
|
|
|
|
|
|
|
|
|
Rustup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
r=wesleywiser,jieyouxu
Mangle rustc_std_internal_symbols functions
This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`.
Helps mitigate https://github.com/rust-lang/rust/issues/104707
try-job: aarch64-gnu-debug
try-job: aarch64-apple
try-job: x86_64-apple-1
try-job: x86_64-mingw-1
try-job: i686-mingw-1
try-job: x86_64-msvc-1
try-job: i686-msvc-1
try-job: test-various
try-job: armhf-gnu
|
|
Shourya742:2025-02-15-change-config.toml-to-bootstrap.toml, r=onur-ozkan,jieyouxu,kobzol
change config.toml to bootstrap.toml
Currently, both Bootstrap and Cargo uses same name as their configuration file, which can be confusing. This PR is based on a discussion to rename `config.toml` to `bootstrap.toml` for Bootstrap. Closes: https://github.com/rust-lang/rust/issues/126875.
I have split the PR into atomic commits to make it easier to review. Once the changes are finalized, I will squash them. I am particularly concerned about the changes made to modules that are not part of Bootstrap. How should we handle those changes? Should we ping the respective maintainers?
|
|
|
|
Stablize anonymous pipe
Since #135822 is staled, I create this PR to stablise anonymous pipe
Closes #127154
try-job: test-various
|
|
|
|
|
|
|
|
|
|
|
|
For macros that are implemented on the compiler, we do *not* mention the `-Zmacro-backtrace` flag. This includes `derive`s and standard macros.
|
|
|
|
|
|
Allow more top-down inlining for single-BB callees
This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if
we've already done a bunch of inlining to find the calls to them.
Fixes #138136
~~Draft as it's built atop #138135, which adds a mir-opt test that's a nice demonstration of this. To see just this change, look at <https://github.com/rust-lang/rust/pull/138157/commits/48f63e3be552605c2933056b77bf23a326757f92>~~ Rebased to be just the inlining change, as the other existing tests show it great.
|
|
minor interpreter cleanups
- remove the `eval_inline_asm` hook that `@saethlin` added; the usage never materialized and he agreed with removing it
- I tried merging `init_alloc_extra` and `adjust_global_allocation` and it didn't work; leave a comment as to why. Also, make the allocation code path a bit more clear by renaming `init_alloc_extra` to `init_local_allocation`.
r? `@oli-obk`
|
|
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
|
|
This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if we've already done a bunch of inlining to find the calls to them.
|
|
FnAbi Compatability check
|
|
added input arg mismatch test
added detailed ub messages
added return type mismatch test
|
|
allocations
|
|
|
|
|
|
memory FFI can access
|
|
|
|
|
|
|
|
|
|
Show interpreter backtrace error on Ctrl+C
|
|
See https://github.com/rust-lang/rust/pull/111769
|
|
Automatic Rustup
|
|
|
|
|
|
Update documentation about nextest
|
|
Fix tier 2 sysroots job
|
|
|
|
|
|
miri native-call support: all previously exposed provenance is accessible to the callee
When Miri invokes a native C function, the memory C can access needs to be "prepared": to avoid false positives, we need to consider all that memory initialized, and we need to consider it to have arbitrary provenance. So far we did this for all pointers passed to C, but not for pointers that were exposed already before the native call. This PR adjusts the logic so that we now "prepare" all memory that has ever been exposed.
This fixes cases such as:
- cast a pointer to integer, send that integer to C, and access the memory there (`test_pass_ptr_as_int`)
- send a pointer to some memory to C, which stores it somewhere; then in Rust store another pointer in that memory, and access that via C (`test_pass_ptr_via_previously_shared_mem`)
r? `````@oli-obk`````
|
|
|
|
|