| Age | Commit message (Collapse) | Author | Lines |
|
r=lolbinarycat,GuillaumeGomez
rustdoc-search: search backend with partitioned suffix tree
Before:
- https://notriddle.com/windows-docs-rs/doc-old/windows/
- https://doc.rust-lang.org/1.89.0/std/index.html
- https://doc.rust-lang.org/1.89.0/nightly-rustc/rustc_hir/index.html
After:
- https://notriddle.com/windows-docs-rs/doc/windows/
- https://notriddle.com/rustdoc-html-demo-12/stringdex/doc/std/index.html
- https://notriddle.com/rustdoc-html-demo-12/stringdex/compiler-doc/rustc_hir/index.html
## Summary
Rewrites the rustdoc search engine to use an indexed data structure, factored out as a crate called [stringdex](https://crates.io/crates/stringdex), that allows it to perform modified-levenshtein distance calculations, substring matches, and prefix matches in a reasonably efficient, and, more importantly, *incremental* algorithm.
## Motivation
Fixes https://github.com/rust-lang/rust/issues/131156
While the windows-rs crate is definitely the worst offender, I've noticed performance problems with the compiler crates as well. It makes no sense for rustdoc-search to have poor performance: it's basically a spell checker, and those have been usable since the 90's.
Stringdex is particularly designed to quickly return exact matches, to always report those matches first, and to never, ever [place new matches on top of old ones](https://web.dev/articles/cls). It also tries to yield to the event loop occasionally as it runs. This way, you can click the exactly-matched result before the rest of the search finishes running.
## Explanation
A longer description of how name search works can be found in stringdex's [HACKING.md](https://gitlab.com/notriddle/stringdex/-/blob/main/HACKING.md) file.
Type search is done by performing a name search on each element, then performing bitmap operations to narrow down a list of potential matches, then performing type unification on each match.
## Drawbacks
It's rather complex, and takes up more disk space than the current flat list of strings.
## Rationale and alternatives
Instead of a suffix tree, I could've used a different [approximate matching data structure](https://en.wikipedia.org/wiki/Approximate_string_matching). I didn't do that because I wanted to keep the current behavior (for example, a straightforward trigram index won't match [oepn](https://doc.rust-lang.org/nightly/std/?search=oepn) like the current system does).
## Prior art
[Sherlodoc](https://github.com/art-w/sherlodoc) is based on a similar concept, but they:
- use edit distance over a suffix tree for type-based search, instead of the binary matching that's implemented here
- use substring matching for name-based search, [but not fuzzy name matching](https://github.com/art-w/sherlodoc/issues/21)
- actually implement body text search, which is a potential-future feature, but not implemented in this PR
## Future possibilities
### Low-level optimization in stringdex
There are half a dozen low-level optimizations that I still need to explore. I haven't done them yet, because I've been working on bug fixes and rebasing on rustdoc's side, and a more solid and diverse test suite for stringdex itself.
- Stringdex decides whether to bundle two nodes into the same file based on size. To figure out a node's size, I have to run compression on it. This is probably slower than it needs to be.
- Stack compression is limited to the same 256-slot sliding windows as backref compression, and it doesn't have to be. (stack and backref compression are used to optimize the representation of the edge pointer from a parent node to its child; backref uses one byte, while stack is entirely implicit)
- The JS-side decoder is pretty naive. It performs unnecessary hash table lookups when decoding compressed nodes, and retains a list of hashes that it doesn't need. It needs to calculate the hashes in order to construct the merkle tree correctly, but it doesn't need to keep them.
- Data compression happens at the end, while emitting the node. This means it's not being counted when deciding on how to bundle, which is pretty dumb.
### Improved recall in type-driven search
Right now, type-driven search performs very strict matching. It's very precise, but misses a lot of things people would want.
What I'm not sure about is whether to focus more on edit-distance-based approaches, or to focus on type-theoretical approaches. Both gives avenues to improve, but edit distance is going to be faster while type checking is going to be more precise.
For example, a type theoretical improvement would fix [`Iterator<T>, (T -> U) -> Iterator<U>`](https://doc.rust-lang.org/nightly/std/?search=Iterator%3CT%3E%2C%20(T%20-%3E%20U)%20-%3E%20Iterator%3CU%3E) to give [`Iterator::map`](https://doc.rust-lang.org/nightly/std/iter/trait.Iterator.html#method.map), because it would recognize that the Map struct implements the Iterator trait. I don't know of any clean way to get this result to work without implementing significant type checking logic in search.js, and an edit-distance-based "dirty" approach would likely give a bunch of other results on top of this one.
## Full-text search
Once you've got this fuzzy dictionary matching to work, the logical next step is to implement some kind of information retrieval-based approach to phrase matching.
Like applying edit distance to types, phrase search gets you significantly better recall, but with a few major drawbacks:
- You have to pick between index bloat and the use of stopwords. Stopwords are bad because they might actually be important (try searching "if let" in mdBook if you're feeling brave), but without them, you spend a lot of space on text that doesn't matter.
- Example code also tends to have a lot of irrelevant stuff in it. Like stop words, we'd have to pick potentially-confusing or bloat.
Neither of these problems are deal-breakers, but they're worth keeping in mind.
|
|
Do not copy .rmeta files into the sysroot of the build compiler during check of rustc/std
Before, when bootstrap did a check build of rustc stage N (with a build compiler that was stage N-1), it automatically copied the resulting `.rmeta` artifacts into the sysroot of the stage N-1 build compiler, so that stage N `rustc_private` tools such as `miri` could be compiled using the stage N-1 build compiler. This has a number of issues:
- It was done unconditionally, even if no `rustc_private` tools were actually built.
- If we did a check and a build of the same stage compiler in the same bootstrap invocation, the generated rmeta and rlib files could clash. This is also why you can see that `check::Std` actually doesn't copy the artifacts anymore (which forced us to build std instead of just checking it in a bunch of `Check` steps).
- It was polluting the sysroot of the build compiler. This is especially annoying for the stage 0 compiler, because we are forced to create an artificial sysroot for it, so that we can copy new stuff into it.
- It was very implicit in bootstrap.
Based on suggestions by ```@cuviper``` and ```@bjorn3,``` I tried to change how this behaves. Instead of copying the rmeta artifacts into the sysroot of the build compiler (from where they would be loaded implicitly), they are now stored in a separate transient bootstrap build directory, and they are then explicitly passed *only* when checking `rustc_private` tools using the `-L` flag. The flags are passed out-of-band through our rustc wrapper, to avoid invalidating the build cache. This is the first commit.
The second commit does the same for std. For a few months, we used to build std instead of just checking it when doing a cross-compile check of something that required std, this now fixes it. There is still the previous ordering requirement though, that `check::Std` has to be executed as the last check step, or rather nothing that requires checked std should be executed *after* it, because it will run into rmeta/rlib duplications (https://github.com/rust-lang/rust/blob/4fa90ef7996f891f7f1e126411e5d75afe64accf/src/bootstrap/src/core/builder/mod.rs#L1066). I tried to fix in this PR, but it quickly runs into the fact that building things currently copies *rlib* artifacts into the build compiler sysroot. I want to fix that as one of the next steps. After we get rid of all the copies (or rather, we only do the copies for dist/stage2+ and do not copy anything into the stage0 compiler's sysroot), we could hopefully finally get rid of `stage0-sysroot`.
Based on my local tests, this seems to be working fine. If it works on CI, and we don't run into other issues after merging it, I'd like to do the same also for rlib artifacts generated during `x build`.
r? ```@jieyouxu```
|
|
Remove the `#[no_sanitize]` attribute in favor of `#[sanitize(xyz = "on|off")]`
This came up during the sanitizer stabilization (rust-lang/rust#123617). Instead of a `#[no_sanitize(xyz)]` attribute, we would like to have a `#[sanitize(xyz = "on|off")]` attribute, which is more powerful and allows to be extended in the future (instead
of just focusing on turning sanitizers off). The implementation is done according to what was [discussed on Zulip](https://rust-lang.zulipchat.com/#narrow/channel/343119-project-exploit-mitigations/topic/Stabilize.20the.20.60no_sanitize.60.20attribute/with/495377292)).
The new attribute also works on modules, traits and impl items and thus enables usage as the following:
```rust
#[sanitize(address = "off")]
mod foo {
fn unsanitized(..) {}
#[sanitize(address = "on")]
fn sanitized(..) {}
}
trait MyTrait {
#[sanitize(address = "off")]
fn unsanitized_default(..) {}
}
#[sanitize(thread = "off")]
impl MyTrait for () {
...
}
```
r? ```@rcvalle```
|
|
|
|
ci: add timeout to windows disk cleanup wait
|
|
|
|
|
|
Also use tracing macro syntax instead of format()
|
|
|
|
expression is a local variable
|
|
Update cargo
28 commits in 840b83a10fb0e039a83f4d70ad032892c287570a..71eb84f21aef43c07580c6aed6f806a6299f5042
2025-07-30 13:59:19 +0000 to 2025-08-17 17:18:56 +0000
- update tests to match lint message changes from rust-lang/rust#140794 (rust-lang/cargo#15849)
- chore: downgrade to libc@0.2.174 (rust-lang/cargo#15851)
- Reorder `lto` options in profiles.md (rust-lang/cargo#15841)
- feat(unstable): add -Zbuild-analysis unstable feature (rust-lang/cargo#15845)
- refactor(unstable): group stabilized features (rust-lang/cargo#15846)
- Fixes error while running the cargo clippy --all-targets -- -D warning (rust-lang/cargo#15843)
- Clarify that `cargo doc --no-deps` is cumulative and won’t delete prev (rust-lang/cargo#15800)
- docs: Formatting and cross-linking to build-dir/target-dir docs (rust-lang/cargo#15840)
- Stabilize `build.build-dir` (rust-lang/cargo#15833)
- make resolve features public for cargo-as-a-library (rust-lang/cargo#15835)
- chore(deps): bump slab from 0.4.10 to 0.4.11 (rust-lang/cargo#15832)
- chore: remove x86_64-apple-darwin from CI and tests (rust-lang/cargo#15831)
- chore(deps): update msrv (3 versions) to v1.87 (rust-lang/cargo#15819)
- perf(package): Always reuse the workspace's target-dir (rust-lang/cargo#15783)
- More helpful error for invalid cargo-features = [] (rust-lang/cargo#15781)
- Add initial integration for `--json=timings` behing `-Zsection-timings` (rust-lang/cargo#15780)
- add is_inherited methods to InheritableDependency and InheritableField (rust-lang/cargo#15828)
- chore(deps): update compatible (rust-lang/cargo#15804)
- docs(unstable): Link out to the Plumbing commands effort (rust-lang/cargo#15821)
- chore(deps): update cargo-semver-checks to v0.43.0 (rust-lang/cargo#15825)
- test(build-std): relax the thread name assertion (rust-lang/cargo#15822)
- chore(deps): update msrv (1 version) to v1.89 (rust-lang/cargo#15815)
- Update semver tests for 1.89 (rust-lang/cargo#15816)
- Accessing each build script's `OUT_DIR` and in the correct order (rust-lang/cargo#15776)
- chore: bump to 0.92.0; update changelog (rust-lang/cargo#15807)
- docs: `-Zpackage-workspace` has been stabilized (rust-lang/cargo#15808)
- chore(deps): update rust crate cargo_metadata to 0.21.0 (rust-lang/cargo#15795)
- docs(build-rs): Fix broken intra-doc links (rust-lang/cargo#15810)
|
|
|
|
|
|
|
|
To make it more obvious what it's testing.
This is its own commit to make git blame easier.
|
|
Add tracing to data race functions
|
|
compiler & tools dependencies:
Locking 28 packages to latest compatible versions
Updating anyhow v1.0.98 -> v1.0.99
Updating bitflags v2.9.1 -> v2.9.2
Updating clap v4.5.43 -> v4.5.45
Updating clap_builder v4.5.43 -> v4.5.44
Updating clap_derive v4.5.41 -> v4.5.45
Updating curl v0.4.48 -> v0.4.49
Updating curl-sys v0.4.82+curl-8.14.1 -> v0.4.83+curl-8.15.0
Updating cxx v1.0.166 -> v1.0.168
Updating cxx-build v1.0.166 -> v1.0.168
Updating cxxbridge-cmd v1.0.166 -> v1.0.168
Updating cxxbridge-flags v1.0.166 -> v1.0.168
Updating cxxbridge-macro v1.0.166 -> v1.0.168
Updating glob v0.3.2 -> v0.3.3
Updating object v0.37.2 -> v0.37.3
Updating proc-macro2 v1.0.95 -> v1.0.101
Updating rayon v1.10.0 -> v1.11.0
Updating rayon-core v1.12.1 -> v1.13.0
Updating serde-untagged v0.1.7 -> v0.1.8
Updating socket2 v0.5.10 -> v0.6.0
Updating syn v2.0.104 -> v2.0.106
Updating thiserror v2.0.12 -> v2.0.15
Updating thiserror-impl v2.0.12 -> v2.0.15
Updating uuid v1.17.0 -> v1.18.0
Updating wasm-encoder v0.236.0 -> v0.236.1
Updating wasmparser v0.236.0 -> v0.236.1
Updating wast v236.0.0 -> v236.0.1
Updating wat v1.236.0 -> v1.236.1
note: pass `--verbose` to see 35 unchanged dependencies behind latest
library dependencies:
Locking 2 packages to latest compatible versions
Updating libc v0.2.174 -> v0.2.175
Updating object v0.37.2 -> v0.37.3
note: pass `--verbose` to see 2 unchanged dependencies behind latest
rustbook dependencies:
Locking 13 packages to latest compatible versions
Updating anyhow v1.0.98 -> v1.0.99
Updating bitflags v2.9.1 -> v2.9.2
Updating cc v1.2.32 -> v1.2.33
Updating clap v4.5.43 -> v4.5.45
Updating clap_builder v4.5.43 -> v4.5.44
Updating clap_complete v4.5.56 -> v4.5.57
Updating clap_derive v4.5.41 -> v4.5.45
Updating proc-macro2 v1.0.95 -> v1.0.101
Updating syn v2.0.104 -> v2.0.106
Updating terminal_size v0.4.2 -> v0.4.3
Updating thiserror v2.0.12 -> v2.0.15
Updating thiserror-impl v2.0.12 -> v2.0.15
|
|
|
|
|
|
|
|
|
|
|
|
This removes the #[no_sanitize] attribute, which was behind an unstable
feature named no_sanitize. Instead, we introduce the sanitize attribute
which is more powerful and allows to be extended in the future (instead
of just focusing on turning sanitizers off).
This also makes sanitize(kernel_address = ..) attribute work with
-Zsanitize=address
To do it the same as how clang disables address sanitizer, we now
disable ASAN on sanitize(kernel_address = "off") and KASAN on
sanitize(address = "off").
The same was added to clang in https://reviews.llvm.org/D44981.
|
|
|
|
fix: Only import the item in "Unqualify method call" if needed
|
|
|
|
Speedup `copy_src_dirs` in bootstrap
I was kinda offended by how slow it was. Just the `copy_src_dirs` part took ~3s locally in the `x dist rustc-src` step. In release mode it was just 1s, but that's kind of cheating (I wonder if we should build bootstrap in release mode on CI though...).
Did some basic optimizations to bring it down to ~1s also in debug mode.
Maybe it's overkill, due to https://github.com/rust-lang/rust/pull/145455. Up to you whether we should merge it or close it :)
r? `````````@jieyouxu`````````
|
|
Add static glibc to the nix dev shell
This fixes `tests/ui/process/nofile-limit.rs` which fails to link on nixos for me without this change.
|
|
Fix outdated doc comment
This updates the documentation comment for `Type::is_doc_subtype_of` to more accurately describe its purpose as a subtyping check, rather than equality
fixes rust-lang/rust#138572
r? ````````````@tgross35````````````
|
|
|
|
|
|
Pull recent changes from https://github.com/rust-lang/rust via Josh.
Upstream ref: 425a9c0a0e365c0b8c6cfd00c2ded83a73bed9a0
Filtered ref: 7e955d5a6c676a099595bdfaec0705d3703e7a3c
This merge was created using https://github.com/rust-lang/josh-sync.
|
|
This updates the rust-version file to 425a9c0a0e365c0b8c6cfd00c2ded83a73bed9a0.
|
|
Pull recent changes from https://github.com/rust-lang/rust via Josh.
Upstream ref: 425a9c0a0e365c0b8c6cfd00c2ded83a73bed9a0
Filtered ref: 26b9fd24259f4fc5fd7634a99dd6dda2821fb2d0
This merge was created using https://github.com/rust-lang/josh-sync.
|
|
This updates the rust-version file to 425a9c0a0e365c0b8c6cfd00c2ded83a73bed9a0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|