| Age | Commit message (Collapse) | Author | Lines |
|
r=lolbinarycat,GuillaumeGomez
rustdoc-search: search backend with partitioned suffix tree
Before:
- https://notriddle.com/windows-docs-rs/doc-old/windows/
- https://doc.rust-lang.org/1.89.0/std/index.html
- https://doc.rust-lang.org/1.89.0/nightly-rustc/rustc_hir/index.html
After:
- https://notriddle.com/windows-docs-rs/doc/windows/
- https://notriddle.com/rustdoc-html-demo-12/stringdex/doc/std/index.html
- https://notriddle.com/rustdoc-html-demo-12/stringdex/compiler-doc/rustc_hir/index.html
## Summary
Rewrites the rustdoc search engine to use an indexed data structure, factored out as a crate called [stringdex](https://crates.io/crates/stringdex), that allows it to perform modified-levenshtein distance calculations, substring matches, and prefix matches in a reasonably efficient, and, more importantly, *incremental* algorithm.
## Motivation
Fixes https://github.com/rust-lang/rust/issues/131156
While the windows-rs crate is definitely the worst offender, I've noticed performance problems with the compiler crates as well. It makes no sense for rustdoc-search to have poor performance: it's basically a spell checker, and those have been usable since the 90's.
Stringdex is particularly designed to quickly return exact matches, to always report those matches first, and to never, ever [place new matches on top of old ones](https://web.dev/articles/cls). It also tries to yield to the event loop occasionally as it runs. This way, you can click the exactly-matched result before the rest of the search finishes running.
## Explanation
A longer description of how name search works can be found in stringdex's [HACKING.md](https://gitlab.com/notriddle/stringdex/-/blob/main/HACKING.md) file.
Type search is done by performing a name search on each element, then performing bitmap operations to narrow down a list of potential matches, then performing type unification on each match.
## Drawbacks
It's rather complex, and takes up more disk space than the current flat list of strings.
## Rationale and alternatives
Instead of a suffix tree, I could've used a different [approximate matching data structure](https://en.wikipedia.org/wiki/Approximate_string_matching). I didn't do that because I wanted to keep the current behavior (for example, a straightforward trigram index won't match [oepn](https://doc.rust-lang.org/nightly/std/?search=oepn) like the current system does).
## Prior art
[Sherlodoc](https://github.com/art-w/sherlodoc) is based on a similar concept, but they:
- use edit distance over a suffix tree for type-based search, instead of the binary matching that's implemented here
- use substring matching for name-based search, [but not fuzzy name matching](https://github.com/art-w/sherlodoc/issues/21)
- actually implement body text search, which is a potential-future feature, but not implemented in this PR
## Future possibilities
### Low-level optimization in stringdex
There are half a dozen low-level optimizations that I still need to explore. I haven't done them yet, because I've been working on bug fixes and rebasing on rustdoc's side, and a more solid and diverse test suite for stringdex itself.
- Stringdex decides whether to bundle two nodes into the same file based on size. To figure out a node's size, I have to run compression on it. This is probably slower than it needs to be.
- Stack compression is limited to the same 256-slot sliding windows as backref compression, and it doesn't have to be. (stack and backref compression are used to optimize the representation of the edge pointer from a parent node to its child; backref uses one byte, while stack is entirely implicit)
- The JS-side decoder is pretty naive. It performs unnecessary hash table lookups when decoding compressed nodes, and retains a list of hashes that it doesn't need. It needs to calculate the hashes in order to construct the merkle tree correctly, but it doesn't need to keep them.
- Data compression happens at the end, while emitting the node. This means it's not being counted when deciding on how to bundle, which is pretty dumb.
### Improved recall in type-driven search
Right now, type-driven search performs very strict matching. It's very precise, but misses a lot of things people would want.
What I'm not sure about is whether to focus more on edit-distance-based approaches, or to focus on type-theoretical approaches. Both gives avenues to improve, but edit distance is going to be faster while type checking is going to be more precise.
For example, a type theoretical improvement would fix [`Iterator<T>, (T -> U) -> Iterator<U>`](https://doc.rust-lang.org/nightly/std/?search=Iterator%3CT%3E%2C%20(T%20-%3E%20U)%20-%3E%20Iterator%3CU%3E) to give [`Iterator::map`](https://doc.rust-lang.org/nightly/std/iter/trait.Iterator.html#method.map), because it would recognize that the Map struct implements the Iterator trait. I don't know of any clean way to get this result to work without implementing significant type checking logic in search.js, and an edit-distance-based "dirty" approach would likely give a bunch of other results on top of this one.
## Full-text search
Once you've got this fuzzy dictionary matching to work, the logical next step is to implement some kind of information retrieval-based approach to phrase matching.
Like applying edit distance to types, phrase search gets you significantly better recall, but with a few major drawbacks:
- You have to pick between index bloat and the use of stopwords. Stopwords are bad because they might actually be important (try searching "if let" in mdBook if you're feeling brave), but without them, you spend a lot of space on text that doesn't matter.
- Example code also tends to have a lot of irrelevant stuff in it. Like stop words, we'd have to pick potentially-confusing or bloat.
Neither of these problems are deal-breakers, but they're worth keeping in mind.
|
|
|
|
When a UI test runs a compiled binary and an error/forbid pattern
check fails, the failure message previously only showed compiler output,
hiding the executed programs stdout/stderr. This makes it harder to
see near-miss or unexpected runtime lines.
|
|
|
|
Simplify polonius location-sensitive analysis
This PR reworks the location-sensitive analysis into what we think is a worthwhile subset of the datalog analysis. A sort of polonius alpha analysis that handles NLL problem case 3 and more, but is still using the faster "reachability as an approximation of liveness", as well as the same loans-in-scope computation as NLLs -- and thus doesn't handle full flow-sensitivity like the datalog implementation.
In the last few months, we've identified this subset as being actionable:
- we believe we can make a stabilizable version of this analysis
- it is an improvement over the status quo
- it can also be modeled in a-mir-formality, or some other formalism, for assurances about soundness, and I believe ````````@nikomatsakis```````` is interested in looking into this during H2.
- and we've identified the areas of work we wish to explore later to gradually expand the supported cases: the differences between reachability and liveness, support of kills, and considerations of time-traveling, for example.
The approach in this PR is to try less to have the graph only represent live paths, by checking whether we reach a live region during traversal and recording the loan as live there, instead of equating traversal with liveness like today because it has subtleties with the typeck edges in statements (that could forward loans to the successor point without ensuring their liveness). We can then also simplify these typeck stmt edges. And we also can simplify traversal by removing looking at kills, because that's enough to handle a bunch of NLL problem 3 cases -- and we can gradually support them more and more in traversal in the future, to reduce the approximation of liveness.
There's still some in-progress pieces of work w/r/t opaque types that I'm expecting [lcnr's opaque types rework](https://github.com/rust-lang/rust/pull/139587), and [amanda's SCCs rework](https://github.com/rust-lang/rust/pull/130227) to handle. That didn't seem to show up in tests until I rebased today (and shows lack of test coverage once again) when https://github.com/rust-lang/rust/pull/142255 introduced a couple of test failures with the new captures rules from edition 2024. It's not unexpected since we know more work is needed with member constraints (and we're not even using SCCs in this prototype yet)
I'll look into these anyways, both for future work, and checking how these other 2 PRs would change things.
---
I'm not sure the following means a lot until we have some formalism in-place, but:
- I've changed the polonius compare-mode to use this analysis: the tests pass with it, except 2 cases with minor diagnostics differences, and the 2 edition 2024 opaque types one I mentioned above and need to investigate
- things that are expected to work still do work: it bootstraps, can run our rustc-perf benchmarks (and the results are not even that bad), and a crater run didn't find any regressions (forgetting that crater currently fails to test around a quarter of all crates 👼)
- I've added tests with improvements, like the NLL problem case 3 and others, as well as some that behave the same as NLLs today and are thus worse than the datalog implementation
r? ````````@jackh726````````
(no rush I know you're deep in phd work and "implmentating" the new trait solver for r-a :p <3)
This also fixes rust-lang/rust#135646, a diagnostics ICE from the previous implementation.
|
|
|
|
|
|
Revert "Preserve the .debug_gdb_scripts section"
https://github.com/rust-lang/rust/pull/143679 introduces a significant build time perf regression for ripgrep. Let's revert it such that we can investigate it without pressure.
|
|
So we don't need to add normalization to every test that includes a
panic message, add a global normalization to compiletest.
|
|
This reverts commit b4d923cea0509933b1fb859930cb20784251f9be.
|
|
Rollup of 12 pull requests
Successful merges:
- rust-lang/rust#144552 (Rehome 33 `tests/ui/issues/` tests to other subdirectories under `tests/ui/`)
- rust-lang/rust#144676 (Add documentation for unstable_feature_bound)
- rust-lang/rust#144836 (Change visibility of Args new function)
- rust-lang/rust#144910 (Add regression tests for seemingly fixed issues)
- rust-lang/rust#144913 ([rustdoc] Fix wrong `i` tooltip icon)
- rust-lang/rust#144924 (compiletest: add hint for when a ui test produces no errors)
- rust-lang/rust#144926 (Correct the use of `must_use` on btree::IterMut)
- rust-lang/rust#144928 (Drop `rust-version` from `rustc_thread_pool`)
- rust-lang/rust#144945 (Autolabel PRs that change explicit tail call tests as `F-explicit_tail_calls`)
- rust-lang/rust#144954 (run-make: Allow blessing snapshot files that don't exist yet)
- rust-lang/rust#144971 (num: Rename `isolate_most_least_significant_one` functions)
- rust-lang/rust#144978 (Fix some doc links for intrinsics)
r? `@ghost`
`@rustbot` modify labels: rollup
|
|
Instead of collecting pretty printers transitively when building
executables/staticlibs/cdylibs, let the debugger find each crate's
pretty printers via its .debug_gdb_scripts section. This covers the case
where libraries defining custom pretty printers are loaded dynamically.
|
|
|
|
For "stage 1" ui-fulldeps, use the stage 1 compiler to query target info
Testing ui-fulldeps in "stage 1" actually uses the stage 0 compiler, so that test programs can link against stage 1 rustc crates.
Unfortunately, using the stage 0 compiler causes problems when compiletest tries to obtain target information from the compiler, but the output format has changed since the last bootstrap beta bump.
We can work around this by also providing compiletest with a stage 1 compiler, and having it use that compiler to query for target information.
---
This fixes the stage 1 ui-fulldeps failure seen at https://github.com/rust-lang/rust/pull/144443#issuecomment-3146992771.
|
|
compiletest: Preliminary cleanup of `ProcRes` printing/unwinding
While experimenting with changes to how compiletest handles output capture, error reporting, and unwinding, I repeatedly ran in to difficulties with this core code for reporting test failures caused by a subprocess.
There should be no change in compiletest output.
r? jieyouxu
|
|
r=jieyouxu
Remove the omit_gdb_pretty_printer_section attribute
Disabling loading of pretty printers in the debugger itself is more reliable. Before this commit the .gdb_debug_scripts section couldn't be included in dylibs or rlibs as otherwise there is no way to disable the section anymore without recompiling the entire standard library.
|
|
Testing ui-fulldeps in "stage 1" actually uses the stage 0 compiler, so that
test programs can link against stage 1 rustc crates.
Unfortunately, using the stage 0 compiler causes problems when compiletest
tries to obtain target information from the compiler, but the output format has
changed since the last bootstrap beta bump.
We can work around this by also providing compiletest with a stage 1 compiler,
and having it use that compiler to query for target information.
|
|
This reduces the amount of "hidden" printing in error-reporting code, which
will be helpful when overhauling compiletest's error handling and output
capture.
|
|
|
|
This method now returns a string instead of printing directly to
(possibly-captured) stdout.
|
|
Properly pass path to staged `rustc` to `compiletest` self-tests
Otherwise, this can do weird things like use a global rustc, or try to use stage 0 rustc. This must be properly configured, because `compiletest` is intended to only support one compiler target spec JSON format (of the in-tree compiler).
Historically, this was probably done so before `bootstrap` was really its own thing, and `compiletest` had to be runnable as a much more "self-standing" tool.
Follow-up to rust-lang/rust#144675, as I didn't realize this until Zalathar pointed it out in [#t-infra/bootstrap > Building vs testing `compiletest` @ 💬](https://rust-lang.zulipchat.com/#narrow/channel/326414-t-infra.2Fbootstrap/topic/Building.20vs.20testing.20.60compiletest.60/near/532040838).
r? ````@Kobzol````
|
|
Disabling loading of pretty printers in the debugger itself is more
reliable. Before this commit the .gdb_debug_scripts section couldn't be
included in dylibs or rlibs as otherwise there is no way to disable the
section anymore without recompiling the entire standard library.
|
|
|
|
Otherwise, this can do weird things like use a global rustc, or try to
use stage 0 rustc. This must be properly configured, because
`compiletest` is intended to only support one compiler target spec JSON
format (of the in-tree compiler).
|
|
compiletest: Move directive names back into a separate file
This list no longer needs to be included in multiple crates, but having the list in its own file makes it easier to find and update when necessary.
As discussed at https://github.com/rust-lang/rust/pull/143850#issuecomment-3130307023.
|
|
This list no longer needs to be included in multiple crates, but having it in
its own file makes it easier to find and update when necessary.
|
|
|
|
move uefi test to run-make
Turn the `uefi` test into a more standard `run-make` test, and execute it using the `test-various` CI job like before.
This is just a straightforward translation of the python code, but using `run-make` to supply the target (hence the 3 separate calls in the docker file).
r? ```@jieyouxu```
cc ```@nicholasbishop```
try-job: test-various
|
|
|
|
Rollup of 9 pull requests
Successful merges:
- rust-lang/rust#140871 (Don't lint against named labels in `naked_asm!`)
- rust-lang/rust#141663 (rustdoc: add ways of collapsing all impl blocks)
- rust-lang/rust#143272 (Upgrade the `fortanix-sgx-abi` dependency)
- rust-lang/rust#143585 (`loop_match`: suggest extracting to a `const` item)
- rust-lang/rust#143698 (Fix unused_parens false positive)
- rust-lang/rust#143859 (Guarantee 8 bytes of alignment in Thread::into_raw)
- rust-lang/rust#144160 (tests: debuginfo: Work around or disable broken tests on powerpc)
- rust-lang/rust#144412 (Small cleanup: Use LocalKey<Cell> methods more)
- rust-lang/rust#144431 (Disable has_reliable_f128_math on musl targets)
r? `@ghost`
`@rustbot` modify labels: rollup
|
|
tests: debuginfo: Work around or disable broken tests on powerpc
f16 support for PowerPC has issues in LLVM, therefore we need a small workaround to prevent LLVM from emitting symbols that don't exist for PowerPC yet.
It also appears that rust-lang/rust#128973 applies to PowerPC targets as well, though I've only tested 64-bit Linux targets.
|
|
|
|
current codegen backend
|
|
|
|
|
|
f16 support for PowerPC has issues in LLVM, therefore we need a small
workaround to prevent LLVM from emitting symbols that don't exist for
PowerPC yet.
It also appears that unused by-value non-immedate issue with gdb applies
to PowerPC targets as well, though I've only tested 64-bit Linux targets.
Signed-off-by: Jens Reidel <adrian@travitia.xyz>
|
|
And introduce two new directives for ui tests:
* `run-crash`
* `run-fail-or-crash`
Normally a `run-fail` ui test like tests that panic shall not be
terminated by a signal like `SIGABRT`. So begin having that as a hard
requirement.
Some of our current tests do terminate by a signal/crash however.
Introduce and use `run-crash` for those tests. Note that Windows crashes
are not handled by signals but by certain high bits set on the process
exit code. Example exit code for crash on Windows: `0xc000001d`.
Because of this, we define "crash" on all platforms as "not exit with
success and not exit with a regular failure code in the range 1..=127".
Some tests behave differently on different targets:
* Targets without unwind support will abort (crash) instead of exit with
failure code 101 after panicking. As a special case, allow crashes for
`run-fail` tests for such targets.
* Different sanitizer implementations handle detected memory problems
differently. Some abort (crash) the process while others exit with
failure code 1. Introduce and use `run-fail-or-crash` for such tests.
|
|
r=workingjubilee,fmease,jieyouxu
tests: Fix duplicated-path-in-error fail with musl
musl's dlopen returns a different error than glibc, which contains the name of the file. This would cause the test to fail, since the filename would appear twice in the output (once in the error from rustc, once in the error message from musl). Split the expected test outputs for the different libc implementations.
Fixes rust-lang/rust#128474
|
|
musl's dlopen returns a different error than glibc, which contains the
name of the file. This would cause the test to fail, since the filename
would appear twice in the output (once in the error from rustc, once in
the error message from musl). Split the expected test outputs for the
different libc implementations.
Signed-off-by: Jens Reidel <adrian@travitia.xyz>
|
|
[COMPILETEST-UNTANGLE 6/N] Use `TestSuite` enum instead of stringly-typed test suites
This is part of a patch series to untangle `compiletest` to hopefully nudge it towards being more maintainable.
This PR should contain no functional changes.
|
|
|
|
|
|
[COMPILETEST-UNTANGLE 5/N] Test mode adjustments and other assorted cleanups
This is part of a patch series to untangle `compiletest` to hopefully nudge it towards being more maintainable.
This PR should contain no functional changes modulo the removed debugger version warning.
- Commit 1: Removes a very outdated debugger version warning.
- Commit 2: Moves `string_enum` out of `common` into `util` module.
- Commit 3: Remove `#[derive(Default)` for `Mode` and `Config`. It is very important for correctness that we *don't* `#[derive(Default)]`, because there are no sensible defaults, so stop pretending there is.
- Commit 4: Rename `Mode` -> `TestMode`, because I would like to introduce a `TestSuite` enum to stop using stringly-typed test suite names where test mode names can be the same as test suite names, and we also have a bunch of other "modes" in compiletest. Make this as unambiguous as possible. A corollary is that now it's more natural to reference via intra-doc links as ``[`TestMode`]``.
- Commit 5: Ditto on `TestSuite`, stop glob-reexporting `TestMode::*` variants, and always use `EnumName::VariantName` form.
- Commit 6: Apparently, `src/tools/rustdoc-gui-test/` depends on `compiletest` for `//@ {compile,run}-paths` directive parsing and extraction, which involves creating a dummy `compiletest` config (hence the existence of the default impls removed in Commit 3). Make this a specific associated function with a FIXME pointing to rust-lang/rust#143827 as I think this setup is quite questionable.
Commits {4, 5} are also intended to help improve the self-consistency in nomenclature used within compiletest.
Best reviewed commit-by-commit.
|
|
|
|
I would like to introduce `TestSuite` over stringly-typed test suite
names, and some test suite names are the same as test modes, which can
make this very confusing.
|
|
It is *critical* that we maintain clear nomenclature in `compiletest`.
We have many types of "modes" in `compiletest` -- pass modes, coverage
modes, compare modes, you name it. `Mode` is also a *super* general
term. Rename it to `TestMode` to leave no room for such ambiguity.
As a follow-up, I also intend to introduce an enum for `TestSuite`, then
rid of all usage of glob re-exported `TestMode::*` enum variants -- many
test suites share the same name as the test mode.
|
|
They do not have sensible defaults, and it is crucial that we get them
right.
|
|
|
|
|
|
Additionally, `NO_DEBUG_ASSERTIONS` is set by CI that threads through to
the `./configure` script, which is somewhat fragile and "spooky action
at a distance". Instead, use env vars controlled by compiletest, whose
debug assertion info comes from bootstrap.
|