about summary refs log tree commit diff
path: root/src/librustdoc/html/static/js/search.js
AgeCommit message (Collapse)AuthorLines
2023-12-15rustdoc-search: remove parallel searchWords arrayMichael Howell-46/+28
This might have made sense if the algorithm could use `searchWords` to skip having to look at `searchIndex`, but since it always does a substring check on both the stock word and the normalizedName, it doesn't seem to help performance anyway.
2023-12-14Use Map instead of Object for source files and search indexGuillaume Gomez-79/+62
2023-12-13rustdoc-search: clean up handleSingleArg type handlingMichael Howell-9/+3
2023-12-13rustdoc-search: better hashing, faster unificationMichael Howell-10/+46
The hash changes are based on some tests with `arti` and various specific queries, aimed at reducing the false positive rate. Sorting the query elements so that generics always come first is instead aimed at reducing the number of Map operations on mgens, assuming if the bloom filter does find a false positive, it'll be able to reject the row without having to track a mapping. - https://hur.st/bloomfilter/?n=3&p=&m=96&k=6 Different functions have different amounts of inputs, and unification isn't very slow anyway, so figuring out a single ideal number of hash functions is nasty, but 6 keeps things low even up to 10 inputs. - https://web.archive.org/web/20210927123933/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.2442&rep=rep1&type=pdf This is the `h1` and `h2`, both derived from `h0`.
2023-12-13rustdoc-search: use set ops for ranking and filteringMichael Howell-40/+185
This commit adds ranking and quick filtering to type-based search, improving performance and having it order results based on their type signatures. Motivation ---------- If I write a query like `str -> String`, a lot of functions come up. That's to be expected, but `String::from_str` should come up on top, and it doesn't right now. This is because the sorting algorithm is based on the functions name, and doesn't consider the type signature at all. `slice::join` even comes up above it! To fix this, the sorting should take into account the function's signature, and the closer match should come up on top. Guide-level description ----------------------- When searching by type signature, types with a "closer" match will show up above types that match less precisely. Reference-level explanation --------------------------- Functions signature search works in three major phases: * A compact "fingerprint," based on the [bloom filter] technique, is used to check for matches and to estimate the distance. It sometimes has false positive matches, but it also operates on 128 bit contiguous memory and requires no backtracking, so it performs a lot better than real unification. The fingerprint represents the set of items in the type signature, but it does not represent nesting, and it ignores when the same item appears more than once. The result is rejected if any query bits are absent in the function, or if the distance is higher than the current maximum and 200 results have already been found. * The second step performs unification. This is where nesting and true bag semantics are taken into account, and it has no false positives. It uses a recursive, backtracking algorithm. The result is rejected if any query elements are absent in the function. [bloom filter]: https://en.wikipedia.org/wiki/Bloom_filter Drawbacks --------- This makes the code bigger. More than that, this design is a subtle trade-off. It makes the cases I've tested against measurably faster, but it's not clear how well this extends to other crates with potentially more functions and fewer types. The more complex things get, the more important it is to gather a good set of data to test with (this is arguably more important than the actual benchmarking ifrastructure right now). Rationale and alternatives -------------------------- Throwing a bloom filter in front makes it faster. More than that, it tries to take a tactic where the system can not only check for potential matches, but also gets an accurate distance function without needing to do unification. That way it can skip unification even on items that have the needed elems, as long as they have more items than the currently found maximum. If I didn't want to be able to cheaply do set operations on the fingerprint, a [cuckoo filter] is supposed to have better performance. But the nice bit-banging set intersection doesn't work AFAIK. I also looked into [minhashing], but since it's actually an unbiased estimate of the similarity coefficient, I'm not sure how it could be used to skip unification (I wouldn't know if the estimate was too low or too high). This function actually uses the number of distinct items as its "distance function." This should give the same results that it would have gotten from a Jaccard Distance $1-\frac{|F\cap{}Q|}{|F\cup{}Q|}$, while being cheaper to compute. This is because: * The function $F$ must be a superset of the query $Q$, so their union is just $F$ and the intersection is $Q$ and it can be reduced to $1-\frac{|Q|}{|F|}. * There are no magic thresholds. These values are only being used to compare against each other while sorting (and, if 200 results are found, to compare with the maximum match). This means we only care if one value is bigger than the other, not what it's actual value is, and since $Q$ is the same for everything, it can be safely left out, reducing the formula to $1-\frac{1}{|F|} = \frac{|F|}{|F|}-\frac{1}{|F|} = |F|-1$. And, since the values are only being compared with each other, $|F|$ is fine. Prior art --------- This is significantly different from how Hoogle does it. It doesn't account for order, and it has no special account for nesting, though `Box<t>` is still two items, while `t` is only one. This should give the same results that it would have gotten from a Jaccard Distance $1-\frac{|A\cap{}B|}{|A\cup{}B|}$, while being cheaper to compute. Unresolved questions -------------------- `[]` and `()`, the slice/array and tuple/union operators, are ignored while building the signature for the query. This is because they match more than one thing, making them ambiguous. Unfortunately, this also makes them a performance cliff. Is this likely to be a problem? Right now, the system just stashes the type distance into the same field that levenshtein distance normally goes in. This means exact query matches show up on top (for example, if you have a function like `fn nothing(a: Nothing, b: i32)`, then searching for `nothing` will show it on top even if there's another function with `fn bar(x: Nothing)` that's technically a closer match in type signature. Future possibilities -------------------- It should be possible to adopt more sorting criteria to act as a tie breaker, which could be determined during unification. [cuckoo filter]: https://en.wikipedia.org/wiki/Cuckoo_filter [minhashing]: https://en.wikipedia.org/wiki/MinHash
2023-12-13rustdoc-search: remove the now-redundant `validateResult`Michael Howell-57/+0
This function dates back to 9a45c9d7c6928743f9e7a7161bf564a65bfc0577 and seems to have been made obsolete when `addIntoResult` grew the ability to check the levenshtein distance matching with commit ba824ec52beb0e49b64e86837c1402a0c2d0c971.
2023-12-12Rollup merge of #118886 - GuillaumeGomez:clean-up-search-vars, r=notriddleJubilee-13/+7
Clean up variables in `search.js` While reviewing https://github.com/rust-lang/rust/pull/118402, I saw a few small clean ups that were needed, mostly about variables creation. r? ```@notriddle```
2023-12-12Clean up variables in `search.js`Guillaume Gomez-13/+7
2023-12-11rustdoc-search: clean up parserMichael Howell-7/+2
The `c === "="` was redundant when `isSeparatorCharacter` already checks that. The function `isStopCharacter` and `isEndCharacter` functions did exactly the same thing and have synonymous names. There doesn't seem much point in having both.
2023-12-10rustdoc-search: fix fast path unboxing bindingsMichael Howell-1/+1
2023-12-10rustdoc-search: do not treat associated type names as typesMichael Howell-13/+20
Before: http://notriddle.com/rustdoc-html-demo-6/tor-before/tor_config/ After: http://notriddle.com/rustdoc-html-demo-6/tor-after/tor_config/ Profile: http://notriddle.com/rustdoc-html-demo-6/tor-profile/ As a bit of background information: in type-based queries, a type name that does not exist gets treated as a generic type variable. This causes a counterintuitive behavior in the `tor_config` crate, which has a trait with an associated type variable called `T`. This isn't a searchable concrete type, but its name still gets stored in the typeNameIdMap, as a convenient way to intern its name.
2023-11-29rustdoc-search: replace TAB/NL/LF with SP firstMichael Howell-10/+6
This way, most of the parsing code doesn't need to be designed to handle it, since they should always be treated exactly the same anyhow.
2023-11-29rustdoc-search: removed dead parser codeMichael Howell-2/+0
This is already covered by the normal unexpected char path.
2023-11-29rustdoc-search: allow `:: ` and ` ::`Michael Howell-8/+5
This restriction made sense back when spaces separated function parameters, but now that they separate path components, there's no real ambiguity any more. Additionally, the Rust language allows it.
2023-11-25rustdoc-search: clean up some DOM codeMichael Howell-11/+5
2023-11-24rustdoc-search: avoid infinite where clause unboxMichael Howell-8/+24
Fixes #118242
2023-11-21rustdoc-search: make primitives and keywords less specialMichael Howell-24/+12
The search sorting code already sorts by item type discriminant, putting things with smaller discriminants first. There was also a special case for sorting keywords and primitives earlier, and this commit removes it by giving them lower discriminants. The sorting code has another criteria where items with descriptions appear earlier than items without, and that criteria has higher priority than the item type. This shouldn't matter, though, because primitives and keywords normally only appear in the standard library, and it always gives them descriptions.
2023-11-21rustdoc-search: clean up `checkPath`Michael Howell-13/+3
This computes the same result with less code by computing many of the old checks at once: * It won't enter the loop if clength > length, because then the result of length - clength will be negative and the loop conditional will fail. * i + clength will never be greater than length, because it starts out as i = length - clength, implying that i + clength equals length, and it only goes down from there. * The aborted variable is replaced with control flow.
2023-11-19rustdoc-search: add support for associated typesMichael Howell-48/+283
2023-11-18rustdoc-search: switch to recursive backtrackingMichael Howell-157/+87
This is significantly faster, because - It allows the one-element fast path to kick in on multi- element queries. - It constructs intermediate data structures more lazily than the old system did. It's measurably faster than the old algo even without the fast path, but that fast path still helps significantly.
2023-11-17rustdoc-search: fix accidental shared, mutable mapMichael Howell-30/+14
2023-11-17rustdoc-search: fast path for 1-query unificationMichael Howell-2/+76
Short queries, in addition to being common, are also the base case for a lot of more complicated queries. We can avoid most of the backtracking data structures, and use simple recursive matching instead, by special casing them. Profile output: https://notriddle.com/rustdoc-html-demo-5/profile-3/index.html
2023-11-17rustdoc-search: less new Maps in unifyFunctionTypeMichael Howell-16/+31
This is a major source of expense on generic queries, and this commit reduces them. Profile output: https://notriddle.com/rustdoc-html-demo-5/profile-2/index.html
2023-11-15rustdoc-search: simplify the checkTypes fast pathMichael Howell-61/+10
This reduces code size while still matching the common case for plain, concrete types.
2023-10-10Rollup merge of #109422 - notriddle:notriddle/impl-disambiguate-search, ↵Guillaume Gomez-3/+16
r=GuillaumeGomez rustdoc-search: add impl disambiguator to duplicate assoc items Preview (to see the difference, click the link and pay attention to the specific function that comes up): | Before | After | |--|--| | [`simd<i64>, simd<i64> -> simd<i64>`](https://doc.rust-lang.org/nightly/std/?search=simd%3Ci64%3E%2C%20simd%3Ci64%3E%20-%3E%20simd%3Ci64%3E) | [`simd<i64>, simd<i64> -> simd<i64>`](https://notriddle.com/rustdoc-demo-html-3/impl-disambiguate-search/std/index.html?search=simd%3Ci64%3E%2C%20simd%3Ci64%3E%20-%3E%20simd%3Ci64%3E) | | [`cow, vec -> bool`](https://doc.rust-lang.org/nightly/std/?search=cow%2C%20vec%20-%3E%20bool) | [`cow, vec -> bool`](https://notriddle.com/rustdoc-demo-html-3/impl-disambiguate-search/std/index.html?search=cow%2C%20vec%20-%3E%20bool) Helps with #90929 This changes the search results, specifically, when there's more than one impl with an associated item with the same name. For example, the search queries `simd<i8> -> simd<i8>` and `simd<i64> -> simd<i64>` don't link to the same function, but most of the functions have the same names. This change should probably be FCP-ed, especially since it adds a new anchor link format for `main.js` to handle, so that URLs like `struct.Vec.html#impl-AsMut<[T]>-for-Vec<T,+A>/method.as_mut` redirect to `struct.Vec.html#method.as_mut-2`. It's a strange design, but there are a few reasons for it: * I'd like to avoid making the HTML bigger. Obviously, fixing this bug is going to add at least a little more data to the search index, but adding more HTML penalises viewers for the benefit of searchers. * Breaking `struct.Vec.html#method.len` would also be a disappointment. On the other hand: * The path-style anchors might be less prone to link rot than the numbered anchors. It's definitely less likely to have URLs that appear to "work", but silently point at the wrong thing. * This commit arranges the path-style anchor to redirect to the numbered anchor. Nothing stops rustdoc from doing the opposite, making path-style anchors the default and redirecting the "legacy" numbered ones. ### The bug On the "Before" links, this example search calls for `i64`: ![image](https://github.com/rust-lang/rust/assets/1593513/9431d89d-41dc-4f68-bbb1-3e2704a973d2) But if I click any of the results, I get `f64` instead. ![image](https://github.com/rust-lang/rust/assets/1593513/6d89c692-1847-421a-84d9-22e359d9cf82) The PR fixes this problem by adding enough information to the search result `href` to disambiguate methods with different types but the same name. More detailed description of the problem at: https://github.com/rust-lang/rust/pull/109422#issuecomment-1491089293 > When a struct/enum/union has multiple impls with different type parameters, it can have multiple methods that have the same name, but which are on different impls. Besides Simd, [Any](https://doc.rust-lang.org/nightly/std/any/trait.Any.html?search=any%3A%3Adowncast) also demonstrates this pattern. It has three methods named `downcast`, on three different impls. > > When that happens, it presents a challenge in linking to the method. Normally we link like `#method.foo`. When there are multiple `foo`, we number them like `#method.foo`, `#method.foo-1`, `#method.foo-2`, etc. > > It also presents a challenge for our search code. Currently we store all the variants in the index, but don’t have any way to generate unambiguous URLs in the results page, or to distinguish them in the SERP. > > To fix this, we need three things: > > 1. A fragment format that fully specifies the impl type parameters when needed to disambiguate (`#impl-SimdOrd-for-Simd<i64,+LANES>/method.simd_max`) > 2. A search index that stores methods with enough information to disambiguate the impl they were on. > 3. A search results interface that can display multiple methods on the same type with the same name, when appropriate OR a disambiguation landing section on item pages? > > For reviewers: it can be hard to see the new fragment format in action since it immediately gets rewritten to the numbered form.
2023-10-05rustdoc-search: fix bug with multi-item impl traitMichael Howell-1/+1
2023-09-21rustdoc-search: add impl disambiguator to duplicate assoc itemsMichael Howell-3/+16
Helps with #90929 This changes the search results, specifically, when there's more than one impl with an associated item with the same name. For example, the search queries `simd<i8> -> simd<i8>` and `simd<i64> -> simd<i64>` don't link to the same function, but most of the functions have the same names. This change should probably be FCP-ed, especially since it adds a new anchor link format for `main.js` to handle, so that URLs like `struct.Vec.html#impl-AsMut<[T]>-for-Vec<T,+A>/method.as_mut` redirect to `struct.Vec.html#method.as_mut-2`. It's a strange design, but there are a few reasons for it: * I'd like to avoid making the HTML bigger. Obviously, fixing this bug is going to add at least a little more data to the search index, but adding more HTML penalises viewers for the benefit of searchers. * Breaking `struct.Vec.html#method.len` would also be a disappointment. On the other hand: * The path-style anchors might be less prone to link rot than the numbered anchors. It's definitely less likely to have URLs that appear to "work", but silently point at the wrong thing. * This commit arranges the path-style anchor to redirect to the numbered anchor. Nothing stops rustdoc from doing the opposite, making path-style anchors the default and redirecting the "legacy" numbered ones.
2023-09-20rustdoc: add comment about numeric spacingMichael Howell-0/+4
2023-09-19rustdoc: add test cases, and fix, search tabsMichael Howell-2/+7
2023-09-09rustdoc-search: fix bugs when unboxing and reordering combineMichael Howell-235/+275
2023-09-03rustdoc: fix test case for generics that look like namesMichael Howell-1/+2
2023-09-03rustdoc: bug fix for `-> option<t>`Michael Howell-1/+4
2023-09-03rustdoc-search: add support for type parametersMichael Howell-171/+283
When writing a type-driven search query in rustdoc, specifically one with more than one query element, non-existent types become generic parameters instead of auto-correcting (which is currently only done for single-element queries) or giving no result. You can also force a generic type parameter by writing `generic:T` (and can force it to not use a generic type parameter with something like `struct:T` or whatever, though if this happens it means the thing you're looking for doesn't exist and will give you no results). There is no syntax provided for specifying type constraints for generic type parameters. When you have a generic type parameter in a search query, it will only match up with generic type parameters in the actual function, not concrete types that match, not concrete types that implement a trait. It also strictly matches based on when they're the same or different, so `option<T>, option<U> -> option<U>` matches `Option::and`, but not `Option::or`. Similarly, `option<T>, option<T> -> option<T>`` matches `Option::or`, but not `Option::and`.
2023-09-03rustdoc-search: `null`, not `-1`, for missing idMichael Howell-13/+13
This allows us to use negative numbers for others purposes.
2023-09-03Rollup merge of #115490 - pitaj:rustdoc-searchjs-comment, r=GuillaumeGomezGuillaume Gomez-8/+17
rustdoc: update comment in search.js for #107629 Addressing https://github.com/rust-lang/rust/pull/107629#issuecomment-1693460106 r? `@jsha`
2023-09-02rustdoc: update comment in search.js for #107629Peter Jaszkowiak-8/+17
2023-09-01[rustdoc] Fix path in type-based searchGuillaume Gomez-23/+89
2023-08-31Improve `search.js` codeGuillaume Gomez-17/+13
2023-07-18Fix display of aliases in rustdoc search resultsGuillaume Gomez-18/+11
2023-07-02Auto merge of #108537 - ↵bors-73/+174
GuillaumeGomez:rustdoc-search-whitespace-as-separator, r=notriddle rustdoc: Allow whitespace as path separator like double colon Fixes https://github.com/rust-lang/rust/issues/108447. I think it makes sense since it allows more common cases, however it also makes the syntax heavier. Not sure what the rest of the team thinks about it. In any case we'll need to go through FCP. Full explanation for the changes is available [here](https://github.com/rust-lang/rust/pull/108537#issuecomment-1589480564). r? `@notriddle`
2023-06-28Fix display of long items in search resultsGuillaume Gomez-3/+5
2023-06-23Abbreviate long typenames so they don't get wrapped in resultsNoah Lev-2/+2
2023-06-19rustdoc: js: change color and reduce size of typename in search resultAlexis (Poliorcetics) Bourget-1/+3
2023-06-17Rollup merge of #112707 - GuillaumeGomez:back-in-history-fix, r=notriddleMatthias Krüger-16/+28
[rustdoc] Fix invalid handling of "going back in history" when "go to only search result" setting is enabled You can test the fix [here](https://rustdoc.crud.net/imperio/back-in-history-fix/lib2/index.html). Enable "Directly go to item in search if there is only one result", then search for `HasALongTraitWithParams` and finally go back to previous page. It should be back on the `index.html` page. The reason for this bug is that the JS state is cached as is, so when we go back to the page, it resumes where it was left, somewhat (very weird), meaning the search is run again etc. The best way to handle this is to force the JS re-execution in this case so that it doesn't try to resume from where it left and then lead us back to the current page. r? ``@notriddle``
2023-06-16Fix invalid handling of "going back in history" when "Directly go to item in ↵Guillaume Gomez-1/+14
search if there is only one result" setting is set to true
2023-06-16Auto merge of #110688 - GuillaumeGomez:result-search-type, r=notriddle,jshabors-9/+34
rustdoc: Add search result item types after their name Here what it looks like: ![Screenshot from 2023-04-22 15-16-58](https://user-images.githubusercontent.com/3050060/233789566-b5f3f625-3b78-4c56-a7ee-0a4f2d62e667.png) The idea is to improve accessibility by providing this information directly in the text and not only in the text color. Currently we already use it for doc aliases and for primitive types, so I extended it to all types. r? `@notriddle`
2023-06-16Unify history interactions in searchGuillaume Gomez-15/+14
2023-06-15Auto merge of #112233 - notriddle:notriddle/search-unify, r=GuillaumeGomezbors-164/+161
rustdoc-search: clean up type unification and "unboxing" This PR redesigns parameter matching, return matching, and generics matching to use a single function that compares two lists of types. It also makes the algorithms more consistent, so the "unboxing" behavior where `Vec<i32>` is considered a match for `i32` works inside generics, and not just at the top level.
2023-06-14Fix eBNF and handling of whitespace characters when not in a pathGuillaume Gomez-2/+11
2023-06-14Correctly display whitespace characters in search errorGuillaume Gomez-1/+1