about summary refs log tree commit diff
path: root/src/librustdoc/html/render/search_index.rs
AgeCommit message (Collapse)AuthorLines
2024-04-08Update search_index.rsMichael Howell-1/+0
2024-04-08rustdoc: improve comments based on feedbackMichael Howell-8/+13
2024-04-08rustdoc-search: single result for items with multiple pathsMichael Howell-16/+134
This change uses the same "exact" paths as trait implementors and type alias inlining to track items with multiple reachable paths. This way, if you search for `vec`, you get only the `std` exports of it, and not the one from `alloc`. It still includes all the items in the search index so that you can search for them by all available paths. For example, try `core::option` and `std::option`, and notice that the results page doesn't show duplicates, but still shows all the items in their respective crates.
2024-04-08Thread pattern types through the HIROli Scherer-1/+1
2024-04-02Rollup merge of #122614 - notriddle:notriddle/search-desc, r=GuillaumeGomezGuillaume Gomez-13/+94
rustdoc-search: shard the search result descriptions ## Preview This makes no visual changes to rustdoc search. It's a pure perf improvement. <details><summary>old</summary> Preview: <http://notriddle.com/rustdoc-html-demo-10/doc/std/index.html?search=vec> WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240317_AiDc61_2EM,240317_AiDcM0_2EN> Waterfall diagram: ![image](https://github.com/rust-lang/rust/assets/1593513/39548f0c-7ad6-411b-abf8-f6668ff4da18) </details> Preview: <http://notriddle.com/rustdoc-html-demo-10/doc2/std/index.html?search=vec> WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240322_BiDcCH_13R,240322_AiDcJY_104> ![image](https://github.com/rust-lang/rust/assets/1593513/4be1f9ff-c3ff-4b96-8f5b-b264df2e662d) ## Description r? `@GuillaumeGomez` The descriptions are, on almost all crates[^1], the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files. Additionally, this PR pulls out information about whether there's a description into a bitmap. This allows us to sort, truncate, *then* download. This PR also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse. https://caniuse.com/async-functions [^1]: <https://microsoft.github.io/windows-docs-rs/>, a crate with 44MiB of pure names and no descriptions for them, is an outlier and should not be counted. But this PR should improve it, by replacing a long line of empty strings with a compressed bitmap with a single Run section. Just not very much. ## Detailed sizes ```console $ cat test.sh set -ex cp ../search-index*.js search-index.js awk 'FNR==NR {a++;next} FNR<a-3' search-index.js{,} | awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' | sed -E "s:\\\\':':g" > search-index.json jq -c '.t' search-index.json > t.json jq -c '.n' search-index.json > n.json jq -c '.q' search-index.json > q.json jq -c '.D' search-index.json > D.json jq -c '.e' search-index.json > e.json jq -c '.i' search-index.json > i.json jq -c '.f' search-index.json > f.json jq -c '.c' search-index.json > c.json jq -c '.p' search-index.json > p.json jq -c '.a' search-index.json > a.json du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json $ bash test.sh + cp ../search-index1.78.0.js search-index.js + awk 'FNR==NR {a++;next} FNR<a-3' search-index.js search-index.js + awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' + sed -E 's:\\'\'':'\'':g' + jq -c .t search-index.json + jq -c .n search-index.json + jq -c .q search-index.json + jq -c .D search-index.json + jq -c .e search-index.json + jq -c .i search-index.json + jq -c .f search-index.json + jq -c .c search-index.json + jq -c .p search-index.json + jq -c .a search-index.json + du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json 64K t.json 800K n.json 8.0K q.json 4.0K D.json 16K e.json 192K i.json 544K f.json 4.0K c.json 36K p.json 20K a.json ``` These are, roughly, the size of each section in the standard library (this tool actually excludes libtest, for parsing-json-with-awk reasons, but libtest is tiny so it's probably not important). t = item type, like "struct", "free fn", or "type alias". Since one byte is used for every item, this implies that there are approximately 64 thousand items in the standard library. n = name, and that's now the largest section of the search index with the descriptions removed from it q = parent *module* path, stored parallel to the items within D = the size of each description shard, stored as vlq hex numbers e = empty description bit flags, stored as a roaring bitmap i = parent *type* index as a link into `p`, stored as decimal json numbers; used only for associated types; might want to switch to vlq hex, since that's shorter, but that would be a separate pr f = function signature, stored as lists of lists that index into `p` c = deprecation flag, stored as a roaring bitmap p = parent *type*, stored separately and linked into from `i` and `f` a = alias, as [[key, value]] pairs ## Search performance http://notriddle.com/rustdoc-html-demo-11/perf-shard/index.html For example, in stm32f4: <table><thead><tr><th>before<th>after</tr></thead> <tbody><tr><td> ``` Testing T -> U ... in_args = 0, returned = 0, others = 200 wall time = 617 Testing T, U ... in_args = 0, returned = 0, others = 200 wall time = 198 Testing T -> T ... in_args = 0, returned = 0, others = 200 wall time = 282 Testing crc32 ... in_args = 0, returned = 0, others = 0 wall time = 426 Testing spi::pac ... in_args = 0, returned = 0, others = 0 wall time = 673 ``` </td><td> ``` Testing T -> U ... in_args = 0, returned = 0, others = 200 wall time = 716 Testing T, U ... in_args = 0, returned = 0, others = 200 wall time = 207 Testing T -> T ... in_args = 0, returned = 0, others = 200 wall time = 289 Testing crc32 ... in_args = 0, returned = 0, others = 0 wall time = 418 Testing spi::pac ... in_args = 0, returned = 0, others = 0 wall time = 687 ``` </td></tr><tr><td> ``` user: 005.345 s sys: 002.955 s wall: 006.899 s child_RSS_high: 583664 KiB group_mem_high: 557876 KiB ``` </td><td> ``` user: 004.652 s sys: 000.565 s wall: 003.865 s child_RSS_high: 538696 KiB group_mem_high: 511724 KiB ``` </td></tr> </table> This perf tester is janky and unscientific enough that the apparent differences might just be noise. If it's not an order of magnitude, it's probably not real. ## Future possibilities * Currently, results are not shown until the descriptions are downloaded. Theoretically, the description-less results could be shown. But actually doing that, and making sure it works properly, would require extra work (we have to be careful to avoid layout jumps). * More than just descriptions can be sharded this way. But we have to be careful to make sure the size wins are worth the round trips. Ideally, data that’s needed only for display should be sharded while data needed for search isn’t. * [Full text search](https://internals.rust-lang.org/t/full-text-search-for-rustdoc-and-doc-rs/20427) also needs this kind of infrastructure. A good implementation might store a compressed bloom filter in the search index, then download the full keyword in shards. But, we have to be careful not just of the amount readers have to download, but also of the amount that [publishers](https://gist.github.com/notriddle/c289e77f3ed469d1c0238d1d135d49e1) have to store.
2024-03-27Add rustdoc hackOli Scherer-2/+3
2024-03-27Remove `DefId`'s `Partial/Ord` implsOli Scherer-2/+2
2024-03-22rustdoc-search: address nitsMichael Howell-259/+42
2024-03-21rustdoc-search: compressed bitmap to sort, then load descMichael Howell-7/+226
This adds a bit more data than "pure sharding" by including information about which items have no description at all. This way, it can sort the results, then truncate, then finally download the description. With the "e" bitmap: 2380KiB Without the "e" bitmap: 2364KiB
2024-03-16rustdoc-search: shard the search result descriptionsMichael Howell-11/+90
The descriptions are, on almost all crates[^1], the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files. This commit also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse. https://caniuse.com/async-functions [^1]: <https://microsoft.github.io/windows-docs-rs/>, a crate with 44MiB of pure names and no descriptions for them, is an outlier and should not be counted.
2024-03-11rustdoc-search: parse and search with ML-style HOFMichael Howell-1/+39
Option::map, for example, looks like this: option<t>, (t -> u) -> option<u> This syntax searches all of the HOFs in Rust: traits Fn, FnOnce, and FnMut, and bare fn primitives.
2024-01-06Rollup merge of #118194 - notriddle:notriddle/tuple-unit, r=GuillaumeGomezMatthias Krüger-0/+3
rustdoc: search for tuples and unit by type with `()` This feature extends rustdoc to support the syntax that most users will naturally attempt to use to search for tuples. Part of https://github.com/rust-lang/rust/issues/60485 Function signature searches already support tuples and unit. The explicit name `primitive:tuple` and `primitive:unit` can be used to match a tuple or unit, while `()` will match either one. It also follows the direction set by the actual language for parens as a group, so `(u8,)` will only match a tuple, while `(u8)` will match a plain, unwrapped byte—thanks to loose search semantics, it will also match the tuple. ## Preview * [`option<t>, option<u> -> (t, u)`](<https://notriddle.com/rustdoc-html-demo-5/tuple-unit/std/index.html?search=option%3Ct%3E%2C option%3Cu%3E -%3E (t%2C u)>) * [`[t] -> (t,)`](<https://notriddle.com/rustdoc-html-demo-5/tuple-unit/std/index.html?search=[t] -%3E (t%2C)>) * [`(ipaddr,) -> socketaddr`](<https://notriddle.com/rustdoc-html-demo-5/tuple-unit/std/index.html?search=(ipaddr%2C) -%3E socketaddr>) ## Motivation When type-based search was first landed, it was directly [described as incomplete][a comment]. [a comment]: https://github.com/rust-lang/rust/pull/23289#issuecomment-79437386 Filling out the missing functionality is going to mean adding support for more of Rust's [type expression] syntax, such as tuples (in this PR), references, raw pointers, function pointers, and closures. [type expression]: https://doc.rust-lang.org/reference/types.html#type-expressions There does seem to be demand for this sort of thing, such as [this Discord message](https://discord.com/channels/442252698964721669/443150878111694848/1042145740065099796) expressing regret at rustdoc not supporting tuples in search queries. ## Reference description (from the Rustdoc book) <table> <thead> <tr> <th>Shorthand</th> <th>Explicit names</th> </tr> </thead> <tbody> <tr><td colspan="2">Before this PR</td></tr> <tr> <td><code>[]</code></td> <td><code>primitive:slice</code> and/or <code>primitive:array</code></td> </tr> <tr> <td><code>[T]</code></td> <td><code>primitive:slice&lt;T&gt;</code> and/or <code>primitive:array&lt;T&gt;</code></td> </tr> <tr> <td><code>!</code></td> <td><code>primitive:never</code></td> </tr> <tr><td colspan="2">After this PR</td></tr> <tr> <td><code>()</code></td> <td><code>primitive:unit</code> and/or <code>primitive:tuple</code></td> </tr> <tr> <td><code>(T)</code></td> <td><code>T</code></td> </tr> <tr> <td><code>(T,)</code></td> <td><code>primitive:tuple&lt;T&gt;</code></td> </tr> </tbody> </table> A single type expression wrapped in parens is the same as that type expression, since parens act as the grouping operator. If they're empty, though, they will match both `unit` and `tuple`, and if there's more than one type (or a trailing or leading comma) it is the same as `primitive:tuple<...>`. However, since items can be left out of the query, `(T)` will still return results for types that match tuples, even though it also matches the type on its own. That is, `(u32)` matches `(u32,)` for the exact same reason that it also matches `Result<u32, Error>`. ## Future direction The [type expression grammar](https://doc.rust-lang.org/reference/types.html#type-expressions) from the Reference is given below: <pre><code>Syntax Type : TypeNoBounds | <a href="https://doc.rust-lang.org/reference/types/impl-trait.html">ImplTraitType</a> | <a href="https://doc.rust-lang.org/reference/types/trait-object.html">TraitObjectType</a> <br> TypeNoBounds : <a href="https://doc.rust-lang.org/reference/types.html#parenthesized-types">ParenthesizedType</a> | <a href="https://doc.rust-lang.org/reference/types/impl-trait.html">ImplTraitTypeOneBound</a> | <a href="https://doc.rust-lang.org/reference/types/trait-object.html">TraitObjectTypeOneBound</a> | <a href="https://doc.rust-lang.org/reference/paths.html#paths-in-types">TypePath</a> | <a href="https://doc.rust-lang.org/reference/types/tuple.html#tuple-types">TupleType</a> | <a href="https://doc.rust-lang.org/reference/types/never.html">NeverType</a> | <a href="https://doc.rust-lang.org/reference/types/pointer.html#raw-pointers-const-and-mut">RawPointerType</a> | <a href="https://doc.rust-lang.org/reference/types/pointer.html#shared-references-">ReferenceType</a> | <a href="https://doc.rust-lang.org/reference/types/array.html">ArrayType</a> | <a href="https://doc.rust-lang.org/reference/types/slice.html">SliceType</a> | <a href="https://doc.rust-lang.org/reference/types/inferred.html">InferredType</a> | <a href="https://doc.rust-lang.org/reference/paths.html#qualified-paths">QualifiedPathInType</a> | <a href="https://doc.rust-lang.org/reference/types/function-pointer.html">BareFunctionType</a> | <a href="https://doc.rust-lang.org/reference/macros.html#macro-invocation">MacroInvocation</a> </code></pre> ImplTraitType and TraitObjectType (and ImplTraitTypeOneBound and TraitObjectTypeOneBound) are not yet implemented. They would mostly desugar to `trait:`, similarly to how `!` desugars to `primitive:never`. ParenthesizedType and TuplePath are added in this PR. TypePath is already implemented (except const generics, which is not planned, and function-like trait syntax, which is planned as part of closure support). NeverType is already implemented. RawPointerType and ReferenceType require parsing and fixes to the search index to store this information, but otherwise their behavior seems simple enough. Just like tuples and slices, `&T` would be equivalent to `primitive:reference<T>`, `&mut T` would be equivalent to `primitive:reference<keyword:mut, T>`, `*T` would be equivalent to `primitive:pointer<T>`, `*mut T` would be equivalent to `primitive:pointer<keyword:mut, T>`, and `*const T` would be equivalent to `primitive:pointer<keyword:const, T>`. Lifetime generics support is not planned, because lifetime subtyping seems too complicated. ArrayType is subsumed by SliceType right now. Implementing const generics is not planned, because it seems like it would require a lot of implementation complexity for not much gain. InferredType isn't really covered right now. Its semantics in a search context are not obvious. QualifiedPathInType is not implemented, and it is not planned. I would need a use case to justify it, and act as a guide for what the exact semantics should be. BareFunctionType is not implemented. Along with function-like trait syntax, which is formally considered a TypePath, it's the biggest missing feature to be able to do structured searches over generic APIs like `Option`. MacroInvocation is not parsed (macro names are, but they don't mean the same thing here at all). Those are gone by the time Rustdoc sees the source code.
2023-12-31rustdoc-search: tighter encoding for `f` indexMichael Howell-22/+7
Two optimizations for the function signature search: * Instead of using JSON arrays, like `[1,20]`, it uses VLQ hex with no commas, like `[aAd]`. * This also adds backrefs: if you have more than one function with exactly the same signature, it'll not only store it once, it'll *decode* it once, and store in the typeIdMap only once. Size change ----------- standard library ```console $ du -bs search-index-old.js search-index-new.js 4976370 search-index-old.js 4404391 search-index-new.js ``` ((4976370-4404391)/4404391)*100% = 12.9% Benchmarks are similarly shrunk: ```console $ du -hs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.js 10555067 tmp/arti/toolchain_old/doc/search-index.js 8921236 tmp/arti/toolchain_new/doc/search-index.js 77018 tmp/cortex-m/toolchain_old/doc/search-index.js 66676 tmp/cortex-m/toolchain_new/doc/search-index.js 2876330 tmp/sqlx/toolchain_old/doc/search-index.js 2436812 tmp/sqlx/toolchain_new/doc/search-index.js 63632890 tmp/stm32f4/toolchain_old/doc/search-index.js 52337438 tmp/stm32f4/toolchain_new/doc/search-index.js 631150 tmp/ripgrep/toolchain_old/doc/search-index.js 541646 tmp/ripgrep/toolchain_new/doc/search-index.js ```
2023-12-26rustdoc: search for tuples and unit by type with `()`Michael Howell-0/+3
2023-12-14Use Map instead of Object for source files and search indexGuillaume Gomez-1/+1
2023-11-19rustdoc-search: add support for associated typesMichael Howell-60/+362
2023-11-15Re-format code with new rustfmtMark Rousskov-4/+5
2023-09-21rustdoc-search: add impl disambiguator to duplicate assoc itemsMichael Howell-2/+31
Helps with #90929 This changes the search results, specifically, when there's more than one impl with an associated item with the same name. For example, the search queries `simd<i8> -> simd<i8>` and `simd<i64> -> simd<i64>` don't link to the same function, but most of the functions have the same names. This change should probably be FCP-ed, especially since it adds a new anchor link format for `main.js` to handle, so that URLs like `struct.Vec.html#impl-AsMut<[T]>-for-Vec<T,+A>/method.as_mut` redirect to `struct.Vec.html#method.as_mut-2`. It's a strange design, but there are a few reasons for it: * I'd like to avoid making the HTML bigger. Obviously, fixing this bug is going to add at least a little more data to the search index, but adding more HTML penalises viewers for the benefit of searchers. * Breaking `struct.Vec.html#method.len` would also be a disappointment. On the other hand: * The path-style anchors might be less prone to link rot than the numbered anchors. It's definitely less likely to have URLs that appear to "work", but silently point at the wrong thing. * This commit arranges the path-style anchor to redirect to the numbered anchor. Nothing stops rustdoc from doing the opposite, making path-style anchors the default and redirecting the "legacy" numbered ones.
2023-09-03rustdoc-search: add support for type parametersMichael Howell-109/+125
When writing a type-driven search query in rustdoc, specifically one with more than one query element, non-existent types become generic parameters instead of auto-correcting (which is currently only done for single-element queries) or giving no result. You can also force a generic type parameter by writing `generic:T` (and can force it to not use a generic type parameter with something like `struct:T` or whatever, though if this happens it means the thing you're looking for doesn't exist and will give you no results). There is no syntax provided for specifying type constraints for generic type parameters. When you have a generic type parameter in a search query, it will only match up with generic type parameters in the actual function, not concrete types that match, not concrete types that implement a trait. It also strictly matches based on when they're the same or different, so `option<T>, option<U> -> option<U>` matches `Option::and`, but not `Option::or`. Similarly, `option<T>, option<T> -> option<T>`` matches `Option::or`, but not `Option::and`.
2023-09-02Correctly handle paths from foreign itemsGuillaume Gomez-11/+46
2023-09-01Merge all loops into one when generating search indexGuillaume Gomez-92/+65
2023-09-01[rustdoc] Fix path in type-based searchGuillaume Gomez-13/+50
2023-05-30rustdoc: simplify `clean` by removing `FnRetTy`Michael Howell-18/+5
The default fn ret ty is always unit. Just use that. Looking back at the time when `FnRetTy` (then called `FunctionRetTy`) was first added to rustdoc, it seems to originally be there because `-> !` was a special form: the never type didn't exist back then. https://github.com/rust-lang/rust/commit/eb01b17b06eb35542bb80ff7456043b0ed5572ba#diff-384affc1b4190940f114f3fcebbf969e7e18657a71ef9001da6b223a036687d9L921-L924
2023-05-22rustdoc: Cleanup doc string collapsingVadim Petrochenkov-7/+3
2023-04-24rustdoc-search: add slices and arrays to indexMichael Howell-2/+28
This indexes them as primitives with generics, so `slice<u32>` is how you search for `[u32]`, and `array<u32>` for `[u32; 1]`. A future commit will desugar the square bracket syntax to search both arrays and slices at once.
2023-04-12remove some unneeded importsKaDiWa-1/+1
2023-03-16Rollup merge of #108875 - notriddle:notriddle/return-trait, r=GuillaumeGomezMatthias Krüger-2/+7
rustdoc: fix type search for `Option` combinators
2023-03-11Rollup merge of #107629 - pitaj:rustdoc-search-deprecated, r=jshaMatthias Krüger-1/+22
rustdoc: sort deprecated items lower in search closes #98759 ### Screenshots `i32::MAX` show sup above `std::i32::MAX` and `core::i32::MAX` ![image](https://user-images.githubusercontent.com/803701/216725619-40afb7b0-e984-4a2e-ab5b-a95b24736b0e.png) If just searching for `min`, the deprecated results show up far below other things: ![image](https://user-images.githubusercontent.com/803701/216725672-e4325d37-9bfe-47eb-a1fe-0e57092aa811.png) one page later ![image](https://user-images.githubusercontent.com/803701/216725932-cd1c4a42-d527-44fb-a4ab-5a6d243659cc.png) ~~And, as you can see, the "Deprecation planned" message shows up in the search results. The same is true for fully-deprecated items like `mem::uninitialized`: ![image](https://user-images.githubusercontent.com/803701/216726268-1657e77a-563f-45a0-85a7-3a0cf4d66d6f.png)~~ Edit: the deprecation message change was removed from this PR. Only the sorting is changed.
2023-03-10rustdoc: sort deprecated items lower in searchPeter Jaszkowiak-1/+22
serialize `q` (`itemPaths`) sparsely overall 4% reduction in search index size
2023-03-07rustdoc: fix type search when more than one `where` clause appliesMichael Howell-1/+1
2023-03-07rustdoc: fix type search index for `fn<T>() -> &T where T: Trait`Michael Howell-1/+6
2023-03-04rustdoc: function signature search with traits in `where` clauseMichael Howell-21/+13
2023-02-13rustdoc: use a string with one-character codes for search index typesMichael Howell-1/+10
$ wc -c search-index.old.js search-index.new.js 3940530 search-index.old.js 3843222 search-index.new.js ((3940530-3843222)/3940530)*100 = 2.47% $ wc -c search-index.old.js.gz search-index.new.js.gz 380251 search-index.old.js.gz 379434 search-index.new.js.gz ((380251-379434)/380251)*100 = 0.214%
2023-01-15rustdoc: simplify some & ref erencesMatthias Krüger-2/+2
2023-01-13CrateData: don't allocate String when Serialize, &str is enoughklensy-1/+1
2023-01-13IndexItem.name String -> Symbolklensy-4/+4
2023-01-10Remove unneeded ItemId::Primitive variantGuillaume Gomez-20/+55
2022-12-29Fix index out of bounds issues in rustdocyukang-2/+1
2022-08-13avoid cloning and then iteratingKaDiWa-4/+9
2022-06-27Add comments, fixes for `0` sentinelMichael Howell-1/+27
2022-06-24Fix rustdoc under `#[no_core]`Michael Howell-28/+32
2022-06-24rustdoc: reference function signature types from the `p` arrayMichael Howell-51/+102
This reduces the size of the function signature index, because it's common to have many functions that operate on the same types. $ wc -c search-index-old.js search-index-new.js 5224374 search-index-old.js 3932314 search-index-new.js By my math, this reduces the uncompressed size of the search index by 32%. On compressed signatures, the wins are less drastic, a mere 8%: $ wc -c search-index-old.js.gz search-index-new.js.gz 404532 search-index-old.js.gz 371635 search-index-new.js.gz
2022-05-31rustdoc: also index raw pointersMichael Howell-2/+3
Co-authored-by: Noah Lev <camelidcamel@gmail.com>
2022-05-31rustdoc: also index impl traitMichael Howell-6/+24
2022-05-27cleanup librustdoc by making parent stack store itemsMichael Howell-15/+2
2022-05-26rustdoc: factor orphan impl items into an actual structMichael Howell-5/+7
2022-05-25rustdoc: include impl generics / self in search indexMichael Howell-13/+63
2022-05-24fix clippy perf lintsklensy-2/+2
2022-05-24fix simple clippy lintsklensy-1/+1
2022-05-21Remove `crate` visibility modifier in libs, testsJacob Pratt-2/+6