|
Compress amount of hashed bytes for `isize` values in StableHasher
This is another attempt to land https://github.com/rust-lang/rust/pull/92103, this time hopefully with a correct implementation w.r.t. stable hashing guarantees. The previous PR was [reverted](https://github.com/rust-lang/rust/pull/93014) because it could produce the [same hash](https://github.com/rust-lang/rust/pull/92103#issuecomment-1014625442) for different values even in quite simple situations. I have since added a basic [test](https://github.com/rust-lang/rust/pull/93193) that should guard against that situation, I also added a new test in this PR, specialised for this optimization.
## Why this optimization helps
Since the original PR, I have tried to analyze why this optimization even helps (and why it especially helps for `clap`). I found that the vast majority of stable-hashing `i64` actually comes from hashing `isize` (which is converted to `i64` in the stable hasher). I only found a single place where is this datatype used directly in the compiler, and this place has also been showing up in traces that I used to find out when is `isize` being hashed. This place is `rustc_span::FileName::DocTest`, however, I suppose that isizes also come from other places, but they might not be so easy to find (there were some other entries in the trace). `clap` hashes about 8.5 million `isize`s, and all of them fit into a single byte, which is why this optimization has helped it [quite a lot](https://github.com/rust-lang/rust/pull/92103#issuecomment-1005711861).
Now, I'm not sure if special casing `isize` is the correct solution here, maybe something could be done with that `isize` inside `DocTest` or in other places, but that's for another discussion I suppose. In this PR, instead of hardcoding a special case inside `SipHasher128`, I instead put it into `StableHasher`, and only used it for `isize` (I tested that for `i64` it doesn't help, or at least not for `clap` and other few benchmarks that I was testing).
## New approach
Since the most common case is a single byte, I added a fast path for hashing `isize` values which positive value fits within a single byte, and a cold path for the rest of the values.
To avoid the previous correctness problem, we need to make sure that each unique `isize` value will produce a unique hash stream to the hasher. By hash stream I mean a sequence of bytes that will be hashed (a different sequence should produce a different hash, but that is of course not guaranteed).
We have to distinguish different values that produce the same bit pattern when we combine them. For example, if we just simply skipped the leading zero bytes for values that fit within a single byte, `(0xFF, 0xFFFFFFFFFFFFFFFF)` and `(0xFFFFFFFFFFFFFFFF, 0xFF)` would send the same hash stream to the hasher, which must not happen.
To avoid this situation, values `[0, 0xFE]` are hashed as a single byte. When we hash a larger (treating `isize` as `u64`) value, we first hash an additional byte `0xFF`. Since `0xFF` cannot occur when we apply the single byte optimization, we guarantee that the hash streams will be unique when hashing two values `(a, b)` and `(b, a)` if `a != b`:
1) When both `a` and `b` are within `[0, 0xFE]`, their hash streams will be different.
2) When neither `a` and `b` are within `[0, 0xFE]`, their hash streams will be different.
3) When `a` is within `[0, 0xFE]` and `b` isn't, when we hash `(a, b)`, the hash stream will definitely not begin with `0xFF`. When we hash `(b, a)`, the hash stream will definitely begin with `0xFF`. Therefore the hash streams will be different.
r? `@the8472`
|
|
debugger
There are several cases where names of types and functions in the debug info are either ambiguous, or not helpful, such as including ambiguous placeholders (e.g., `{{impl}}`, `{{closure}}` or `dyn _'`) or dropping qualifications (e.g., for dynamic types).
Instead, each debug symbol name should be unique and useful:
* Include disambiguators for anonymous `DefPathDataName` (closures and generators), and unify their formatting when used as a path-qualifier vs item being qualified.
* Qualify the principal trait for dynamic types.
* If there is no principal trait for a dynamic type, emit all other traits instead.
* Respect the `qualified` argument when emitting ref and pointer types.
* For implementations, emit the disambiguator.
* Print const generics when emitting generic parameters or arguments.
Additionally, when targeting MSVC, its debugger treats many command arguments as C++ expressions, even when the argument is defined to be a symbol name. As such names in the debug info need to be more C++-like to be parsed correctly:
* Avoid characters with special meaning (`#`, `[`, `"`, `+`).
* Never start a name with `<` or `{` as this is treated as an operator.
* `>>` is always treated as a right-shift, even when parsing generic arguments (so add a space to avoid this).
* Emit function declarations using C/C++ style syntax (e.g., leading return type).
* Emit arrays as a synthetic `array$<type, size>` type.
* Include a `$` in all synthetic types as this is a legal character for C++, but not Rust (thus we avoid collisions with user types).
|