about summary refs log tree commit diff
path: root/compiler/rustc_data_structures/src
diff options
context:
space:
mode:
authorbors <bors@rust-lang.org>2021-03-07 23:45:57 +0000
committerbors <bors@rust-lang.org>2021-03-07 23:45:57 +0000
commit76c500ec6c36fa8287317d6dc342a64c079301de (patch)
treef80d97279a800721b171c03b0672243a79c76f7f /compiler/rustc_data_structures/src
parent234781afe33d3f339b002f85f948046d8476cfc9 (diff)
parent9e5054498b181fc3984e266d1aa05f076dfea22f (diff)
downloadrust-76c500ec6c36fa8287317d6dc342a64c079301de.tar.gz
rust-76c500ec6c36fa8287317d6dc342a64c079301de.zip
Auto merge of #81635 - michaelwoerister:structured_def_path_hash, r=pnkfelix
Let a portion of DefPathHash uniquely identify the DefPath's crate.

This allows to directly map from a `DefPathHash` to the crate it originates from, without constructing side tables to do that mapping -- something that is useful for incremental compilation where we deal with `DefPathHash` instead of `DefId` a lot.

It also allows to reliably and cheaply check for `DefPathHash` collisions which allows the compiler to gracefully abort compilation instead of running into a subsequent ICE at some random place in the code.

The following new piece of documentation describes the most interesting aspects of the changes:

```rust
/// A `DefPathHash` is a fixed-size representation of a `DefPath` that is
/// stable across crate and compilation session boundaries. It consists of two
/// separate 64-bit hashes. The first uniquely identifies the crate this
/// `DefPathHash` originates from (see [StableCrateId]), and the second
/// uniquely identifies the corresponding `DefPath` within that crate. Together
/// they form a unique identifier within an entire crate graph.
///
/// There is a very small chance of hash collisions, which would mean that two
/// different `DefPath`s map to the same `DefPathHash`. Proceeding compilation
/// with such a hash collision would very probably lead to an ICE and, in the
/// worst case, to a silent mis-compilation. The compiler therefore actively
/// and exhaustively checks for such hash collisions and aborts compilation if
/// it finds one.
///
/// `DefPathHash` uses 64-bit hashes for both the crate-id part and the
/// crate-internal part, even though it is likely that there are many more
/// `LocalDefId`s in a single crate than there are individual crates in a crate
/// graph. Since we use the same number of bits in both cases, the collision
/// probability for the crate-local part will be quite a bit higher (though
/// still very small).
///
/// This imbalance is not by accident: A hash collision in the
/// crate-local part of a `DefPathHash` will be detected and reported while
/// compiling the crate in question. Such a collision does not depend on
/// outside factors and can be easily fixed by the crate maintainer (e.g. by
/// renaming the item in question or by bumping the crate version in a harmless
/// way).
///
/// A collision between crate-id hashes on the other hand is harder to fix
/// because it depends on the set of crates in the entire crate graph of a
/// compilation session. Again, using the same crate with a different version
/// number would fix the issue with a high probability -- but that might be
/// easier said then done if the crates in questions are dependencies of
/// third-party crates.
///
/// That being said, given a high quality hash function, the collision
/// probabilities in question are very small. For example, for a big crate like
/// `rustc_middle` (with ~50000 `LocalDefId`s as of the time of writing) there
/// is a probability of roughly 1 in 14,750,000,000 of a crate-internal
/// collision occurring. For a big crate graph with 1000 crates in it, there is
/// a probability of 1 in 36,890,000,000,000 of a `StableCrateId` collision.
```

Given the probabilities involved I hope that no one will ever actually see the error messages. Nonetheless, I'd be glad about some feedback on how to improve them. Should we create a GH issue describing the problem and possible solutions to point to? Or a page in the rustc book?

r? `@pnkfelix` (feel free to re-assign)
Diffstat (limited to 'compiler/rustc_data_structures/src')
-rw-r--r--compiler/rustc_data_structures/src/fingerprint.rs28
1 files changed, 25 insertions, 3 deletions
diff --git a/compiler/rustc_data_structures/src/fingerprint.rs b/compiler/rustc_data_structures/src/fingerprint.rs
index 08c3419a842..681b49e2ea9 100644
--- a/compiler/rustc_data_structures/src/fingerprint.rs
+++ b/compiler/rustc_data_structures/src/fingerprint.rs
@@ -7,19 +7,30 @@ use std::hash::{Hash, Hasher};
 use std::mem::{self, MaybeUninit};
 
 #[derive(Eq, PartialEq, Ord, PartialOrd, Debug, Clone, Copy)]
+#[repr(C)]
 pub struct Fingerprint(u64, u64);
 
 impl Fingerprint {
     pub const ZERO: Fingerprint = Fingerprint(0, 0);
 
     #[inline]
+    pub fn new(_0: u64, _1: u64) -> Fingerprint {
+        Fingerprint(_0, _1)
+    }
+
+    #[inline]
     pub fn from_smaller_hash(hash: u64) -> Fingerprint {
         Fingerprint(hash, hash)
     }
 
     #[inline]
     pub fn to_smaller_hash(&self) -> u64 {
-        self.0
+        // Even though both halves of the fingerprint are expected to be good
+        // quality hash values, let's still combine the two values because the
+        // Fingerprints in DefPathHash have the StableCrateId portion which is
+        // the same for all DefPathHashes from the same crate. Combining the
+        // two halfs makes sure we get a good quality hash in such cases too.
+        self.0.wrapping_mul(3).wrapping_add(self.1)
     }
 
     #[inline]
@@ -92,8 +103,19 @@ impl<H: Hasher> FingerprintHasher for H {
 impl FingerprintHasher for crate::unhash::Unhasher {
     #[inline]
     fn write_fingerprint(&mut self, fingerprint: &Fingerprint) {
-        // `Unhasher` only wants a single `u64`
-        self.write_u64(fingerprint.0);
+        // Even though both halves of the fingerprint are expected to be good
+        // quality hash values, let's still combine the two values because the
+        // Fingerprints in DefPathHash have the StableCrateId portion which is
+        // the same for all DefPathHashes from the same crate. Combining the
+        // two halfs makes sure we get a good quality hash in such cases too.
+        //
+        // Since `Unhasher` is used only in the context of HashMaps, it is OK
+        // to combine the two components in an order-independent way (which is
+        // cheaper than the more robust Fingerprint::to_smaller_hash()). For
+        // HashMaps we don't really care if Fingerprint(x,y) and
+        // Fingerprint(y, x) result in the same hash value. Collision
+        // probability will still be much better than with FxHash.
+        self.write_u64(fingerprint.0.wrapping_add(fingerprint.1));
     }
 }