diff options
| author | Mark Rousskov <mark.simulacrum@gmail.com> | 2020-03-21 10:16:01 -0400 |
|---|---|---|
| committer | Mark Rousskov <mark.simulacrum@gmail.com> | 2020-03-21 11:22:00 -0400 |
| commit | b0e121d9d588b334eaa1b68a127f5ee0fcda4296 (patch) | |
| tree | d3bc693f5f4e5d894ccfcf1173052c381b8ff0f5 /src/test/codegen/src-hash-algorithm/src-hash-algorithm-md5.rs | |
| parent | 6c7691a37bf485b28fecb6856e6ede8fa952f99e (diff) | |
| download | rust-b0e121d9d588b334eaa1b68a127f5ee0fcda4296.tar.gz rust-b0e121d9d588b334eaa1b68a127f5ee0fcda4296.zip | |
Shrink bitset words through functional mapping
Previously, all words in the (deduplicated) bitset would be stored raw -- a full 64 bits (8 bytes). Now, those words that are equivalent to others through a specific mapping are stored separately and "mapped" to the original when loading; this shrinks the table sizes significantly, as each mapped word is stored in 2 bytes (a 4x decrease from the previous). The new encoding is also potentially non-optimal: the "mapped" byte is frequently repeated, as in practice many mapped words use the same base word. Currently we only support two forms of mapping: rotation and inversion. Note that these are both guaranteed to map transitively if at all, and supporting mappings for which this is not true may require a more interesting algorithm for choosing the optimal pairing. Updated sizes: Alphabetic : 2622 bytes (- 414 bytes) Case_Ignorable : 1803 bytes (- 330 bytes) Cased : 808 bytes (- 126 bytes) Cc : 32 bytes Grapheme_Extend: 1508 bytes (- 252 bytes) Lowercase : 901 bytes (- 84 bytes) N : 1064 bytes (- 156 bytes) Uppercase : 838 bytes (- 96 bytes) White_Space : 91 bytes (- 6 bytes) Total table sizes: 9667 bytes (-1,464 bytes)
Diffstat (limited to 'src/test/codegen/src-hash-algorithm/src-hash-algorithm-md5.rs')
0 files changed, 0 insertions, 0 deletions
