Shrink bitset words through functional mapping - rust - https://github.com/rust-lang/rust

diff options

author	Mark Rousskov <mark.simulacrum@gmail.com>	2020-03-21 10:16:01 -0400
committer	Mark Rousskov <mark.simulacrum@gmail.com>	2020-03-21 11:22:00 -0400
commit	b0e121d9d588b334eaa1b68a127f5ee0fcda4296 (patch)
tree	d3bc693f5f4e5d894ccfcf1173052c381b8ff0f5 /src/test/codegen/src-hash-algorithm/src-hash-algorithm-md5.rs
parent	6c7691a37bf485b28fecb6856e6ede8fa952f99e (diff)
download	rust-b0e121d9d588b334eaa1b68a127f5ee0fcda4296.tar.gz rust-b0e121d9d588b334eaa1b68a127f5ee0fcda4296.zip

Shrink bitset words through functional mapping

Previously, all words in the (deduplicated) bitset would be stored raw -- a full
64 bits (8 bytes). Now, those words that are equivalent to others through a
specific mapping are stored separately and "mapped" to the original when
loading; this shrinks the table sizes significantly, as each mapped word is
stored in 2 bytes (a 4x decrease from the previous).

The new encoding is also potentially non-optimal: the "mapped" byte is
frequently repeated, as in practice many mapped words use the same base word.

Currently we only support two forms of mapping: rotation and inversion. Note
that these are both guaranteed to map transitively if at all, and supporting
mappings for which this is not true may require a more interesting algorithm for
choosing the optimal pairing.

Updated sizes:

Alphabetic     : 2622 bytes     (-  414 bytes)
Case_Ignorable : 1803 bytes     (-  330 bytes)
Cased          : 808 bytes      (-  126 bytes)
Cc             : 32 bytes
Grapheme_Extend: 1508 bytes     (-  252 bytes)
Lowercase      : 901 bytes      (-   84 bytes)
N              : 1064 bytes     (-  156 bytes)
Uppercase      : 838 bytes      (-   96 bytes)
White_Space    : 91 bytes       (-    6 bytes)
Total table sizes: 9667 bytes   (-1,464 bytes)

Diffstat (limited to 'src/test/codegen/src-hash-algorithm/src-hash-algorithm-md5.rs')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: