diff options
| author | bors <bors@rust-lang.org> | 2023-03-24 10:33:42 +0000 |
|---|---|---|
| committer | bors <bors@rust-lang.org> | 2023-03-24 10:33:42 +0000 |
| commit | f421586eed77de266a3f99ffa8a5687b7d2d893c (patch) | |
| tree | 75671286cb63e9b0b93b6cc55510d2a50c32af5f /compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp | |
| parent | c763eceae349c1d827d9cfbf5df21ca40b21c861 (diff) | |
| parent | 54f55efb9a147e8a7b5073d24c0cc67f0aad5a13 (diff) | |
| download | rust-f421586eed77de266a3f99ffa8a5687b7d2d893c.tar.gz rust-f421586eed77de266a3f99ffa8a5687b7d2d893c.zip | |
Auto merge of #109216 - martingms:unicode-case-lut-shrink, r=Mark-Simulacrum
Shrink unicode case-mapping LUTs by 24k I was looking into the binary bloat of a small program using `str::to_lowercase` and `str::to_uppercase`, and noticed that the lookup tables used for case mapping had a lot of zero-bytes in them. The reason for this is that since some characters map to up to three other characters when lower or uppercased, the LUTs store a `[char; 3]` for each character. However, the vast majority of cases only map to a single new character, in other words most of the entries are e.g. `(lowerc, [upperc, '\0', '\0'])`. This PR introduces a new encoding scheme for these tables. The changes reduces the size of my test binary by about 24K. I've also done some `#[bench]`marks on unicode-heavy test data, and found that the performance of both `str::to_lowercase` and `str::to_uppercase` improves by up to 20%. These measurements are obviously very dependent on the character distribution of the data. Someone else will have to decide whether this more complex scheme is worth it or not, I was just goofing around a bit and here's what came out of it :man_shrugging: No hard feelings if this isn't wanted!
Diffstat (limited to 'compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp')
0 files changed, 0 insertions, 0 deletions
