about summary refs log tree commit diff
path: root/src/libstd/sys/unix/stack_overflow.rs
diff options
context:
space:
mode:
authorDylan DPC <dylan.dpc@gmail.com>2020-03-28 15:22:00 +0100
committerGitHub <noreply@github.com>2020-03-28 15:22:00 +0100
commit7f1e6261bfd05fb411ff361bf999e48ede6237b6 (patch)
tree40d7dfc1270dde6b529ae14c896a68d1451c2583 /src/libstd/sys/unix/stack_overflow.rs
parentbbd3634f5f22e3e58cc570c37626625ef4109c14 (diff)
parentad679a7f433b60e9d5a7ce5029d50600aa919fd6 (diff)
downloadrust-7f1e6261bfd05fb411ff361bf999e48ede6237b6.tar.gz
rust-7f1e6261bfd05fb411ff361bf999e48ede6237b6.zip
Rollup merge of #70486 - Mark-Simulacrum:unicode-shrink, r=dtolnay
Shrink Unicode tables (even more)

This shrinks the Unicode tables further, building upon the wins in #68232 (the previous counts differ due to an interim Unicode version update, see #69929.

The new data structure is slower by around 3x, on the benchmark of looking up every Unicode scalar value in each data set sequentially in every data set included. Note that for ASCII, the exposed functions on `char` optimize with direct branches, so ASCII will retain the same performance regardless of internal optimizations (or the reverse). Also, note that the size reduction due to the skip list (from where the performance losses come) is around 40%, and, as a result, I believe the performance loss is acceptable, as the routines are still quite fast. Anywhere where this is hot, should probably be using a custom data structure anyway (e.g., a raw bitset) or something optimized for frequently seen values, etc.

This PR updates both the bitset data structure, and introduces a new data structure similar to a skip list. For more details, see the [main.rs] of the table generator, which describes both. The commits mostly work individually and document size wins.

As before, this is tested on all valid chars to have the same results as nightly (and the canonical Unicode data sets), happily, no bugs were found.

[main.rs]: https://github.com/rust-lang/rust/blob/fb4a715e18b/src/tools/unicode-table-generator/src/main.rs

Set             | Previous |  New  |  % of old  | Codepoints | Ranges |
----------------|---------:|------:|-----------:|-----------:|-------:|
Alphabetic      |     3055 |  1599 |        52% |     132875 |    695 |
Case Ignorable  |     2136 |   949 |        44% |       2413 |    410 |
Cased           |      934 |   359 |        38% |       4286 |    141 |
Cc              |       43 |     9 |        20% |         65 |      2 |
Grapheme Extend |     1774 |   813 |        46% |       1979 |    344 |
Lowercase       |      985 |   867 |        88% |       2344 |    652 |
N               |     1266 |   419 |        33% |       1781 |    133 |
Uppercase       |      934 |   777 |        83% |       1911 |    643 |
White_Space     |      140 |    37 |        26% |         25 |     10 |
----------------|----------|-------|------------|------------|--------|
Total           |    11267 |  5829 |        51% |     -      |   -    |
Diffstat (limited to 'src/libstd/sys/unix/stack_overflow.rs')
0 files changed, 0 insertions, 0 deletions