| Age | Commit message (Collapse) | Author | Lines |
|
The ASCII subset of Unicode is fixed and will never change, so we don't
need to generate tables for it with every new Unicode version. This
saves a few bytes of static data and speeds up `char::is_control` and
`char::is_grapheme_extended` on ASCII inputs.
Since the table lookup functions exported from the `unicode` module will
give nonsensical errors on ASCII input (and in fact will panic in debug
mode), I had to add some private wrapper methods to `char` which check
for ASCII-ness first.
|
|
|
|
r=joshtriplett,tgross35
unicode-table-generator refactors
Split off from https://github.com/rust-lang/rust/pull/145219
|
|
Hard-code `char::is_control`
Split off from https://github.com/rust-lang/rust/pull/145219
According to
https://www.unicode.org/policies/stability_policy.html#Property_Value, the set of codepoints in `Cc` will never change. So we can hard-code the patterns to match against instead of using a table.
This doesn't change the generated assembly, since the lookup table is small enough that[ LLVM is able to inline the whole search](https://godbolt.org/z/bG8dM37YG). But this does reduce the chance of regressions if LLVM's heuristics change in the future, and means less generated Rust code checked in to `unicode-data.rs`.
|
|
According to
https://www.unicode.org/policies/stability_policy.html#Property_Value,
the set of codepoints in `Cc` will never change. So we can hard-code
the patterns to match against instead of using a table.
|
|
Include the sizes of the `to_lowercase` and `to_uppercase` tables in the
total size calculations.
|
|
To make changes in table size obvious from git diffs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
make char::is_whitespace unstably const
I am adding this to the existing https://github.com/rust-lang/rust/issues/132241 feature gate, since `is_digit` and `is_whitespace` seem similar enough that one can group them together.
|
|
unicode_data.rs: show command for generating file
https://github.com/rust-lang/rust/pull/131647 made this an easily runnable tool, now we just have to mention that in the comment. :)
Fixes https://github.com/rust-lang/rust/issues/131640.
|
|
|
|
|
|
|
|
|
|
some const cleanup: remove unnecessary attributes, add const-hack indications
I learned that we use `FIXME(const-hack)` on top of the "const-hack" label. That seems much better since it marks the right place in the code and moves around with the code. So I went through the PRs with that label and added appropriate FIXMEs in the code. IMO this means we can then remove the label -- Cc ``@rust-lang/wg-const-eval.``
I also noticed some const stability attributes that don't do anything useful, and removed them.
r? ``@fee1-dead``
|
|
|
|
|
|
|
|
`use` declarations will be reformatted in #125443. Very rarely, there is
a desire to force a group of `use` declarations together in a way that
auto-formatting will break up. E.g. when you want a single comment to
apply to a group. #126776 dealt with all of these in the codebase,
ensuring that no comments intended for multiple `use` declarations would
end up in the wrong place. But some people were unhappy with it.
This commit uses `#[rustfmt::skip]` to create these custom `use` groups
in an idiomatic way for a few of the cases changed in #126776. This
works because rustfmt treats any `use` item annotated with
`#[rustfmt::skip]` as a barrier and won't reorder other `use` items
around it.
|
|
There are some comments describing multiple subsequent `use` items. When
the big `use` reformatting happens some of these `use` items will be
reordered, possibly moving them away from the comment. With this
additional level of formatting it's not really feasible to have comments
of this type. This commit removes them in various ways:
- merging separate `use` items when appropriate;
- inserting blank lines between the comment and the first `use` item;
- outright deletion (for comments that are relatively low-value);
- adding a separate "top-level" comment.
We also entirely skip formatting for four library files that contain
nothing but `pub use` re-exports, where reordering would be painful.
|
|
This adds a dedicated check for the lower bound
(if it is outside of ASCII range) to the output of the `unicode-table-generator` tool.
This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now,
as that is the only one with a lower bound outside of ASCII.
|
|
|
|
|
|
|
|
|
|
The indices are encoded as `u32`s in the range of invalid `char`s, so
that we know that if any mapping fails to parse as a `char` we should
use the value for lookup in the multi-table.
This avoids the second binary search in cases where a multi-`char`
mapping is needed.
Idea from @nikic
|
|
The majority of char case replacements are single char replacements,
so storing them as [char; 3] wastes a lot of space.
This commit splits the replacement tables for both `to_lower` and
`to_upper` into two separate tables, one with single-character mappings
and one with multi-character mappings.
This reduces the binary size for programs using all of these tables
with roughly 24K bytes.
|
|
Since ascii chars are already handled by a special case in the
`to_lower` and `to_upper` functions, there's no need to waste space on
them in the LUTs.
|
|
|
|
|
|
|
|
Implements #101400.
|
|
|
|
|
|
Before, it would enter the full expensive check even for normal ascii
characters. Now, it skips the check for the ascii characters in
`32..127`. This range was checked manually from the current behavior.
|
|
|
|
|
|
This extra check has better performance. See discussion here:
https://internals.rust-lang.org/t/to-upper-speed/13896
|
|
My understanding is that these API are perma unstable, so it doesn't
make sense to pollute docs & IDE completion[1] with them.
[1]: https://github.com/rust-analyzer/rust-analyzer/issues/6738
|
|
|