rust - https://github.com/rust-lang/rust

Age	Commit message (Collapse)	Author	Lines
2025-09-07	optimization: Don't include ASCII characters in Unicode tables	Karl Meakin	-243/+276
	The ASCII subset of Unicode is fixed and will never change, so we don't need to generate tables for it with every new Unicode version. This saves a few bytes of static data and speeds up `char::is_control` and `char::is_grapheme_extended` on ASCII inputs. Since the table lookup functions exported from the `unicode` module will give nonsensical errors on ASCII input (and in fact will panic in debug mode), I had to add some private wrapper methods to `char` which check for ASCII-ness first.
2025-09-05	change file-is-generated doc comment to inner	Marijn Schouten	-1/+1

2025-09-03	Rollup merge of #145414 - Kmeakin:km/unicode-table-refactors, ↵	Stuart Cook	-4/+16
	r=joshtriplett,tgross35 unicode-table-generator refactors Split off from https://github.com/rust-lang/rust/pull/145219
2025-08-30	Auto merge of #145479 - Kmeakin:km/hardcode-char-is-control, r=joboet	bors	-26/+0
	Hard-code `char::is_control` Split off from https://github.com/rust-lang/rust/pull/145219 According to https://www.unicode.org/policies/stability_policy.html#Property_Value, the set of codepoints in `Cc` will never change. So we can hard-code the patterns to match against instead of using a table. This doesn't change the generated assembly, since the lookup table is small enough that[ LLVM is able to inline the whole search](https://godbolt.org/z/bG8dM37YG). But this does reduce the chance of regressions if LLVM's heuristics change in the future, and means less generated Rust code checked in to `unicode-data.rs`.
2025-08-16	refactor: Hard-code `char::is_control`	Karl Meakin	-26/+0
	According to https://www.unicode.org/policies/stability_policy.html#Property_Value, the set of codepoints in `Cc` will never change. So we can hard-code the patterns to match against instead of using a table.
2025-08-15	refactor: Include size of case conversion tables	Karl Meakin	-5/+7
	Include the sizes of the `to_lowercase` and `to_uppercase` tables in the total size calculations.
2025-08-15	refactor: Include table sizes in comment at top of `unicode_data.rs`	Karl Meakin	-0/+10
	To make changes in table size obvious from git diffs
2025-08-13	Hide docs for core::unicode	ltdk	-2/+2

2025-07-10	Remove uncessary parens in closure body with unused lint	yukang	-1/+1

2025-03-08	Remove unneeded parentheses.	Markus Reiter	-6/+6

2025-03-07	Use `intrinsics::assume` instead of `hint::assert_unchecked`.	Markus Reiter	-2/+8

2025-03-07	Never inline `lookup_slow`.	Markus Reiter	-0/+2

2025-03-06	Add second precondition for `skip_search`.	Markus Reiter	-57/+205

2025-03-06	Allow optimizing out `panic_bounds_check` in Unicode checks.	Markus Reiter	-39/+34

2025-01-20	core: add `#![warn(unreachable_pub)]`	Urgau	-0/+2

2024-12-04	Reformat Python code with `ruff`	Jakub Beránek	-19/+34

2024-11-27	update cfgs	Boxy	-3/+0

2024-11-12	stabilize const_unicode_case_lookup	Ralf Jung	-0/+3

2024-11-06	Auto merge of #132500 - RalfJung:char-is-whitespace-const, r=jhpratt	bors	-1/+1
	make char::is_whitespace unstably const I am adding this to the existing https://github.com/rust-lang/rust/issues/132241 feature gate, since `is_digit` and `is_whitespace` seem similar enough that one can group them together.
2024-11-03	Rollup merge of #132499 - RalfJung:unicode_data.rs, r=tgross35	Matthias Krüger	-1/+1
	unicode_data.rs: show command for generating file https://github.com/rust-lang/rust/pull/131647 made this an easily runnable tool, now we just have to mention that in the comment. :) Fixes https://github.com/rust-lang/rust/issues/131640.
2024-11-02	make char::is_whitespace unstably const	Ralf Jung	-1/+1

2024-11-02	unicode_data.rs: show command for generating file	Ralf Jung	-1/+1

2024-11-02	get rid of a whole bunch of unnecessary rustc_const_unstable attributes	Ralf Jung	-3/+0

2024-10-13	switch unicode-data back to 'static'	Ralf Jung	-8/+8

2024-09-12	Rollup merge of #130101 - RalfJung:const-cleanup, r=fee1-dead	Matthias Krüger	-4/+2
	some const cleanup: remove unnecessary attributes, add const-hack indications I learned that we use `FIXME(const-hack)` on top of the "const-hack" label. That seems much better since it marks the right place in the code and moves around with the code. So I went through the PRs with that label and added appropriate FIXMEs in the code. IMO this means we can then remove the label -- Cc ``@rust-lang/wg-const-eval.`` I also noticed some const stability attributes that don't do anything useful, and removed them. r? ``@fee1-dead``
2024-09-10	Bump unicode printable to version 16.0.0	Marcondiro	-57/+73

2024-09-10	Bump unicode_data to version 16.0.0	Marcondiro	-651/+670

2024-09-08	add FIXME(const-hack)	Ralf Jung	-4/+2

2024-07-19	Use `#[rustfmt::skip]` on some `use` groups to prevent reordering.	Nicholas Nethercote	-4/+6
	`use` declarations will be reformatted in #125443. Very rarely, there is a desire to force a group of `use` declarations together in a way that auto-formatting will break up. E.g. when you want a single comment to apply to a group. #126776 dealt with all of these in the codebase, ensuring that no comments intended for multiple `use` declarations would end up in the wrong place. But some people were unhappy with it. This commit uses `#[rustfmt::skip]` to create these custom `use` groups in an idiomatic way for a few of the cases changed in #126776. This works because rustfmt treats any `use` item annotated with `#[rustfmt::skip]` as a barrier and won't reorder other `use` items around it.
2024-07-17	Avoid comments that describe multiple `use` items.	Nicholas Nethercote	-13/+13
	There are some comments describing multiple subsequent `use` items. When the big `use` reformatting happens some of these `use` items will be reordered, possibly moving them away from the comment. With this additional level of formatting it's not really feasible to have comments of this type. This commit removes them in various ways: - merging separate `use` items when appropriate; - inserting blank lines between the comment and the first `use` item; - outright deletion (for comments that are relatively low-value); - adding a separate "top-level" comment. We also entirely skip formatting for four library files that contain nothing but `pub use` re-exports, where reordering would be painful.
2024-04-20	Add a lower bound check to `unicode-table-generator` output	Arpad Borsos	-0/+4
	This adds a dedicated check for the lower bound (if it is outside of ASCII range) to the output of the `unicode-table-generator` tool. This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now, as that is the only one with a lower bound outside of ASCII.
2024-03-28	Bump Unicode printables to version 15.1, align to unicode_data	Marcondiro	-12/+14

2024-02-09	Bump Unicode to version 15.1.0, regenerate tables	Marcondiro	-6/+6

2023-06-16	Apply changes to fix python linting errors	Trevor Gross	-1/+1

2023-03-21	Use hex literal for INDEX_MASK	Martin Gammelsæter	-1/+1

2023-03-16	Improve case mapping encoding scheme	Martin Gammelsæter	-1045/+779
	The indices are encoded as `u32`s in the range of invalid `char`s, so that we know that if any mapping fails to parse as a `char` we should use the value for lookup in the multi-table. This avoids the second binary search in cases where a multi-`char` mapping is needed. Idea from @nikic
2023-03-16	Split unicode case LUTs in single and multi variants	Martin Gammelsæter	-1682/+963
	The majority of char case replacements are single char replacements, so storing them as [char; 3] wastes a lot of space. This commit splits the replacement tables for both `to_lower` and `to_upper` into two separate tables, one with single-character mappings and one with multi-character mappings. This reduces the binary size for programs using all of these tables with roughly 24K bytes.
2023-03-15	Skip serializing ascii chars in case LUTs	Martin Gammelsæter	-26/+0
	Since ascii chars are already handled by a special case in the `to_lower` and `to_upper` functions, there's no need to waste space on them in the LUTs.
2022-12-30	Replace libstd, libcore, liballoc in line comments.	jonathanCogan	-1/+1

2022-09-14	Bump Unicode to version 15.0.0, regenerate tables	Thom Chiovoloni	-173/+190

2022-09-04	Address feedback from PR #101401	Sage Mitchell	-8/+12

2022-09-04	Make `char::is_lowercase` and `char::is_uppercase` const	Sage Mitchell	-15/+18
	Implements #101400.
2022-07-20	add #inline	Bruce A. MacNaughton	-0/+1

2022-07-19	generated code	Bruce A. MacNaughton	-10/+17

2022-05-31	Add unicode fast path to `is_printable`	Nilstrieb	-4/+18
	Before, it would enter the full expensive check even for normal ascii characters. Now, it skips the check for the ascii characters in `32..127`. This range was checked manually from the current behavior.
2021-10-06	Regenerate tables for Unicode 14.0.0	Josh Stone	-553/+653

2021-06-23	Use HTTPS links where possible	Smitty	-2/+2

2021-02-26	Add a check for ASCII characters in to_upper and to_lower	Miccah Castorina	-6/+14
	This extra check has better performance. See discussion here: https://internals.rust-lang.org/t/to-upper-speed/13896
2020-12-07	Privatize some of libcore unicode_internals	Aleksey Kladov	-13/+10
	My understanding is that these API are perma unstable, so it doesn't make sense to pollute docs & IDE completion[1] with them. [1]: https://github.com/rust-analyzer/rust-analyzer/issues/6738
2020-07-27	mv std libs to library/	mark	-0/+3103