rust - https://github.com/rust-lang/rust

Age	Commit message (Collapse)	Author	Lines
2025-09-07	optimization: Don't include ASCII characters in Unicode tables	Karl Meakin	-0/+5
	The ASCII subset of Unicode is fixed and will never change, so we don't need to generate tables for it with every new Unicode version. This saves a few bytes of static data and speeds up `char::is_control` and `char::is_grapheme_extended` on ASCII inputs. Since the table lookup functions exported from the `unicode` module will give nonsensical errors on ASCII input (and in fact will panic in debug mode), I had to add some private wrapper methods to `char` which check for ASCII-ness first.
2025-09-05	change file-is-generated doc comment to inner	Marijn Schouten	-1/+1

2025-09-03	Rollup merge of #145414 - Kmeakin:km/unicode-table-refactors, ↵	Stuart Cook	-149/+143
	r=joshtriplett,tgross35 unicode-table-generator refactors Split off from https://github.com/rust-lang/rust/pull/145219
2025-08-16	refactor: Hard-code `char::is_control`	Karl Meakin	-1/+0
	According to https://www.unicode.org/policies/stability_policy.html#Property_Value, the set of codepoints in `Cc` will never change. So we can hard-code the patterns to match against instead of using a table.
2025-08-15	refactor: Add tests for case conversions	Karl Meakin	-11/+41

2025-08-15	refactor: `generate_tests`	Karl Meakin	-52/+45
	Rewrite `generate_tests` to be more idiomatic.
2025-08-15	refactor: rewrite `ranges_from_set`	Karl Meakin	-66/+17
	The `merge_ranges` function was very complicated and hard to understand. Forunately, we can use `slice::chunk_by` to achieve the same thing.
2025-08-15	refactor: Include size of case conversion tables	Karl Meakin	-13/+35
	Include the sizes of the `to_lowercase` and `to_uppercase` tables in the total size calculations.
2025-08-15	refactor: Include table sizes in comment at top of `unicode_data.rs`	Karl Meakin	-11/+9
	To make changes in table size obvious from git diffs
2025-08-05	fix(unicode-table-generator): fix duplicated unique indices	Marco Cavenati	-1/+1
	unicode-table-generator panicked while populating distinct_indices because of duplicated indices. This was introduced by swapping the order of canonical_words.push(...) and canonical_words.len().
2025-07-18	unicode-table-gen: more clippy fixes	Marijn Schouten	-8/+8

2025-07-18	unicode-table-gen: edition 2024	Marijn Schouten	-2/+2

2025-07-18	unicode-table-gen: clippy fixes	Marijn Schouten	-35/+28

2025-07-10	Remove uncessary parens in closure body with unused lint	yukang	-1/+1

2025-03-08	Remove unneeded parentheses.	Markus Reiter	-1/+1

2025-03-08	Fix formatting.	Markus Reiter	-34/+7

2025-03-07	Use `intrinsics::assume` instead of `hint::assert_unchecked`.	Markus Reiter	-2/+8

2025-03-07	Never inline `lookup_slow`.	Markus Reiter	-0/+2

2025-03-06	Add second precondition for `skip_search`.	Markus Reiter	-28/+89

2025-03-06	Allow optimizing out `panic_bounds_check` in Unicode checks.	Markus Reiter	-14/+31

2025-02-08	Rustfmt	bjorn3	-21/+27

2024-11-27	update cfgs	Boxy	-5/+0

2024-11-12	stabilize const_unicode_case_lookup	Ralf Jung	-0/+5

2024-11-06	Auto merge of #132500 - RalfJung:char-is-whitespace-const, r=jhpratt	bors	-1/+1
	make char::is_whitespace unstably const I am adding this to the existing https://github.com/rust-lang/rust/issues/132241 feature gate, since `is_digit` and `is_whitespace` seem similar enough that one can group them together.
2024-11-03	Rollup merge of #132499 - RalfJung:unicode_data.rs, r=tgross35	Matthias Krüger	-1/+1
	unicode_data.rs: show command for generating file https://github.com/rust-lang/rust/pull/131647 made this an easily runnable tool, now we just have to mention that in the comment. :) Fixes https://github.com/rust-lang/rust/issues/131640.
2024-11-02	make char::is_whitespace unstably const	Ralf Jung	-1/+1

2024-11-02	unicode_data.rs: show command for generating file	Ralf Jung	-1/+1

2024-11-02	get rid of a whole bunch of unnecessary rustc_const_unstable attributes	Ralf Jung	-6/+0

2024-10-20	Rollup merge of #131647 - jieyouxu:unicode-table-generator, r=Mark-Simulacrum	Matthias Krüger	-5/+3
	Register `src/tools/unicode-table-generator` as a runnable tool It seems like `src/tools/unicode-table-generator` is not currently managed by bootstrap. This PR wires it up with bootstrap as a runnable tool. This tool seems to take two possible args: 1. (Mandatory) path to `library/core/src/unicode/unicode_data.rs`, and 2. (Optional) path to generate a test file. I only passed the mandatory path to `unicode_data.rs` in bootstrap and didn't do anything about (2). I'm not sure about how this tool is supposed to be run. `Cargo.lock` is modified because I renamed `unicode-table-generator`'s bin name to match the tool name, as bootstrap's tool running logic expects the bin name to be derived from the tool name. I also added a triagebot message to remind to not manually edit the library source file and edit the tool then regenerate instead, but this should probably be a tidy check (if that's desirable then that can be in a follow-up PR, though may be overkill). Helps with #131640 but does not close it because still no docs. r? `@Mark-Simulacrum` (since I think you authored this tool?)
2024-10-13	unicode-table-generator: sync comments	许杰友 Jieyou Xu (Joe)	-4/+2
	These comments were updated on master but not through this tool, so the comments in the tool became outdated. Sync the comments to stay consistent.
2024-10-13	unicode-table-generator: match bin name with tool name	许杰友 Jieyou Xu (Joe)	-1/+1
	Bootstrap assumes that the binary name is the same as tool name, just makes everyone's lives easier.
2024-10-13	switch unicode-data back to 'static'	Ralf Jung	-4/+4

2024-09-22	Reformat using the new identifier sorting from rustfmt	Michael Goulet	-29/+23

2024-07-29	Reformat `use` declarations.	Nicholas Nethercote	-11/+15
	The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-04-20	Add a lower bound check to `unicode-table-generator` output	Arpad Borsos	-3/+27
	This adds a dedicated check for the lower bound (if it is outside of ASCII range) to the output of the `unicode-table-generator` tool. This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now, as that is the only one with a lower bound outside of ASCII.
2023-04-12	remove some unneeded imports	KaDiWa	-2/+0

2023-03-21	Use hex literal for INDEX_MASK	Martin Gammelsæter	-1/+1

2023-03-16	Improve case mapping encoding scheme	Martin Gammelsæter	-49/+54
	The indices are encoded as `u32`s in the range of invalid `char`s, so that we know that if any mapping fails to parse as a `char` we should use the value for lookup in the multi-table. This avoids the second binary search in cases where a multi-`char` mapping is needed. Idea from @nikic
2023-03-16	Split unicode case LUTs in single and multi variants	Martin Gammelsæter	-13/+45
	The majority of char case replacements are single char replacements, so storing them as [char; 3] wastes a lot of space. This commit splits the replacement tables for both `to_lower` and `to_upper` into two separate tables, one with single-character mappings and one with multi-character mappings. This reduces the binary size for programs using all of these tables with roughly 24K bytes.
2023-03-15	Skip serializing ascii chars in case LUTs	Martin Gammelsæter	-14/+11
	Since ascii chars are already handled by a special case in the `to_lower` and `to_upper` functions, there's no need to waste space on them in the LUTs.
2022-09-04	Address feedback from PR #101401	Sage Mitchell	-4/+8

2022-09-04	Make `char::is_lowercase` and `char::is_uppercase` const	Sage Mitchell	-10/+16
	Implements #101400.
2022-08-28	Auto merge of #100497 - kadiwa4:remove_clone_into_iter, r=cjgillot	bors	-6/+2
	Avoid cloning a collection only to iterate over it `@rustbot` label: +C-cleanup
2022-08-27	Rollup merge of #100924 - est31:closure_to_fn_ptr, r=Mark-Simulacrum	Yuki Okushi	-17/+16
	Smaller improvements of tidy and the unicode generator
2022-08-23	Change hint to correct path	est31	-1/+1

2022-08-23	Simplify unicode_downloads.rs	est31	-16/+15
	Reduce duplication by moving fetching logic into a dedicated function.
2022-08-13	avoid cloning and then iterating	KaDiWa	-6/+2

2022-07-20	add #inline	Bruce A. MacNaughton	-0/+1

2022-07-19	formatted	Bruce A. MacNaughton	-34/+20

2022-07-19	working updates	Bruce A. MacNaughton	-2/+108