rust - https://github.com/rust-lang/rust

Age	Commit message (Collapse)	Author	Lines
2024-11-27	update cfgs	Boxy	-5/+0

2024-11-12	stabilize const_unicode_case_lookup	Ralf Jung	-0/+5

2024-11-06	Auto merge of #132500 - RalfJung:char-is-whitespace-const, r=jhpratt	bors	-1/+1
	make char::is_whitespace unstably const I am adding this to the existing https://github.com/rust-lang/rust/issues/132241 feature gate, since `is_digit` and `is_whitespace` seem similar enough that one can group them together.
2024-11-03	Rollup merge of #132499 - RalfJung:unicode_data.rs, r=tgross35	Matthias Krüger	-1/+1
	unicode_data.rs: show command for generating file https://github.com/rust-lang/rust/pull/131647 made this an easily runnable tool, now we just have to mention that in the comment. :) Fixes https://github.com/rust-lang/rust/issues/131640.
2024-11-02	make char::is_whitespace unstably const	Ralf Jung	-1/+1

2024-11-02	unicode_data.rs: show command for generating file	Ralf Jung	-1/+1

2024-11-02	get rid of a whole bunch of unnecessary rustc_const_unstable attributes	Ralf Jung	-6/+0

2024-10-20	Rollup merge of #131647 - jieyouxu:unicode-table-generator, r=Mark-Simulacrum	Matthias Krüger	-5/+3
	Register `src/tools/unicode-table-generator` as a runnable tool It seems like `src/tools/unicode-table-generator` is not currently managed by bootstrap. This PR wires it up with bootstrap as a runnable tool. This tool seems to take two possible args: 1. (Mandatory) path to `library/core/src/unicode/unicode_data.rs`, and 2. (Optional) path to generate a test file. I only passed the mandatory path to `unicode_data.rs` in bootstrap and didn't do anything about (2). I'm not sure about how this tool is supposed to be run. `Cargo.lock` is modified because I renamed `unicode-table-generator`'s bin name to match the tool name, as bootstrap's tool running logic expects the bin name to be derived from the tool name. I also added a triagebot message to remind to not manually edit the library source file and edit the tool then regenerate instead, but this should probably be a tidy check (if that's desirable then that can be in a follow-up PR, though may be overkill). Helps with #131640 but does not close it because still no docs. r? `@Mark-Simulacrum` (since I think you authored this tool?)
2024-10-13	unicode-table-generator: sync comments	许杰友 Jieyou Xu (Joe)	-4/+2
	These comments were updated on master but not through this tool, so the comments in the tool became outdated. Sync the comments to stay consistent.
2024-10-13	unicode-table-generator: match bin name with tool name	许杰友 Jieyou Xu (Joe)	-1/+1
	Bootstrap assumes that the binary name is the same as tool name, just makes everyone's lives easier.
2024-10-13	switch unicode-data back to 'static'	Ralf Jung	-4/+4

2024-09-22	Reformat using the new identifier sorting from rustfmt	Michael Goulet	-29/+23

2024-07-29	Reformat `use` declarations.	Nicholas Nethercote	-11/+15
	The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-04-20	Add a lower bound check to `unicode-table-generator` output	Arpad Borsos	-3/+27
	This adds a dedicated check for the lower bound (if it is outside of ASCII range) to the output of the `unicode-table-generator` tool. This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now, as that is the only one with a lower bound outside of ASCII.
2023-04-12	remove some unneeded imports	KaDiWa	-2/+0

2023-03-21	Use hex literal for INDEX_MASK	Martin Gammelsæter	-1/+1

2023-03-16	Improve case mapping encoding scheme	Martin Gammelsæter	-49/+54
	The indices are encoded as `u32`s in the range of invalid `char`s, so that we know that if any mapping fails to parse as a `char` we should use the value for lookup in the multi-table. This avoids the second binary search in cases where a multi-`char` mapping is needed. Idea from @nikic
2023-03-16	Split unicode case LUTs in single and multi variants	Martin Gammelsæter	-13/+45
	The majority of char case replacements are single char replacements, so storing them as [char; 3] wastes a lot of space. This commit splits the replacement tables for both `to_lower` and `to_upper` into two separate tables, one with single-character mappings and one with multi-character mappings. This reduces the binary size for programs using all of these tables with roughly 24K bytes.
2023-03-15	Skip serializing ascii chars in case LUTs	Martin Gammelsæter	-14/+11
	Since ascii chars are already handled by a special case in the `to_lower` and `to_upper` functions, there's no need to waste space on them in the LUTs.
2022-09-04	Address feedback from PR #101401	Sage Mitchell	-4/+8

2022-09-04	Make `char::is_lowercase` and `char::is_uppercase` const	Sage Mitchell	-10/+16
	Implements #101400.
2022-08-28	Auto merge of #100497 - kadiwa4:remove_clone_into_iter, r=cjgillot	bors	-6/+2
	Avoid cloning a collection only to iterate over it `@rustbot` label: +C-cleanup
2022-08-27	Rollup merge of #100924 - est31:closure_to_fn_ptr, r=Mark-Simulacrum	Yuki Okushi	-17/+16
	Smaller improvements of tidy and the unicode generator
2022-08-23	Change hint to correct path	est31	-1/+1

2022-08-23	Simplify unicode_downloads.rs	est31	-16/+15
	Reduce duplication by moving fetching logic into a dedicated function.
2022-08-13	avoid cloning and then iterating	KaDiWa	-6/+2

2022-07-20	add #inline	Bruce A. MacNaughton	-0/+1

2022-07-19	formatted	Bruce A. MacNaughton	-34/+20

2022-07-19	working updates	Bruce A. MacNaughton	-2/+108

2022-03-10	Use implicit capture syntax in format_args	T-O-R-U-S	-8/+7
	This updates the standard library's documentation to use the new syntax. The documentation is worthwhile to update as it should be more idiomatic (particularly for features like this, which are nice for users to get acquainted with). The general codebase is likely more hassle than benefit to update: it'll hurt git blame, and generally updates can be done by folks updating the code if (and when) that makes things more readable with the new format. A few places in the compiler and library code are updated (mostly just due to already having been done when this commit was first authored).
2021-10-06	Let unicode-table-generator fail gracefully for bitsets	Josh Stone	-4/+6
	The "Alphabetic" property in Unicode 14 grew too big for the bitset representation, panicking "cannot pack 264 into 8 bits". However, we were already choosing the skiplist for that anyway, so this doesn't need to be a hard failure. That panic is now a returned `Err`, and then in `emit_codepoints` we automatically defer to skiplist.
2021-10-06	Redo #81358 in unicode-table-generator	Josh Stone	-7/+15

2021-09-20	Migrate to 2021	Mark Rousskov	-1/+1

2021-07-29	rfc3052: Remove authors field from Cargo manifests	Jade	-1/+0
	Since RFC 3052 soft deprecated the authors field anyway, hiding it from crates.io, docs.rs, and making Cargo not add it by default, and it is not generally up to date/useful information, we should remove it from crates in this repo.
2020-08-24	unicode_table_generator: fix clippy::writeln_empty_string, ↵	Matthias Krüger	-6/+6
	clippy::useless_format, clippy:::for_kv_map
2020-08-06	Fix typo "biset" -> "bitset"	Izzy Swart	-1/+1

2020-07-27	mv std libs to library/	mark	-1/+1

2020-06-10	Migrate to numeric associated consts	Lzu Tao	-1/+1

2020-04-11	Store UNICODE_VERSION as a tuple	Pyfisch	-1/+1
	Remove the UnicodeVersion struct containing major, minor and update fields and replace it with a 3-tuple containing the version number. As the value of each field is limited to 255 use u8 to store them.
2020-03-27	Update the documentation comment	Mark Rousskov	-39/+73

2020-03-27	Remove separate encoding for a single nonzero-mapping byte	Mark Rousskov	-31/+2
	In practice, for the two data sets that still use the bitset encoding (uppercase and lowercase) this is not a significant win, so just drop it entirely. It costs us about 5 bytes, and the complexity is nontrivial.
2020-03-27	Add skip list based implementation for smaller encoding	Mark Rousskov	-42/+222
	This arranges for the sparser sets (everything except lower and uppercase) to be encoded in a significantly smaller context. However, it is also a performance trade-off (roughly 3x slower than the bitset encoding). The 40% size reduction is deemed to be sufficiently important to merit this performance loss, particularly as it is unlikely that this code is hot anywhere (and if it is, paying the memory cost for a bitset that directly represents the data seems worthwhile). Alphabetic : 1599 bytes (- 937 bytes) Case_Ignorable : 949 bytes (- 822 bytes) Cased : 359 bytes (- 429 bytes) Cc : 9 bytes (- 15 bytes) Grapheme_Extend: 813 bytes (- 675 bytes) Lowercase : 863 bytes N : 419 bytes (- 619 bytes) Uppercase : 776 bytes White_Space : 37 bytes (- 46 bytes) Total table sizes: 5824 bytes (-3543 bytes)
2020-03-24	Add richer printing	Mark Rousskov	-1/+9

2020-03-21	Avoid relying on const parameters to function	Mark Rousskov	-4/+4
	LLVM seems to at least sometimes optimize better when the length comes directly from the `len()` of the array vs. an equivalent integer. Also, this allows easier copy/pasting of the function into compiler explorer for experimentation.
2020-03-21	Arrange for zero to be canonical	Mark Rousskov	-1/+15
	We find that it is common for large ranges of chars to be false -- and that means that it is plausibly common for us to ask about a word that is entirely empty. Therefore, we should make sure that we do not need to rotate bits or otherwise perform some operation to map to the zero word; canonicalize it first if possible.
2020-03-21	Push the byte of LAST_CHUNK_MAP into the array	Mark Rousskov	-11/+14
	This optimizes slightly better. Alphabetic : 2536 bytes Case_Ignorable : 1771 bytes Cased : 788 bytes Cc : 24 bytes Grapheme_Extend: 1488 bytes Lowercase : 863 bytes N : 1038 bytes Uppercase : 776 bytes White_Space : 83 bytes Total table sizes: 9367 bytes (-18 bytes; 2 bytes per set)
2020-03-21	Deduplicate test and primary range_search definitions	Mark Rousskov	-55/+53
	This ensures that what we test is what we get for final results as well.
2020-03-21	Add a right shift mapping	Mark Rousskov	-8/+32
	This saves less bytes - by far - and is likely not the best operator to choose. But for now, it works -- a better choice may arise later. Alphabetic : 2538 bytes (- 84 bytes) Case_Ignorable : 1773 bytes (- 30 bytes) Cased : 790 bytes (- 18 bytes) Cc : 26 bytes (- 6 bytes) Grapheme_Extend: 1490 bytes (- 18 bytes) Lowercase : 865 bytes (- 36 bytes) N : 1040 bytes (- 24 bytes) Uppercase : 778 bytes (- 60 bytes) White_Space : 85 bytes (- 6 bytes) Total table sizes: 9385 bytes (-282 bytes)
2020-03-21	Shrink bitset words through functional mapping	Mark Rousskov	-19/+249
	Previously, all words in the (deduplicated) bitset would be stored raw -- a full 64 bits (8 bytes). Now, those words that are equivalent to others through a specific mapping are stored separately and "mapped" to the original when loading; this shrinks the table sizes significantly, as each mapped word is stored in 2 bytes (a 4x decrease from the previous). The new encoding is also potentially non-optimal: the "mapped" byte is frequently repeated, as in practice many mapped words use the same base word. Currently we only support two forms of mapping: rotation and inversion. Note that these are both guaranteed to map transitively if at all, and supporting mappings for which this is not true may require a more interesting algorithm for choosing the optimal pairing. Updated sizes: Alphabetic : 2622 bytes (- 414 bytes) Case_Ignorable : 1803 bytes (- 330 bytes) Cased : 808 bytes (- 126 bytes) Cc : 32 bytes Grapheme_Extend: 1508 bytes (- 252 bytes) Lowercase : 901 bytes (- 84 bytes) N : 1064 bytes (- 156 bytes) Uppercase : 838 bytes (- 96 bytes) White_Space : 91 bytes (- 6 bytes) Total table sizes: 9667 bytes (-1,464 bytes)
2020-03-20	Pre-pop zero chunks before mapping LAST_CHUNK_MAP	Mark Rousskov	-8/+16
	This avoids wasting a small amount of space for some of the data sets. The chunk resizing is caused by but not directly related to changes in this commit. Alphabetic : 3036 bytes Case_Ignorable : 2133 bytes (- 3 bytes) Cased : 934 bytes Cc : 32 bytes Grapheme_Extend: 1760 bytes (-14 bytes) Lowercase : 985 bytes N : 1220 bytes (- 5 bytes) Uppercase : 934 bytes White_Space : 97 bytes Total table sizes: 11131 bytes (-22 bytes)