about summary refs log tree commit diff
path: root/src/tools/unicode-table-generator
AgeCommit message (Collapse)AuthorLines
2025-08-05fix(unicode-table-generator): fix duplicated unique indicesMarco Cavenati-1/+1
unicode-table-generator panicked while populating distinct_indices because of duplicated indices. This was introduced by swapping the order of canonical_words.push(...) and canonical_words.len().
2025-07-18unicode-table-gen: more clippy fixesMarijn Schouten-8/+8
2025-07-18unicode-table-gen: edition 2024Marijn Schouten-2/+2
2025-07-18unicode-table-gen: clippy fixesMarijn Schouten-35/+28
2025-07-10Remove uncessary parens in closure body with unused lintyukang-1/+1
2025-03-08Remove unneeded parentheses.Markus Reiter-1/+1
2025-03-08Fix formatting.Markus Reiter-34/+7
2025-03-07Use `intrinsics::assume` instead of `hint::assert_unchecked`.Markus Reiter-2/+8
2025-03-07Never inline `lookup_slow`.Markus Reiter-0/+2
2025-03-06Add second precondition for `skip_search`.Markus Reiter-28/+89
2025-03-06Allow optimizing out `panic_bounds_check` in Unicode checks.Markus Reiter-14/+31
2025-02-08Rustfmtbjorn3-21/+27
2024-11-27update cfgsBoxy-5/+0
2024-11-12stabilize const_unicode_case_lookupRalf Jung-0/+5
2024-11-06Auto merge of #132500 - RalfJung:char-is-whitespace-const, r=jhprattbors-1/+1
make char::is_whitespace unstably const I am adding this to the existing https://github.com/rust-lang/rust/issues/132241 feature gate, since `is_digit` and `is_whitespace` seem similar enough that one can group them together.
2024-11-03Rollup merge of #132499 - RalfJung:unicode_data.rs, r=tgross35Matthias Krüger-1/+1
unicode_data.rs: show command for generating file https://github.com/rust-lang/rust/pull/131647 made this an easily runnable tool, now we just have to mention that in the comment. :) Fixes https://github.com/rust-lang/rust/issues/131640.
2024-11-02make char::is_whitespace unstably constRalf Jung-1/+1
2024-11-02unicode_data.rs: show command for generating fileRalf Jung-1/+1
2024-11-02get rid of a whole bunch of unnecessary rustc_const_unstable attributesRalf Jung-6/+0
2024-10-20Rollup merge of #131647 - jieyouxu:unicode-table-generator, r=Mark-SimulacrumMatthias Krüger-5/+3
Register `src/tools/unicode-table-generator` as a runnable tool It seems like `src/tools/unicode-table-generator` is not currently managed by bootstrap. This PR wires it up with bootstrap as a runnable tool. This tool seems to take two possible args: 1. (Mandatory) path to `library/core/src/unicode/unicode_data.rs`, and 2. (Optional) path to generate a test file. I only passed the mandatory path to `unicode_data.rs` in bootstrap and didn't do anything about (2). I'm not sure about how this tool is supposed to be run. `Cargo.lock` is modified because I renamed `unicode-table-generator`'s bin name to match the tool name, as bootstrap's tool running logic expects the bin name to be derived from the tool name. I also added a triagebot message to remind to not manually edit the library source file and edit the tool then regenerate instead, but this should probably be a tidy check (if that's desirable then that can be in a follow-up PR, though may be overkill). Helps with #131640 but does not close it because still no docs. r? `@Mark-Simulacrum` (since I think you authored this tool?)
2024-10-13unicode-table-generator: sync comments许杰友 Jieyou Xu (Joe)-4/+2
These comments were updated on master but not through this tool, so the comments in the tool became outdated. Sync the comments to stay consistent.
2024-10-13unicode-table-generator: match bin name with tool name许杰友 Jieyou Xu (Joe)-1/+1
Bootstrap assumes that the binary name is the same as tool name, just makes everyone's lives easier.
2024-10-13switch unicode-data back to 'static'Ralf Jung-4/+4
2024-09-22Reformat using the new identifier sorting from rustfmtMichael Goulet-29/+23
2024-07-29Reformat `use` declarations.Nicholas Nethercote-11/+15
The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-04-20Add a lower bound check to `unicode-table-generator` outputArpad Borsos-3/+27
This adds a dedicated check for the lower bound (if it is outside of ASCII range) to the output of the `unicode-table-generator` tool. This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now, as that is the only one with a lower bound outside of ASCII.
2023-04-12remove some unneeded importsKaDiWa-2/+0
2023-03-21Use hex literal for INDEX_MASKMartin Gammelsæter-1/+1
2023-03-16Improve case mapping encoding schemeMartin Gammelsæter-49/+54
The indices are encoded as `u32`s in the range of invalid `char`s, so that we know that if any mapping fails to parse as a `char` we should use the value for lookup in the multi-table. This avoids the second binary search in cases where a multi-`char` mapping is needed. Idea from @nikic
2023-03-16Split unicode case LUTs in single and multi variantsMartin Gammelsæter-13/+45
The majority of char case replacements are single char replacements, so storing them as [char; 3] wastes a lot of space. This commit splits the replacement tables for both `to_lower` and `to_upper` into two separate tables, one with single-character mappings and one with multi-character mappings. This reduces the binary size for programs using all of these tables with roughly 24K bytes.
2023-03-15Skip serializing ascii chars in case LUTsMartin Gammelsæter-14/+11
Since ascii chars are already handled by a special case in the `to_lower` and `to_upper` functions, there's no need to waste space on them in the LUTs.
2022-09-04Address feedback from PR #101401Sage Mitchell-4/+8
2022-09-04Make `char::is_lowercase` and `char::is_uppercase` constSage Mitchell-10/+16
Implements #101400.
2022-08-28Auto merge of #100497 - kadiwa4:remove_clone_into_iter, r=cjgillotbors-6/+2
Avoid cloning a collection only to iterate over it `@rustbot` label: +C-cleanup
2022-08-27Rollup merge of #100924 - est31:closure_to_fn_ptr, r=Mark-SimulacrumYuki Okushi-17/+16
Smaller improvements of tidy and the unicode generator
2022-08-23Change hint to correct pathest31-1/+1
2022-08-23Simplify unicode_downloads.rsest31-16/+15
Reduce duplication by moving fetching logic into a dedicated function.
2022-08-13avoid cloning and then iteratingKaDiWa-6/+2
2022-07-20add #inlineBruce A. MacNaughton-0/+1
2022-07-19formattedBruce A. MacNaughton-34/+20
2022-07-19working updatesBruce A. MacNaughton-2/+108
2022-03-10Use implicit capture syntax in format_argsT-O-R-U-S-8/+7
This updates the standard library's documentation to use the new syntax. The documentation is worthwhile to update as it should be more idiomatic (particularly for features like this, which are nice for users to get acquainted with). The general codebase is likely more hassle than benefit to update: it'll hurt git blame, and generally updates can be done by folks updating the code if (and when) that makes things more readable with the new format. A few places in the compiler and library code are updated (mostly just due to already having been done when this commit was first authored).
2021-10-06Let unicode-table-generator fail gracefully for bitsetsJosh Stone-4/+6
The "Alphabetic" property in Unicode 14 grew too big for the bitset representation, panicking "cannot pack 264 into 8 bits". However, we were already choosing the skiplist for that anyway, so this doesn't need to be a hard failure. That panic is now a returned `Err`, and then in `emit_codepoints` we automatically defer to skiplist.
2021-10-06Redo #81358 in unicode-table-generatorJosh Stone-7/+15
2021-09-20Migrate to 2021Mark Rousskov-1/+1
2021-07-29rfc3052: Remove authors field from Cargo manifestsJade-1/+0
Since RFC 3052 soft deprecated the authors field anyway, hiding it from crates.io, docs.rs, and making Cargo not add it by default, and it is not generally up to date/useful information, we should remove it from crates in this repo.
2020-08-24unicode_table_generator: fix clippy::writeln_empty_string, ↵Matthias Krüger-6/+6
clippy::useless_format, clippy:::for_kv_map
2020-08-06Fix typo "biset" -> "bitset"Izzy Swart-1/+1
2020-07-27mv std libs to library/mark-1/+1
2020-06-10Migrate to numeric associated constsLzu Tao-1/+1