about summary refs log tree commit diff
path: root/src/libcore/unicode/unicode.py
diff options
context:
space:
mode:
authorMazdak Farrokhzad <twingoow@gmail.com>2019-09-05 12:11:04 +0200
committerGitHub <noreply@github.com>2019-09-05 12:11:04 +0200
commita8d4e4f435a3287989ac1b20f86b87f6fca5e09b (patch)
tree69b3a040b81a76b51ee36ce5000249971020c7c4 /src/libcore/unicode/unicode.py
parenta24f636e60a5da57ab641d800ac5952bbde98b65 (diff)
parent206fe8e1c37d55d0bf3a82baaa23eb5fb148880b (diff)
downloadrust-a8d4e4f435a3287989ac1b20f86b87f6fca5e09b.tar.gz
rust-a8d4e4f435a3287989ac1b20f86b87f6fca5e09b.zip
Rollup merge of #62848 - matklad:xid-unicode, r=petrochenkov
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in #59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to #59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with https://github.com/rust-lang/rust/pull/62641.

cc https://github.com/unicode-rs/unicode-xid/pull/11

</details>
Diffstat (limited to 'src/libcore/unicode/unicode.py')
-rwxr-xr-xsrc/libcore/unicode/unicode.py9
1 files changed, 4 insertions, 5 deletions
diff --git a/src/libcore/unicode/unicode.py b/src/libcore/unicode/unicode.py
index 6de5d9e033b..89894f7932d 100755
--- a/src/libcore/unicode/unicode.py
+++ b/src/libcore/unicode/unicode.py
@@ -728,7 +728,7 @@ def generate_property_module(mod, grouped_categories, category_subset):
 
     yield "pub(crate) mod %s {\n" % mod
     for cat in sorted(category_subset):
-        if cat in ("Cc", "White_Space", "Pattern_White_Space"):
+        if cat in ("Cc", "White_Space"):
             generator = generate_small_bool_trie("%s_table" % cat, grouped_categories[cat])
         else:
             generator = generate_bool_trie("%s_table" % cat, grouped_categories[cat])
@@ -841,19 +841,18 @@ def main():
     unicode_data = load_unicode_data(get_path(UnicodeFiles.UNICODE_DATA))
     load_special_casing(get_path(UnicodeFiles.SPECIAL_CASING), unicode_data)
 
-    want_derived = {"XID_Start", "XID_Continue", "Alphabetic", "Lowercase", "Uppercase",
+    want_derived = {"Alphabetic", "Lowercase", "Uppercase",
                     "Cased", "Case_Ignorable", "Grapheme_Extend"}
     derived = load_properties(get_path(UnicodeFiles.DERIVED_CORE_PROPERTIES), want_derived)
 
     props = load_properties(get_path(UnicodeFiles.PROPS),
-                            {"White_Space", "Join_Control", "Noncharacter_Code_Point",
-                             "Pattern_White_Space"})
+                            {"White_Space", "Join_Control", "Noncharacter_Code_Point"})
 
     # Category tables
     for (name, categories, category_subset) in (
             ("general_category", unicode_data.general_categories, ["N", "Cc"]),
             ("derived_property", derived, want_derived),
-            ("property", props, ["White_Space", "Pattern_White_Space"])
+            ("property", props, ["White_Space"])
     ):
         for fragment in generate_property_module(name, categories, category_subset):
             buf.write(fragment)