about summary refs log tree commit diff
path: root/library/std/src/sys_common/wtf8.rs
AgeCommit message (Collapse)AuthorLines
2025-08-20Copy WTF-8 code into core/alloc (for better diffs)ltdk-1083/+0
2025-05-24make `OsString::new` and `PathBuf::new` unstably constcyrgani-1/+1
2025-02-19Skip scanning for surrogates when not known validThalia Archibald-1/+1
2025-02-19Add fast path for displaying pre-validated Wtf8BufThalia Archibald-0/+24
2025-02-19Rollup merge of #137155 - thaliaarchi:wtf8-organize, r=ChrisDentonMatthias Krüger-35/+35
Organize `OsString`/`OsStr` shims Synchronize the `bytes.rs` and `wtf8.rs` shims for `OsString`/`OsStr` so they're easier to diff between each other. This is mostly ordering items the same between the two. I tried to minimize moves and went for the average locations between the files. With them in the same order, it is clear that `FromInner<_>` is not implemented for `bytes::Buf` and `Clone::clone_from` is not implemented for `wtf8::Buf`, but they are for the other. Fix that. I added #[inline] to all inherent methods of the `OsString`/`OsStr` shims, because it seemed that was already the rough pattern. `bytes.rs` has more inlining than `wtf8.rs`, so I added the corresponding ones to `wtf8.rs`. Then, the common missing ones have no discernible pattern to me. They're not divided by non-allocating/allocating. Perhaps the pattern is that UTF-8 validation isn't inlined? Since these types are merely the inner values in `OsStr`/`OsString`, I put inline on all methods and let those public types dictate inlining. I have not inspected codegen or run benchmarks. Also, touch up some (private) documentation comments. r? ``````@ChrisDenton``````
2025-02-16Simplify control flow with while-letThalia Archibald-22/+14
2025-02-16Improve WTF-8 commentsThalia Archibald-15/+23
2025-02-16add MAX_LEN_UTF8 and MAX_LEN_UTF16 constantsHTGAzureX1212-3/+3
2025-01-07Avoid naming variables `str`Josh Triplett-2/+2
This renames variables named `str` to other names, to make sure `str` always refers to a type. It's confusing to read code where `str` (or another standard type name) is used as an identifier. It also produces misleading syntax highlighting.
2024-11-12Make `CloneToUninit` dyn-compatibleZachary S-3/+3
2024-09-25Use `&raw` in the standard libraryJosh Stone-2/+1
Since the stabilization in #127679 has reached stage0, 1.82-beta, we can start using `&raw` freely, and even the soft-deprecated `ptr::addr_of!` and `ptr::addr_of_mut!` can stop allowing the unstable feature. I intentionally did not change any documentation or tests, but the rest of those macro uses are all now using `&raw const` or `&raw mut` in the standard library.
2024-09-22Reformat using the new identifier sorting from rustfmtMichael Goulet-1/+1
2024-07-29Sparkle some attributes over `CloneToUninit` stuffPavel Grigorenko-0/+1
2024-07-29impl CloneToUninit for Path and OsStrPavel Grigorenko-0/+11
2024-07-29Reformat `use` declarations.Nicholas Nethercote-5/+1
The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-07-14std: Unsafe-wrap in Wtf8 implJubilee Young-4/+10
2024-06-25set self.is_known_utf8 to false in extend_from_sliceash-1/+1
2024-06-25`PathBuf::as_mut_vec` removed and verified for UEFI and Windows platforms ↵ash-6/+6
#126333
2024-06-12Make PathBuf less Ok with adding UTF-16 then `into_string`Jubilee Young-0/+3
2024-06-04impl OsString::leak & PathBuf::leakschvv31n-0/+5
2024-04-26PathBuf: replace transmuting by accessor functionsRalf Jung-0/+6
2024-01-21Move `OsStr::slice_encoded_bytes` validation to platform modulesJan Verbeek-4/+32
On Windows and UEFI this improves performance and error messaging. On other platforms we optimize the fast path a bit more. This also prepares for later relaxing the checks on certain platforms.
2023-08-14std: add some missing repr(transparent)Ralf Jung-0/+2
2023-07-07Allow limited access to `OsString` bytesEd Page-0/+15
This extends #109698 to allow no-cost conversion between `Vec<u8>` and `OsString` as suggested in feedback from `os_str_bytes` crate in #111544.
2023-06-14Rollup merge of #98202 - aticu:impl_tryfrom_osstr_for_str, r=AmanieuMatthias Krüger-7/+2
Implement `TryFrom<&OsStr>` for `&str` Recently when trying to work with `&OsStr` I was surprised to find this `impl` missing. Since the `to_str` method already existed the actual implementation is fairly non-controversial, except for maybe the choice of the error type. I chose an opaque error here instead of something like `std::str::Utf8Error`, since that would already make a number of assumption about the underlying implementation of `OsStr`. As this is a trait implementation, it is insta-stable, if I'm not mistaken? Either way this will need an FCP. I chose "1.64.0" as the version, since this is unlikely to land before the beta cut-off. `@rustbot` modify labels: +T-libs-api API Change Proposal: rust-lang/rust#99031 (accepted)
2023-06-12Implement `TryFrom<&OsStr>` for `&str`aticu-7/+2
2023-03-27Allow access to `OsStr` bytesEd Page-1/+7
`OsStr` has historically kept its implementation details private out of concern for locking us into a specific encoding on Windows. This is an alternative to #95290 which proposed specifying the encoding on Windows. Instead, this only specifies that for cross-platform code, `OsStr`'s encoding is a superset of UTF-8 and defines rules for safely interacting with it At minimum, this can greatly simplify the `os_str_bytes` crate and every arg parser that interacts with `OsStr` directly (which is most of those that support invalid UTF-8).
2023-05-01Inline AsInner implementationsKonrad Borowski-0/+1
2023-03-03Match unmatched backticks in library/est31-1/+1
2023-01-14Use associated items of `char` instead of freestanding items in `core::char`Lukas Markeffsky-3/+3
2022-08-24Auto merge of #96869 - sunfishcode:main, r=joshtriplettbors-17/+75
Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8. Add a `is_known_utf8` flag to `Wtf8Buf`, which tracks whether the string is known to contain UTF-8. This is efficiently computed in many common situations, such as when a `Wtf8Buf` is constructed from a `String` or `&str`, or with `Wtf8Buf::from_wide` which is already doing UTF-16 decoding and already checking for surrogates. This makes `OsString::into_string` O(1) rather than O(N) on Windows in common cases. And, it eliminates the need to scan through the string for surrogates in `Args::next` and `Vars::next`, because the strings are already being translated with `Wtf8Buf::from_wide`. Many things on Windows construct `OsString`s with `Wtf8Buf::from_wide`, such as `DirEntry::file_name` and `fs::read_link`, so with this patch, users of those functions can subsequently call `.into_string()` without paying for an extra scan through the string for surrogates. r? `@ghost`
2022-08-10Guarantee `try_reserve` preserves the contents on errorYOSHIOKA Takuma-1/+2
Update doc comments to make the guarantee explicit. However, some implementations does not have the statement though. * `HashMap`, `HashSet`: require guarantees on hashbrown side. * `PathBuf`: simply redirecting to `OsString`. Fixes #99606.
2022-06-23Don't eagerly scan for `is_known_utf8` in `to_ascii_lowercase`/`uppercase`.Dan Gohman-8/+2
2022-06-23Panic safety.Dan Gohman-7/+7
2022-06-23Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8.Dan Gohman-17/+81
Add a `is_known_utf8` flag to `Wtf8Buf`, which tracks whether the string is known to contain UTF-8. This is efficiently computed in many common situations, such as when a `Wtf8Buf` is constructed from a `String` or `&str`, or with `Wtf8Buf::from_wide` which is already doing UTF-16 decoding and already checking for surrogates. This makes `OsString::into_string` O(1) rather than O(N) on Windows in common cases. And, it eliminates the need to scan through the string for surrogates in `Args::next` and `Vars::next`, because the strings are already being translated with `Wtf8Buf::from_wide`. Many things on Windows construct `OsString`s with `Wtf8Buf::from_wide`, such as `DirEntry::file_name` and `fs::read_link`, so with this patch, users of those functions can subsequently call `.into_string()` without paying for an extra scan through the string for surrogates.
2022-05-09Use Rust 2021 prelude in std itself.Mara Bos-1/+1
2022-04-25Make EncodeWide implement FusedIteratorAron Parker-1/+4
2022-03-10Use implicit capture syntax in format_argsT-O-R-U-S-1/+1
This updates the standard library's documentation to use the new syntax. The documentation is worthwhile to update as it should be more idiomatic (particularly for features like this, which are nice for users to get acquainted with). The general codebase is likely more hassle than benefit to update: it'll hurt git blame, and generally updates can be done by folks updating the code if (and when) that makes things more readable with the new format. A few places in the compiler and library code are updated (mostly just due to already having been done when this commit was first authored).
2021-12-29Address commentsXuanwo-4/+4
Signed-off-by: Xuanwo <github@xuanwo.io>
2021-12-28Implement support in wtf8Xuanwo-0/+37
Signed-off-by: Xuanwo <github@xuanwo.io>
2021-11-21libcore: assume the input of `next_code_point` and `next_code_point_reverse` ↵Eduardo Sánchez Muñoz-1/+2
is UTF-8-like The functions are now `unsafe` and they use `Option::unwrap_unchecked` instead of `unwrap_or_0` `unwrap_or_0` was added in 42357d772b8a3a1ce4395deeac0a5cf1f66e951d. I guess `unwrap_unchecked` was not available back then. Given this example: ```rust pub fn first_char(s: &str) -> Option<char> { s.chars().next() } ``` Previously, the following assembly was produced: ```asm _ZN7example10first_char17ha056ddea6bafad1cE: .cfi_startproc test rsi, rsi je .LBB0_1 movzx edx, byte ptr [rdi] test dl, dl js .LBB0_3 mov eax, edx ret .LBB0_1: mov eax, 1114112 ret .LBB0_3: lea r8, [rdi + rsi] xor eax, eax mov r9, r8 cmp rsi, 1 je .LBB0_5 movzx eax, byte ptr [rdi + 1] add rdi, 2 and eax, 63 mov r9, rdi .LBB0_5: mov ecx, edx and ecx, 31 cmp dl, -33 jbe .LBB0_6 cmp r9, r8 je .LBB0_9 movzx esi, byte ptr [r9] add r9, 1 and esi, 63 shl eax, 6 or eax, esi cmp dl, -16 jb .LBB0_12 .LBB0_13: cmp r9, r8 je .LBB0_14 movzx edx, byte ptr [r9] and edx, 63 jmp .LBB0_16 .LBB0_6: shl ecx, 6 or eax, ecx ret .LBB0_9: xor esi, esi mov r9, r8 shl eax, 6 or eax, esi cmp dl, -16 jae .LBB0_13 .LBB0_12: shl ecx, 12 or eax, ecx ret .LBB0_14: xor edx, edx .LBB0_16: and ecx, 7 shl ecx, 18 shl eax, 6 or eax, ecx or eax, edx ret ``` After this change, the assembly is reduced to: ```asm _ZN7example10first_char17h4318683472f884ccE: .cfi_startproc test rsi, rsi je .LBB0_1 movzx ecx, byte ptr [rdi] test cl, cl js .LBB0_3 mov eax, ecx ret .LBB0_1: mov eax, 1114112 ret .LBB0_3: mov eax, ecx and eax, 31 movzx esi, byte ptr [rdi + 1] and esi, 63 cmp cl, -33 jbe .LBB0_4 movzx edx, byte ptr [rdi + 2] shl esi, 6 and edx, 63 or edx, esi cmp cl, -16 jb .LBB0_7 movzx ecx, byte ptr [rdi + 3] and eax, 7 shl eax, 18 shl edx, 6 and ecx, 63 or ecx, edx or eax, ecx ret .LBB0_4: shl eax, 6 or eax, esi ret .LBB0_7: shl eax, 12 or eax, edx ret ```
2021-10-22docs: Escape brackets to satisfy the linkcheckerNoah Lev-1/+1
My change to use `Type::def_id()` (formerly `Type::def_id_full()`) in more places caused some docs to show up that used to be missed by rustdoc. Those docs contained unescaped square brackets, which triggered linkcheck errors. This commit escapes the square brackets and adds this particular instance to the linkcheck exception list.
2021-08-22Fix typos “an”→“a” and a few different ones that appeared in the ↵Frank Steffahn-1/+1
same search
2021-06-19Account for self.extra in size_hint for EncodeWideDeadbeef-1/+2
2020-08-31std: move "mod tests/benches" to separate filesLzu Tao-404/+3
Also doing fmt inplace as requested.
2020-07-27mv std libs to library/mark-0/+1285