about summary refs log tree commit diff
path: root/library/std/src/sys_common/wtf8.rs
AgeCommit message (Collapse)AuthorLines
2025-01-07Avoid naming variables `str`Josh Triplett-2/+2
This renames variables named `str` to other names, to make sure `str` always refers to a type. It's confusing to read code where `str` (or another standard type name) is used as an identifier. It also produces misleading syntax highlighting.
2024-11-12Make `CloneToUninit` dyn-compatibleZachary S-3/+3
2024-09-25Use `&raw` in the standard libraryJosh Stone-2/+1
Since the stabilization in #127679 has reached stage0, 1.82-beta, we can start using `&raw` freely, and even the soft-deprecated `ptr::addr_of!` and `ptr::addr_of_mut!` can stop allowing the unstable feature. I intentionally did not change any documentation or tests, but the rest of those macro uses are all now using `&raw const` or `&raw mut` in the standard library.
2024-09-22Reformat using the new identifier sorting from rustfmtMichael Goulet-1/+1
2024-07-29Sparkle some attributes over `CloneToUninit` stuffPavel Grigorenko-0/+1
2024-07-29impl CloneToUninit for Path and OsStrPavel Grigorenko-0/+11
2024-07-29Reformat `use` declarations.Nicholas Nethercote-5/+1
The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-07-14std: Unsafe-wrap in Wtf8 implJubilee Young-4/+10
2024-06-25set self.is_known_utf8 to false in extend_from_sliceash-1/+1
2024-06-25`PathBuf::as_mut_vec` removed and verified for UEFI and Windows platforms ↵ash-6/+6
#126333
2024-06-12Make PathBuf less Ok with adding UTF-16 then `into_string`Jubilee Young-0/+3
2024-06-04impl OsString::leak & PathBuf::leakschvv31n-0/+5
2024-04-26PathBuf: replace transmuting by accessor functionsRalf Jung-0/+6
2024-01-21Move `OsStr::slice_encoded_bytes` validation to platform modulesJan Verbeek-4/+32
On Windows and UEFI this improves performance and error messaging. On other platforms we optimize the fast path a bit more. This also prepares for later relaxing the checks on certain platforms.
2023-08-14std: add some missing repr(transparent)Ralf Jung-0/+2
2023-07-07Allow limited access to `OsString` bytesEd Page-0/+15
This extends #109698 to allow no-cost conversion between `Vec<u8>` and `OsString` as suggested in feedback from `os_str_bytes` crate in #111544.
2023-06-14Rollup merge of #98202 - aticu:impl_tryfrom_osstr_for_str, r=AmanieuMatthias Krüger-7/+2
Implement `TryFrom<&OsStr>` for `&str` Recently when trying to work with `&OsStr` I was surprised to find this `impl` missing. Since the `to_str` method already existed the actual implementation is fairly non-controversial, except for maybe the choice of the error type. I chose an opaque error here instead of something like `std::str::Utf8Error`, since that would already make a number of assumption about the underlying implementation of `OsStr`. As this is a trait implementation, it is insta-stable, if I'm not mistaken? Either way this will need an FCP. I chose "1.64.0" as the version, since this is unlikely to land before the beta cut-off. `@rustbot` modify labels: +T-libs-api API Change Proposal: rust-lang/rust#99031 (accepted)
2023-06-12Implement `TryFrom<&OsStr>` for `&str`aticu-7/+2
2023-03-27Allow access to `OsStr` bytesEd Page-1/+7
`OsStr` has historically kept its implementation details private out of concern for locking us into a specific encoding on Windows. This is an alternative to #95290 which proposed specifying the encoding on Windows. Instead, this only specifies that for cross-platform code, `OsStr`'s encoding is a superset of UTF-8 and defines rules for safely interacting with it At minimum, this can greatly simplify the `os_str_bytes` crate and every arg parser that interacts with `OsStr` directly (which is most of those that support invalid UTF-8).
2023-05-01Inline AsInner implementationsKonrad Borowski-0/+1
2023-03-03Match unmatched backticks in library/est31-1/+1
2023-01-14Use associated items of `char` instead of freestanding items in `core::char`Lukas Markeffsky-3/+3
2022-08-24Auto merge of #96869 - sunfishcode:main, r=joshtriplettbors-17/+75
Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8. Add a `is_known_utf8` flag to `Wtf8Buf`, which tracks whether the string is known to contain UTF-8. This is efficiently computed in many common situations, such as when a `Wtf8Buf` is constructed from a `String` or `&str`, or with `Wtf8Buf::from_wide` which is already doing UTF-16 decoding and already checking for surrogates. This makes `OsString::into_string` O(1) rather than O(N) on Windows in common cases. And, it eliminates the need to scan through the string for surrogates in `Args::next` and `Vars::next`, because the strings are already being translated with `Wtf8Buf::from_wide`. Many things on Windows construct `OsString`s with `Wtf8Buf::from_wide`, such as `DirEntry::file_name` and `fs::read_link`, so with this patch, users of those functions can subsequently call `.into_string()` without paying for an extra scan through the string for surrogates. r? `@ghost`
2022-08-10Guarantee `try_reserve` preserves the contents on errorYOSHIOKA Takuma-1/+2
Update doc comments to make the guarantee explicit. However, some implementations does not have the statement though. * `HashMap`, `HashSet`: require guarantees on hashbrown side. * `PathBuf`: simply redirecting to `OsString`. Fixes #99606.
2022-06-23Don't eagerly scan for `is_known_utf8` in `to_ascii_lowercase`/`uppercase`.Dan Gohman-8/+2
2022-06-23Panic safety.Dan Gohman-7/+7
2022-06-23Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8.Dan Gohman-17/+81
Add a `is_known_utf8` flag to `Wtf8Buf`, which tracks whether the string is known to contain UTF-8. This is efficiently computed in many common situations, such as when a `Wtf8Buf` is constructed from a `String` or `&str`, or with `Wtf8Buf::from_wide` which is already doing UTF-16 decoding and already checking for surrogates. This makes `OsString::into_string` O(1) rather than O(N) on Windows in common cases. And, it eliminates the need to scan through the string for surrogates in `Args::next` and `Vars::next`, because the strings are already being translated with `Wtf8Buf::from_wide`. Many things on Windows construct `OsString`s with `Wtf8Buf::from_wide`, such as `DirEntry::file_name` and `fs::read_link`, so with this patch, users of those functions can subsequently call `.into_string()` without paying for an extra scan through the string for surrogates.
2022-05-09Use Rust 2021 prelude in std itself.Mara Bos-1/+1
2022-04-25Make EncodeWide implement FusedIteratorAron Parker-1/+4
2022-03-10Use implicit capture syntax in format_argsT-O-R-U-S-1/+1
This updates the standard library's documentation to use the new syntax. The documentation is worthwhile to update as it should be more idiomatic (particularly for features like this, which are nice for users to get acquainted with). The general codebase is likely more hassle than benefit to update: it'll hurt git blame, and generally updates can be done by folks updating the code if (and when) that makes things more readable with the new format. A few places in the compiler and library code are updated (mostly just due to already having been done when this commit was first authored).
2021-12-29Address commentsXuanwo-4/+4
Signed-off-by: Xuanwo <github@xuanwo.io>
2021-12-28Implement support in wtf8Xuanwo-0/+37
Signed-off-by: Xuanwo <github@xuanwo.io>
2021-11-21libcore: assume the input of `next_code_point` and `next_code_point_reverse` ↵Eduardo Sánchez Muñoz-1/+2
is UTF-8-like The functions are now `unsafe` and they use `Option::unwrap_unchecked` instead of `unwrap_or_0` `unwrap_or_0` was added in 42357d772b8a3a1ce4395deeac0a5cf1f66e951d. I guess `unwrap_unchecked` was not available back then. Given this example: ```rust pub fn first_char(s: &str) -> Option<char> { s.chars().next() } ``` Previously, the following assembly was produced: ```asm _ZN7example10first_char17ha056ddea6bafad1cE: .cfi_startproc test rsi, rsi je .LBB0_1 movzx edx, byte ptr [rdi] test dl, dl js .LBB0_3 mov eax, edx ret .LBB0_1: mov eax, 1114112 ret .LBB0_3: lea r8, [rdi + rsi] xor eax, eax mov r9, r8 cmp rsi, 1 je .LBB0_5 movzx eax, byte ptr [rdi + 1] add rdi, 2 and eax, 63 mov r9, rdi .LBB0_5: mov ecx, edx and ecx, 31 cmp dl, -33 jbe .LBB0_6 cmp r9, r8 je .LBB0_9 movzx esi, byte ptr [r9] add r9, 1 and esi, 63 shl eax, 6 or eax, esi cmp dl, -16 jb .LBB0_12 .LBB0_13: cmp r9, r8 je .LBB0_14 movzx edx, byte ptr [r9] and edx, 63 jmp .LBB0_16 .LBB0_6: shl ecx, 6 or eax, ecx ret .LBB0_9: xor esi, esi mov r9, r8 shl eax, 6 or eax, esi cmp dl, -16 jae .LBB0_13 .LBB0_12: shl ecx, 12 or eax, ecx ret .LBB0_14: xor edx, edx .LBB0_16: and ecx, 7 shl ecx, 18 shl eax, 6 or eax, ecx or eax, edx ret ``` After this change, the assembly is reduced to: ```asm _ZN7example10first_char17h4318683472f884ccE: .cfi_startproc test rsi, rsi je .LBB0_1 movzx ecx, byte ptr [rdi] test cl, cl js .LBB0_3 mov eax, ecx ret .LBB0_1: mov eax, 1114112 ret .LBB0_3: mov eax, ecx and eax, 31 movzx esi, byte ptr [rdi + 1] and esi, 63 cmp cl, -33 jbe .LBB0_4 movzx edx, byte ptr [rdi + 2] shl esi, 6 and edx, 63 or edx, esi cmp cl, -16 jb .LBB0_7 movzx ecx, byte ptr [rdi + 3] and eax, 7 shl eax, 18 shl edx, 6 and ecx, 63 or ecx, edx or eax, ecx ret .LBB0_4: shl eax, 6 or eax, esi ret .LBB0_7: shl eax, 12 or eax, edx ret ```
2021-10-22docs: Escape brackets to satisfy the linkcheckerNoah Lev-1/+1
My change to use `Type::def_id()` (formerly `Type::def_id_full()`) in more places caused some docs to show up that used to be missed by rustdoc. Those docs contained unescaped square brackets, which triggered linkcheck errors. This commit escapes the square brackets and adds this particular instance to the linkcheck exception list.
2021-08-22Fix typos “an”→“a” and a few different ones that appeared in the ↵Frank Steffahn-1/+1
same search
2021-06-19Account for self.extra in size_hint for EncodeWideDeadbeef-1/+2
2020-08-31std: move "mod tests/benches" to separate filesLzu Tao-404/+3
Also doing fmt inplace as requested.
2020-07-27mv std libs to library/mark-0/+1285