about summary refs log tree commit diff
path: root/compiler/rustc_mir_transform/src/coverage/mod.rs
diff options
context:
space:
mode:
authorAli MJ Al-Nasrawy <alimjalnasrawy@gmail.com>2023-10-11 03:53:16 +0300
committerGitHub <noreply@github.com>2023-10-11 03:53:16 +0300
commit38654ad74198757890bd4679f48b2318b0e84194 (patch)
tree3e17f36b1be26bf18d1cc674a06e119589485696 /compiler/rustc_mir_transform/src/coverage/mod.rs
parentd627cf07ce46d230a93732a4714d16f00df9466b (diff)
parent5facc32e22e8843a8c276305fff4ec84d718e1c0 (diff)
downloadrust-38654ad74198757890bd4679f48b2318b0e84194.tar.gz
rust-38654ad74198757890bd4679f48b2318b0e84194.zip
Rollup merge of #95967 - CAD97:from-utf16, r=dtolnay
Add explicit-endian String::from_utf16 variants

This adds the following APIs under `feature(str_from_utf16_endian)`:

```rust
impl String {
    pub fn from_utf16le(v: &[u8]) -> Result<String, FromUtf16Error>;
    pub fn from_utf16le_lossy(v: &[u8]) -> String;
    pub fn from_utf16be(v: &[u8]) -> Result<String, FromUtf16Error>;
    pub fn from_utf16be_lossy(v: &[u8]) -> String;
}
```

These are versions of `String::from_utf16` that explicitly take [UTF-16LE and UTF-16BE](https://unicode.org/faq/utf_bom.html#gen7). Notably, we can do better than just the obvious `decode_utf16(v.array_chunks::<2>().copied().map(u16::from_le_bytes)).collect()` in that:

- We handle the case where the byte slice is not an even number of bytes, and
- In the case that the UTF-16 is native endian and the slice is aligned, we can forward to `String::from_utf16`.

If the Unicode Consortium actively defines how to handle character replacement when decoding a UTF-16 bytestream with a trailing odd byte, I was unable to find reference. However, the behavior implemented here is fairly self-evidently correct: replace the single errant byte with the replacement character.
Diffstat (limited to 'compiler/rustc_mir_transform/src/coverage/mod.rs')
0 files changed, 0 insertions, 0 deletions