diff options
| author | Dylan DPC <dylan.dpc@gmail.com> | 2021-02-27 21:56:15 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2021-02-27 21:56:15 +0100 |
| commit | be3d1eb3010b48f5b0512fc83cc029bb321fb3ab (patch) | |
| tree | 6260c72ef61516da8803cb6ca2d1cc56aa34856d /compiler/rustc_parse/src | |
| parent | 94736c434ee154b30e2ec22ec112b79e3f6c5884 (diff) | |
| parent | ed8c68644c9a352f61c3b4591b6fc18653e2ffc2 (diff) | |
| download | rust-be3d1eb3010b48f5b0512fc83cc029bb321fb3ab.tar.gz rust-be3d1eb3010b48f5b0512fc83cc029bb321fb3ab.zip | |
Rollup merge of #81856 - Smittyvb:utf16-warn, r=matthewjasper
Suggest character encoding is incorrect when encountering random null bytes This adds a note whenever null bytes are seen at the start of a token unexpectedly, since those tend to come from UTF-16 encoded files without a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) (if a UTF-16 BOM appears it won't be valid UTF-8, but if there is no BOM it be both valid UTF-16 and valid but garbled UTF-8). This approach was suggested in https://github.com/rust-lang/rust/issues/73979#issuecomment-653976451. Closes #73979.
Diffstat (limited to 'compiler/rustc_parse/src')
| -rw-r--r-- | compiler/rustc_parse/src/lexer/mod.rs | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/compiler/rustc_parse/src/lexer/mod.rs b/compiler/rustc_parse/src/lexer/mod.rs index 4a638ec3f80..4bf870eb7ce 100644 --- a/compiler/rustc_parse/src/lexer/mod.rs +++ b/compiler/rustc_parse/src/lexer/mod.rs @@ -268,6 +268,9 @@ impl<'a> StringReader<'a> { // tokens like `<<` from `rustc_lexer`, and then add fancier error recovery to it, // as there will be less overall work to do this way. let token = unicode_chars::check_for_substitution(self, start, c, &mut err); + if c == '\x00' { + err.help("source files must contain UTF-8 encoded text, unexpected null bytes might occur when a different encoding is used"); + } err.emit(); token? } |
