|
Reading a file into an empty vector or string buffer can incur
unnecessary `read` syscalls and memory re-allocations as the buffer
"warms up" and grows to its final size. This is perhaps a necessary evil
with generic readers, but files can be read in smarter by checking the
file size and reserving that much capacity.
`std::fs::read` and `read_to_string` already perform this optimization:
they open the file, reads its metadata, and call `with_capacity` with
the file size. This ensures that the buffer does not need to be resized
and an initial string of small `read` syscalls.
However, if a user opens the `File` themselves and calls
`file.read_to_end` or `file.read_to_string` they do not get this
optimization.
```rust
let mut buf = Vec::new();
file.read_to_end(&mut buf)?;
```
I searched through this project's codebase and even here are a *lot* of
examples of this. They're found all over in unit tests, which isn't a
big deal, but there are also several real instances in the compiler and
in Cargo. I've documented the ones I found in a comment here:
https://github.com/rust-lang/rust/issues/89516#issuecomment-934423999
Most telling, the `Read` trait and the `read_to_end` method both show
this exact pattern as examples of how to use readers. What this says to
me is that this shouldn't be solved by simply fixing the instances of it
in this codebase. If it's here it's certain to be prevalent in the wider
Rust ecosystem.
To that end, this commit adds specializations of `read_to_end` and
`read_to_string` directly on `File`. This way it's no longer a minor
footgun to start with an empty buffer when reading a file in.
A nice side effect of this change is that code that accesses a `File` as
a bare `Read` constraint or via a `dyn Read` trait object will benefit.
For example, this code from `compiler/rustc_serialize/src/json.rs`:
```rust
pub fn from_reader(rdr: &mut dyn Read) -> Result<Json, BuilderError> {
let mut contents = Vec::new();
match rdr.read_to_end(&mut contents) {
```
Related changes:
- I also added specializations to `BufReader` to delegate to
`self.inner`'s methods. That way it can call `File`'s optimized
implementations if the inner reader is a file.
- The private `std::io::append_to_string` function is now marked
`unsafe`.
- `File::read_to_string` being more efficient means that the performance
note for `io::read_to_string` can be softened. I've added @camelid's
suggested wording from:
https://github.com/rust-lang/rust/issues/80218#issuecomment-936806502
|
|
----------
Fix spacing for links inside code blocks, and improve link tooltips in alloc::fmt
----------
Fix spacing for links inside code blocks, and improve link tooltips in alloc::{rc, sync}
----------
Fix spacing for links inside code blocks, and improve link tooltips in alloc::string
----------
Fix spacing for links inside code blocks in alloc::vec
----------
Fix spacing for links inside code blocks in core::option
----------
Fix spacing for links inside code blocks, and improve a few link tooltips in core::result
----------
Fix spacing for links inside code blocks in core::{iter::{self, iterator}, stream::stream, poll}
----------
Fix spacing for links inside code blocks, and improve a few link tooltips in std::{fs, path}
----------
Fix spacing for links inside code blocks in std::{collections, time}
----------
Fix spacing for links inside code blocks in and make formatting of `&str`-like types consistent in std::ffi::{c_str, os_str}
----------
Fix spacing for links inside code blocks, and improve link tooltips in std::ffi
----------
Fix spacing for links inside code blocks, and improve a few link tooltips
in std::{io::{self, buffered::{bufreader, bufwriter}, cursor, util}, net::{self, addr}}
----------
Fix typo in link to `into` for `OsString` docs
----------
Remove tooltips that will probably become redundant in the future
----------
Apply suggestions from code review
Replacing `…std/primitive.reference.html` paths with just `reference`
Co-authored-by: Joshua Nelson <github@jyn.dev>
----------
Also replace `…std/primitive.reference.html` paths with just `reference` in `core::pin`
|