| Age | Commit message (Collapse) | Author | Lines |
|
|
|
Add Component examples
r? @frewsxcv
|
|
Add missing examples for IpAddr enum
r? @frewsxcv
|
|
Add part of missing UdpSocket's urls and examples
r? @frewsxcv
|
|
Add missing examples for Ipv6Addr
r? @steveklabnik
cc @frewsxcv
|
|
|
|
|
|
Fuchsia support for std::process via liblaunchpad.
Now we can launch processes on Fuchsia via the Rust standard library! ... Mostly.
Right now, ~5% of the time, reading the stdout/stderr off the pipes will fail. Some Magenta kernel people think it's probably a bug in Magenta's pipes. I wrote a unit test that demonstrates the issue in C, which I was told will expedite a fix. https://fuchsia-review.googlesource.com/#/c/15628/
Hopefully this can get merged once the issue is fixed :)
@raphlinus
|
|
|
|
Document that Process::command will search the PATH
|
|
|
|
Add small-copy optimization for copy_from_slice
## Summary
During benchmarking, I found that one of my programs spent between 5 and 10 percent of the time doing memmoves. Ultimately I tracked these down to single-byte slices being copied with a memcopy. Doing a manual copy if the slice contains only one element can speed things up significantly. For my program, this reduced the running time by 20%.
## Background
I am optimizing a program that relies heavily on reading a single byte at a time. To avoid IO overhead, I read all data into a vector once, and then I use a `Cursor` around that vector to read from. During profiling, I noticed that `__memmove_avx_unaligned_erms` was hot, taking up 7.3% of the running time. It turns out that these were caused by calls to `Cursor::read()`, which calls `<&[u8] as Read>::read()`, which calls `&[T]::copy_from_slice()`, which calls `ptr::copy_nonoverlapping()`. This one is implemented as a memcopy. Copying a single byte with a memcopy is very wasteful, because (at least on my platform) it involves calling `memcpy` in libc. This is an indirect call when libc is linked dynamically, and furthermore `memcpy` is optimized for copying large amounts of data at the cost of a bit of overhead for small copies.
## Benchmarks
Before I made this change, `perf` reported the following for my program. I only included the relevant functions, and how they rank. (This is on a different machine than where I ran the original benchmarks. It has an older CPU, so `__memmove_sse2_unaligned_erms` is called instead of `__memmove_avx_unaligned_erms`.)
```
#3 5.47% bench_decode libc-2.24.so [.] __memmove_sse2_unaligned_erms
#5 1.67% bench_decode libc-2.24.so [.] memcpy@GLIBC_2.2.5
#6 1.51% bench_decode bench_decode [.] memcpy@plt
```
`memcpy` is eating up 8.65% of the total running time, and the overhead of dispatching to a specialized fast copy function (`memcpy@GLIBC` showing up) is clearly visible. The price of dynamic linking (`memcpy@plt` showing up) is visible too.
After this change, this is what `perf` reports:
```
#5 0.33% bench_decode libc-2.24.so [.] __memmove_sse2_unaligned_erms
#14 0.01% bench_decode libc-2.24.so [.] memcpy@GLIBC_2.2.5
```
Now only 0.34% of the running time is spent on memcopies. The dynamic linking overhead is not significant at all any more.
To add some more data, my program generates timing results for the operation in its main loop. These are the timings before and after the change:
| Time before | Time after | After/Before |
|---------------|---------------|--------------|
| 29.8 ± 0.8 ns | 23.6 ± 0.5 ns | 0.79 ± 0.03 |
The time is basically the total running time divided by a constant; the actual numbers are not important. This change reduced the total running time by 21% (much more than the original 9% spent on memmoves, likely because the CPU is stalling a lot less because data dependencies are more transparent). Of course YMMV and for most programs this will not matter at all. But when it does, the gains can be significant!
## Alternatives
* At first I implemented this in `io::Cursor`. I moved it to `&[T]::copy_from_slice()` instead, but this might be too intrusive, especially because it applies to all `T`, not just `u8`. To restrict this to `io::Read`, `<&[u8] as Read>::read()` is probably the best place.
* I tried copying bytes in a loop up to 64 or 8 bytes before calling `Read::read`, but both resulted in about a 20% slowdown instead of speedup.
|
|
|
|
|
|
add_creation_flags methods. Fixes #37827
This adds a CommandExt trait for Windows along with an implementation of it
for std::process::Command with methods to set the process creation flags that
are passed to CreateProcess.
|
|
launchpads and close handles in Drop impls rather than manually
|
|
Now that we've got a beta build, let's use it!
|
|
Based on the discussion in https://github.com/rust-lang/rust/pull/37573,
it is likely better to keep this limited to std::io, instead of
modifying a function which users expect to be a memcpy.
|
|
Ultimately copy_from_slice is being a bottleneck, not io::Cursor::read.
It might be worthwhile to move the check here, so more places can
benefit from it.
|
|
During benchmarking, I found that one of my programs spent between 5 and
10 percent of the time doing memmoves. Ultimately I tracked these down
to single-byte slices being copied with a memcopy in io::Cursor::read().
Doing a manual copy if only one byte is requested can speed things up
significantly. For my program, this reduced the running time by 20%.
Why special-case only a single byte, and not a "small" slice in general?
I tried doing this for slices of at most 64 bytes and of at most 8
bytes. In both cases my test program was significantly slower.
|
|
Fixes #26554.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Previously the `LineWriter` could successfully write some bytes but then fail to
report that it has done so. Additionally, an erroneous flush after a successful
write was permanently ignored. This commit fixes these two issues by (a)
maintaining a `need_flush` flag to indicate whether a flush should be the first
operation in `LineWriter::write` and (b) avoiding returning an error once some
bytes have been successfully written.
Closes #37807
|
|
Clearer description of std::path::MAIN_SEPARATOR.
|
|
Use displacement instead of initial bucket in HashMap code
Use displacement instead of initial bucket in HashMap code. It makes the code a bit cleaner and also saves a few instructions (handy since it'll be using some to do some sort of adaptive behavior soon).
|
|
|
|
Add examples for TcpListener struct
r? @frewsxcv
|
|
|
|
Add missing urls and examples to TcpStream
r? @frewsxcv
|
|
|
|
|
|
Document how lock 'guard' structures are created.
|
|
Follow our own recommendations in the examples
Remove exclamation marks from the the example error descriptions:
> The description [...] should not contain newlines or sentence-ending punctuation
|
|
Remove completed FIXME.
https://github.com/rust-lang/rust/issues/30530
|
|
Define `bound` argument in std::sync::mpsc::sync_channel in the documentation
The `bound` argument in `std::sync::mpsc::sync:channel(bound: usize)` was not defined in the documentation.
|
|
Add missing examples to SocketAddrV6
r? @steveklabnik
cc @frewsxcv
|
|
|
|
|
|
Also, end sentence with a period.
|
|
|
|
|
|
|
|
Remove exclamation marks from the the example error descriptions:
> The description [...] should not contain newlines or sentence-ending punctuation
|
|
https://github.com/rust-lang/rust/issues/30530
|
|
The `bound` argument in `std::sync::mpsc::sync:channel(bound: usize)` was not defined in the documentation.
|