| Age | Commit message (Collapse) | Author | Lines |
|
This commit primarily adds implementations of the algorithms from William
Clinger's paper "How to Read Floating Point Numbers Accurately". It also
includes a lot of infrastructure necessary for those algorithms, and some
unit tests.
Since these algorithms reject a few (extreme) inputs that were previously
accepted, this could be seen as a [breaking-change]
|
|
- Exposing digits and individual bits
- Counting the number of bits
- Add small (digit-sized) values
- Multiplication by power of 5
- Division with remainder
All are necessary for decimal to floating point conversions.
All but the most trivial ones come with tests.
|
|
This is necessary for decimal-to-float code (in a later commit) to handle
inputs such as 4.9406564584124654e-324 (the smallest subnormal f64).
According to the benchmarks for flt2dec::dragon, this does not
affect performance measurably. It probably uses slightly more stack
space though.
|
|
The intent is to allow decimal-to-float parsing to use Fp in its fast path.
That code is added in a later commit.
|
|
The innermost loop of TwoWaySearcher checks the boundary of the haystack
vs position + needle.len(), and it checks the last byte of the needle
against the byteset.
If these two steps are combined by using the indexing of the last
needle byte's position as bounds check, the algorithm improves its
throughput. We improve the innermost loop by reducing the number of
instructions used, and elminating the panic case for the checked
indexing that was previously used.
Selected benchmarks from the external/workspace testsuite. Benchmarks
improve across the board.
```
before:
test bb_in_aa::twoway_find ... bench: 4,229 ns/iter (+/- 1,305) = 23646 MB/s
test bb_in_aa::twoway_rfind ... bench: 3,873 ns/iter (+/- 101) = 25819 MB/s
test short_1let_long::twoway_find ... bench: 7,075 ns/iter (+/- 29) = 360 MB/s
test short_1let_long::twoway_rfind ... bench: 6,640 ns/iter (+/- 79) = 384 MB/s
test short_2let_long::twoway_find ... bench: 3,823 ns/iter (+/- 16) = 667 MB/s
test short_2let_long::twoway_rfind ... bench: 3,774 ns/iter (+/- 44) = 675 MB/s
test short_3let_long::twoway_find ... bench: 3,582 ns/iter (+/- 47) = 712 MB/s
test short_3let_long::twoway_rfind ... bench: 3,616 ns/iter (+/- 34) = 705 MB/s
with this commit:
test bb_in_aa::twoway_find ... bench: 2,952 ns/iter (+/- 20) = 33875 MB/s
test bb_in_aa::twoway_rfind ... bench: 2,939 ns/iter (+/- 99) = 34025 MB/s
test short_1let_long::twoway_find ... bench: 4,593 ns/iter (+/- 4) = 555 MB/s
test short_1let_long::twoway_rfind ... bench: 4,592 ns/iter (+/- 76) = 555 MB/s
test short_2let_long::twoway_find ... bench: 2,804 ns/iter (+/- 3) = 909 MB/s
test short_2let_long::twoway_rfind ... bench: 2,807 ns/iter (+/- 40) = 908 MB/s
test short_3let_long::twoway_find ... bench: 3,105 ns/iter (+/- 120) = 821 MB/s
test short_3let_long::twoway_rfind ... bench: 3,019 ns/iter (+/- 50) = 844 MB/s
```
- `bb_in_aa`: fast skip due to byteset filter loop improves.
- 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
|
|
This wording was too strong.
Fixes #27523
|
|
This wording was too strong.
Fixes #27523
|
|
We were burying the reason to use this function below a bunch of caveats about
its usage. That's backwards. Why a function should be used belongs at the top of
the docs, not the bottom.
Also, add some extra links to related functions mentioned in the body.
/cc @abhijeetbhagat who pointed this out on IRC
|
|
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of
the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The
`#![no_std]` attribute now injects `extern crate core` at the top of the crate
as well as the libcore prelude into all modules (in the same manner as the
standard library's prelude). The `#![no_core]` attribute disables both std and
core injection.
[rfc]: https://github.com/rust-lang/rfcs/pull/1184
Closes #27394
|
|
We were burying the reason to use this function below a bunch of caveats about
its usage. That's backwards. Why a function should be used belongs at the top of
the docs, not the bottom.
Also, add some extra links to related functions mentioned in the body.
|
|
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of
the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The
`#![no_std]` attribute now injects `extern crate core` at the top of the crate
as well as the libcore prelude into all modules (in the same manner as the
standard library's prelude). The `#![no_core]` attribute disables both std and
core injection.
[rfc]: https://github.com/rust-lang/rfcs/pull/1184
|
|
|
|
|
|
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100
```
test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s
```
|
|
There are still problems in both the design and implementation of this, so we don't want it landing in 1.2.
cc @arielb1 @nikomatsakis
cc #27364
r? @alexcrichton
|
|
The following APIs were all marked with a `#[stable]` tag:
* process::Child::id
* error::Error::is
* error::Error::downcast
* error::Error::downcast_ref
* error::Error::downcast_mut
* io::Error::get_ref
* io::Error::get_mut
* io::Error::into_inner
* hash::Hash::hash_slice
* hash::Hasher::write_{i,u}{8,16,32,64,size}
|
|
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.
The libcore inner module was also trimmed down a bit to the bare bones.
|
|
I’ve been sitting on this one for ages now. Silly me, if only I had got on and submitted it earlier it’d be into the stable release by now…
|
|
I think this was just missed when `Send` and `Sync` were redone, since it seems odd to not be able to use things like `Arc<AtomicPtr>`. If it was intentional feel free to just close this.
I used another test as a template for writing mine, so I hope I got all the headers and stuff right.
|
|
There are multiple issues with them as designed and implemented.
cc #27364
|
|
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.
The libcore inner module was also trimmed down a bit to the bare bones.
|
|
|
|
r? @steveklabnik
|
|
As described in the module documentation, the memory orderings in Rust
are the same with that of LLVM. However, the documentation for the
memory orderings enum says the memory orderings are the same of that of
C++. Note that they differ in that C++'s support the consume reads,
while LLVM's does not. Hence this commit fixes the bug in the
documentation for the enum.
|
|
In spirit with https://internals.rust-lang.org/t/should-we-keep-including-obvious-imports-in-code-examples/2217, show the feature flags we're using in examples.
(also one instance of 'use')
|
|
As described in the module documentation, the memory orderings in Rust
are the same with that of LLVM. However, the documentation for the
memory orderings enum says the memory orderings are the same of that of
C++. Note that they differ in that C++'s support the consume reads,
while LLVM's does not. Hence this commit fixes the bug in the
documentation for the enum.
|
|
Use raw pointers to avoid aliasing violation in split_at_mut
Fixes #27357
|
|
The following APIs were all marked with a `#[stable]` tag:
* process::Child::id
* error::Error::is
* error::Error::downcast
* error::Error::downcast_ref
* error::Error::downcast_mut
* io::Error::get_ref
* io::Error::get_mut
* io::Error::into_inner
* hash::Hash::hash_slice
* hash::Hasher::write_{i,u}{8,16,32,64,size}
|
|
Fixes #27357
|
|
|
|
I had to modify some tests : since `wtf8buf_show` and `wtf8_show` were doing the exact same thing, I repurposed `wtf8_show` to `wtf8buf_show_str` which ensures `Wtf8Buf` `Debug`-formats the same as `str`.
`write_str_escaped` might also be shared amongst other `fmt` but I just left it there within `Wtf8::fmt` for review.
|
|
FreeBSD i386 snapshot is missing, failed tests (possibly spurious).
r? @alexcrichton
|
|
Improve siphash performance for longer data
Use `ptr::copy_nonoverlapping` (aka memcpy) to load an u64 from the
byte stream. This is correct for any alignment, and the compiler will
use the appropriate instruction to load the data.
Also contains small tweaks that should benefit hashing short data too,
both the commit that removes a variable and the autovectorization of
the hash state initialization (in SipHash::reset).
Benchmarks show that hashing longer data benefits for the improved word loading.
Before (using benchmarks from the first commit in the PR):
The before benchmark is a bit noisy.
```
test hash::sip::bench_bytes_4 ... bench: 41 ns/iter (+/- 0) = 97 MB/s
test hash::sip::bench_bytes_7 ... bench: 49 ns/iter (+/- 2) = 142 MB/s
test hash::sip::bench_bytes_8 ... bench: 42 ns/iter (+/- 4) = 190 MB/s
test hash::sip::bench_bytes_a_16 ... bench: 57 ns/iter (+/- 14) = 280 MB/s
test hash::sip::bench_bytes_b_32 ... bench: 85 ns/iter (+/- 74) = 376 MB/s
test hash::sip::bench_bytes_c_128 ... bench: 278 ns/iter (+/- 33) = 460 MB/s
test hash::sip::bench_long_str ... bench: 825 ns/iter (+/- 103)
test hash::sip::bench_str_of_8_bytes ... bench: 151 ns/iter (+/- 66)
test hash::sip::bench_str_over_8_bytes ... bench: 59 ns/iter (+/- 3)
test hash::sip::bench_str_under_8_bytes ... bench: 47 ns/iter (+/- 56)
test hash::sip::bench_u32 ... bench: 39 ns/iter (+/- 93) = 205 MB/s
test hash::sip::bench_u32_keyed ... bench: 40 ns/iter (+/- 88) = 200 MB/s
test hash::sip::bench_u64 ... bench: 54 ns/iter (+/- 96) = 148 MB/s
```
After:
```
test hash::sip::bench_bytes_4 ... bench: 41 ns/iter (+/- 3) = 97 MB/s
test hash::sip::bench_bytes_7 ... bench: 48 ns/iter (+/- 0) = 145 MB/s
test hash::sip::bench_bytes_8 ... bench: 35 ns/iter (+/- 1) = 228 MB/s
test hash::sip::bench_bytes_a_16 ... bench: 45 ns/iter (+/- 1) = 355 MB/s
test hash::sip::bench_bytes_b_32 ... bench: 60 ns/iter (+/- 0) = 533 MB/s
test hash::sip::bench_bytes_c_128 ... bench: 161 ns/iter (+/- 5) = 795 MB/s
test hash::sip::bench_long_str ... bench: 514 ns/iter (+/- 5)
test hash::sip::bench_str_of_8_bytes ... bench: 44 ns/iter (+/- 0)
test hash::sip::bench_str_over_8_bytes ... bench: 51 ns/iter (+/- 0)
test hash::sip::bench_str_under_8_bytes ... bench: 52 ns/iter (+/- 6)
test hash::sip::bench_u32 ... bench: 40 ns/iter (+/- 2) = 200 MB/s
test hash::sip::bench_u32_keyed ... bench: 39 ns/iter (+/- 1) = 205 MB/s
test hash::sip::bench_u64 ... bench: 36 ns/iter (+/- 1) = 222 MB/s
```
|
|
Many of these have long since reached their stage of being obsolete, so this
commit starts the removal process for all of them. The unstable features that
were deprecated are:
* box_heap
* cmp_partial
* fs_time
* hash_default
* int_slice
* iter_min_max
* iter_reset_fuse
* iter_to_vec
* map_in_place
* move_from
* owned_ascii_ext
* page_size
* read_and_zero
* scan_state
* slice_chars
* slice_position_elem
* subslice_offset
|
|
Many of these have long since reached their stage of being obsolete, so this
commit starts the removal process for all of them. The unstable features that
were deprecated are:
* cmp_partial
* fs_time
* hash_default
* int_slice
* iter_min_max
* iter_reset_fuse
* iter_to_vec
* map_in_place
* move_from
* owned_ascii_ext
* page_size
* read_and_zero
* scan_state
* slice_chars
* slice_position_elem
* subslice_offset
|
|
Fixes #27211
Fix Debug for {char, str} in core::fmt
|
|
|
|
|
|
|
|
If they are ordered v0, v2, v1, v3, the compiler can find just a few
simd optimizations itself.
The new optimization I could observe on x86-64 was using 128 bit
registers for the v = key ^ constant operations in new / reset.
|
|
Without this temporary variable, codegen improves slightly and less
registers are spilled to the stack in SipHash::write.
|
|
Use `ptr::copy_nonoverlapping` (aka memcpy) to load an u64 from the
byte stream. This is correct for any alignment, and the compiler will
use the appropriate instruction to load the data.
Use unchecked indexing.
This results in a large improvement of throughput (hashed bytes
/ second) for long data. Maximum improvement benches at a 70% increase
in throughput for large values (> 256 bytes) but already values of 16
bytes or larger improve.
Introducing unchecked indexing is motivated to reach as good throughput
as possible. Using ptr::copy_nonoverlapping without unchecked indexing
would land the improvement some 20-30 pct units lower.
We use a debug assertion so that the test suite checks our use of
unchecked indexing.
|
|
|
|
|
|
FIxes #26927
|
|
Macro desugaring of `in PLACE { BLOCK }` into "simpler" expressions following the in-development "Placer" protocol.
Includes Placer API that one can override to integrate support for `in` into one's own type. (See [RFC 809].)
[RFC 809]: https://github.com/rust-lang/rfcs/blob/master/text/0809-box-and-in-for-stdlib.md
Part of #22181
Replaced PR #26180.
Turns on the `in PLACE { BLOCK }` syntax, while leaving in support for the old `box (PLACE) EXPR` syntax (since we need to support that at least until we have a snapshot with support for `in PLACE { BLOCK }`.
(Note that we are not 100% committed to the `in PLACE { BLOCK }` syntax. In particular I still want to play around with some other alternatives. Still, I want to get the fundamental framework for the protocol landed so we can play with implementing it for non `Box` types.)
----
Also, this PR leaves out support for desugaring-based `box EXPR`. We will hopefully land that in the future, but for the short term there are type-inference issues injected by that change that we want to resolve separately.
|
|
|
|
FIxes #26927
|
|
works.
|
|
|