| Age | Commit message (Collapse) | Author | Lines |
|
This commit stabilizes and deprecates the FCP (final comment period) APIs for
the upcoming 1.7 beta release. The specific APIs which changed were:
Stabilized
* `Path::strip_prefix` (renamed from `relative_from`)
* `path::StripPrefixError` (new error type returned from `strip_prefix`)
* `Ipv4Addr::is_loopback`
* `Ipv4Addr::is_private`
* `Ipv4Addr::is_link_local`
* `Ipv4Addr::is_multicast`
* `Ipv4Addr::is_broadcast`
* `Ipv4Addr::is_documentation`
* `Ipv6Addr::is_unspecified`
* `Ipv6Addr::is_loopback`
* `Ipv6Addr::is_unique_local`
* `Ipv6Addr::is_multicast`
* `Vec::as_slice`
* `Vec::as_mut_slice`
* `String::as_str`
* `String::as_mut_str`
* `<[T]>::clone_from_slice` - the `usize` return value is removed
* `<[T]>::sort_by_key`
* `i32::checked_rem` (and other signed types)
* `i32::checked_neg` (and other signed types)
* `i32::checked_shl` (and other signed types)
* `i32::checked_shr` (and other signed types)
* `i32::saturating_mul` (and other signed types)
* `i32::overflowing_add` (and other signed types)
* `i32::overflowing_sub` (and other signed types)
* `i32::overflowing_mul` (and other signed types)
* `i32::overflowing_div` (and other signed types)
* `i32::overflowing_rem` (and other signed types)
* `i32::overflowing_neg` (and other signed types)
* `i32::overflowing_shl` (and other signed types)
* `i32::overflowing_shr` (and other signed types)
* `u32::checked_rem` (and other unsigned types)
* `u32::checked_neg` (and other unsigned types)
* `u32::checked_shl` (and other unsigned types)
* `u32::saturating_mul` (and other unsigned types)
* `u32::overflowing_add` (and other unsigned types)
* `u32::overflowing_sub` (and other unsigned types)
* `u32::overflowing_mul` (and other unsigned types)
* `u32::overflowing_div` (and other unsigned types)
* `u32::overflowing_rem` (and other unsigned types)
* `u32::overflowing_neg` (and other unsigned types)
* `u32::overflowing_shl` (and other unsigned types)
* `u32::overflowing_shr` (and other unsigned types)
* `ffi::IntoStringError`
* `CString::into_string`
* `CString::into_bytes`
* `CString::into_bytes_with_nul`
* `From<CString> for Vec<u8>`
* `From<CString> for Vec<u8>`
* `IntoStringError::into_cstring`
* `IntoStringError::utf8_error`
* `Error for IntoStringError`
Deprecated
* `Path::relative_from` - renamed to `strip_prefix`
* `Path::prefix` - use `components().next()` instead
* `os::unix::fs` constants - moved to the `libc` crate
* `fmt::{radix, Radix, RadixFmt}` - not used enough to stabilize
* `IntoCow` - conflicts with `Into` and may come back later
* `i32::{BITS, BYTES}` (and other integers) - not pulling their weight
* `DebugTuple::formatter` - will be removed
* `sync::Semaphore` - not used enough and confused with system semaphores
Closes #23284
cc #27709 (still lots more methods though)
Closes #27712
Closes #27722
Closes #27728
Closes #27735
Closes #27729
Closes #27755
Closes #27782
Closes #27798
|
|
We need to guard that `len` is large enough for the fast skip loop.
|
|
This speeds up the ascii case (and long stretches of ascii in otherwise
mixed UTF-8 data) when checking UTF-8 validity.
Benchmark results suggest that on purely ASCII input, we can improve
throughput (megabytes verified / second) by a factor of 13 to 14!
On xml and mostly english language input (en.wikipedia xml dump),
throughput increases by a factor 7.
On mostly non-ASCII input, performance increases slightly or is the
same.
The UTF-8 validation is rewritten to use indexed access; since all
access is preceded by a (mandatory for validation) length check, they
are statically elided by llvm and this formulation is in fact the best
for performance. A previous version had losses due to slice to iterator
conversions.
A large credit to Björn Steinbrink who improved this patch immensely,
writing this second version.
Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3.
Old code is `regular`, this PR is called `fast`.
Datasets:
- `ascii` is just ascii (2.5 kB)
- `cyr` is cyrillic script with ascii spaces (5 kB)
- `dewik10` is 10MB of a de.wikipedia xml dump
- `enwik10` is 100MB of an en.wikipedia xml dump
- `jawik10` is 10MB of a ja.wikipedia xml dump
```
test from_utf8_ascii_fast ... bench: 140 ns/iter (+/- 4) = 18221 MB/s
test from_utf8_ascii_regular ... bench: 1,932 ns/iter (+/- 19) = 1320 MB/s
test from_utf8_cyr_fast ... bench: 10,025 ns/iter (+/- 245) = 511 MB/s
test from_utf8_cyr_regular ... bench: 12,250 ns/iter (+/- 437) = 418 MB/s
test from_utf8_dewik10_fast ... bench: 6,017,909 ns/iter (+/- 105,755) = 1740 MB/s
test from_utf8_dewik10_regular ... bench: 11,669,493 ns/iter (+/- 264,045) = 891 MB/s
test from_utf8_enwik8_fast ... bench: 14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s
test from_utf8_enwik8_regular ... bench: 93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s
test from_utf8_jawik10_fast ... bench: 29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s
test from_utf8_jawik10_regular ... bench: 29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s
```
Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>
|
|
|
|
This commit is the standard API stabilization commit for the 1.6 release cycle.
The list of issues and APIs below have all been through their cycle-long FCP and
the libs team decisions are listed below
Stabilized APIs
* `Read::read_exact`
* `ErrorKind::UnexpectedEof` (renamed from `UnexpectedEOF`)
* libcore -- this was a bit of a nuanced stabilization, the crate itself is now
marked as `#[stable]` and the methods appearing via traits for primitives like
`char` and `str` are now also marked as stable. Note that the extension traits
themeselves are marked as unstable as they're imported via the prelude. The
`try!` macro was also moved from the standard library into libcore to have the
same interface. Otherwise the functions all have copied stability from the
standard library now.
* The `#![no_std]` attribute
* `fs::DirBuilder`
* `fs::DirBuilder::new`
* `fs::DirBuilder::recursive`
* `fs::DirBuilder::create`
* `os::unix::fs::DirBuilderExt`
* `os::unix::fs::DirBuilderExt::mode`
* `vec::Drain`
* `vec::Vec::drain`
* `string::Drain`
* `string::String::drain`
* `vec_deque::Drain`
* `vec_deque::VecDeque::drain`
* `collections::hash_map::Drain`
* `collections::hash_map::HashMap::drain`
* `collections::hash_set::Drain`
* `collections::hash_set::HashSet::drain`
* `collections::binary_heap::Drain`
* `collections::binary_heap::BinaryHeap::drain`
* `Vec::extend_from_slice` (renamed from `push_all`)
* `Mutex::get_mut`
* `Mutex::into_inner`
* `RwLock::get_mut`
* `RwLock::into_inner`
* `Iterator::min_by_key` (renamed from `min_by`)
* `Iterator::max_by_key` (renamed from `max_by`)
Deprecated APIs
* `ErrorKind::UnexpectedEOF` (renamed to `UnexpectedEof`)
* `OsString::from_bytes`
* `OsStr::to_cstring`
* `OsStr::to_bytes`
* `fs::walk_dir` and `fs::WalkDir`
* `path::Components::peek`
* `slice::bytes::MutableByteVector`
* `slice::bytes::copy_memory`
* `Vec::push_all` (renamed to `extend_from_slice`)
* `Duration::span`
* `IpAddr`
* `SocketAddr::ip`
* `Read::tee`
* `io::Tee`
* `Write::broadcast`
* `io::Broadcast`
* `Iterator::min_by` (renamed to `min_by_key`)
* `Iterator::max_by` (renamed to `max_by_key`)
* `net::lookup_addr`
New APIs (still unstable)
* `<[T]>::sort_by_key` (added to mirror `min_by_key`)
Closes #27585
Closes #27704
Closes #27707
Closes #27710
Closes #27711
Closes #27727
Closes #27740
Closes #27744
Closes #27799
Closes #27801
cc #27801 (doesn't close as `Chars` is still unstable)
Closes #28968
|
|
Part of https://github.com/rust-lang/rust/issues/29935
The deprecation lint is still called "deprecated", so people can continue using `#[allow(deprecated)]` and similar things.
|
|
|
|
|
|
|
|
|
|
|
|
Remove `stable` stability annotations from inherent impls
|
|
This commit stabilizes and deprecates library APIs whose FCP has closed in the
last cycle, specifically:
Stabilized APIs:
* `fs::canonicalize`
* `Path::{metadata, symlink_metadata, canonicalize, read_link, read_dir, exists,
is_file, is_dir}` - all moved to inherent methods from the `PathExt` trait.
* `Formatter::fill`
* `Formatter::width`
* `Formatter::precision`
* `Formatter::sign_plus`
* `Formatter::sign_minus`
* `Formatter::alternate`
* `Formatter::sign_aware_zero_pad`
* `string::ParseError`
* `Utf8Error::valid_up_to`
* `Iterator::{cmp, partial_cmp, eq, ne, lt, le, gt, ge}`
* `<[T]>::split_{first,last}{,_mut}`
* `Condvar::wait_timeout` - note that `wait_timeout_ms` is not yet deprecated
but will be once 1.5 is released.
* `str::{R,}MatchIndices`
* `str::{r,}match_indices`
* `char::from_u32_unchecked`
* `VecDeque::insert`
* `VecDeque::shrink_to_fit`
* `VecDeque::as_slices`
* `VecDeque::as_mut_slices`
* `VecDeque::swap_remove_front` - (renamed from `swap_front_remove`)
* `VecDeque::swap_remove_back` - (renamed from `swap_back_remove`)
* `Vec::resize`
* `str::slice_mut_unchecked`
* `FileTypeExt`
* `FileTypeExt::{is_block_device, is_char_device, is_fifo, is_socket}`
* `BinaryHeap::from` - `from_vec` deprecated in favor of this
* `BinaryHeap::into_vec` - plus a `Into` impl
* `BinaryHeap::into_sorted_vec`
Deprecated APIs
* `slice::ref_slice`
* `slice::mut_ref_slice`
* `iter::{range_inclusive, RangeInclusive}`
* `std::dynamic_lib`
Closes #27706
Closes #27725
cc #27726 (align not stabilized yet)
Closes #27734
Closes #27737
Closes #27742
Closes #27743
Closes #27772
Closes #27774
Closes #27777
Closes #27781
cc #27788 (a few remaining methods though)
Closes #27790
Closes #27793
Closes #27796
Closes #27810
cc #28147 (not all parts stabilized)
|
|
This commit stabilizes and deprecates library APIs whose FCP has closed in the
last cycle, specifically:
Stabilized APIs:
* `fs::canonicalize`
* `Path::{metadata, symlink_metadata, canonicalize, read_link, read_dir, exists,
is_file, is_dir}` - all moved to inherent methods from the `PathExt` trait.
* `Formatter::fill`
* `Formatter::width`
* `Formatter::precision`
* `Formatter::sign_plus`
* `Formatter::sign_minus`
* `Formatter::alternate`
* `Formatter::sign_aware_zero_pad`
* `string::ParseError`
* `Utf8Error::valid_up_to`
* `Iterator::{cmp, partial_cmp, eq, ne, lt, le, gt, ge}`
* `<[T]>::split_{first,last}{,_mut}`
* `Condvar::wait_timeout` - note that `wait_timeout_ms` is not yet deprecated
but will be once 1.5 is released.
* `str::{R,}MatchIndices`
* `str::{r,}match_indices`
* `char::from_u32_unchecked`
* `VecDeque::insert`
* `VecDeque::shrink_to_fit`
* `VecDeque::as_slices`
* `VecDeque::as_mut_slices`
* `VecDeque::swap_remove_front` - (renamed from `swap_front_remove`)
* `VecDeque::swap_remove_back` - (renamed from `swap_back_remove`)
* `Vec::resize`
* `str::slice_mut_unchecked`
* `FileTypeExt`
* `FileTypeExt::{is_block_device, is_char_device, is_fifo, is_socket}`
* `BinaryHeap::from` - `from_vec` deprecated in favor of this
* `BinaryHeap::into_vec` - plus a `Into` impl
* `BinaryHeap::into_sorted_vec`
Deprecated APIs
* `slice::ref_slice`
* `slice::mut_ref_slice`
* `iter::{range_inclusive, RangeInclusive}`
* `std::dynamic_lib`
Closes #27706
Closes #27725
cc #27726 (align not stabilized yet)
Closes #27734
Closes #27737
Closes #27742
Closes #27743
Closes #27772
Closes #27774
Closes #27777
Closes #27781
cc #27788 (a few remaining methods though)
Closes #27790
Closes #27793
Closes #27796
Closes #27810
cc #28147 (not all parts stabilized)
|
|
Follow https://doc.rust-lang.org/book/documentation.html#special-sections
|
|
|
|
|
|
|
|
Our docs were very basic for the various versions of from_utf8, so
this commit beefs them up.
It also improves docs for the &str variant's error, Utf8Error.
|
|
@marchelzo pointed out on IRC that this doesn't have docs, so, let's
change that.
|
|
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).
cc #27743
|
|
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).
cc #27743
|
|
|
|
The FCP is coming to a close and 1.4 is coming out soon, so this brings in the
libs team decision for all library features this cycle.
Stabilized APIs:
* `<Box<str>>::into_string`
* `Arc::downgrade`
* `Arc::get_mut`
* `Arc::make_mut`
* `Arc::try_unwrap`
* `Box::from_raw`
* `Box::into_raw`
* `CStr::to_str`
* `CStr::to_string_lossy`
* `CString::from_raw`
* `CString::into_raw`
* `IntoRawFd::into_raw_fd`
* `IntoRawFd`
* `IntoRawHandle::into_raw_handle`
* `IntoRawHandle`
* `IntoRawSocket::into_raw_socket`
* `IntoRawSocket`
* `Rc::downgrade`
* `Rc::get_mut`
* `Rc::make_mut`
* `Rc::try_unwrap`
* `Result::expect`
* `String::into_boxed_slice`
* `TcpSocket::read_timeout`
* `TcpSocket::set_read_timeout`
* `TcpSocket::set_write_timeout`
* `TcpSocket::write_timeout`
* `UdpSocket::read_timeout`
* `UdpSocket::set_read_timeout`
* `UdpSocket::set_write_timeout`
* `UdpSocket::write_timeout`
* `Vec::append`
* `Vec::split_off`
* `VecDeque::append`
* `VecDeque::retain`
* `VecDeque::split_off`
* `rc::Weak::upgrade`
* `rc::Weak`
* `slice::Iter::as_slice`
* `slice::IterMut::into_slice`
* `str::CharIndices::as_str`
* `str::Chars::as_str`
* `str::split_at_mut`
* `str::split_at`
* `sync::Weak::upgrade`
* `sync::Weak`
* `thread::park_timeout`
* `thread::sleep`
Deprecated APIs
* `BTreeMap::with_b`
* `BTreeSet::with_b`
* `Option::as_mut_slice`
* `Option::as_slice`
* `Result::as_mut_slice`
* `Result::as_slice`
* `f32::from_str_radix`
* `f64::from_str_radix`
Closes #27277
Closes #27718
Closes #27736
Closes #27764
Closes #27765
Closes #27766
Closes #27767
Closes #27768
Closes #27769
Closes #27771
Closes #27773
Closes #27775
Closes #27776
Closes #27785
Closes #27792
Closes #27795
Closes #27797
|
|
The FCP is coming to a close and 1.4 is coming out soon, so this brings in the
libs team decision for all library features this cycle.
Stabilized APIs:
* `<Box<str>>::into_string`
* `Arc::downgrade`
* `Arc::get_mut`
* `Arc::make_mut`
* `Arc::try_unwrap`
* `Box::from_raw`
* `Box::into_raw`
* `CStr::to_str`
* `CStr::to_string_lossy`
* `CString::from_raw`
* `CString::into_raw`
* `IntoRawFd::into_raw_fd`
* `IntoRawFd`
* `IntoRawHandle::into_raw_handle`
* `IntoRawHandle`
* `IntoRawSocket::into_raw_socket`
* `IntoRawSocket`
* `Rc::downgrade`
* `Rc::get_mut`
* `Rc::make_mut`
* `Rc::try_unwrap`
* `Result::expect`
* `String::into_boxed_slice`
* `TcpSocket::read_timeout`
* `TcpSocket::set_read_timeout`
* `TcpSocket::set_write_timeout`
* `TcpSocket::write_timeout`
* `UdpSocket::read_timeout`
* `UdpSocket::set_read_timeout`
* `UdpSocket::set_write_timeout`
* `UdpSocket::write_timeout`
* `Vec::append`
* `Vec::split_off`
* `VecDeque::append`
* `VecDeque::retain`
* `VecDeque::split_off`
* `rc::Weak::upgrade`
* `rc::Weak`
* `slice::Iter::as_slice`
* `slice::IterMut::into_slice`
* `str::CharIndices::as_str`
* `str::Chars::as_str`
* `str::split_at_mut`
* `str::split_at`
* `sync::Weak::upgrade`
* `sync::Weak`
* `thread::park_timeout`
* `thread::sleep`
Deprecated APIs
* `BTreeMap::with_b`
* `BTreeSet::with_b`
* `Option::as_mut_slice`
* `Option::as_slice`
* `Result::as_mut_slice`
* `Result::as_slice`
* `f32::from_str_radix`
* `f64::from_str_radix`
Closes #27277
Closes #27718
Closes #27736
Closes #27764
Closes #27765
Closes #27766
Closes #27767
Closes #27768
Closes #27769
Closes #27771
Closes #27773
Closes #27775
Closes #27776
Closes #27785
Closes #27792
Closes #27795
Closes #27797
|
|
llvm seems to be having some trouble optimizing the iterator-based
string comparsion method into some equivalent to memcmp. This
explicitly calls out to the memcmp intrinisic in order to allow
llvm to generate better code. In some manual benchmarking, this
memcmp-based approach is 20 times faster than the iterator approach.
|
|
|
|
|
|
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of
the `str::lines` and `BufRead::lines` iterators. Both iterators now account for
`\r\n` sequences in addition to `\n`, allowing for less surprising behavior
across platforms (especially in the `BufRead` case). Splitting *only* on the
`\n` character can still be achieved with `split('\n')` in both cases.
The `str::lines_any` function is also now deprecated as `str::lines` is a
drop-in replacement for it.
[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md
Closes #28032
|
|
|
|
|
|
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.
Partially closes #24214.
|
|
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.
Partially closes #24214.
|
|
|
|
StrSearcher: Implement the complete reverse case for the two way algorithm
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100
```
test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s
```
|
|
Break out a separate static method to create the "byteset".
|
|
|
|
|
|
|
|
The replacements are functions that usually use a single `mem::transmute` in
their body and restrict input and output via more concrete types than `T` and
`U`. Worth noting are the `transmute` functions for slices and the `from_utf8*`
family for mutable slices. Additionally, `mem::transmute` was often used for
casting raw pointers, when you can already cast raw pointers just fine with
`as`.
|
|
The innermost loop of TwoWaySearcher checks the boundary of the haystack
vs position + needle.len(), and it checks the last byte of the needle
against the byteset.
If these two steps are combined by using the indexing of the last
needle byte's position as bounds check, the algorithm improves its
throughput. We improve the innermost loop by reducing the number of
instructions used, and elminating the panic case for the checked
indexing that was previously used.
Selected benchmarks from the external/workspace testsuite. Benchmarks
improve across the board.
```
before:
test bb_in_aa::twoway_find ... bench: 4,229 ns/iter (+/- 1,305) = 23646 MB/s
test bb_in_aa::twoway_rfind ... bench: 3,873 ns/iter (+/- 101) = 25819 MB/s
test short_1let_long::twoway_find ... bench: 7,075 ns/iter (+/- 29) = 360 MB/s
test short_1let_long::twoway_rfind ... bench: 6,640 ns/iter (+/- 79) = 384 MB/s
test short_2let_long::twoway_find ... bench: 3,823 ns/iter (+/- 16) = 667 MB/s
test short_2let_long::twoway_rfind ... bench: 3,774 ns/iter (+/- 44) = 675 MB/s
test short_3let_long::twoway_find ... bench: 3,582 ns/iter (+/- 47) = 712 MB/s
test short_3let_long::twoway_rfind ... bench: 3,616 ns/iter (+/- 34) = 705 MB/s
with this commit:
test bb_in_aa::twoway_find ... bench: 2,952 ns/iter (+/- 20) = 33875 MB/s
test bb_in_aa::twoway_rfind ... bench: 2,939 ns/iter (+/- 99) = 34025 MB/s
test short_1let_long::twoway_find ... bench: 4,593 ns/iter (+/- 4) = 555 MB/s
test short_1let_long::twoway_rfind ... bench: 4,592 ns/iter (+/- 76) = 555 MB/s
test short_2let_long::twoway_find ... bench: 2,804 ns/iter (+/- 3) = 909 MB/s
test short_2let_long::twoway_rfind ... bench: 2,807 ns/iter (+/- 40) = 908 MB/s
test short_3let_long::twoway_find ... bench: 3,105 ns/iter (+/- 120) = 821 MB/s
test short_3let_long::twoway_rfind ... bench: 3,019 ns/iter (+/- 50) = 844 MB/s
```
- `bb_in_aa`: fast skip due to byteset filter loop improves.
- 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
|
|
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of
the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The
`#![no_std]` attribute now injects `extern crate core` at the top of the crate
as well as the libcore prelude into all modules (in the same manner as the
standard library's prelude). The `#![no_core]` attribute disables both std and
core injection.
[rfc]: https://github.com/rust-lang/rfcs/pull/1184
|
|
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100
```
test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s
```
|
|
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.
The libcore inner module was also trimmed down a bit to the bare bones.
|
|
eq_slice_() used to be a common implementation for two function that
both called it, but of those only eq_slice() is left, so we can as well
directly inline the code.
|
|
This makes the primitive descriptions on the front page read properly
as descriptions of types and not of the associated modules.
Having the primitive and module docs derived from the same source
causes problems, primarily that they can't contain hyperlinks
cross-referencing each other.
This crates dedicated private modules in `std` to document the
primitive types, then for all primitives that have a corresponding
module, puts hyperlinks in moth the primitive docs and the module docs
cross-linking each other.
This should help clear up confusion when readers find themselves on
the wrong page.
This also removes all the duplicate `#[doc(primitive)]` tags in various places (especially core), so the core docs will no longer attempt to document the primitives for now. Seems like an acceptable tradeoff to get some cleanup for std.
|
|
Having the primitive and module docs derived from the same source
causes problems, primarily that they can't contain hyperlinks
cross-referencing each other.
This crates dedicated private modules in `std` to document the
primitive types, then for all primitives that have a corresponding
module, puts hyperlinks in moth the primitive docs and the module docs
cross-linking each other.
This should help clear up confusion when readers find themselves on
the wrong page.
|
|
|
|
|
|
... matching the existing Index impls.
There is no reason not to if String implement DerefMut.
The code removed in `src/librustc/middle/effect.rs` was added in #9750
to prevent things like `s[0] = 0x80` where `s: String`,
but I belive became unnecessary when the Index(Mut) traits were introduced.
|