| Age | Commit message (Collapse) | Author | Lines |
|
llvm seems to be having some trouble optimizing the iterator-based
string comparsion method into some equivalent to memcmp. This
explicitly calls out to the memcmp intrinisic in order to allow
llvm to generate better code. In some manual benchmarking, this
memcmp-based approach is 20 times faster than the iterator approach.
|
|
|
|
|
|
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of
the `str::lines` and `BufRead::lines` iterators. Both iterators now account for
`\r\n` sequences in addition to `\n`, allowing for less surprising behavior
across platforms (especially in the `BufRead` case). Splitting *only* on the
`\n` character can still be achieved with `split('\n')` in both cases.
The `str::lines_any` function is also now deprecated as `str::lines` is a
drop-in replacement for it.
[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md
Closes #28032
|
|
|
|
|
|
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.
Partially closes #24214.
|
|
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.
Partially closes #24214.
|
|
|
|
StrSearcher: Implement the complete reverse case for the two way algorithm
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100
```
test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s
```
|
|
Break out a separate static method to create the "byteset".
|
|
|
|
|
|
|
|
The replacements are functions that usually use a single `mem::transmute` in
their body and restrict input and output via more concrete types than `T` and
`U`. Worth noting are the `transmute` functions for slices and the `from_utf8*`
family for mutable slices. Additionally, `mem::transmute` was often used for
casting raw pointers, when you can already cast raw pointers just fine with
`as`.
|
|
The innermost loop of TwoWaySearcher checks the boundary of the haystack
vs position + needle.len(), and it checks the last byte of the needle
against the byteset.
If these two steps are combined by using the indexing of the last
needle byte's position as bounds check, the algorithm improves its
throughput. We improve the innermost loop by reducing the number of
instructions used, and elminating the panic case for the checked
indexing that was previously used.
Selected benchmarks from the external/workspace testsuite. Benchmarks
improve across the board.
```
before:
test bb_in_aa::twoway_find ... bench: 4,229 ns/iter (+/- 1,305) = 23646 MB/s
test bb_in_aa::twoway_rfind ... bench: 3,873 ns/iter (+/- 101) = 25819 MB/s
test short_1let_long::twoway_find ... bench: 7,075 ns/iter (+/- 29) = 360 MB/s
test short_1let_long::twoway_rfind ... bench: 6,640 ns/iter (+/- 79) = 384 MB/s
test short_2let_long::twoway_find ... bench: 3,823 ns/iter (+/- 16) = 667 MB/s
test short_2let_long::twoway_rfind ... bench: 3,774 ns/iter (+/- 44) = 675 MB/s
test short_3let_long::twoway_find ... bench: 3,582 ns/iter (+/- 47) = 712 MB/s
test short_3let_long::twoway_rfind ... bench: 3,616 ns/iter (+/- 34) = 705 MB/s
with this commit:
test bb_in_aa::twoway_find ... bench: 2,952 ns/iter (+/- 20) = 33875 MB/s
test bb_in_aa::twoway_rfind ... bench: 2,939 ns/iter (+/- 99) = 34025 MB/s
test short_1let_long::twoway_find ... bench: 4,593 ns/iter (+/- 4) = 555 MB/s
test short_1let_long::twoway_rfind ... bench: 4,592 ns/iter (+/- 76) = 555 MB/s
test short_2let_long::twoway_find ... bench: 2,804 ns/iter (+/- 3) = 909 MB/s
test short_2let_long::twoway_rfind ... bench: 2,807 ns/iter (+/- 40) = 908 MB/s
test short_3let_long::twoway_find ... bench: 3,105 ns/iter (+/- 120) = 821 MB/s
test short_3let_long::twoway_rfind ... bench: 3,019 ns/iter (+/- 50) = 844 MB/s
```
- `bb_in_aa`: fast skip due to byteset filter loop improves.
- 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
|
|
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of
the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The
`#![no_std]` attribute now injects `extern crate core` at the top of the crate
as well as the libcore prelude into all modules (in the same manner as the
standard library's prelude). The `#![no_core]` attribute disables both std and
core injection.
[rfc]: https://github.com/rust-lang/rfcs/pull/1184
|
|
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100
```
test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s
```
|
|
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.
The libcore inner module was also trimmed down a bit to the bare bones.
|
|
eq_slice_() used to be a common implementation for two function that
both called it, but of those only eq_slice() is left, so we can as well
directly inline the code.
|
|
This makes the primitive descriptions on the front page read properly
as descriptions of types and not of the associated modules.
Having the primitive and module docs derived from the same source
causes problems, primarily that they can't contain hyperlinks
cross-referencing each other.
This crates dedicated private modules in `std` to document the
primitive types, then for all primitives that have a corresponding
module, puts hyperlinks in moth the primitive docs and the module docs
cross-linking each other.
This should help clear up confusion when readers find themselves on
the wrong page.
This also removes all the duplicate `#[doc(primitive)]` tags in various places (especially core), so the core docs will no longer attempt to document the primitives for now. Seems like an acceptable tradeoff to get some cleanup for std.
|
|
Having the primitive and module docs derived from the same source
causes problems, primarily that they can't contain hyperlinks
cross-referencing each other.
This crates dedicated private modules in `std` to document the
primitive types, then for all primitives that have a corresponding
module, puts hyperlinks in moth the primitive docs and the module docs
cross-linking each other.
This should help clear up confusion when readers find themselves on
the wrong page.
|
|
|
|
|
|
... matching the existing Index impls.
There is no reason not to if String implement DerefMut.
The code removed in `src/librustc/middle/effect.rs` was added in #9750
to prevent things like `s[0] = 0x80` where `s: String`,
but I belive became unnecessary when the Index(Mut) traits were introduced.
|
|
The basic str equality in core::str calls memcmp, re-use the same
function in StrSearcher's is_prefix_of, is_suffix_of.
|
|
This is needed to not drop performance, after the trait-based changes.
Force separate versions of the next method to be generated for the short
and long period cases.
|
|
Use a trait to be able to implement both the fast search that skips to
each match, and the slower search that emits `Reject` intervals
regularly. The latter is important for uses of `next_reject`.
|
|
|
|
To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.
Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).
Much thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.
I had quickcheck to fuzz test forward and reverse searching thoroughly.
The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.
The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.
A bad case for the two way algorithm is:
```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```
For this particular case, two way is not much faster than the naive
implementation it replaces.
|
|
This commit stabilizes the `str::{matches, rmatches}` functions and iterators,
but renames the unstable feature for the `str::{matches,rmatches}_indices`
function to `str_match_indices` due to the comment present on the functions
about the iterator's return value.
|
|
These were just exposed to be used elsewhere at some point, but neither is
currently being used so just make them private again.
|
|
This commit shards the broad `core` feature of the libcore library into finer
grained features. This split groups together similar APIs and enables tracking
each API separately, giving a better sense of where each feature is within the
stabilization process.
A few minor APIs were deprecated along the way:
* Iterator::reverse_in_place
* marker::NoCopy
|
|
Because these structures are created by a macro, the doc comments
don't quite work: the leading /// isn't stripped. Instead, just
use #[doc] so that they render correctly.
|
|
Because these structures are created by a macro, the doc comments
don't quite work: the leading /// isn't stripped. Instead, just
use #[doc] so that they render correctly.
|
|
Implement RFC rust-lang/rfcs#1123
Add str method str::split_at(mid: usize) -> (&str, &str).
|
|
Fixes #25250
|
|
In my opinion this looks nicer, but also it matches the whitespace generally
used for stability markers more closely.
|
|
|
|
|
|
Without the inline annotation this:
str::from_utf8_unchecked( slice::from_raw_parts( ptr, len ) )
doesn't get inlined which can be pretty brutal performance-wise
when used in an inner loop of a low level string manipulation method.
|
|
This commit removes all the old casting/generic traits from `std::num` that are
no longer in use by the standard library. This additionally removes the old
`strconv` module which has not seen much use in quite a long time. All generic
functionality has been supplanted with traits in the `num` crate and the
`strconv` module is supplanted with the [rust-strconv crate][rust-strconv].
[rust-strconv]: https://github.com/lifthrasiir/rust-strconv
This is a breaking change due to the removal of these deprecated crates, and the
alternative crates are listed above.
[breaking-change]
|
|
Cleaning out more deprecated items
|
|
Fixes #15679
Fixes #15878
Fixes #15882
Closes #15883
|
|
|
|
Fixes #24008.
|
|
`s/(?<!\{ self)(?<=\.)len\(\) == 0/is_empty()/g`
|
|
Conflicts:
src/libstd/net/ip.rs
src/libstd/sys/unix/fs.rs
src/libstd/sys/unix/mod.rs
src/libstd/sys/windows/mod.rs
|
|
|
|
The meaning of each variant of this enum was somewhat ambiguous and it's uncler
that we wouldn't even want to add more enumeration values in the future. As a
result this error has been altered to instead become an opaque structure.
Learning about the "first invalid byte index" is still an unstable feature, but
the type itself is now stable.
|