about summary refs log tree commit diff
path: root/src/libcore/str
AgeCommit message (Collapse)AuthorLines
2015-09-10Optimize string comparison by using memcmpErick Tryzelaar-15/+20
llvm seems to be having some trouble optimizing the iterator-based string comparsion method into some equivalent to memcmp. This explicitly calls out to the memcmp intrinisic in order to allow llvm to generate better code. In some manual benchmarking, this memcmp-based approach is 20 times faster than the iterator approach.
2015-09-08some more clippy-based improvementsAndre Bogus-3/+3
2015-09-04Auto merge of #28119 - nagisa:bytesderef, r=alexcrichtonbors-30/+3
2015-09-03std: Account for CRLF in {str, BufRead}::linesAlex Crichton-4/+10
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of the `str::lines` and `BufRead::lines` iterators. Both iterators now account for `\r\n` sequences in addition to `\n`, allowing for less surprising behavior across platforms (especially in the `BufRead` case). Splitting *only* on the `\n` character can still be achieved with `split('\n')` in both cases. The `str::lines_any` function is also now deprecated as `str::lines` is a drop-in replacement for it. [rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md Closes #28032
2015-09-03Elide lifetimes in libcoreManish Goregaokar-11/+11
2015-08-31Change explicit BytesDeref impl into Cloned iteratorSimonas Kazlauskas-30/+3
2015-08-31Auto merge of #28101 - ijks:24214-str-bytes, r=alexcrichtonbors-0/+15
Specifically, `count`, `last`, and `nth` are implemented to use the methods of the underlying slice iterator. Partially closes #24214.
2015-08-30Add overrides to iterator methods for `str::Bytes`Daan Rijks-0/+15
Specifically, `count`, `last`, and `nth` are implemented to use the methods of the underlying slice iterator. Partially closes #24214.
2015-08-28Add .as_str() to str::Chars and str::CharIndices. See #27775.Simon Sapin-0/+24
2015-08-18Auto merge of #27474 - bluss:twoway-reverse, r=brsonbors-64/+193
StrSearcher: Implement the complete reverse case for the two way algorithm Fix quadratic behavior in StrSearcher in reverse search with periodic needles. This commit adds the missing pieces for the "short period" case in reverse search. The short case will show up when the needle is literally periodic, for example "abababab". Two way uses a "critical factorization" of the needle: x = u v. Searching matches v first, if mismatch at character k, skip k forward. Matching u, if mismatch, skip period(x) forward. To avoid O(mn) behavior after mismatch in u, memorize the already matched prefix. The short period case requires that |u| < period(x). For the reverse search we need to compute a different critical factorization x = u' v' where |v'| < period(x), because we are searching for the reversed needle. A short v' also benefits the algorithm in general. The reverse critical factorization is computed quickly by using the same maximal suffix algorithm, but terminating as soon as we have a location with local period equal to period(x). This adds extra fields crit_pos_back and memory_back for the reverse case. The new overhead for TwoWaySearcher::new is low, and additionally I think the "short period" case is uncommon in many applications of string search. The maximal_suffix methods were updated in documentation and the algorithms updated to not use !0 and wrapping add, variable left is now 1 larger, offset 1 smaller. Use periodicity when computing byteset: in the periodic case, just iterate over one period instead of the whole needle. Example before (rfind) after (twoway_rfind) benchmark shows the removal of quadratic behavior. needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100 ``` test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s ```
2015-08-16StrSearcher: Additional comments and small code movesUlrik Sverdrup-19/+27
Break out a separate static method to create the "byteset".
2015-08-15core: Fill out issues for unstable featuresAlex Crichton-6/+11
2015-08-12Fallout in libs -- misc missing bounds uncovered by WF checks.Niko Matsakis-1/+2
2015-08-09Make `str::as_bytes_mut` privateTobias Bucher-6/+0
2015-08-09Replace many uses of `mem::transmute` with more specific functionsTobias Bucher-8/+14
The replacements are functions that usually use a single `mem::transmute` in their body and restrict input and output via more concrete types than `T` and `U`. Worth noting are the `transmute` functions for slices and the `from_utf8*` family for mutable slices. Additionally, `mem::transmute` was often used for casting raw pointers, when you can already cast raw pointers just fine with `as`.
2015-08-07StrSearcher: Improve inner loop in TwoWaySearcher::next, next_backUlrik Sverdrup-10/+22
The innermost loop of TwoWaySearcher checks the boundary of the haystack vs position + needle.len(), and it checks the last byte of the needle against the byteset. If these two steps are combined by using the indexing of the last needle byte's position as bounds check, the algorithm improves its throughput. We improve the innermost loop by reducing the number of instructions used, and elminating the panic case for the checked indexing that was previously used. Selected benchmarks from the external/workspace testsuite. Benchmarks improve across the board. ``` before: test bb_in_aa::twoway_find ... bench: 4,229 ns/iter (+/- 1,305) = 23646 MB/s test bb_in_aa::twoway_rfind ... bench: 3,873 ns/iter (+/- 101) = 25819 MB/s test short_1let_long::twoway_find ... bench: 7,075 ns/iter (+/- 29) = 360 MB/s test short_1let_long::twoway_rfind ... bench: 6,640 ns/iter (+/- 79) = 384 MB/s test short_2let_long::twoway_find ... bench: 3,823 ns/iter (+/- 16) = 667 MB/s test short_2let_long::twoway_rfind ... bench: 3,774 ns/iter (+/- 44) = 675 MB/s test short_3let_long::twoway_find ... bench: 3,582 ns/iter (+/- 47) = 712 MB/s test short_3let_long::twoway_rfind ... bench: 3,616 ns/iter (+/- 34) = 705 MB/s with this commit: test bb_in_aa::twoway_find ... bench: 2,952 ns/iter (+/- 20) = 33875 MB/s test bb_in_aa::twoway_rfind ... bench: 2,939 ns/iter (+/- 99) = 34025 MB/s test short_1let_long::twoway_find ... bench: 4,593 ns/iter (+/- 4) = 555 MB/s test short_1let_long::twoway_rfind ... bench: 4,592 ns/iter (+/- 76) = 555 MB/s test short_2let_long::twoway_find ... bench: 2,804 ns/iter (+/- 3) = 909 MB/s test short_2let_long::twoway_rfind ... bench: 2,807 ns/iter (+/- 40) = 908 MB/s test short_3let_long::twoway_find ... bench: 3,105 ns/iter (+/- 120) = 821 MB/s test short_3let_long::twoway_rfind ... bench: 3,019 ns/iter (+/- 50) = 844 MB/s ``` - `bb_in_aa`: fast skip due to byteset filter loop improves. - 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
2015-08-03syntax: Implement #![no_core]Alex Crichton-1/+2
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The `#![no_std]` attribute now injects `extern crate core` at the top of the crate as well as the libcore prelude into all modules (in the same manner as the standard library's prelude). The `#![no_core]` attribute disables both std and core injection. [rfc]: https://github.com/rust-lang/rfcs/pull/1184
2015-08-02StrSearcher: Implement the full two way algorithm in reverse for rfindUlrik Sverdrup-49/+158
Fix quadratic behavior in StrSearcher in reverse search with periodic needles. This commit adds the missing pieces for the "short period" case in reverse search. The short case will show up when the needle is literally periodic, for example "abababab". Two way uses a "critical factorization" of the needle: x = u v. Searching matches v first, if mismatch at character k, skip k forward. Matching u, if mismatch, skip period(x) forward. To avoid O(mn) behavior after mismatch in u, memorize the already matched prefix. The short period case requires that |u| < period(x). For the reverse search we need to compute a different critical factorization x = u' v' where |v'| < period(x), because we are searching for the reversed needle. A short v' also benefits the algorithm in general. The reverse critical factorization is computed quickly by using the same maximal suffix algorithm, but terminating as soon as we have a location with local period equal to period(x). This adds extra fields crit_pos_back and memory_back for the reverse case. The new overhead for TwoWaySearcher::new is low, and additionally I think the "short period" case is uncommon in many applications of string search. The maximal_suffix methods were updated in documentation and the algorithms updated to not use !0 and wrapping add, variable left is now 1 larger, offset 1 smaller. Use periodicity when computing byteset: in the periodic case, just iterate over one period instead of the whole needle. Example before (rfind) after (twoway_rfind) benchmark shows the removal of quadratic behavior. needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100 ``` test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s ```
2015-07-29std: Remove the curious inner moduleAlex Crichton-1/+1
This isn't actually necessary any more with the advent of `$crate` and changes in the compiler to expand macros to `::core::$foo` in the context of a `#![no_std]` crate. The libcore inner module was also trimmed down a bit to the bare bones.
2015-07-21Inline eq_slice_() into eq_slice()Björn Steinbrink-12/+3
eq_slice_() used to be a common implementation for two function that both called it, but of those only eq_slice() is left, so we can as well directly inline the code.
2015-07-21Auto merge of #27168 - brson:stdprim, r=steveklabnikbors-1/+0
This makes the primitive descriptions on the front page read properly as descriptions of types and not of the associated modules. Having the primitive and module docs derived from the same source causes problems, primarily that they can't contain hyperlinks cross-referencing each other. This crates dedicated private modules in `std` to document the primitive types, then for all primitives that have a corresponding module, puts hyperlinks in moth the primitive docs and the module docs cross-linking each other. This should help clear up confusion when readers find themselves on the wrong page. This also removes all the duplicate `#[doc(primitive)]` tags in various places (especially core), so the core docs will no longer attempt to document the primitives for now. Seems like an acceptable tradeoff to get some cleanup for std.
2015-07-20std: Create separate docs for the primitivesBrian Anderson-1/+0
Having the primitive and module docs derived from the same source causes problems, primarily that they can't contain hyperlinks cross-referencing each other. This crates dedicated private modules in `std` to document the primitive types, then for all primitives that have a corresponding module, puts hyperlinks in moth the primitive docs and the module docs cross-linking each other. This should help clear up confusion when readers find themselves on the wrong page.
2015-07-18Fix doc comment parsing in macros.Lee Jeffery-10/+10
2015-07-13Add str::split_at_mutSimon Sapin-0/+15
2015-07-13Implement IndexMut for String and str.Simon Sapin-0/+64
... matching the existing Index impls. There is no reason not to if String implement DerefMut. The code removed in `src/librustc/middle/effect.rs` was added in #9750 to prevent things like `s[0] = 0x80` where `s: String`, but I belive became unnecessary when the Index(Mut) traits were introduced.
2015-07-04core: Use memcmp in is_prefix_of / is_suffix_ofUlrik Sverdrup-5/+4
The basic str equality in core::str calls memcmp, re-use the same function in StrSearcher's is_prefix_of, is_suffix_of.
2015-06-24StrSearcher: Explicitly separate the long and short casesUlrik Sverdrup-5/+11
This is needed to not drop performance, after the trait-based changes. Force separate versions of the next method to be generated for the short and long period cases.
2015-06-21StrSearcher: Use trait to specialize two way algorithm by caseUlrik Sverdrup-57/+133
Use a trait to be able to implement both the fast search that skips to each match, and the slower search that emits `Reject` intervals regularly. The latter is important for uses of `next_reject`.
2015-06-21StrSearcher: Specialize is_prefix_of/is_suffix_of for &strUlrik Sverdrup-17/+33
2015-06-21StrSearcher: Update substring search to use the Two Way algorithmUlrik Sverdrup-425/+466
To improve our substring search performance, revive the two way searcher and adapt it to the Pattern API. Fixes #25483, a performance bug: that particular case now completes faster in optimized rust than in ruby (but they share the same order of magnitude). Much thanks to @gereeter who helped me understand the reverse case better and wrote the comment explaining `next_back` in the code. I had quickcheck to fuzz test forward and reverse searching thoroughly. The two way searcher implements both forward and reverse search, but not double ended search. The forward and reverse parts of the two way searcher are completely independent. The two way searcher algorithm has very small, constant space overhead, requiring no dynamic allocation. Our implementation is relatively fast, especially due to the `byteset` addition to the algorithm, which speeds up many no-match cases. A bad case for the two way algorithm is: ``` let haystack = (0..10_000).map(|_| "dac").collect::<String>(); let needle = (0..100).map(|_| "bac").collect::<String>()); ``` For this particular case, two way is not much faster than the naive implementation it replaces.
2015-06-17std: Stabilize the `str_matches` featureAlex Crichton-2/+2
This commit stabilizes the `str::{matches, rmatches}` functions and iterators, but renames the unstable feature for the `str::{matches,rmatches}_indices` function to `str_match_indices` due to the comment present on the functions about the iterator's return value.
2015-06-17std: Remove two internal `str_internals` functionsAlex Crichton-4/+2
These were just exposed to be used elsewhere at some point, but neither is currently being used so just make them private again.
2015-06-17core: Split apart the global `core` featureAlex Crichton-5/+11
This commit shards the broad `core` feature of the libcore library into finer grained features. This split groups together similar APIs and enables tracking each API separately, giving a better sense of where each feature is within the stabilization process. A few minor APIs were deprecated along the way: * Iterator::reverse_in_place * marker::NoCopy
2015-06-16Auto merge of #26313 - steveklabnik:fix_str_docs, r=alexcrichtonbors-10/+10
Because these structures are created by a macro, the doc comments don't quite work: the leading /// isn't stripped. Instead, just use #[doc] so that they render correctly.
2015-06-15Fix up Split docsSteve Klabnik-10/+10
Because these structures are created by a macro, the doc comments don't quite work: the leading /// isn't stripped. Instead, just use #[doc] so that they render correctly.
2015-06-10Add str::split_atUlrik Sverdrup-0/+13
Implement RFC rust-lang/rfcs#1123 Add str method str::split_at(mid: usize) -> (&str, &str).
2015-05-11docs: Update FromStr documentationUlrik Sverdrup-2/+5
Fixes #25250
2015-05-09Convert #[lang="..."] to #[lang = "..."]Nick Hamann-1/+1
In my opinion this looks nicer, but also it matches the whitespace generally used for stability markers more closely.
2015-05-08core: impl AsRef<[u8]> for strSean McArthur-0/+9
2015-05-04Fix spelling errors in documentation.Joseph Crail-1/+1
2015-05-03Add #[inline(always)] to str::from_utf8_uncheckedJan Bujak-0/+1
Without the inline annotation this: str::from_utf8_unchecked( slice::from_raw_parts( ptr, len ) ) doesn't get inlined which can be pretty brutal performance-wise when used in an inner loop of a low level string manipulation method.
2015-04-21std: Remove deprecated/unstable num functionalityAlex Crichton-1/+0
This commit removes all the old casting/generic traits from `std::num` that are no longer in use by the standard library. This additionally removes the old `strconv` module which has not seen much use in quite a long time. All generic functionality has been supplanted with traits in the `num` crate and the `strconv` module is supplanted with the [rust-strconv crate][rust-strconv]. [rust-strconv]: https://github.com/lifthrasiir/rust-strconv This is a breaking change due to the removal of these deprecated crates, and the alternative crates are listed above. [breaking-change]
2015-04-21std: Remove deprecated AsOsStr/Str/AsSlice traitsAlex Crichton-24/+0
Cleaning out more deprecated items
2015-04-21Auto merge of #24620 - pczarn:model-lexer-issues, r=cmrbors-2/+0
Fixes #15679 Fixes #15878 Fixes #15882 Closes #15883
2015-04-21Model lexer: Fix remaining issuesPiotr Czarnecki-2/+0
2015-04-20Make iterator struct docs more consistent.Steve Klabnik-13/+13
Fixes #24008.
2015-04-14Positive case of `len()` -> `is_empty()`Tamir Duberstein-2/+2
`s/(?<!\{ self)(?<=\.)len\(\) == 0/is_empty()/g`
2015-04-14rollup merge of #24377: apasel422/docsAlex Crichton-4/+4
Conflicts: src/libstd/net/ip.rs src/libstd/sys/unix/fs.rs src/libstd/sys/unix/mod.rs src/libstd/sys/windows/mod.rs
2015-04-13pluralize doc comment verbs and add missing periodsAndrew Paseltiner-4/+4
2015-04-10std: Stabilize the Utf8Error typeAlex Crichton-22/+17
The meaning of each variant of this enum was somewhat ambiguous and it's uncler that we wouldn't even want to add more enumeration values in the future. As a result this error has been altered to instead become an opaque structure. Learning about the "first invalid byte index" is still an unstable feature, but the type itself is now stable.