summary refs log tree commit diff
path: root/src/libstd/str.rs
AgeCommit message (Collapse)AuthorLines
2013-08-04auto merge of #8237 : blake2-ppc/rust/faster-utf8, r=brsonbors-35/+67
Use unchecked vec indexing since the vector bounds are checked by the loop. Iterators are not easy to use in this case since we skip 1-4 bytes each lap. This part of the commit speeds up is_utf8 for ASCII input. Check codepoint ranges by checking the byte ranges manually instead of computing a full decoding for multibyte encodings. This is easy to read and corresponds to the UTF-8 syntax in the RFC. No changes to what we accept. A comment notes that surrogate halves are accepted. Before: test str::bench::is_utf8_100_ascii ... bench: 165 ns/iter (+/- 3) test str::bench::is_utf8_100_multibyte ... bench: 218 ns/iter (+/- 5) After: test str::bench::is_utf8_100_ascii ... bench: 130 ns/iter (+/- 1) test str::bench::is_utf8_100_multibyte ... bench: 156 ns/iter (+/- 3) An improvement upon the previous pull #8133
2013-08-03remove obsolete `foreach` keywordDaniel Micay-30/+30
this has been replaced by `for`
2013-08-02std: Speed up str::is_utf8blake2-ppc-35/+67
Use unchecked vec indexing since the vector bounds are checked by the loop. Iterators are not easy to use in this case since we skip 1-4 bytes each lap. This part of the commit speeds up is_utf8 for ASCII input. Check codepoint ranges by checking the byte ranges manually instead of computing a full decoding for multibyte encodings. This is easy to read and corresponds to the UTF-8 syntax in the RFC. No changes to what we accept. A comment notes that surrogate halves are accepted. Before: test str::bench::is_utf8_100_ascii ... bench: 165 ns/iter (+/- 3) test str::bench::is_utf8_100_multibyte ... bench: 218 ns/iter (+/- 5) After: test str::bench::is_utf8_100_ascii ... bench: 130 ns/iter (+/- 1) test str::bench::is_utf8_100_multibyte ... bench: 156 ns/iter (+/- 3)
2013-08-01str: Add method .into_owned(self) -> ~str to StrKevin Ballard-0/+12
The method .into_owned() is meant to be used as an optimization when you need to get a ~str from a Str, but don't want to unnecessarily copy it if it's already a ~str. This is meant to ease functions that look like fn foo<S: Str>(strs: &[S]) Previously they could work with the strings as slices using .as_slice(), but producing ~str required copying the string, even if the vector turned out be a &[~str] already.
2013-08-01std: Change `Times` trait to use `do` instead of `for`blake2-ppc-1/+1
Change the former repetition:: for 5.times { } to:: do 5.times { } .times() cannot be broken with `break` or `return` anymore; for those cases, use a numerical range loop instead.
2013-08-01migrate many `for` loops to `foreach`Daniel Micay-31/+32
2013-08-01make `in` and `foreach` get treated as keywordsDaniel Micay-1/+1
2013-07-30std: Mark the static constants in str.rs as privateblake2-ppc-10/+10
static variables are pub by default, which is not reflected in our code (we need to use priv).
2013-07-30std: Add from_bytes test for utf-8 using codepoints above 0xffffblake2-ppc-0/+3
2013-07-30std: Deny overlong encodings in UTF-8blake2-ppc-8/+45
An 'overlong encoding' is a codepoint encoded non-minimally using the utf-8 format. Denying these enforce each codepoint to have only one valid representation in utf-8. An example is byte sequence 0xE0 0x80 0x80 which could be interpreted as U+0, but it's an overlong encoding since the canonical form is just 0x00. Another example is 0xE0 0x80 0xAF which was previously accepted and is an overlong encoding of the solidus "/". Directory traversal characters like / and . form the most compelling argument for why this commit is security critical. Factor out common UTF-8 decoding expressions as macros. This commit will partly duplicate UTF-8 decoding, so it is now present in both fn is_utf8() and .char_range_at(); the latter using an assumption of a valid str.
2013-07-30std: Disallow bytes 0xC0, 0xC1 (192, 193) in utf-8blake2-ppc-1/+1
Bytes 0xC0, 0xC1 can only be used to start 2-byte codepoint encodings, that are 'overlong encodings' of codepoints below 128. The reference given in a comment -- https://tools.ietf.org/html/rfc3629 -- does in fact already exclude these bytes, so no additional comment should be needed in the code.
2013-07-30auto merge of #8121 : thestinger/rust/offset, r=alexcrichtonbors-19/+19
Closes #8118, #7136 ~~~rust extern mod extra; use std::vec; use std::ptr; fn bench_from_elem(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = vec::from_elem(1024, 0u8); } } fn bench_set_memory(b: &mut extra::test::BenchHarness) { do b.iter { let mut v: ~[u8] = vec::with_capacity(1024); unsafe { let vp = vec::raw::to_mut_ptr(v); ptr::set_memory(vp, 0, 1024); vec::raw::set_len(&mut v, 1024); } } } fn bench_vec_repeat(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = ~[0u8, ..1024]; } } ~~~ Before: test bench_from_elem ... bench: 415 ns/iter (+/- 17) test bench_set_memory ... bench: 85 ns/iter (+/- 4) test bench_vec_repeat ... bench: 83 ns/iter (+/- 3) After: test bench_from_elem ... bench: 84 ns/iter (+/- 2) test bench_set_memory ... bench: 84 ns/iter (+/- 5) test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)
2013-07-30Added str::char_offset_iter() and str::rev_char_offset_iter()Marvin Löbel-590/+489
Renamed bytes_iter to byte_iter to match other iterators Refactored str Iterators to use DoubleEnded Iterators and typedefs instead of wrapper structs Reordered the Iterator section Whitespace fixup Moved clunky `each_split_within` function to the one place in the tree where it's actually needed Replaced all block doccomments in str with line doccomments
2013-07-30implement pointer arithmetic with GEPDaniel Micay-19/+19
Closes #8118, #7136 ~~~rust extern mod extra; use std::vec; use std::ptr; fn bench_from_elem(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = vec::from_elem(1024, 0u8); } } fn bench_set_memory(b: &mut extra::test::BenchHarness) { do b.iter { let mut v: ~[u8] = vec::with_capacity(1024); unsafe { let vp = vec::raw::to_mut_ptr(v); ptr::set_memory(vp, 0, 1024); vec::raw::set_len(&mut v, 1024); } } } fn bench_vec_repeat(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = ~[0u8, ..1024]; } } ~~~ Before: test bench_from_elem ... bench: 415 ns/iter (+/- 17) test bench_set_memory ... bench: 85 ns/iter (+/- 4) test bench_vec_repeat ... bench: 83 ns/iter (+/- 3) After: test bench_from_elem ... bench: 84 ns/iter (+/- 2) test bench_set_memory ... bench: 84 ns/iter (+/- 5) test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)
2013-07-30std: Implement Extendable for hashmap, str and trieblake2-ppc-3/+24
2013-07-29std: Rename Iterator adaptor types to drop the -Iterator suffixblake2-ppc-4/+3
Drop the "Iterator" suffix for the the structs in std::iterator. Filter, Zip, Chain etc. are shorter type names for when iterator pipelines need their types written out in full in return value types, so it's easier to read and write. the iterator module already forms enough namespace.
2013-07-29std: Implement FromIterator for ~strblake2-ppc-1/+23
FromIterator initially only implemented for Iterator<char>, which is the type of the main iterator.
2013-07-28Refactored vec and str iterators to remove prefixesjmgrosen-45/+45
2013-07-27auto merge of #8036 : sfackler/rust/container-impls, r=msullivanbors-8/+0
A couple of implementations of Container::is_empty weren't exactly self.len() == 0 so I left them alone (e.g. Treemap).
2013-07-26Consolidate raw representations of rust valuesAlex Crichton-29/+31
This moves the raw struct layout of closures, vectors, boxes, and strings into a new `unstable::raw` module. This is meant to be a centralized location to find information for the layout of these values. As safe method, `repr`, is provided to convert a rust value to its raw representation. Unsafe methods to convert back are not provided because they are rarely used and too numerous to write an implementation for each (not much of a common pattern).
2013-07-25Added default impls for container methodsSteven Fackler-8/+0
A couple of implementations of Container::is_empty weren't exactly self.len() == 0 so I left them alone (e.g. Treemap).
2013-07-24auto merge of #7996 : erickt/rust/cleanup-strs, r=ericktbors-178/+152
This is a cleanup pull request that does: * removes `os::as_c_charp` * moves `str::as_buf` and `str::as_c_str` into `StrSlice` * converts some functions from `StrSlice::as_buf` to `StrSlice::as_c_str` * renames `StrSlice::as_buf` to `StrSlice::as_imm_buf` (and adds `StrSlice::as_mut_buf` to match `vec.rs`. * renames `UniqueStr::as_bytes_with_null_consume` to `UniqueStr::to_bytes` * and other misc cleanups and minor optimizations
2013-07-24Change 'print(fmt!(...))' to printf!/printfln! in src/lib*Birunthan Mohanathas-1/+1
2013-07-23std: make str::append move selfErick Tryzelaar-6/+5
This eliminates a copy and fixes a FIXME.
2013-07-23std: inline str::with_capacity and vec::with_capacityErick Tryzelaar-5/+3
2013-07-23std: simplify str::as_imm_buf and vec::as_{imm,mut}_bufErick Tryzelaar-5/+2
2013-07-23str: move as_mut_buf into OwnedStr, and make it `self`Erick Tryzelaar-18/+18
2013-07-23std: remove str::to_owned and str::raw::slice_bytes_ownedErick Tryzelaar-41/+22
2013-07-23std: rename str.as_buf to as_imm_buf, add str.as_mut_bufErick Tryzelaar-65/+59
2013-07-23std: add test for str::as_c_strErick Tryzelaar-0/+22
2013-07-23std: move StrUtil::as_c_str into StrSliceErick Tryzelaar-45/+29
2013-07-23std: move str::as_buf into StrSliceErick Tryzelaar-48/+48
2013-07-23std: rename str.as_bytes_with_null_consume to str.to_bytes_with_nullErick Tryzelaar-7/+6
2013-07-23std: wrap "long" utf8 lines.Graydon Hoare-2/+4
2013-07-22std: add preliminary str benchmark.Graydon Hoare-0/+45
2013-07-22new snapshotDaniel Micay-13/+0
2013-07-21auto merge of #7932 : blake2-ppc/rust/str-clear, r=huonwbors-1/+51
~str and @str need separate implementations for use in generic functions, where it will not automatically use the impl on &str. fixes issue #7900
2013-07-20std: Implement Clone for VecIterator and iterators using itblake2-ppc-0/+7
The theory is simple, the immutable iterators simply hold state variables (indicies or pointers) into frozen containers. We can freely clone these iterators, just like we can clone borrowed pointers. VecIterator needs a manual impl to handle the lifetime struct member.
2013-07-20str: Implement Container for ~str, @str and Mutable for ~strblake2-ppc-1/+51
~str and @str need separate implementations for use in generic functions, where it will not automatically use the impl on &str.
2013-07-17librustc: Remove all uses of "copy".Patrick Walton-1/+8
2013-07-15remove headers from unique vectorsDaniel Micay-1/+14
2013-07-11Optimize is_utf8Gary Linscott-8/+16
Manually unroll the multibyte loops, and optimize for the single byte chars.
2013-07-11char_range_at perf workGary Linscott-28/+60
Moves multibyte code to it's own function to make char_range_at easier to inline, and faster for single and multibyte chars. Benchmarked reading example.json 100 times, 1.18s before, 1.08s after.
2013-07-08auto merge of #7612 : thestinger/rust/utf8, r=huonwbors-22/+2
2013-07-08Merge pull request #7595 from thestinger/iteratorDaniel Micay-1/+1
remove some method resolve workarounds
2013-07-07remove some method resolve workaroundsDaniel Micay-1/+1
2013-07-06remove extra::ropeDaniel Micay-2/+1
It's broken/unmaintained and needs to be rewritten to avoid managed pointers and needless copies. A full rewrite is necessary and the API will need to be redone so it's not worth keeping this around. Closes #2236, #2744
2013-07-05str: stop encoding invalid out-of-range `char`Daniel Micay-22/+2
2013-07-04Convert vec::{as_imm_buf, as_mut_buf} to methods.Huon Wilson-3/+4
2013-07-01rustc: add a lint to enforce uppercase statics.Huon Wilson-40/+40