| Age | Commit message (Collapse) | Author | Lines |
|
|
|
|
|
|
|
`v.len()` counts code units, not UTF-16 bytes. The lower bound is one UTF-8 byte per code unit, not per two code units.
|
|
This reverts commit 40b9f5ded50ac4ce8c9323921ec556ad611af6b7.
|
|
This reverts commit 95cfc35607ccf5f02f02de56a35a9ef50fa23a82.
|
|
This reverts commit 6e0611a48707a1f5d90aee32a02b2b15957ef25b.
|
|
|
|
[breaking-change]
If you are using slicing syntax you will need to add #![feature(slicing_syntax)] to your crate.
|
|
|
|
This breaks code that looks like:
match foo {
1..3 => { ... }
}
Instead, write:
match foo {
1...3 => { ... }
}
Closes #17295.
[breaking-change]
|
|
|
|
Closes #17502
|
|
|
|
|
|
This commit deprecates the String::shift_char() function in favor of the
addition of an insert()/remove() pair of functions. This aligns the API with Vec
in that characters can be inserted at arbitrary positions. Additionaly, there
is no `_char` suffix due to the rationaled laid out in the previous commit.
These functions are both introduced as unstable as their failure semantics,
while in line with slices/vectors, are uncertain about whether they should
remain the same.
|
|
# Rationale
When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name. There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).
The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:
> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.
# Method stabilization
The following methods have been marked #[stable]
* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice
The following methods have been marked #[unstable]
* String::from_utf8 - The error type in the returned `Result` may change to
provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
stabilization
* String::from_utf16 - The return type may change to become a `Result` which
includes more contextual information like where the error
occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
been marked #[unstable] becuase it can be seen as a
duplicate of iterator-based functionality as well as
possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
less efficient because of decoding/encoding. Due to the
desire to minimize API surface this may be able to be
removed in the future for something possibly generic with
no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
indices isn't totally clear at this time, so the failure
semantics and return value of this method are subject to
change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]
[2]: https://github.com/rust-lang/rfcs/pull/240
The following method have been marked #[experimental]
* String::from_str - This function only exists as it's more efficient than
to_string(), but having a less ergonomic function for
performance reasons isn't the greatest reason to keep it
around. Like Vec::push_all, this has been marked
experimental for now.
The following methods have been #[deprecated]
* String::append - This method has been deprecated to remain consistent with the
deprecation of Vec::append. While convenient, it is one of
the only functional-style apis on String, and requires more
though as to whether it belongs as a first-class method or
now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
str::from_utf8 plus an assert plus a call to to_string().
Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
type which allow bypassing utf-8 checks. These have all
been deprecated in favor of calling `.as_mut_vec()` and
then operating directly on the vector returned. These
methods were deprecated because naming them with relation
to other methods was difficult to rationalize and it's
arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes
# Reservation methods
This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]
[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
|
|
|
|
Change to resolve and update compiler and libs for uses.
[breaking-change]
Enum variants are now in both the value and type namespaces. This means that
if you have a variant with the same name as a type in scope in a module, you
will get a name clash and thus an error. The solution is to either rename the
type or the variant.
|
|
|
|
|
|
of `use bar as foo`.
Change all uses of `use foo = bar` to `use bar as foo`.
Implements RFC #47.
Closes #16461.
[breaking-change]
|
|
The first commit improves code generation through a few changes:
- The `#[inline]` attributes allow llvm to constant fold the encoding step away in certain situations. For example, code like this changes from a call to `encode_utf8` in a inner loop to the pushing of a byte constant:
```rust
let mut s = String::new();
for _ in range(0u, 21) {
s.push_char('a');
}
```
- Both methods changed their semantic from causing run time failure if the target buffer is not large enough to returning `None` instead. This makes llvm no longer emit code for causing failure for these methods.
- A few debug `assert!()` calls got removed because they affected code generation due to unwinding, and where basically unnecessary with today's sound handling of `char` as a Unicode scalar value.
~~The second commit is optional. It changes the methods from regular indexing with the `dst[i]` syntax to unsafe indexing with `dst.unsafe_mut_ref(i)`. This does not change code generation directly - in both cases llvm is smart enough to see that there can never be an out-of-bounds access. But it makes it emit a `nounwind` attribute for the function.
However, I'm not sure whether that is a real improvement, so if there is any objection to this I'll remove the commit.~~
This changes how the methods behave on a too small buffer, so this is a
[breaking-change]
|
|
declared with the same name in the same scope.
This breaks several common patterns. First are unused imports:
use foo::bar;
use baz::bar;
Change this code to the following:
use baz::bar;
Second, this patch breaks globs that import names that are shadowed by
subsequent imports. For example:
use foo::*; // including `bar`
use baz::bar;
Change this code to remove the glob:
use foo::{boo, quux};
use baz::bar;
Or qualify all uses of `bar`:
use foo::{boo, quux};
use baz;
... baz::bar ...
Finally, this patch breaks code that, at top level, explicitly imports
`std` and doesn't disable the prelude.
extern crate std;
Because the prelude imports `std` implicitly, there is no need to
explicitly import it; just remove such directives.
The old behavior can be opted into via the `import_shadowing` feature
gate. Use of this feature gate is discouraged.
This implements RFC #116.
Closes #16464.
[breaking-change]
|
|
- Both can now be inlined and constant folded away
- Both can no longer cause failure
- Both now return an `Option` instead
Removed debug `assert!()`s over the valid ranges of a `char`
- It affected optimizations due to unwinding
- Char handling is now sound enought that they became uneccessary
|
|
Deprecate the previous.
|
|
This required some contortions because importing both raw::Slice
and slice::Slice makes rustc crash.
Since `Slice` is in the prelude, this renaming is unlikely to
casue breakage.
[breaking-change]
|
|
|
|
|
|
|
|
|
|
Reword comments on unsafe methods regarding UTF-8.
|
|
|
|
Replaced by `string::raw::from_parts`
[breaking-change]
|
|
Replaced by `string::raw::from_buf_len`
[breaking-change]
|
|
Replaced by `string::raw::from_utf8`
[breaking-change]
|
|
|
|
Implement for Vec, DList, RingBuf. Add MutableSeq to the prelude.
Since the collections traits are in the prelude most consumers of
these methods will continue to work without change.
[breaking-change]
|
|
|
|
I blame @ChrisMorgan for the hyphens.
|
|
|
|
|
|
Use `String::from_utf8_lossy` instead
[breaking-change]
|
|
Use `String::from_utf16_lossy` instead.
[breaking-change]
|
|
Use `String::from_utf16` instead
[breaking-change]
|
|
Replaced by `String::from_byte`
[breaking-change]
|
|
Use `String::from_chars` instead
[breaking-change]
|
|
Use `String::from_utf8` instead
[breaking-change]
|
|
[breaking-change]
|
|
```
test new_push_byte ... bench: 6985 ns/iter (+/- 487) = 17 MB/s
test old_push_byte ... bench: 19335 ns/iter (+/- 1368) = 6 MB/s
```
```rust
extern crate test;
use test::Bencher;
static TEXT: &'static str = "\
Unicode est un standard informatique qui permet des échanges \
de textes dans différentes langues, à un niveau mondial.";
#[bench]
fn old_push_byte(bencher: &mut Bencher) {
bencher.bytes = TEXT.len() as u64;
bencher.iter(|| {
let mut new = String::new();
for b in TEXT.bytes() {
unsafe { new.as_mut_vec().push_all([b]) }
}
})
}
#[bench]
fn new_push_byte(bencher: &mut Bencher) {
bencher.bytes = TEXT.len() as u64;
bencher.iter(|| {
let mut new = String::new();
for b in TEXT.bytes() {
unsafe { new.as_mut_vec().push(b) }
}
})
}
```
|