| Age | Commit message (Collapse) | Author | Lines |
|
|
|
|
|
Currently, `BitSet` doesn't actually know its own domain size; it just
knows how many words it contains. To improve things, this commit makes
the following changes.
- It changes `BitSet` and `SparseBitSet` to store their own domain size,
and do more precise bounds and same-size checks with it. It also
changes the signature of `BitSet::to_string()` (and puts it within
`impl ToString`) now that the domain size need not be passed in from
outside.
- It uses `derive(RustcDecodable, RustcEncodable)` for `BitSet`. This
required adding code to handle `PhantomData` in `libserialize`.
- As a result, it removes the domain size from `HybridBitSet`, making a
lot of that code nicer.
- Both set_up_to() and clear_above() were overly general, working with
arbitrary sizes when they are only needed for the domain size. The
commit removes the former, degeneralizes the latter, and removes the
(overly general) tests.
- Changes `GrowableBitSet::grow()` to `ensure()`, fixing a bug where a
(1-based) domain size was confused with a (0-based) element index.
- Changes `BitMatrix` to store its row count, and do more precise bounds
checks with it.
- Changes `ty_params` in `select.rs` from a `BitSet` to a
`GrowableBitSet` because it repeatedly failed the new, more precise
bounds checks. (Changing the type was simpler than computing an
accurate domain size.)
- Various other minor improvements.
|
|
This requires adding a few extra methods to `HybridBitSet`. (These are
tested in a new unit test.)
This commit reduces the `max-rss` for `nll-check` builds of `html5ever`
by 46%, `ucd` by 45%, `clap-rs` by 23%, `inflate` by 14%. And the
results for the `unic-ucd-name` crate are even more impressive: a 21%
reduction in instructions, a 60% reduction in wall-time, a 96%
reduction in `max-rss`, and a 97% reduction in faults!
Fixes #52028.
|
|
`SparseBitSet` is the only remaining user of `ArrayVec`. This commit
switches it to using `SmallVec`, and removes `array_vec.rs`.
Why the switch? Although `SparseBitSet` is size-limited and doesn't need
the ability to spill to the heap, `SmallVec` has many more features than
`ArrayVec`. In particular, it's now possible to keep `SparseBitSet`'s
elements in sorted order, which gives in-order iteration, which is a
requirement for the next commit.
|
|
|
|
- Rename `BitSet::data` and `BitMatrix::vector` as `words`, because that's
what they are.
- Remove `BitSet::words_mut()`, which is no longer necessary.
- Better distinguish multiple meanins of "word", i.e. "word index" vs
"word ref" vs "word" (i.e. the value itself).
|
|
`BitwiseOperator` is an unnecessarily low-level thing. This commit
replaces it with `BitSetOperator`, which works on `BitSet`s instead of
words. Within `bit_set.rs`, the commit eliminates `Intersect`, `Union`,
and `Subtract` by instead passing a function to `bitwise()`.
|
|
Currently we have two files implementing bitsets (and 2D bit matrices).
This commit combines them into one, taking the best features from each.
This involves renaming a lot of things. The high level changes are as
follows.
- bitvec.rs --> bit_set.rs
- indexed_set.rs --> (removed)
- BitArray + IdxSet --> BitSet (merged, see below)
- BitVector --> GrowableBitSet
- {,Sparse,Hybrid}IdxSet --> {,Sparse,Hybrid}BitSet
- BitMatrix --> BitMatrix
- SparseBitMatrix --> SparseBitMatrix
The changes within the bitset types themselves are as follows.
```
OLD OLD NEW
BitArray<C> IdxSet<T> BitSet<T>
-------- ------ ------
grow - grow
new - (remove)
new_empty new_empty new_empty
new_filled new_filled new_filled
- to_hybrid to_hybrid
clear clear clear
set_up_to set_up_to set_up_to
clear_above - clear_above
count - count
contains(T) contains(&T) contains(T)
contains_all - superset
is_empty - is_empty
insert(T) add(&T) insert(T)
insert_all - insert_all()
remove(T) remove(&T) remove(T)
words words words
words_mut words_mut words_mut
- overwrite overwrite
merge union union
- subtract subtract
- intersect intersect
iter iter iter
```
In general, when choosing names I went with:
- names that are more obvious (e.g. `BitSet` over `IdxSet`).
- names that are more like the Rust libraries (e.g. `T` over `C`,
`insert` over `add`);
- names that are more set-like (e.g. `union` over `merge`, `superset`
over `contains_all`, `domain_size` over `num_bits`).
Also, using `T` for index arguments seems more sensible than `&T` --
even though the latter is standard in Rust collection types -- because
indices are always copyable. It also results in fewer `&` and `*`
sigils in practice.
|