the heap refill code was recursively implemented with two methods.
I merged both methods.
replace recursion in heap refill method with iterative approach
use array for list of LongQueue
This way there is no precondition when accessing the elements
In many use cases it is more efficient to have a getter that returns
a default value instead of throwing an exception. The exception forces
every consumer to call containsKey() before calling get(). In cases
where the consumer wants to use a default value this is unnecessary.
Using the unsafe versions of add/get to improve performance.
Here are some numbers for 2^15 intersections of two random
sorted lists with 16k elements. The numbers were gathered with
perf stat -d -d -d --delay 2000 java -ea -cp bin/test:bin/main
org.lucares.collections.Test 16000 15 (the test class is not committed)
Duration: 9059ms -> 7293ms (80.5%)
Cycles: 26.084.812.051 -> 21.616.608.207 (82.9%)
Instructions: 68.045.848.666 -> 52.306.000.150 (76.9%)
Instructions per Cycle: 2,61 -> 2.42
Branches: 15.007.093.940 -> 9.839.481.658 (65.6%)
Branch Misses: 2.285.461 -> 1.551.906
Cycles per element: 24.87 -> 20.61 (82.9%)
Added getUnsafe and addUnsafe, two methods that skip checks.
The checks are not needed in unionSorted, because we made
sure the code works correctly. getUnsafe still has the checks,
but as assertions.
The speedup is roughly 20%. Here are some results for calling
union 2^16 times for two random sorted lists with 16k
elements:
Duration: 20170 -> 16857 (83.57%)
Instructions: 138.277.743.679 -> 108.474.027.213 (78.4%)
Instructions per cycle: 2,33 -> 2,18
Branches: 30.248.330.644 -> 19.908.207.057 (65.8%)
Branch Misses: 3.012.427 -> 3.477.433
Cycles per list element: 65.93 -> 51.72 (78.4%)
Created with (the test program is not committed) on a Core2Duo 8600:
perf stat -d -d -d --delay 2000 java -cp bin/test:bin/main
org.lucares.collections.Test 16000 16
Use case: The list is used as a buffer, that is re-used and in
each iteration the list is cleared.
Now that clear() does not replace the data array there is no
garbage collection and we do not have to allocated one or
several new arrays in the next iteration.
You can still free the associated memory by calling clear() + trim().
- Remove TODO in unionUnsorted, because I don't think there is a much
better solution.
- Remove TODO for lastIndexOf, because it has been implemented.
- Use local variables for size in unionSorted.
The algorithms for removeAll, retainAll and removeIf were almost
identical. The main difference was, that removeIf calls a lambda,
whereas removeAll and retainAll executed the check directly. Both
methods call indexOf on an IntList, which easily outweighs the extra
cost of the lambda.
It makes sense to consolidate and remove the duplicated code.
I ran a few benchmarks with different list sizes before the replacement.
The results are not as clear as I would have liked. In some cases,
especially for short lists, the special implementations of
removeAll/retainAll were up to 10% faster. In other situations I saw
50% difference the one day, but could not reproduce those results a
day later. This leads me to believe, that my test setup is not
trustworthy.
That means I stay with what I now. The code is identical with one tiny
difference. And that difference shouldn't matter, because indexOf is
much more expensive.
A list can be sorted after shuffling. The trivial cases are
the empty list and a list with only one element or lists
with only identical elements. Lists with only a few elements
have a non-negligible chance to be sorted after shuffling.