BSFile was used to store two types of data. This makes
the API complex. I split the API into two files with
easier and more clear APIs. Interestingly the API of
BSFile is still rather complex and has to consider both
use cases.
Due to a mistake in Tag which added all strings used
by Tag into the String dictionary, the dictionary
did contain all values that were used in queries.
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
The reason seems to be the number of memory allocations. In order
to create the union of 100 lists we have 99 memory allocations.
The first needs the space for the first two lists, the second the
space for the first three lists, and so on.
We can reduce the number of allocations drastically (in many
cases to one) by leveraging the fact that many of the lists
were already sorted, non-overlapping and increasing, so that
we can simply concatenate them.
The CacheKey is used as a key in a HashMap. Lookup can
be faster if the CacheKey is comparable when there are
hash collisions.
In this case I was not able to measure any effect. I am
keeping the comparables nonetheless, because the can
only have a positive effect.
We are now reading the CSV input without transforming
the data into strings. This reduces the amount of bytes
that have to be converted and copied.
We also made Tag smaller. It no longer stores pointers
to strings, instead it stored integers obtained by
compressing the strings (see StringCompressor). This
reduces memory usage and it speeds up hashcode and
equals, which speeds up access to the writer cache.
Performance gain is almost 100%:
- 330k entries/s -> 670k entries/s, top speed measured over a second
- 62s -> 32s, to ingest 16 million entries
UniqueStringIntegerPairs stores mappings of integers
0-n to strings and vice versa. Mapping integers to
strings does not need a TreeMap, it can be done with
a List.
Makes insertions 3 times (when using the in-memory
variant that does not write to disk) and 7 times faster
for int to string mapping.
Calling Instant.now() several hundred thousand times per
second can be expensive. In my measurements >10% of the
time spend when loading new data was spend calling
Instant.now().
Fixed this by storing an Instant as static member and
updating it periodically in a separate thread.
One bottleneck was the blocking queue used to transport entries
from the listener thread to the ingestor thread.
Reduced the bottleneck by batching entries.
Interestingly the batch size of 100 was better than batch size
of 1000 and better than 10.
In the last commit I added a lastAccessMap to the HotEntryCache.
This map made it much more efficient to evict entries. But it
also made and put and get operation much more expensive. Overall
that change lead to a 65% decrease in ingestion performance of
the PerformanceDB.
Fixed by removing the map again. Eviction has to look at all
elements again.
I was finally able to show that there is a tiny but measureable
effect of this buffer. I think it was not visible before,
because the parsing was too slow. But now, that I replaced the
date parser, the ingestion thread is twice as fast as the
insertion thread. Therefore the buffer makes more sense.
Compared to FastISODateParser.parse, which returns an
OffsetDateTime object, parseAsEpochMilli returns the
epoch time millis. The performance improvement for
date parsing alone is roughly 100% (8m dates/s to
18m dates/s).
Insertion speed improved from 13-14s for 1.6m entries
to 11.5-12.5s.
Instead of spawning a new thread for every cache, we use a single thread
that will evict entries from all caches.
The thread keeps a weak reference to the caches, so that they can be
garbage collected.
A specialized date parser that can only handle ISO-8601 like dates
(2011-12-03T10:15:30.123Z or 2011-12-03T10:15:30+01:00) but does this
roughly 10 times faster than DateTimeFormatter and 5 times
faster than the FastDateParser of commons-lang3.
diskBlock.force() makes insertion speed very slow, because it adds
two digit ms to tree changes. I disabled it for now. The tree is not
crash resistent anyway.
Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now
stored as longs (variable length encoded) in the PersistenMap.
Tags.filenameBytes was introduced to reduce memory consumption, when
all tags were hold in memory. Tags are now stored in a PersistentMap
and only read when needed.
Moved the VariableByteEncoder into its own project, because it was
needed by pdb-api.