The MAX_KEY inserted into the tree had a value of one byte. This
triggered an assertion for maps with values of type Empty, because they
expected values to be empty.
Fixed by using an empty array for the value of the MAX_KEY.
BSFile was used to store two types of data. This makes
the API complex. I split the API into two files with
easier and more clear APIs. Interestingly the API of
BSFile is still rather complex and has to consider both
use cases.
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
A specialized date parser that can only handle ISO-8601 like dates
(2011-12-03T10:15:30.123Z or 2011-12-03T10:15:30+01:00) but does this
roughly 10 times faster than DateTimeFormatter and 5 times
faster than the FastDateParser of commons-lang3.
diskBlock.force() makes insertion speed very slow, because it adds
two digit ms to tree changes. I disabled it for now. The tree is not
crash resistent anyway.
Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now
stored as longs (variable length encoded) in the PersistenMap.
Tags.filenameBytes was introduced to reduce memory consumption, when
all tags were hold in memory. Tags are now stored in a PersistentMap
and only read when needed.
Moved the VariableByteEncoder into its own project, because it was
needed by pdb-api.
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
Before the offset of the root node was hard-coded.
Now the offset of the pointer to the root node is hard-coded.
That allows us to replace the root node.
We now support negative values. This will allow us to
store time/value sequences that are not monotonically
increasing, so that we do not have to create multiple
files just because some values were send out of order.
This is done by first transforming the values into
positive values by using interleaved encoding (there
is a name for it, but I don't remember it). We are
mapping values like this:
0 -> 1
1 -> 2
-1 -> 3
2 -> 4
-2 -> 5
...
Renamed LongSequenceEncoderDecoder to VariableByteEncoder.
Made methods static.
- The DiskStorage uses only one file instead of millions.
Also the block size is only 512 byte instead of 4kb, which
helps to reduce the memory usage for short sequences.
- Update primitiveCollections to get the new LongList.range
and LongList.rangeClosed methods.
- BSFile now stores Time&Value sequences and knows how to
encode the time values with delta encoding.
- Doc had to do some magic tricks to save memory. The path
was initialized lazy and stored as byte array. This is no
longer necessary. The patch was replaced by the
rootBlockNumber of the BSFile.
- Had to temporarily disable the 'in' queries.
- The stored values are now processed as stream of LongLists
instead of Entry. The overhead for creating Entries is
gone, so is the memory overhead, because Entry was an
object and had a reference to the tags, which is
unnecessary.
It can store multiple streams of integers in a single
file. It uses blocks of 512 byte, which is only 1/8th
of the block size the file based data-store uses. This
reduces the overhead and waste of memory for short
integer streams significantly. Storing data in one big
file, instead of many small files, makes backups much
more efficient.