Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now
stored as longs (variable length encoded) in the PersistenMap.
Tags.filenameBytes was introduced to reduce memory consumption, when
all tags were hold in memory. Tags are now stored in a PersistentMap
and only read when needed.
Moved the VariableByteEncoder into its own project, because it was
needed by pdb-api.
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
- The DiskStorage uses only one file instead of millions.
Also the block size is only 512 byte instead of 4kb, which
helps to reduce the memory usage for short sequences.
- Update primitiveCollections to get the new LongList.range
and LongList.rangeClosed methods.
- BSFile now stores Time&Value sequences and knows how to
encode the time values with delta encoding.
- Doc had to do some magic tricks to save memory. The path
was initialized lazy and stored as byte array. This is no
longer necessary. The patch was replaced by the
rootBlockNumber of the BSFile.
- Had to temporarily disable the 'in' queries.
- The stored values are now processed as stream of LongLists
instead of Entry. The overhead for creating Entries is
gone, so is the memory overhead, because Entry was an
object and had a reference to the tags, which is
unnecessary.
The last improvement of memory usage introduced a performance
regression. The ingestion performance dropped by 50%-80%, because
for every inserted entry the Tags were created inefficient.
Before we could only group by a single field. But it is acutally
very useful to group by multiple fields. For example to see the
graph for a small set of methods grouped by host and project.
Store values in sequences of variable length. Instead of using 8 bytes
per entry we are now using between 2 and 20 bytes. But we are also able
to store every non-negative long value.