Date increments have usually higher values.
I had hoped to reduce the file size by a lot. But in my example data
with 44 million entries (real life data) it only reduced the storage
size by 1.5%.
Also fixed a bug in PdbReader that prevented other values for the
CONTINUATION byte.
Also added a small testing tool that prints the content of a pdb file.
It is not (yet) made available as standalone tool, but during
debugging sessions it is very useful.
Before we could only group by a single field. But it is acutally
very useful to group by multiple fields. For example to see the
graph for a small set of methods grouped by host and project.
If for 10 seconds no new entry is received, then all open
files are flushed and closed.
We do this to make sure, that we do not loose data, when
we kill the process.
There is still a risk of data loss if we kill the process
while entries are received.
Store values in sequences of variable length. Instead of using 8 bytes
per entry we are now using between 2 and 20 bytes. But we are also able
to store every non-negative long value.
- the tags come first, then the date,
e.g. "mykey=myvalue_<uuid>/2016/01/01/<uuid>"
- We do this, so that we don't have to tag each file,
but only the root folder. This should speed up searches
also read value with MappingIterator.
This made reading 20-30 times faster.
We can now read and index 100k-500k per second.
The varianz might be due to LuDB slowness.