Commit Graph

108 Commits

Author SHA1 Message Date
5d2fdb4820 create Tags in StringCompressor instead of use StringCompressor in Tags 2021-08-25 20:00:31 +02:00
ceb8d5a7c8 validate that tokens are known 2021-08-14 12:40:21 +02:00
67c66ef89d add second parser that uses a standard CSV reader 2021-08-12 17:54:27 +02:00
85ed5f1ccb file drop support
- Add a folder where you can drop Zip files which will then be
  extracted on the fly and ingsted.
- CsvReaderSettings now contain TagMatcher that are applied to the
  first line and can be used to extract additional tags.
- Update to jdk 16 so that we can have records.
2021-08-01 09:31:40 +02:00
bc520f97ed move DateIndexExtension to DataStore 2021-05-12 19:00:25 +02:00
7adfc7029f Revert "introduce indexes"
This reverts commit 36ccc57db6.
2021-05-12 18:18:57 +02:00
75857b553e do tag to string conversion in StringCompressor instead of Tag 2021-05-09 10:44:24 +02:00
6dc335600e do string compression in StringCompressor instead of Tag 2021-05-09 10:37:35 +02:00
36ccc57db6 introduce indexes 2021-05-09 10:33:28 +02:00
6dc0e3c250 performance improvement for queries with wildcards
Computing the union of many LongLists was inefficient, because we were
using a trivial algorithm. I replaced the algorithm with a multi way
merge. The old algorithm had a runtime of O(n!*m) where n is the number
of lists and m the length or the longest list. The new algorithm has a
runtime of O(log(n) * n*m).
2020-11-15 13:02:15 +01:00
46070a31b9 add block size to the header of a PersistentMap and optimize storage
usage for monotonically incrementing keys.
2020-10-17 10:13:46 +02:00
10155f9cdb use special enum for DateBucket units
Preparation step for having custom intervals.
2020-09-27 17:06:27 +02:00
6bb6cdaea7 count disk reads 2020-09-20 19:51:47 +02:00
50f555d23c add interval splitting for bar charts 2020-04-05 08:14:09 +02:00
75391f21ff extract code from DateIndexExtension to LongToDateBucket
Making it possible to reuse the code to sort timestamps
into date based buckets.
2020-04-03 19:46:08 +02:00
9a311313ec use US locale to format strings
This is especially important for all strings that are
passed to gnuplot. Because gnuplot uses the US locale
during parsing.
2020-03-12 19:40:20 +01:00
ffc3832bfa fix: events are added to wrong partition
The writerCache in DataStore did not use the partitionId
in its cache key. Therefore the cache could return the
wrong writer and events were written to the wrong
partition.
Fixed by changing the cache key.
2019-12-23 18:42:54 +01:00
5d8df6888d move Entry and Entries to data-store 2019-12-13 18:15:10 +01:00
07ad62ddd9 use Junit5 instead of TestNG
We want to be able to use @SpringBootTest tests that fully initialize
the Spring application. This is much easier done with Junit than TestNG.
Gradle does not support (at least not easily) to run Junit and TestNG
tests. Therefore we switch to Junit with all tests.
The original reason for using TestNG was that Junit didn't support
data providers. But that finally changed in Junit5 with
ParameterizedTest.
2019-12-13 14:33:20 +01:00
06b379494f apply new code formatter and save action 2019-11-24 10:20:43 +01:00
7636781315 fix StringIndexOutOfBounds when caret is in position 0 2019-10-26 10:30:02 +02:00
8579974051 performance improvement
Queries like "firstname=John and lastname=???" were slightly
inefficient.
They fetched all firstnames, filtered to those that matched the prefix
(e.g. John or Jonathan is this example) and then iterated over all those
values and return the lastnames.
Fixed by having two implementations. One for the case that only a few
of the values in fieldA match and one for the case that many match.
2019-08-31 19:30:54 +02:00
d8a114dbaf handle globlike patterns in in-expressions 2019-08-31 17:34:17 +02:00
f8e859fb6d cleanup and javadoc 2019-08-31 16:52:13 +02:00
0eee012798 allow 'not' for negation in addition to '!' 2019-08-31 08:30:13 +02:00
4161cd7f98 only field prefixes returned instead of full values
When using autocomplete to return field values I
missed, that autocomplete had the feature that cut
values at dots. So instead of returning full field
values only the prefix up to the first dot was
returned.
Fixed by making the cut-at-dot feature optional.
2019-08-27 20:37:07 +02:00
2f35978184 fetch available values for gallery via autocomplete method
We had a method that returned the values of a field
with respect to a query. That method was inefficient,
because it executed the query, fetched all Docs
and collected the values.
The autocomplete method we introduced a while back
can answer the same question but much more efficiently.
2019-08-25 18:52:05 +02:00
6eaf4e10fc add maxSize parameter to HotEntryCache 2019-08-24 19:24:20 +02:00
feda901f6d remove event types
We only have removal events. The additional complexity
of having a generic interface for many different event
types does not pay off.
2019-08-18 20:30:25 +02:00
4d9ea6d2a8 switch back to my own HotEntryCache implementation
Guava's cache does not evict elements reliably by
time. Configure a cache to have a lifetime of n
seconds, then you cannot expect that an element is
actually evicted after n seconds with Guava.
2019-08-18 20:14:14 +02:00
3252fcf42d improve trace logging
- Add filename for trace logs for read/write operations.
2019-08-18 09:25:49 +02:00
0b3eb97b96 Fix to string for maps with values of type Empty
The MAX_KEY inserted into the tree had a value of one byte. This
triggered an assertion for maps with values of type Empty, because they
expected values to be empty.
Fixed by using an empty array for the value of the MAX_KEY.
2019-08-12 08:35:40 +02:00
9fb1a136c8 cache last used date prefix
The 99.9999% use case is to ingest data
from the same month.
2019-04-22 09:51:44 +02:00
56085061ed do not return anything if the field/value does not exist
The computation of proposals is done by searching for values in a
combined index. If one of the values didn't exist, then the algorithm
returned all values. Fixed by checking that we query only existing
field/values from the combined index.
2019-04-20 19:48:51 +02:00
dbe0e02517 rename cluster to partition
We are not clustering the indices, we
are partitioning them.
2019-04-14 10:10:16 +02:00
2a1885a77f cluster the indices 2019-03-31 09:01:55 +02:00
95f2f26966 handle IOExceptions earlier 2019-03-17 11:13:46 +01:00
5d0ceb112e add clustering for DiskStore 2019-03-17 10:53:02 +01:00
b5e2d0a217 introduce clustering for query completion indices 2019-03-16 10:19:28 +01:00
fb9f8592ac make ClusteredPersistentMap easier to use 2019-02-24 19:20:44 +01:00
59aea1a15f introduce index clustering (part 1)
In order to prevent files from getting too big and
make it easier to implement retention policies, we
are splitting all files into chunks. Each chunk
contains the data for a time interval (1 month per
default).
This first changeset introduces the ClusteredPersistentMap
that implements this for PersistentMap. It is used
for a couple (not all) of indices.
2019-02-24 16:50:57 +01:00
372a073b6d PdbWriter is no longer in the API of DataStore 2019-02-16 16:24:14 +01:00
92a47d9b56 remove TagsToFile
Remove one layer of abstraction by moving the code into the DataStore.
2019-02-16 16:06:46 +01:00
117ef4ea34 use guava's cache as implementation for the HotEntryCache
My own implementation was faster, but was not able to
implement a size limitation.
2019-02-16 10:23:52 +01:00
7b00eede86 refactoring: extract EncoderDecoders from DataStore 2019-02-16 09:16:15 +01:00
cbcb7714bb split BSFile into a TimeSeries and a LongStream file
BSFile was used to store two types of data. This makes
the API complex. I split the API into two files with
easier and more clear APIs. Interestingly the API of
BSFile is still rather complex and has to consider both
use cases.
2019-02-10 09:59:16 +01:00
27b83234cc group proposal as if they were hierarchical
We interpret dots ('.') as hierarchy delimiter in.
That way we can reduce the number of proposed values
and show only those for the next level.
2019-02-09 15:21:35 +01:00
493971bcf3 values used in queries were added to the keys.csv
Due to a mistake in Tag which added all strings used
by Tag into the String dictionary, the dictionary
did contain all values that were used in queries.
2019-02-09 08:28:23 +01:00
ea5884a5e6 move creation of PdbWriter to the DataStore 2019-02-07 18:06:41 +01:00
99cdf557b3 add metric logger for query completion evaluation 2019-02-06 15:51:41 +00:00