Commit Graph

784 Commits

Author SHA1 Message Date
1f144846db update 3rd party libs and gradle 2019-04-28 14:40:43 +02:00
9fb1a136c8 cache last used date prefix
The 99.9999% use case is to ingest data
from the same month.
2019-04-22 09:51:44 +02:00
dfe9579726 use DateTimeRange.max() instead of arbitrary relative range 2019-04-20 20:36:26 +02:00
d82b33c60e update js libraries 2019-04-20 20:31:51 +02:00
7277670b8b update 3rd party libs 2019-04-20 20:19:24 +02:00
9525ee22a0 add access restrictions for a few unwelcome classes 2019-04-20 20:12:45 +02:00
56085061ed do not return anything if the field/value does not exist
The computation of proposals is done by searching for values in a
combined index. If one of the values didn't exist, then the algorithm
returned all values. Fixed by checking that we query only existing
field/values from the combined index.
2019-04-20 19:48:51 +02:00
dbe0e02517 rename cluster to partition
We are not clustering the indices, we
are partitioning them.
2019-04-14 10:10:16 +02:00
2a1885a77f cluster the indices 2019-03-31 09:01:55 +02:00
95f2f26966 handle IOExceptions earlier 2019-03-17 11:13:46 +01:00
5d0ceb112e add clustering for DiskStore 2019-03-17 10:53:02 +01:00
b5e2d0a217 introduce clustering for query completion indices 2019-03-16 10:19:28 +01:00
fb9f8592ac make ClusteredPersistentMap easier to use 2019-02-24 19:20:44 +01:00
59aea1a15f introduce index clustering (part 1)
In order to prevent files from getting too big and
make it easier to implement retention policies, we
are splitting all files into chunks. Each chunk
contains the data for a time interval (1 month per
default).
This first changeset introduces the ClusteredPersistentMap
that implements this for PersistentMap. It is used
for a couple (not all) of indices.
2019-02-24 16:50:57 +01:00
372a073b6d PdbWriter is no longer in the API of DataStore 2019-02-16 16:24:14 +01:00
92a47d9b56 remove TagsToFile
Remove one layer of abstraction by moving the code into the DataStore.
2019-02-16 16:06:46 +01:00
117ef4ea34 use guava's cache as implementation for the HotEntryCache
My own implementation was faster, but was not able to
implement a size limitation.
2019-02-16 10:23:52 +01:00
7b00eede86 refactoring: extract EncoderDecoders from DataStore 2019-02-16 09:16:15 +01:00
cbcb7714bb split BSFile into a TimeSeries and a LongStream file
BSFile was used to store two types of data. This makes
the API complex. I split the API into two files with
easier and more clear APIs. Interestingly the API of
BSFile is still rather complex and has to consider both
use cases.
2019-02-10 09:59:16 +01:00
fd55ea0866 update vuejs to 2.6.4
Added the version to moment.min.js
2019-02-09 15:39:28 +01:00
93dea402a5 remove obsolete class 2019-02-09 15:25:39 +01:00
27b83234cc group proposal as if they were hierarchical
We interpret dots ('.') as hierarchy delimiter in.
That way we can reduce the number of proposed values
and show only those for the next level.
2019-02-09 15:21:35 +01:00
493971bcf3 values used in queries were added to the keys.csv
Due to a mistake in Tag which added all strings used
by Tag into the String dictionary, the dictionary
did contain all values that were used in queries.
2019-02-09 08:28:23 +01:00
ea5884a5e6 move creation of PdbWriter to the DataStore 2019-02-07 18:06:41 +01:00
58bfba23bb reset lastEpochMilli when opening a new export file 2019-02-06 15:52:37 +00:00
99cdf557b3 add metric logger for query completion evaluation 2019-02-06 15:51:41 +00:00
668d73c926 introduced a new custom file format used for backup and ingestion
The new file format reduces repetition, is easy to parse,
easy to generate in any language and is human readable.
2019-02-03 15:44:35 +01:00
1d8ca0e21c fetch org.lucares artifacts only from repo.lucares.de 2019-02-02 17:51:20 +01:00
c0fffbf676 update third party libs
gradle to 5.1.1
spring-boot to 2.1.2.RELEASE
antlr to 4.7.2
jackson to 2.9.8
2019-02-02 17:33:21 +01:00
2e48061793 add LRU cache to PersistentMap
This should speed up fetching and inserting of values
that are used often.
2019-02-02 17:26:25 +01:00
d4d1685f9f replace stdout with logger 2019-02-02 16:49:21 +01:00
151e9363e1 remove obsolete classes 2019-02-02 16:45:34 +01:00
76e5d441de rewrite query completion
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
2019-02-02 15:35:56 +01:00
72e9a9ebe3 prepare more efficient query completion
adding an index that answers the question
given a query "a=b and c=", what are possible values
for c.
2019-01-13 10:22:17 +01:00
5197063ae3 the union of many small lists is expensive
The reason seems to be the number of memory allocations. In order
to create the union of 100 lists we have 99 memory allocations.
The first needs the space for the first two lists, the second the
space for the first three lists, and so on.

We can reduce the number of allocations drastically (in many
cases to one) by leveraging the fact that many of the lists
were already sorted, non-overlapping and increasing, so that
we can simply concatenate them.
2019-01-05 08:52:56 +01:00
3dca7483de utility that generates a csv with many different tags 2019-01-05 08:33:57 +01:00
f2d16b6758 make CacheKey comparable
The CacheKey is used as a key in a HashMap. Lookup can
be faster if the CacheKey is comparable when there are
hash collisions.
In this case I was not able to measure any effect. I am
keeping the comparables nonetheless, because the can
only have a positive effect.
2019-01-01 08:47:48 +01:00
4cde10a9f2 read csv using input stream instead of reader
We are now reading the CSV input without transforming
the data into strings. This reduces the amount of bytes
that have to be converted and copied.
We also made Tag smaller. It no longer stores pointers
to strings, instead it stored integers obtained by
compressing the strings (see StringCompressor). This
reduces memory usage and it speeds up hashcode and
equals, which speeds up access to the writer cache.

Performance gain is almost 100%:
- 330k entries/s -> 670k entries/s, top speed measured over a second
- 62s -> 32s, to ingest 16 million entries
2019-01-01 08:31:28 +01:00
0487c30582 use List instead of TreeMap for intToString mapping
UniqueStringIntegerPairs stores mappings of integers
0-n to strings and vice versa. Mapping integers to
strings does not need a TreeMap, it can be done with
a List.
Makes insertions 3 times (when using the in-memory
variant that does not write to disk) and 7 times faster
for int to string mapping.
2018-12-22 10:07:19 +01:00
e537e94d39 HotEntryCache will update Instants only once per second
Calling Instant.now() several hundred thousand times per
second can be expensive. In my measurements >10% of the
time spend when loading new data was spend calling
Instant.now().
Fixed this by storing an Instant as static member and
updating it periodically in a separate thread.
2018-12-21 19:16:55 +01:00
d95a71e32e batch entries between TcpIngestor and PerformanceDB
One bottleneck was the blocking queue used to transport entries
from the listener thread to the ingestor thread.
Reduced the bottleneck by batching entries.
Interestingly the batch size of 100 was better than batch size
of 1000 and better than 10.
2018-12-21 13:11:35 +01:00
73ad27ab96 remove lastAccessMap
In the last commit I added a lastAccessMap to the HotEntryCache.
This map made it much more efficient to evict entries. But it
also made and put and get operation much more expensive. Overall
that change lead to a 65% decrease in ingestion performance of
the PerformanceDB.
Fixed by removing the map again. Eviction has to look at all
elements again.
2018-12-21 10:28:34 +01:00
afba3b6f77 elements not evicted if new elements are added 2018-12-20 16:13:55 +01:00
d52bfa0916 remove obsolete class RadixConverter 2018-12-17 19:11:33 +01:00
3a4101bbf9 increase the buffer between ingestion and insertion thread
I was finally able to show that there is a tiny but measureable
effect of this buffer. I think it was not visible before,
because the parsing was too slow. But now, that I replaced the
date parser, the ingestion thread is twice as fast as the
insertion thread. Therefore the buffer makes more sense.
2018-12-17 19:07:55 +01:00
d37508b7a1 Pattern.split is faster than StringUtils.splitPreserveAll
Document the fact, so that I do not have to repeat the same
test a third time.
2018-12-17 19:05:34 +01:00
40f4506e13 use FastISODateParser.parseAsEpochMilli
Compared to FastISODateParser.parse, which returns an
OffsetDateTime object, parseAsEpochMilli returns the
epoch time millis. The performance improvement for
date parsing alone is roughly 100% (8m dates/s to
18m dates/s).
Insertion speed improved from 13-14s for 1.6m entries
to 11.5-12.5s.
2018-12-16 19:24:47 +01:00
23f800a441 add date parsing method that returns epochMillis instead of date object 2018-12-16 15:38:26 +01:00
20c555c30a update 3rd party libs and gradle 2018-12-07 14:06:59 +01:00
253bbabd19 cleanup
remove debug output
2018-11-25 07:49:23 +00:00