Commit Graph

69 Commits

Author SHA1 Message Date
bff446f0e3 rename getKey to getString 2021-10-16 19:59:30 +02:00
1461d32691 rename get to getInt 2021-10-16 19:59:10 +02:00
2ea11e6adb rename get to getString 2021-10-16 19:57:34 +02:00
491e88463a reduce visibility for getHighestInteger(), only needed for tests 2021-10-16 18:57:23 +02:00
d63a203619 use easier api 2021-10-16 18:56:51 +02:00
fe26068400 remove unnecessary method 2021-10-16 18:55:27 +02:00
7d3ae61656 rename put to putString to make it easier for ides to find references 2021-10-16 18:45:55 +02:00
d7cd4a94a5 rename put method to help IDEs find references 2021-10-16 18:33:20 +02:00
7754e54037 replace keys.csv with persistent map 2021-10-16 17:28:15 +02:00
3d72231415 move filename for keys.csv to UniqueStringIntegerPairs 2021-10-16 15:33:04 +02:00
3e1002a99d remove static string compressor from QueryCompletionIndex 2021-09-18 19:24:30 +02:00
edaba72c33 remove method to create a query based on tags
was only used in tests
2021-09-18 18:48:33 +02:00
6a3d711838 move StringCompressor initialization to StringCompressor 2021-09-18 15:39:54 +02:00
6cd6f578e7 move Tags.asString() to StringCompressor 2021-09-18 10:18:42 +02:00
54acb38e5e move Tags.asValueString to StringCompressor 2021-09-18 10:14:52 +02:00
5d2fdb4820 create Tags in StringCompressor instead of use StringCompressor in Tags 2021-08-25 20:00:31 +02:00
b01567c82e move usages of StringCompressor to outside of TagsBuilder 2021-08-25 19:45:49 +02:00
67c66ef89d add second parser that uses a standard CSV reader 2021-08-12 17:54:27 +02:00
85ed5f1ccb file drop support
- Add a folder where you can drop Zip files which will then be
  extracted on the fly and ingsted.
- CsvReaderSettings now contain TagMatcher that are applied to the
  first line and can be used to extract additional tags.
- Update to jdk 16 so that we can have records.
2021-08-01 09:31:40 +02:00
7adfc7029f Revert "introduce indexes"
This reverts commit 36ccc57db6.
2021-05-12 18:18:57 +02:00
75857b553e do tag to string conversion in StringCompressor instead of Tag 2021-05-09 10:44:24 +02:00
6dc335600e do string compression in StringCompressor instead of Tag 2021-05-09 10:37:35 +02:00
36ccc57db6 introduce indexes 2021-05-09 10:33:28 +02:00
cf7e5ec968 add bar charts 2020-01-19 10:35:07 +01:00
00ba4d2a69 add support for renaming and post processing of csv columns 2019-12-14 18:11:59 +01:00
4e554bfa85 specify additional tags for CSV upload
You can now specify additional tags to be added to all entries.
This makes it possible to remove columns that would be identical
for all entries.
2019-12-14 07:59:22 +01:00
5d8df6888d move Entry and Entries to data-store 2019-12-13 18:15:10 +01:00
550d7ba44e add flag to make CSV upload wait until entries are flushed
To make it easier/possible to write stable unit test the CSV upload
can optionally wait until all entries have been flushed to disk.
This is necessary for tests that ingest data and then read the data.
2019-12-13 18:05:20 +01:00
07ad62ddd9 use Junit5 instead of TestNG
We want to be able to use @SpringBootTest tests that fully initialize
the Spring application. This is much easier done with Junit than TestNG.
Gradle does not support (at least not easily) to run Junit and TestNG
tests. Therefore we switch to Junit with all tests.
The original reason for using TestNG was that Junit didn't support
data providers. But that finally changed in Junit5 with
ParameterizedTest.
2019-12-13 14:33:20 +01:00
ffe5ae8652 add CsvReaderSettings
Preparation to add more complex CSV parsing rules.
2019-11-30 18:32:34 +01:00
06b379494f apply new code formatter and save action 2019-11-24 10:20:43 +01:00
f8e859fb6d cleanup and javadoc 2019-08-31 16:52:13 +02:00
4161cd7f98 only field prefixes returned instead of full values
When using autocomplete to return field values I
missed, that autocomplete had the feature that cut
values at dots. So instead of returning full field
values only the prefix up to the first dot was
returned.
Fixed by making the cut-at-dot feature optional.
2019-08-27 20:37:07 +02:00
2a1885a77f cluster the indices 2019-03-31 09:01:55 +02:00
b5e2d0a217 introduce clustering for query completion indices 2019-03-16 10:19:28 +01:00
59aea1a15f introduce index clustering (part 1)
In order to prevent files from getting too big and
make it easier to implement retention policies, we
are splitting all files into chunks. Each chunk
contains the data for a time interval (1 month per
default).
This first changeset introduces the ClusteredPersistentMap
that implements this for PersistentMap. It is used
for a couple (not all) of indices.
2019-02-24 16:50:57 +01:00
493971bcf3 values used in queries were added to the keys.csv
Due to a mistake in Tag which added all strings used
by Tag into the String dictionary, the dictionary
did contain all values that were used in queries.
2019-02-09 08:28:23 +01:00
668d73c926 introduced a new custom file format used for backup and ingestion
The new file format reduces repetition, is easy to parse,
easy to generate in any language and is human readable.
2019-02-03 15:44:35 +01:00
76e5d441de rewrite query completion
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
2019-02-02 15:35:56 +01:00
72e9a9ebe3 prepare more efficient query completion
adding an index that answers the question
given a query "a=b and c=", what are possible values
for c.
2019-01-13 10:22:17 +01:00
f2d16b6758 make CacheKey comparable
The CacheKey is used as a key in a HashMap. Lookup can
be faster if the CacheKey is comparable when there are
hash collisions.
In this case I was not able to measure any effect. I am
keeping the comparables nonetheless, because the can
only have a positive effect.
2019-01-01 08:47:48 +01:00
4cde10a9f2 read csv using input stream instead of reader
We are now reading the CSV input without transforming
the data into strings. This reduces the amount of bytes
that have to be converted and copied.
We also made Tag smaller. It no longer stores pointers
to strings, instead it stored integers obtained by
compressing the strings (see StringCompressor). This
reduces memory usage and it speeds up hashcode and
equals, which speeds up access to the writer cache.

Performance gain is almost 100%:
- 330k entries/s -> 670k entries/s, top speed measured over a second
- 62s -> 32s, to ingest 16 million entries
2019-01-01 08:31:28 +01:00
0487c30582 use List instead of TreeMap for intToString mapping
UniqueStringIntegerPairs stores mappings of integers
0-n to strings and vice versa. Mapping integers to
strings does not need a TreeMap, it can be done with
a List.
Makes insertions 3 times (when using the in-memory
variant that does not write to disk) and 7 times faster
for int to string mapping.
2018-12-22 10:07:19 +01:00
d95a71e32e batch entries between TcpIngestor and PerformanceDB
One bottleneck was the blocking queue used to transport entries
from the listener thread to the ingestor thread.
Reduced the bottleneck by batching entries.
Interestingly the batch size of 100 was better than batch size
of 1000 and better than 10.
2018-12-21 13:11:35 +01:00
d52bfa0916 remove obsolete class RadixConverter 2018-12-17 19:11:33 +01:00
40f4506e13 use FastISODateParser.parseAsEpochMilli
Compared to FastISODateParser.parse, which returns an
OffsetDateTime object, parseAsEpochMilli returns the
epoch time millis. The performance improvement for
date parsing alone is roughly 100% (8m dates/s to
18m dates/s).
Insertion speed improved from 13-14s for 1.6m entries
to 11.5-12.5s.
2018-12-16 19:24:47 +01:00
135ab42cd8 tags are now stored as variable length byte sequences of longs
Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now
stored as longs (variable length encoded) in the PersistenMap.
Tags.filenameBytes was introduced to reduce memory consumption, when
all tags were hold in memory. Tags are now stored in a PersistentMap
and only read when needed.

Moved the VariableByteEncoder into its own project, because it was
needed by pdb-api.
2018-11-17 20:03:46 +01:00
fce0f6a04d use PersistentMap in DataStore
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
2018-11-17 09:45:35 +01:00
0e5a47ac10 make sure serialized tags are always sorted the same way 2018-10-03 16:50:09 +02:00
d799682b4d Fix build issue with Java 11.
For some reason the Gradle build with Java 11 failed
because of an inner class. After extracting it the build
no longer fails.
2018-09-29 19:50:05 +02:00