Commit Graph

55 Commits

Author SHA1 Message Date
67c66ef89d add second parser that uses a standard CSV reader 2021-08-12 17:54:27 +02:00
85ed5f1ccb file drop support
- Add a folder where you can drop Zip files which will then be
  extracted on the fly and ingsted.
- CsvReaderSettings now contain TagMatcher that are applied to the
  first line and can be used to extract additional tags.
- Update to jdk 16 so that we can have records.
2021-08-01 09:31:40 +02:00
7adfc7029f Revert "introduce indexes"
This reverts commit 36ccc57db6.
2021-05-12 18:18:57 +02:00
75857b553e do tag to string conversion in StringCompressor instead of Tag 2021-05-09 10:44:24 +02:00
6dc335600e do string compression in StringCompressor instead of Tag 2021-05-09 10:37:35 +02:00
36ccc57db6 introduce indexes 2021-05-09 10:33:28 +02:00
cf7e5ec968 add bar charts 2020-01-19 10:35:07 +01:00
00ba4d2a69 add support for renaming and post processing of csv columns 2019-12-14 18:11:59 +01:00
4e554bfa85 specify additional tags for CSV upload
You can now specify additional tags to be added to all entries.
This makes it possible to remove columns that would be identical
for all entries.
2019-12-14 07:59:22 +01:00
5d8df6888d move Entry and Entries to data-store 2019-12-13 18:15:10 +01:00
550d7ba44e add flag to make CSV upload wait until entries are flushed
To make it easier/possible to write stable unit test the CSV upload
can optionally wait until all entries have been flushed to disk.
This is necessary for tests that ingest data and then read the data.
2019-12-13 18:05:20 +01:00
07ad62ddd9 use Junit5 instead of TestNG
We want to be able to use @SpringBootTest tests that fully initialize
the Spring application. This is much easier done with Junit than TestNG.
Gradle does not support (at least not easily) to run Junit and TestNG
tests. Therefore we switch to Junit with all tests.
The original reason for using TestNG was that Junit didn't support
data providers. But that finally changed in Junit5 with
ParameterizedTest.
2019-12-13 14:33:20 +01:00
e931856041 merge projects file-utils, byte-utils and pdb-utils
It turned out that most projects needed at least
two of the utils projects. file-utils and byte-utils
had only one class. Merging them made sense.
2019-12-08 18:47:54 +01:00
ffe5ae8652 add CsvReaderSettings
Preparation to add more complex CSV parsing rules.
2019-11-30 18:32:34 +01:00
06b379494f apply new code formatter and save action 2019-11-24 10:20:43 +01:00
4367323fcd replace deprecated dependency configurations
Using api and implementation instead of the
deprecated compile configuration.

Update to Gradle 6.0.
2019-11-10 11:08:50 +01:00
f8e859fb6d cleanup and javadoc 2019-08-31 16:52:13 +02:00
4161cd7f98 only field prefixes returned instead of full values
When using autocomplete to return field values I
missed, that autocomplete had the feature that cut
values at dots. So instead of returning full field
values only the prefix up to the first dot was
returned.
Fixed by making the cut-at-dot feature optional.
2019-08-27 20:37:07 +02:00
2a1885a77f cluster the indices 2019-03-31 09:01:55 +02:00
b5e2d0a217 introduce clustering for query completion indices 2019-03-16 10:19:28 +01:00
59aea1a15f introduce index clustering (part 1)
In order to prevent files from getting too big and
make it easier to implement retention policies, we
are splitting all files into chunks. Each chunk
contains the data for a time interval (1 month per
default).
This first changeset introduces the ClusteredPersistentMap
that implements this for PersistentMap. It is used
for a couple (not all) of indices.
2019-02-24 16:50:57 +01:00
493971bcf3 values used in queries were added to the keys.csv
Due to a mistake in Tag which added all strings used
by Tag into the String dictionary, the dictionary
did contain all values that were used in queries.
2019-02-09 08:28:23 +01:00
668d73c926 introduced a new custom file format used for backup and ingestion
The new file format reduces repetition, is easy to parse,
easy to generate in any language and is human readable.
2019-02-03 15:44:35 +01:00
76e5d441de rewrite query completion
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
2019-02-02 15:35:56 +01:00
72e9a9ebe3 prepare more efficient query completion
adding an index that answers the question
given a query "a=b and c=", what are possible values
for c.
2019-01-13 10:22:17 +01:00
f2d16b6758 make CacheKey comparable
The CacheKey is used as a key in a HashMap. Lookup can
be faster if the CacheKey is comparable when there are
hash collisions.
In this case I was not able to measure any effect. I am
keeping the comparables nonetheless, because the can
only have a positive effect.
2019-01-01 08:47:48 +01:00
4cde10a9f2 read csv using input stream instead of reader
We are now reading the CSV input without transforming
the data into strings. This reduces the amount of bytes
that have to be converted and copied.
We also made Tag smaller. It no longer stores pointers
to strings, instead it stored integers obtained by
compressing the strings (see StringCompressor). This
reduces memory usage and it speeds up hashcode and
equals, which speeds up access to the writer cache.

Performance gain is almost 100%:
- 330k entries/s -> 670k entries/s, top speed measured over a second
- 62s -> 32s, to ingest 16 million entries
2019-01-01 08:31:28 +01:00
0487c30582 use List instead of TreeMap for intToString mapping
UniqueStringIntegerPairs stores mappings of integers
0-n to strings and vice versa. Mapping integers to
strings does not need a TreeMap, it can be done with
a List.
Makes insertions 3 times (when using the in-memory
variant that does not write to disk) and 7 times faster
for int to string mapping.
2018-12-22 10:07:19 +01:00
d95a71e32e batch entries between TcpIngestor and PerformanceDB
One bottleneck was the blocking queue used to transport entries
from the listener thread to the ingestor thread.
Reduced the bottleneck by batching entries.
Interestingly the batch size of 100 was better than batch size
of 1000 and better than 10.
2018-12-21 13:11:35 +01:00
d52bfa0916 remove obsolete class RadixConverter 2018-12-17 19:11:33 +01:00
40f4506e13 use FastISODateParser.parseAsEpochMilli
Compared to FastISODateParser.parse, which returns an
OffsetDateTime object, parseAsEpochMilli returns the
epoch time millis. The performance improvement for
date parsing alone is roughly 100% (8m dates/s to
18m dates/s).
Insertion speed improved from 13-14s for 1.6m entries
to 11.5-12.5s.
2018-12-16 19:24:47 +01:00
20c555c30a update 3rd party libs and gradle 2018-12-07 14:06:59 +01:00
135ab42cd8 tags are now stored as variable length byte sequences of longs
Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now
stored as longs (variable length encoded) in the PersistenMap.
Tags.filenameBytes was introduced to reduce memory consumption, when
all tags were hold in memory. Tags are now stored in a PersistentMap
and only read when needed.

Moved the VariableByteEncoder into its own project, because it was
needed by pdb-api.
2018-11-17 20:03:46 +01:00
fce0f6a04d use PersistentMap in DataStore
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
2018-11-17 09:45:35 +01:00
0e5a47ac10 make sure serialized tags are always sorted the same way 2018-10-03 16:50:09 +02:00
d799682b4d Fix build issue with Java 11.
For some reason the Gradle build with Java 11 failed
because of an inner class. After extracting it the build
no longer fails.
2018-09-29 19:50:05 +02:00
24fcfd7763 prepare the addition of a date index 2018-09-28 19:07:01 +02:00
2e433ba969 cleanup 2018-09-13 07:52:14 +02:00
1182d76205 replace the FolderStorage with DiskStorage
- The DiskStorage uses only one file instead of millions.
  Also the block size is only 512 byte instead of 4kb, which
  helps to reduce the memory usage for short sequences.
- Update primitiveCollections to get the new LongList.range
  and LongList.rangeClosed methods.
- BSFile now stores Time&Value sequences and knows how to
  encode the time values with delta encoding.
- Doc had to do some magic tricks to save memory. The path
  was initialized lazy and stored as byte array. This is no
  longer necessary. The patch was replaced by the
  rootBlockNumber of the BSFile.
- Had to temporarily disable the 'in' queries.
- The stored values are now processed as stream of LongLists
  instead of Entry. The overhead for creating Entries is
  gone, so is the memory overhead, because Entry was an
  object and had a reference to the tags, which is
  unnecessary.
2018-09-12 09:35:07 +02:00
bb8dbad393 different tags could be written to the same file
There was a missing synchronization in the code that maps
Strings to Integers.
2018-07-28 08:37:30 +02:00
9f37243ba3 Reduce memory consumption of Tags by 50%
by storing only the bytes instead of the string.
2018-03-28 19:08:53 +02:00
81711d551f fix performance regression
The last improvement of memory usage introduced a performance
regression. The ingestion performance dropped by 50%-80%, because
for every inserted entry the Tags were created inefficient.
2018-03-27 19:30:18 +02:00
c581e352e4 add method that returns a string representation of the tags in Tags 2018-03-19 19:29:22 +01:00
5343c0d427 reduce memory usage
Reduce memory usage by storing the filename as string instead of
individual tags.
2018-03-19 19:21:57 +01:00
f2868fcc1b reduce memory footprint: old generation by 100 MB
This reduces the size of the old generation by 100MB (300MB down to
200MB). Unfortunately the total JVM size didn't change and is still
512MB.

Doc stores the path as byte array instead of Path.
2017-11-18 10:39:01 +01:00
ahr
64db4c48a2 add plots for percentiles 2017-11-06 16:57:22 +01:00
d4fd25dc4c replace LinkedHashMap with a more memory efficient implementation
This saves approximately 50MB of heap space.
2017-09-30 17:51:02 +02:00
7e00594382 add helper class that returns the size of objects 2017-09-30 17:49:21 +02:00
8baf05962f group by multiple fields
Before we could only group by a single field. But it is acutally
very useful to group by multiple fields. For example to see the
graph for a small set of methods grouped by host and project.
2017-04-12 19:16:19 +02:00
ee15594070 remove TODOs
They don't make sense anymore.
E.g. the Tags class is used by classes outside of
org.lucares.performance.db.
2017-04-11 18:09:29 +02:00