Commit Graph

33 Commits

Author SHA1 Message Date
f8e859fb6d cleanup and javadoc 2019-08-31 16:52:13 +02:00
4161cd7f98 only field prefixes returned instead of full values
When using autocomplete to return field values I
missed, that autocomplete had the feature that cut
values at dots. So instead of returning full field
values only the prefix up to the first dot was
returned.
Fixed by making the cut-at-dot feature optional.
2019-08-27 20:37:07 +02:00
2a1885a77f cluster the indices 2019-03-31 09:01:55 +02:00
b5e2d0a217 introduce clustering for query completion indices 2019-03-16 10:19:28 +01:00
59aea1a15f introduce index clustering (part 1)
In order to prevent files from getting too big and
make it easier to implement retention policies, we
are splitting all files into chunks. Each chunk
contains the data for a time interval (1 month per
default).
This first changeset introduces the ClusteredPersistentMap
that implements this for PersistentMap. It is used
for a couple (not all) of indices.
2019-02-24 16:50:57 +01:00
493971bcf3 values used in queries were added to the keys.csv
Due to a mistake in Tag which added all strings used
by Tag into the String dictionary, the dictionary
did contain all values that were used in queries.
2019-02-09 08:28:23 +01:00
668d73c926 introduced a new custom file format used for backup and ingestion
The new file format reduces repetition, is easy to parse,
easy to generate in any language and is human readable.
2019-02-03 15:44:35 +01:00
76e5d441de rewrite query completion
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
2019-02-02 15:35:56 +01:00
72e9a9ebe3 prepare more efficient query completion
adding an index that answers the question
given a query "a=b and c=", what are possible values
for c.
2019-01-13 10:22:17 +01:00
f2d16b6758 make CacheKey comparable
The CacheKey is used as a key in a HashMap. Lookup can
be faster if the CacheKey is comparable when there are
hash collisions.
In this case I was not able to measure any effect. I am
keeping the comparables nonetheless, because the can
only have a positive effect.
2019-01-01 08:47:48 +01:00
4cde10a9f2 read csv using input stream instead of reader
We are now reading the CSV input without transforming
the data into strings. This reduces the amount of bytes
that have to be converted and copied.
We also made Tag smaller. It no longer stores pointers
to strings, instead it stored integers obtained by
compressing the strings (see StringCompressor). This
reduces memory usage and it speeds up hashcode and
equals, which speeds up access to the writer cache.

Performance gain is almost 100%:
- 330k entries/s -> 670k entries/s, top speed measured over a second
- 62s -> 32s, to ingest 16 million entries
2019-01-01 08:31:28 +01:00
0487c30582 use List instead of TreeMap for intToString mapping
UniqueStringIntegerPairs stores mappings of integers
0-n to strings and vice versa. Mapping integers to
strings does not need a TreeMap, it can be done with
a List.
Makes insertions 3 times (when using the in-memory
variant that does not write to disk) and 7 times faster
for int to string mapping.
2018-12-22 10:07:19 +01:00
d95a71e32e batch entries between TcpIngestor and PerformanceDB
One bottleneck was the blocking queue used to transport entries
from the listener thread to the ingestor thread.
Reduced the bottleneck by batching entries.
Interestingly the batch size of 100 was better than batch size
of 1000 and better than 10.
2018-12-21 13:11:35 +01:00
d52bfa0916 remove obsolete class RadixConverter 2018-12-17 19:11:33 +01:00
40f4506e13 use FastISODateParser.parseAsEpochMilli
Compared to FastISODateParser.parse, which returns an
OffsetDateTime object, parseAsEpochMilli returns the
epoch time millis. The performance improvement for
date parsing alone is roughly 100% (8m dates/s to
18m dates/s).
Insertion speed improved from 13-14s for 1.6m entries
to 11.5-12.5s.
2018-12-16 19:24:47 +01:00
135ab42cd8 tags are now stored as variable length byte sequences of longs
Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now
stored as longs (variable length encoded) in the PersistenMap.
Tags.filenameBytes was introduced to reduce memory consumption, when
all tags were hold in memory. Tags are now stored in a PersistentMap
and only read when needed.

Moved the VariableByteEncoder into its own project, because it was
needed by pdb-api.
2018-11-17 20:03:46 +01:00
fce0f6a04d use PersistentMap in DataStore
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
2018-11-17 09:45:35 +01:00
0e5a47ac10 make sure serialized tags are always sorted the same way 2018-10-03 16:50:09 +02:00
24fcfd7763 prepare the addition of a date index 2018-09-28 19:07:01 +02:00
1182d76205 replace the FolderStorage with DiskStorage
- The DiskStorage uses only one file instead of millions.
  Also the block size is only 512 byte instead of 4kb, which
  helps to reduce the memory usage for short sequences.
- Update primitiveCollections to get the new LongList.range
  and LongList.rangeClosed methods.
- BSFile now stores Time&Value sequences and knows how to
  encode the time values with delta encoding.
- Doc had to do some magic tricks to save memory. The path
  was initialized lazy and stored as byte array. This is no
  longer necessary. The patch was replaced by the
  rootBlockNumber of the BSFile.
- Had to temporarily disable the 'in' queries.
- The stored values are now processed as stream of LongLists
  instead of Entry. The overhead for creating Entries is
  gone, so is the memory overhead, because Entry was an
  object and had a reference to the tags, which is
  unnecessary.
2018-09-12 09:35:07 +02:00
bb8dbad393 different tags could be written to the same file
There was a missing synchronization in the code that maps
Strings to Integers.
2018-07-28 08:37:30 +02:00
9f37243ba3 Reduce memory consumption of Tags by 50%
by storing only the bytes instead of the string.
2018-03-28 19:08:53 +02:00
81711d551f fix performance regression
The last improvement of memory usage introduced a performance
regression. The ingestion performance dropped by 50%-80%, because
for every inserted entry the Tags were created inefficient.
2018-03-27 19:30:18 +02:00
c581e352e4 add method that returns a string representation of the tags in Tags 2018-03-19 19:29:22 +01:00
5343c0d427 reduce memory usage
Reduce memory usage by storing the filename as string instead of
individual tags.
2018-03-19 19:21:57 +01:00
d4fd25dc4c replace LinkedHashMap with a more memory efficient implementation
This saves approximately 50MB of heap space.
2017-09-30 17:51:02 +02:00
8baf05962f group by multiple fields
Before we could only group by a single field. But it is acutally
very useful to group by multiple fields. For example to see the
graph for a small set of methods grouped by host and project.
2017-04-12 19:16:19 +02:00
ee15594070 remove TODOs
They don't make sense anymore.
E.g. the Tags class is used by classes outside of
org.lucares.performance.db.
2017-04-11 18:09:29 +02:00
cd6b71d35a use shorter folder names
reduces the risk of too long file names
2017-04-02 11:13:08 +02:00
a01c8b3907 fix flaky test and improve error handling
just ignore invalid entries
2017-03-18 10:14:41 +01:00
f520f18e13 leverage the cached pdbwriters
this increased performance from 500 entries per second to 4000.
2016-12-29 19:24:16 +01:00
db0b3d6d24 new file format
Store values in sequences of variable length. Instead of using 8 bytes
per entry we are now using between 2 and 20 bytes. But we are also able
to store every non-negative long value.
2016-12-27 10:24:56 +01:00
d1e39513f3 create web application 2016-12-21 17:48:36 +01:00