perfdb

Author	SHA1	Message	Date
andi	67c66ef89d	add second parser that uses a standard CSV reader	2021-08-12 17:54:27 +02:00
andi	85ed5f1ccb	file drop support - Add a folder where you can drop Zip files which will then be extracted on the fly and ingsted. - CsvReaderSettings now contain TagMatcher that are applied to the first line and can be used to extract additional tags. - Update to jdk 16 so that we can have records.	2021-08-01 09:31:40 +02:00
andi	7adfc7029f	Revert "introduce indexes" This reverts commit `36ccc57db6`.	2021-05-12 18:18:57 +02:00
andi	75857b553e	do tag to string conversion in StringCompressor instead of Tag	2021-05-09 10:44:24 +02:00
andi	6dc335600e	do string compression in StringCompressor instead of Tag	2021-05-09 10:37:35 +02:00
andi	36ccc57db6	introduce indexes	2021-05-09 10:33:28 +02:00
andi	cf7e5ec968	add bar charts	2020-01-19 10:35:07 +01:00
andi	00ba4d2a69	add support for renaming and post processing of csv columns	2019-12-14 18:11:59 +01:00
andi	4e554bfa85	specify additional tags for CSV upload You can now specify additional tags to be added to all entries. This makes it possible to remove columns that would be identical for all entries.	2019-12-14 07:59:22 +01:00
andi	5d8df6888d	move Entry and Entries to data-store	2019-12-13 18:15:10 +01:00
andi	550d7ba44e	add flag to make CSV upload wait until entries are flushed To make it easier/possible to write stable unit test the CSV upload can optionally wait until all entries have been flushed to disk. This is necessary for tests that ingest data and then read the data.	2019-12-13 18:05:20 +01:00
andi	07ad62ddd9	use Junit5 instead of TestNG We want to be able to use @SpringBootTest tests that fully initialize the Spring application. This is much easier done with Junit than TestNG. Gradle does not support (at least not easily) to run Junit and TestNG tests. Therefore we switch to Junit with all tests. The original reason for using TestNG was that Junit didn't support data providers. But that finally changed in Junit5 with ParameterizedTest.	2019-12-13 14:33:20 +01:00
andi	e931856041	merge projects file-utils, byte-utils and pdb-utils It turned out that most projects needed at least two of the utils projects. file-utils and byte-utils had only one class. Merging them made sense.	2019-12-08 18:47:54 +01:00
andi	ffe5ae8652	add CsvReaderSettings Preparation to add more complex CSV parsing rules.	2019-11-30 18:32:34 +01:00
andi	06b379494f	apply new code formatter and save action	2019-11-24 10:20:43 +01:00
andi	4367323fcd	replace deprecated dependency configurations Using api and implementation instead of the deprecated compile configuration. Update to Gradle 6.0.	2019-11-10 11:08:50 +01:00
andi	f8e859fb6d	cleanup and javadoc	2019-08-31 16:52:13 +02:00
andi	4161cd7f98	only field prefixes returned instead of full values When using autocomplete to return field values I missed, that autocomplete had the feature that cut values at dots. So instead of returning full field values only the prefix up to the first dot was returned. Fixed by making the cut-at-dot feature optional.	2019-08-27 20:37:07 +02:00
andi	2a1885a77f	cluster the indices	2019-03-31 09:01:55 +02:00
andi	b5e2d0a217	introduce clustering for query completion indices	2019-03-16 10:19:28 +01:00
andi	59aea1a15f	introduce index clustering (part 1) In order to prevent files from getting too big and make it easier to implement retention policies, we are splitting all files into chunks. Each chunk contains the data for a time interval (1 month per default). This first changeset introduces the ClusteredPersistentMap that implements this for PersistentMap. It is used for a couple (not all) of indices.	2019-02-24 16:50:57 +01:00
andi	493971bcf3	values used in queries were added to the keys.csv Due to a mistake in Tag which added all strings used by Tag into the String dictionary, the dictionary did contain all values that were used in queries.	2019-02-09 08:28:23 +01:00
andi	668d73c926	introduced a new custom file format used for backup and ingestion The new file format reduces repetition, is easy to parse, easy to generate in any language and is human readable.	2019-02-03 15:44:35 +01:00
andi	76e5d441de	rewrite query completion The old implementation searched for all possible values and then executed each query to see what matches. The new implementation uses several indices to find only the matching values.	2019-02-02 15:35:56 +01:00
andi	72e9a9ebe3	prepare more efficient query completion adding an index that answers the question given a query "a=b and c=", what are possible values for c.	2019-01-13 10:22:17 +01:00
andi	f2d16b6758	make CacheKey comparable The CacheKey is used as a key in a HashMap. Lookup can be faster if the CacheKey is comparable when there are hash collisions. In this case I was not able to measure any effect. I am keeping the comparables nonetheless, because the can only have a positive effect.	2019-01-01 08:47:48 +01:00
andi	4cde10a9f2	read csv using input stream instead of reader We are now reading the CSV input without transforming the data into strings. This reduces the amount of bytes that have to be converted and copied. We also made Tag smaller. It no longer stores pointers to strings, instead it stored integers obtained by compressing the strings (see StringCompressor). This reduces memory usage and it speeds up hashcode and equals, which speeds up access to the writer cache. Performance gain is almost 100%: - 330k entries/s -> 670k entries/s, top speed measured over a second - 62s -> 32s, to ingest 16 million entries	2019-01-01 08:31:28 +01:00
andi	0487c30582	use List instead of TreeMap for intToString mapping UniqueStringIntegerPairs stores mappings of integers 0-n to strings and vice versa. Mapping integers to strings does not need a TreeMap, it can be done with a List. Makes insertions 3 times (when using the in-memory variant that does not write to disk) and 7 times faster for int to string mapping.	2018-12-22 10:07:19 +01:00
andi	d95a71e32e	batch entries between TcpIngestor and PerformanceDB One bottleneck was the blocking queue used to transport entries from the listener thread to the ingestor thread. Reduced the bottleneck by batching entries. Interestingly the batch size of 100 was better than batch size of 1000 and better than 10.	2018-12-21 13:11:35 +01:00
andi	d52bfa0916	remove obsolete class RadixConverter	2018-12-17 19:11:33 +01:00
andi	40f4506e13	use FastISODateParser.parseAsEpochMilli Compared to FastISODateParser.parse, which returns an OffsetDateTime object, parseAsEpochMilli returns the epoch time millis. The performance improvement for date parsing alone is roughly 100% (8m dates/s to 18m dates/s). Insertion speed improved from 13-14s for 1.6m entries to 11.5-12.5s.	2018-12-16 19:24:47 +01:00
andi	20c555c30a	update 3rd party libs and gradle	2018-12-07 14:06:59 +01:00
andi	135ab42cd8	tags are now stored as variable length byte sequences of longs Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now stored as longs (variable length encoded) in the PersistenMap. Tags.filenameBytes was introduced to reduce memory consumption, when all tags were hold in memory. Tags are now stored in a PersistentMap and only read when needed. Moved the VariableByteEncoder into its own project, because it was needed by pdb-api.	2018-11-17 20:03:46 +01:00
andi	fce0f6a04d	use PersistentMap in DataStore Replaces the use of in-memory data structures with the PersistentMap. This is the crucial step in reducing memory usage for both persistent storage and main memory.	2018-11-17 09:45:35 +01:00
andi	0e5a47ac10	make sure serialized tags are always sorted the same way	2018-10-03 16:50:09 +02:00
andi	d799682b4d	Fix build issue with Java 11. For some reason the Gradle build with Java 11 failed because of an inner class. After extracting it the build no longer fails.	2018-09-29 19:50:05 +02:00
andi	24fcfd7763	prepare the addition of a date index	2018-09-28 19:07:01 +02:00
andi	2e433ba969	cleanup	2018-09-13 07:52:14 +02:00
andi	1182d76205	replace the FolderStorage with DiskStorage - The DiskStorage uses only one file instead of millions. Also the block size is only 512 byte instead of 4kb, which helps to reduce the memory usage for short sequences. - Update primitiveCollections to get the new LongList.range and LongList.rangeClosed methods. - BSFile now stores Time&Value sequences and knows how to encode the time values with delta encoding. - Doc had to do some magic tricks to save memory. The path was initialized lazy and stored as byte array. This is no longer necessary. The patch was replaced by the rootBlockNumber of the BSFile. - Had to temporarily disable the 'in' queries. - The stored values are now processed as stream of LongLists instead of Entry. The overhead for creating Entries is gone, so is the memory overhead, because Entry was an object and had a reference to the tags, which is unnecessary.	2018-09-12 09:35:07 +02:00
andi	bb8dbad393	different tags could be written to the same file There was a missing synchronization in the code that maps Strings to Integers.	2018-07-28 08:37:30 +02:00
andi	9f37243ba3	Reduce memory consumption of Tags by 50% by storing only the bytes instead of the string.	2018-03-28 19:08:53 +02:00
andi	81711d551f	fix performance regression The last improvement of memory usage introduced a performance regression. The ingestion performance dropped by 50%-80%, because for every inserted entry the Tags were created inefficient.	2018-03-27 19:30:18 +02:00
andi	c581e352e4	add method that returns a string representation of the tags in Tags	2018-03-19 19:29:22 +01:00
andi	5343c0d427	reduce memory usage Reduce memory usage by storing the filename as string instead of individual tags.	2018-03-19 19:21:57 +01:00
andi	f2868fcc1b	reduce memory footprint: old generation by 100 MB This reduces the size of the old generation by 100MB (300MB down to 200MB). Unfortunately the total JVM size didn't change and is still 512MB. Doc stores the path as byte array instead of Path.	2017-11-18 10:39:01 +01:00
ahr	64db4c48a2	add plots for percentiles	2017-11-06 16:57:22 +01:00
andi	d4fd25dc4c	replace LinkedHashMap with a more memory efficient implementation This saves approximately 50MB of heap space.	2017-09-30 17:51:02 +02:00
andi	7e00594382	add helper class that returns the size of objects	2017-09-30 17:49:21 +02:00
andi	8baf05962f	group by multiple fields Before we could only group by a single field. But it is acutally very useful to group by multiple fields. For example to see the graph for a small set of methods grouped by host and project.	2017-04-12 19:16:19 +02:00
andi	ee15594070	remove TODOs They don't make sense anymore. E.g. the Tags class is used by classes outside of org.lucares.performance.db.	2017-04-11 18:09:29 +02:00

1 2

55 Commits