perfdb

Author	SHA1	Message	Date
Andreas Huber	1f144846db	update 3rd party libs and gradle	2019-04-28 14:40:43 +02:00
Andreas Huber	9fb1a136c8	cache last used date prefix The 99.9999% use case is to ingest data from the same month.	2019-04-22 09:51:44 +02:00
Andreas Huber	dfe9579726	use DateTimeRange.max() instead of arbitrary relative range	2019-04-20 20:36:26 +02:00
Andreas Huber	d82b33c60e	update js libraries	2019-04-20 20:31:51 +02:00
Andreas Huber	7277670b8b	update 3rd party libs	2019-04-20 20:19:24 +02:00
Andreas Huber	9525ee22a0	add access restrictions for a few unwelcome classes	2019-04-20 20:12:45 +02:00
Andreas Huber	56085061ed	do not return anything if the field/value does not exist The computation of proposals is done by searching for values in a combined index. If one of the values didn't exist, then the algorithm returned all values. Fixed by checking that we query only existing field/values from the combined index.	2019-04-20 19:48:51 +02:00
Andreas Huber	dbe0e02517	rename cluster to partition We are not clustering the indices, we are partitioning them.	2019-04-14 10:10:16 +02:00
Andreas Huber	2a1885a77f	cluster the indices	2019-03-31 09:01:55 +02:00
Andreas Huber	95f2f26966	handle IOExceptions earlier	2019-03-17 11:13:46 +01:00
Andreas Huber	5d0ceb112e	add clustering for DiskStore	2019-03-17 10:53:02 +01:00
Andreas Huber	b5e2d0a217	introduce clustering for query completion indices	2019-03-16 10:19:28 +01:00
Andreas Huber	fb9f8592ac	make ClusteredPersistentMap easier to use	2019-02-24 19:20:44 +01:00
Andreas Huber	59aea1a15f	introduce index clustering (part 1) In order to prevent files from getting too big and make it easier to implement retention policies, we are splitting all files into chunks. Each chunk contains the data for a time interval (1 month per default). This first changeset introduces the ClusteredPersistentMap that implements this for PersistentMap. It is used for a couple (not all) of indices.	2019-02-24 16:50:57 +01:00
Andreas Huber	372a073b6d	PdbWriter is no longer in the API of DataStore	2019-02-16 16:24:14 +01:00
Andreas Huber	92a47d9b56	remove TagsToFile Remove one layer of abstraction by moving the code into the DataStore.	2019-02-16 16:06:46 +01:00
Andreas Huber	117ef4ea34	use guava's cache as implementation for the HotEntryCache My own implementation was faster, but was not able to implement a size limitation.	2019-02-16 10:23:52 +01:00
Andreas Huber	7b00eede86	refactoring: extract EncoderDecoders from DataStore	2019-02-16 09:16:15 +01:00
Andreas Huber	cbcb7714bb	split BSFile into a TimeSeries and a LongStream file BSFile was used to store two types of data. This makes the API complex. I split the API into two files with easier and more clear APIs. Interestingly the API of BSFile is still rather complex and has to consider both use cases.	2019-02-10 09:59:16 +01:00
Andreas Huber	fd55ea0866	update vuejs to 2.6.4 Added the version to moment.min.js	2019-02-09 15:39:28 +01:00
Andreas Huber	93dea402a5	remove obsolete class	2019-02-09 15:25:39 +01:00
Andreas Huber	27b83234cc	group proposal as if they were hierarchical We interpret dots ('.') as hierarchy delimiter in. That way we can reduce the number of proposed values and show only those for the next level.	2019-02-09 15:21:35 +01:00
Andreas Huber	493971bcf3	values used in queries were added to the keys.csv Due to a mistake in Tag which added all strings used by Tag into the String dictionary, the dictionary did contain all values that were used in queries.	2019-02-09 08:28:23 +01:00
Andreas Huber	ea5884a5e6	move creation of PdbWriter to the DataStore	2019-02-07 18:06:41 +01:00
Andreas Huber	58bfba23bb	reset lastEpochMilli when opening a new export file	2019-02-06 15:52:37 +00:00
Andreas Huber	99cdf557b3	add metric logger for query completion evaluation	2019-02-06 15:51:41 +00:00
Andreas Huber	668d73c926	introduced a new custom file format used for backup and ingestion The new file format reduces repetition, is easy to parse, easy to generate in any language and is human readable.	2019-02-03 15:44:35 +01:00
Andreas Huber	1d8ca0e21c	fetch org.lucares artifacts only from repo.lucares.de	2019-02-02 17:51:20 +01:00
Andreas Huber	c0fffbf676	update third party libs gradle to 5.1.1 spring-boot to 2.1.2.RELEASE antlr to 4.7.2 jackson to 2.9.8	2019-02-02 17:33:21 +01:00
Andreas Huber	2e48061793	add LRU cache to PersistentMap This should speed up fetching and inserting of values that are used often.	2019-02-02 17:26:25 +01:00
Andreas Huber	d4d1685f9f	replace stdout with logger	2019-02-02 16:49:21 +01:00
Andreas Huber	151e9363e1	remove obsolete classes	2019-02-02 16:45:34 +01:00
Andreas Huber	76e5d441de	rewrite query completion The old implementation searched for all possible values and then executed each query to see what matches. The new implementation uses several indices to find only the matching values.	2019-02-02 15:35:56 +01:00
Andreas Huber	72e9a9ebe3	prepare more efficient query completion adding an index that answers the question given a query "a=b and c=", what are possible values for c.	2019-01-13 10:22:17 +01:00
Andreas Huber	5197063ae3	the union of many small lists is expensive The reason seems to be the number of memory allocations. In order to create the union of 100 lists we have 99 memory allocations. The first needs the space for the first two lists, the second the space for the first three lists, and so on. We can reduce the number of allocations drastically (in many cases to one) by leveraging the fact that many of the lists were already sorted, non-overlapping and increasing, so that we can simply concatenate them.	2019-01-05 08:52:56 +01:00
Andreas Huber	3dca7483de	utility that generates a csv with many different tags	2019-01-05 08:33:57 +01:00
Andreas Huber	f2d16b6758	make CacheKey comparable The CacheKey is used as a key in a HashMap. Lookup can be faster if the CacheKey is comparable when there are hash collisions. In this case I was not able to measure any effect. I am keeping the comparables nonetheless, because the can only have a positive effect.	2019-01-01 08:47:48 +01:00
Andreas Huber	4cde10a9f2	read csv using input stream instead of reader We are now reading the CSV input without transforming the data into strings. This reduces the amount of bytes that have to be converted and copied. We also made Tag smaller. It no longer stores pointers to strings, instead it stored integers obtained by compressing the strings (see StringCompressor). This reduces memory usage and it speeds up hashcode and equals, which speeds up access to the writer cache. Performance gain is almost 100%: - 330k entries/s -> 670k entries/s, top speed measured over a second - 62s -> 32s, to ingest 16 million entries	2019-01-01 08:31:28 +01:00
Andreas Huber	0487c30582	use List instead of TreeMap for intToString mapping UniqueStringIntegerPairs stores mappings of integers 0-n to strings and vice versa. Mapping integers to strings does not need a TreeMap, it can be done with a List. Makes insertions 3 times (when using the in-memory variant that does not write to disk) and 7 times faster for int to string mapping.	2018-12-22 10:07:19 +01:00
Andreas Huber	e537e94d39	HotEntryCache will update Instants only once per second Calling Instant.now() several hundred thousand times per second can be expensive. In my measurements >10% of the time spend when loading new data was spend calling Instant.now(). Fixed this by storing an Instant as static member and updating it periodically in a separate thread.	2018-12-21 19:16:55 +01:00
Andreas Huber	d95a71e32e	batch entries between TcpIngestor and PerformanceDB One bottleneck was the blocking queue used to transport entries from the listener thread to the ingestor thread. Reduced the bottleneck by batching entries. Interestingly the batch size of 100 was better than batch size of 1000 and better than 10.	2018-12-21 13:11:35 +01:00
Andreas Huber	73ad27ab96	remove lastAccessMap In the last commit I added a lastAccessMap to the HotEntryCache. This map made it much more efficient to evict entries. But it also made and put and get operation much more expensive. Overall that change lead to a 65% decrease in ingestion performance of the PerformanceDB. Fixed by removing the map again. Eviction has to look at all elements again.	2018-12-21 10:28:34 +01:00
Andreas Huber	afba3b6f77	elements not evicted if new elements are added	2018-12-20 16:13:55 +01:00
Andreas Huber	d52bfa0916	remove obsolete class RadixConverter	2018-12-17 19:11:33 +01:00
Andreas Huber	3a4101bbf9	increase the buffer between ingestion and insertion thread I was finally able to show that there is a tiny but measureable effect of this buffer. I think it was not visible before, because the parsing was too slow. But now, that I replaced the date parser, the ingestion thread is twice as fast as the insertion thread. Therefore the buffer makes more sense.	2018-12-17 19:07:55 +01:00
Andreas Huber	d37508b7a1	Pattern.split is faster than StringUtils.splitPreserveAll Document the fact, so that I do not have to repeat the same test a third time.	2018-12-17 19:05:34 +01:00
Andreas Huber	40f4506e13	use FastISODateParser.parseAsEpochMilli Compared to FastISODateParser.parse, which returns an OffsetDateTime object, parseAsEpochMilli returns the epoch time millis. The performance improvement for date parsing alone is roughly 100% (8m dates/s to 18m dates/s). Insertion speed improved from 13-14s for 1.6m entries to 11.5-12.5s.	2018-12-16 19:24:47 +01:00
Andreas Huber	23f800a441	add date parsing method that returns epochMillis instead of date object	2018-12-16 15:38:26 +01:00
Andreas Huber	20c555c30a	update 3rd party libs and gradle	2018-12-07 14:06:59 +01:00
Andreas Huber	253bbabd19	cleanup remove debug output	2018-11-25 07:49:23 +00:00

... 7 8 9 10 11 ...

784 Commits