perfdb

Author	SHA1	Message	Date
Andreas Huber	10155f9cdb	use special enum for DateBucket units Preparation step for having custom intervals.	2020-09-27 17:06:27 +02:00
Andreas Huber	b8f77dc9a6	replace custom timezones with UTC we are only using UTC	2020-09-27 08:21:24 +02:00
Andreas Huber	273c019df1	remove unused class RuntimeExcecutionException	2020-09-27 07:53:52 +02:00
Andreas Huber	e06f4175a3	illegal state exception when using interval 'year'	2020-09-20 09:11:45 +02:00
Andreas Huber	45f9e36a88	cleanup	2020-04-05 09:50:26 +02:00
Andreas Huber	50f555d23c	add interval splitting for bar charts	2020-04-05 08:14:09 +02:00
Andreas Huber	75391f21ff	extract code from DateIndexExtension to LongToDateBucket Making it possible to reuse the code to sort timestamps into date based buckets.	2020-04-03 19:46:08 +02:00
Andreas Huber	00ba4d2a69	add support for renaming and post processing of csv columns	2019-12-14 18:11:59 +01:00
Andreas Huber	07ad62ddd9	use Junit5 instead of TestNG We want to be able to use @SpringBootTest tests that fully initialize the Spring application. This is much easier done with Junit than TestNG. Gradle does not support (at least not easily) to run Junit and TestNG tests. Therefore we switch to Junit with all tests. The original reason for using TestNG was that Junit didn't support data providers. But that finally changed in Junit5 with ParameterizedTest.	2019-12-13 14:33:20 +01:00
Andreas Huber	e931856041	merge projects file-utils, byte-utils and pdb-utils It turned out that most projects needed at least two of the utils projects. file-utils and byte-utils had only one class. Merging them made sense.	2019-12-08 18:47:54 +01:00
Andreas Huber	06b379494f	apply new code formatter and save action	2019-11-24 10:20:43 +01:00
Andreas Huber	e2a33ac6e2	make the code that determines which axis to use explicit In the previous changeset the code that determined which axis the plots used was implemented as a side effect of getting the Gnuplot definition of an axis. Changed that to an explit update call with simpler logic.	2019-11-24 09:08:36 +01:00
Andreas Huber	892d5a6d08	automatically determine which axis a plot needs	2019-11-24 08:18:52 +01:00
Andreas Huber	82a961dbaf	move definition of x-axis to the aggregate handlers	2019-11-23 14:28:18 +01:00
Andreas Huber	4367323fcd	replace deprecated dependency configurations Using api and implementation instead of the deprecated compile configuration. Update to Gradle 6.0.	2019-11-10 11:08:50 +01:00
Andreas Huber	2f35978184	fetch available values for gallery via autocomplete method We had a method that returned the values of a field with respect to a query. That method was inefficient, because it executed the query, fetched all Docs and collected the values. The autocomplete method we introduced a while back can answer the same question but much more efficiently.	2019-08-25 18:52:05 +02:00
Andreas Huber	3a7688d1ae	remember next eviction time and skip eviction	2019-08-24 19:39:59 +02:00
Andreas Huber	6eaf4e10fc	add maxSize parameter to HotEntryCache	2019-08-24 19:24:20 +02:00
Andreas Huber	00c20dae6b	use long instead of Instant for time Working with longs is faster and requires less cache. The space in L123 caches is precious.	2019-08-19 18:58:24 +02:00
Andreas Huber	feda901f6d	remove event types We only have removal events. The additional complexity of having a generic interface for many different event types does not pay off.	2019-08-18 20:30:25 +02:00
Andreas Huber	4d9ea6d2a8	switch back to my own HotEntryCache implementation Guava's cache does not evict elements reliably by time. Configure a cache to have a lifetime of n seconds, then you cannot expect that an element is actually evicted after n seconds with Guava.	2019-08-18 20:14:14 +02:00
Andreas Huber	0dc908c79c	show the removal listener of HotEntryCache is not called on expire	2019-08-18 09:27:18 +02:00
Andreas Huber	92a47d9b56	remove TagsToFile Remove one layer of abstraction by moving the code into the DataStore.	2019-02-16 16:06:46 +01:00
Andreas Huber	117ef4ea34	use guava's cache as implementation for the HotEntryCache My own implementation was faster, but was not able to implement a size limitation.	2019-02-16 10:23:52 +01:00
Andreas Huber	76e5d441de	rewrite query completion The old implementation searched for all possible values and then executed each query to see what matches. The new implementation uses several indices to find only the matching values.	2019-02-02 15:35:56 +01:00
Andreas Huber	5197063ae3	the union of many small lists is expensive The reason seems to be the number of memory allocations. In order to create the union of 100 lists we have 99 memory allocations. The first needs the space for the first two lists, the second the space for the first three lists, and so on. We can reduce the number of allocations drastically (in many cases to one) by leveraging the fact that many of the lists were already sorted, non-overlapping and increasing, so that we can simply concatenate them.	2019-01-05 08:52:56 +01:00
Andreas Huber	e537e94d39	HotEntryCache will update Instants only once per second Calling Instant.now() several hundred thousand times per second can be expensive. In my measurements >10% of the time spend when loading new data was spend calling Instant.now(). Fixed this by storing an Instant as static member and updating it periodically in a separate thread.	2018-12-21 19:16:55 +01:00
Andreas Huber	73ad27ab96	remove lastAccessMap In the last commit I added a lastAccessMap to the HotEntryCache. This map made it much more efficient to evict entries. But it also made and put and get operation much more expensive. Overall that change lead to a 65% decrease in ingestion performance of the PerformanceDB. Fixed by removing the map again. Eviction has to look at all elements again.	2018-12-21 10:28:34 +01:00
Andreas Huber	afba3b6f77	elements not evicted if new elements are added	2018-12-20 16:13:55 +01:00
Andreas Huber	d67e452a91	cache disk blocks in an LRU cache Improves read access by factor 4 for small trees.	2018-11-24 15:07:37 +01:00
Andreas Huber	9889252205	use only one thread for evictions Instead of spawning a new thread for every cache, we use a single thread that will evict entries from all caches. The thread keeps a weak reference to the caches, so that they can be garbage collected.	2018-11-24 08:32:05 +01:00
Andreas Huber	64771417e4	only iterates over elements when at least one element can be evicted	2018-11-23 07:23:38 +01:00
Andreas Huber	f78f69328b	add cache for docId to Doc mapping A Doc does not change once it is created, so it is easy to cache. Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).	2018-11-22 19:51:07 +01:00
Andreas Huber	fce0f6a04d	use PersistentMap in DataStore Replaces the use of in-memory data structures with the PersistentMap. This is the crucial step in reducing memory usage for both persistent storage and main memory.	2018-11-17 09:45:35 +01:00
Andreas Huber	f2d5c27668	insertion of many values into the persistent map	2018-11-04 10:11:10 +01:00
Andreas Huber	8b48b8c3e7	add a pointer to the root node Before the offset of the root node was hard-coded. Now the offset of the pointer to the root node is hard-coded. That allows us to replace the root node.	2018-10-27 08:55:15 +02:00
Andreas Huber	c83b6e11e2	Add first part of a persistent map implementation.	2018-10-14 16:47:17 +02:00
Andreas Huber	5343c0d427	reduce memory usage Reduce memory usage by storing the filename as string instead of individual tags.	2018-03-19 19:21:57 +01:00
ahr	829fddf88c	merge key and value arrays we have several hundred thousand of those MiniMaps and this reduces the memory requirement by 8 bytes per instance	2018-03-09 08:40:12 +01:00
ahr	64db4c48a2	add plots for percentiles	2017-11-06 16:57:22 +01:00
Andreas Huber	d4fd25dc4c	replace LinkedHashMap with a more memory efficient implementation This saves approximately 50MB of heap space.	2017-09-30 17:51:02 +02:00
Andreas Huber	f6a9fc2394	propose for an empty query	2017-04-16 10:39:17 +02:00
Andreas Huber	ac1ee20046	replace ludb with data-store LuDB has a few disadvantages. 1. Most notably disk space. H2 wastes a lot of valuable disk space. For my test data set with 44 million entries it is 14 MB (sometimes a lot more; depends on H2 internal cleanup). With data-store it is 15 KB. Overall I could reduce the disk space from 231 MB to 200 MB (13.4 % in this example). That is an average of 4.6 bytes per entry. 2. Speed: a) Liquibase is slow. The first time it takes approx. three seconds b) Query and insertion. with data-store we can insert entries up to 1.6 times faster. Data-store uses a few tricks to save disk space: 1. We encode the tags into the file names. 2. To keep them short we translate the key/value of the tag into shorter numbers. For example "foo" -> 12 and "bar" to 47. So the tag "foo"/"bar" would be 12/47. We then translate this number into a numeral system of base 62 (a-zA-Z0-9), so it can be used for file names and it is shorter. That way we only have to store the mapping of string to int. 3. We do that in a simple tab separated file.	2017-04-16 09:07:28 +02:00
Andreas Huber	d1e39513f3	create web application	2016-12-21 17:48:36 +01:00

44 Commits