perfdb

Author	SHA1	Message	Date
Andreas Huber	76e5d441de	rewrite query completion The old implementation searched for all possible values and then executed each query to see what matches. The new implementation uses several indices to find only the matching values.	2019-02-02 15:35:56 +01:00
Andreas Huber	5197063ae3	the union of many small lists is expensive The reason seems to be the number of memory allocations. In order to create the union of 100 lists we have 99 memory allocations. The first needs the space for the first two lists, the second the space for the first three lists, and so on. We can reduce the number of allocations drastically (in many cases to one) by leveraging the fact that many of the lists were already sorted, non-overlapping and increasing, so that we can simply concatenate them.	2019-01-05 08:52:56 +01:00
Andreas Huber	e537e94d39	HotEntryCache will update Instants only once per second Calling Instant.now() several hundred thousand times per second can be expensive. In my measurements >10% of the time spend when loading new data was spend calling Instant.now(). Fixed this by storing an Instant as static member and updating it periodically in a separate thread.	2018-12-21 19:16:55 +01:00
Andreas Huber	73ad27ab96	remove lastAccessMap In the last commit I added a lastAccessMap to the HotEntryCache. This map made it much more efficient to evict entries. But it also made and put and get operation much more expensive. Overall that change lead to a 65% decrease in ingestion performance of the PerformanceDB. Fixed by removing the map again. Eviction has to look at all elements again.	2018-12-21 10:28:34 +01:00
Andreas Huber	afba3b6f77	elements not evicted if new elements are added	2018-12-20 16:13:55 +01:00
Andreas Huber	d67e452a91	cache disk blocks in an LRU cache Improves read access by factor 4 for small trees.	2018-11-24 15:07:37 +01:00
Andreas Huber	9889252205	use only one thread for evictions Instead of spawning a new thread for every cache, we use a single thread that will evict entries from all caches. The thread keeps a weak reference to the caches, so that they can be garbage collected.	2018-11-24 08:32:05 +01:00
Andreas Huber	64771417e4	only iterates over elements when at least one element can be evicted	2018-11-23 07:23:38 +01:00
Andreas Huber	f78f69328b	add cache for docId to Doc mapping A Doc does not change once it is created, so it is easy to cache. Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).	2018-11-22 19:51:07 +01:00
Andreas Huber	fce0f6a04d	use PersistentMap in DataStore Replaces the use of in-memory data structures with the PersistentMap. This is the crucial step in reducing memory usage for both persistent storage and main memory.	2018-11-17 09:45:35 +01:00
Andreas Huber	f2d5c27668	insertion of many values into the persistent map	2018-11-04 10:11:10 +01:00
Andreas Huber	8b48b8c3e7	add a pointer to the root node Before the offset of the root node was hard-coded. Now the offset of the pointer to the root node is hard-coded. That allows us to replace the root node.	2018-10-27 08:55:15 +02:00
Andreas Huber	c83b6e11e2	Add first part of a persistent map implementation.	2018-10-14 16:47:17 +02:00
Andreas Huber	5343c0d427	reduce memory usage Reduce memory usage by storing the filename as string instead of individual tags.	2018-03-19 19:21:57 +01:00
ahr	829fddf88c	merge key and value arrays we have several hundred thousand of those MiniMaps and this reduces the memory requirement by 8 bytes per instance	2018-03-09 08:40:12 +01:00
ahr	64db4c48a2	add plots for percentiles	2017-11-06 16:57:22 +01:00
Andreas Huber	d4fd25dc4c	replace LinkedHashMap with a more memory efficient implementation This saves approximately 50MB of heap space.	2017-09-30 17:51:02 +02:00
Andreas Huber	f6a9fc2394	propose for an empty query	2017-04-16 10:39:17 +02:00
Andreas Huber	ac1ee20046	replace ludb with data-store LuDB has a few disadvantages. 1. Most notably disk space. H2 wastes a lot of valuable disk space. For my test data set with 44 million entries it is 14 MB (sometimes a lot more; depends on H2 internal cleanup). With data-store it is 15 KB. Overall I could reduce the disk space from 231 MB to 200 MB (13.4 % in this example). That is an average of 4.6 bytes per entry. 2. Speed: a) Liquibase is slow. The first time it takes approx. three seconds b) Query and insertion. with data-store we can insert entries up to 1.6 times faster. Data-store uses a few tricks to save disk space: 1. We encode the tags into the file names. 2. To keep them short we translate the key/value of the tag into shorter numbers. For example "foo" -> 12 and "bar" to 47. So the tag "foo"/"bar" would be 12/47. We then translate this number into a numeral system of base 62 (a-zA-Z0-9), so it can be used for file names and it is shorter. That way we only have to store the mapping of string to int. 3. We do that in a simple tab separated file.	2017-04-16 09:07:28 +02:00
Andreas Huber	d1e39513f3	create web application	2016-12-21 17:48:36 +01:00

20 Commits