Commit Graph

16 Commits

Author SHA1 Message Date
afba3b6f77 elements not evicted if new elements are added 2018-12-20 16:13:55 +01:00
d67e452a91 cache disk blocks in an LRU cache
Improves read access by factor 4 for small trees.
2018-11-24 15:07:37 +01:00
9889252205 use only one thread for evictions
Instead of spawning a new thread for every cache, we use a single thread
that will evict entries from all caches.
The thread keeps a weak reference to the caches, so that they can be
garbage collected.
2018-11-24 08:32:05 +01:00
64771417e4 only iterates over elements when at least one element can be evicted 2018-11-23 07:23:38 +01:00
f78f69328b add cache for docId to Doc mapping
A Doc does not change once it is created, so it is easy to cache.
Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).
2018-11-22 19:51:07 +01:00
fce0f6a04d use PersistentMap in DataStore
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
2018-11-17 09:45:35 +01:00
f2d5c27668 insertion of many values into the persistent map 2018-11-04 10:11:10 +01:00
8b48b8c3e7 add a pointer to the root node
Before the offset of the root node was hard-coded.
Now the offset of the pointer to the root node is hard-coded.
That allows us to replace the root node.
2018-10-27 08:55:15 +02:00
c83b6e11e2 Add first part of a persistent map implementation. 2018-10-14 16:47:17 +02:00
5343c0d427 reduce memory usage
Reduce memory usage by storing the filename as string instead of
individual tags.
2018-03-19 19:21:57 +01:00
ahr
829fddf88c merge key and value arrays
we have several hundred thousand of those MiniMaps and this reduces
the memory requirement by 8 bytes per instance
2018-03-09 08:40:12 +01:00
ahr
64db4c48a2 add plots for percentiles 2017-11-06 16:57:22 +01:00
d4fd25dc4c replace LinkedHashMap with a more memory efficient implementation
This saves approximately 50MB of heap space.
2017-09-30 17:51:02 +02:00
f6a9fc2394 propose for an empty query 2017-04-16 10:39:17 +02:00
ac1ee20046 replace ludb with data-store
LuDB has a few disadvantages. 
  1. Most notably disk space. H2 wastes a lot of valuable disk space.
     For my test data set with 44 million entries it is 14 MB 
     (sometimes a lot more; depends on H2 internal cleanup). With 
     data-store it is 15 KB.
     Overall I could reduce the disk space from 231 MB to 200 MB (13.4 %
     in this example). That is an average of 4.6 bytes per entry.
  2. Speed:
     a) Liquibase is slow. The first time it takes approx. three seconds
     b) Query and insertion. with data-store we can insert entries 
        up to 1.6 times faster.

Data-store uses a few tricks to save disk space:
  1. We encode the tags into the file names.
  2. To keep them short we translate the key/value of the tag into 
     shorter numbers. For example "foo" -> 12 and "bar" to 47. So the
     tag "foo"/"bar" would be 12/47. 
     We then translate this number into a numeral system of base 62
     (a-zA-Z0-9), so it can be used for file names and it is shorter.
     That way we only have to store the mapping of string to int.
  3. We do that in a simple tab separated file.
2017-04-16 09:07:28 +02:00
d1e39513f3 create web application 2016-12-21 17:48:36 +01:00