Commit Graph

44 Commits

Author SHA1 Message Date
10155f9cdb use special enum for DateBucket units
Preparation step for having custom intervals.
2020-09-27 17:06:27 +02:00
b8f77dc9a6 replace custom timezones with UTC
we are only using UTC
2020-09-27 08:21:24 +02:00
273c019df1 remove unused class RuntimeExcecutionException 2020-09-27 07:53:52 +02:00
e06f4175a3 illegal state exception when using interval 'year' 2020-09-20 09:11:45 +02:00
45f9e36a88 cleanup 2020-04-05 09:50:26 +02:00
50f555d23c add interval splitting for bar charts 2020-04-05 08:14:09 +02:00
75391f21ff extract code from DateIndexExtension to LongToDateBucket
Making it possible to reuse the code to sort timestamps
into date based buckets.
2020-04-03 19:46:08 +02:00
00ba4d2a69 add support for renaming and post processing of csv columns 2019-12-14 18:11:59 +01:00
07ad62ddd9 use Junit5 instead of TestNG
We want to be able to use @SpringBootTest tests that fully initialize
the Spring application. This is much easier done with Junit than TestNG.
Gradle does not support (at least not easily) to run Junit and TestNG
tests. Therefore we switch to Junit with all tests.
The original reason for using TestNG was that Junit didn't support
data providers. But that finally changed in Junit5 with
ParameterizedTest.
2019-12-13 14:33:20 +01:00
e931856041 merge projects file-utils, byte-utils and pdb-utils
It turned out that most projects needed at least
two of the utils projects. file-utils and byte-utils
had only one class. Merging them made sense.
2019-12-08 18:47:54 +01:00
06b379494f apply new code formatter and save action 2019-11-24 10:20:43 +01:00
e2a33ac6e2 make the code that determines which axis to use explicit
In the previous changeset the code that determined
which axis the plots used was implemented as a
side effect of getting the Gnuplot definition of
an axis.
Changed that to an explit update call with simpler
logic.
2019-11-24 09:08:36 +01:00
892d5a6d08 automatically determine which axis a plot needs 2019-11-24 08:18:52 +01:00
82a961dbaf move definition of x-axis to the aggregate handlers 2019-11-23 14:28:18 +01:00
4367323fcd replace deprecated dependency configurations
Using api and implementation instead of the
deprecated compile configuration.

Update to Gradle 6.0.
2019-11-10 11:08:50 +01:00
2f35978184 fetch available values for gallery via autocomplete method
We had a method that returned the values of a field
with respect to a query. That method was inefficient,
because it executed the query, fetched all Docs
and collected the values.
The autocomplete method we introduced a while back
can answer the same question but much more efficiently.
2019-08-25 18:52:05 +02:00
3a7688d1ae remember next eviction time and skip eviction 2019-08-24 19:39:59 +02:00
6eaf4e10fc add maxSize parameter to HotEntryCache 2019-08-24 19:24:20 +02:00
00c20dae6b use long instead of Instant for time
Working with longs is faster and requires less
cache. The space in L123 caches is precious.
2019-08-19 18:58:24 +02:00
feda901f6d remove event types
We only have removal events. The additional complexity
of having a generic interface for many different event
types does not pay off.
2019-08-18 20:30:25 +02:00
4d9ea6d2a8 switch back to my own HotEntryCache implementation
Guava's cache does not evict elements reliably by
time. Configure a cache to have a lifetime of n
seconds, then you cannot expect that an element is
actually evicted after n seconds with Guava.
2019-08-18 20:14:14 +02:00
0dc908c79c show the removal listener of HotEntryCache is not called on expire 2019-08-18 09:27:18 +02:00
92a47d9b56 remove TagsToFile
Remove one layer of abstraction by moving the code into the DataStore.
2019-02-16 16:06:46 +01:00
117ef4ea34 use guava's cache as implementation for the HotEntryCache
My own implementation was faster, but was not able to
implement a size limitation.
2019-02-16 10:23:52 +01:00
76e5d441de rewrite query completion
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
2019-02-02 15:35:56 +01:00
5197063ae3 the union of many small lists is expensive
The reason seems to be the number of memory allocations. In order
to create the union of 100 lists we have 99 memory allocations.
The first needs the space for the first two lists, the second the
space for the first three lists, and so on.

We can reduce the number of allocations drastically (in many
cases to one) by leveraging the fact that many of the lists
were already sorted, non-overlapping and increasing, so that
we can simply concatenate them.
2019-01-05 08:52:56 +01:00
e537e94d39 HotEntryCache will update Instants only once per second
Calling Instant.now() several hundred thousand times per
second can be expensive. In my measurements >10% of the
time spend when loading new data was spend calling
Instant.now().
Fixed this by storing an Instant as static member and
updating it periodically in a separate thread.
2018-12-21 19:16:55 +01:00
73ad27ab96 remove lastAccessMap
In the last commit I added a lastAccessMap to the HotEntryCache.
This map made it much more efficient to evict entries. But it
also made and put and get operation much more expensive. Overall
that change lead to a 65% decrease in ingestion performance of
the PerformanceDB.
Fixed by removing the map again. Eviction has to look at all
elements again.
2018-12-21 10:28:34 +01:00
afba3b6f77 elements not evicted if new elements are added 2018-12-20 16:13:55 +01:00
d67e452a91 cache disk blocks in an LRU cache
Improves read access by factor 4 for small trees.
2018-11-24 15:07:37 +01:00
9889252205 use only one thread for evictions
Instead of spawning a new thread for every cache, we use a single thread
that will evict entries from all caches.
The thread keeps a weak reference to the caches, so that they can be
garbage collected.
2018-11-24 08:32:05 +01:00
64771417e4 only iterates over elements when at least one element can be evicted 2018-11-23 07:23:38 +01:00
f78f69328b add cache for docId to Doc mapping
A Doc does not change once it is created, so it is easy to cache.
Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).
2018-11-22 19:51:07 +01:00
fce0f6a04d use PersistentMap in DataStore
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
2018-11-17 09:45:35 +01:00
f2d5c27668 insertion of many values into the persistent map 2018-11-04 10:11:10 +01:00
8b48b8c3e7 add a pointer to the root node
Before the offset of the root node was hard-coded.
Now the offset of the pointer to the root node is hard-coded.
That allows us to replace the root node.
2018-10-27 08:55:15 +02:00
c83b6e11e2 Add first part of a persistent map implementation. 2018-10-14 16:47:17 +02:00
5343c0d427 reduce memory usage
Reduce memory usage by storing the filename as string instead of
individual tags.
2018-03-19 19:21:57 +01:00
ahr
829fddf88c merge key and value arrays
we have several hundred thousand of those MiniMaps and this reduces
the memory requirement by 8 bytes per instance
2018-03-09 08:40:12 +01:00
ahr
64db4c48a2 add plots for percentiles 2017-11-06 16:57:22 +01:00
d4fd25dc4c replace LinkedHashMap with a more memory efficient implementation
This saves approximately 50MB of heap space.
2017-09-30 17:51:02 +02:00
f6a9fc2394 propose for an empty query 2017-04-16 10:39:17 +02:00
ac1ee20046 replace ludb with data-store
LuDB has a few disadvantages. 
  1. Most notably disk space. H2 wastes a lot of valuable disk space.
     For my test data set with 44 million entries it is 14 MB 
     (sometimes a lot more; depends on H2 internal cleanup). With 
     data-store it is 15 KB.
     Overall I could reduce the disk space from 231 MB to 200 MB (13.4 %
     in this example). That is an average of 4.6 bytes per entry.
  2. Speed:
     a) Liquibase is slow. The first time it takes approx. three seconds
     b) Query and insertion. with data-store we can insert entries 
        up to 1.6 times faster.

Data-store uses a few tricks to save disk space:
  1. We encode the tags into the file names.
  2. To keep them short we translate the key/value of the tag into 
     shorter numbers. For example "foo" -> 12 and "bar" to 47. So the
     tag "foo"/"bar" would be 12/47. 
     We then translate this number into a numeral system of base 62
     (a-zA-Z0-9), so it can be used for file names and it is shorter.
     That way we only have to store the mapping of string to int.
  3. We do that in a simple tab separated file.
2017-04-16 09:07:28 +02:00
d1e39513f3 create web application 2016-12-21 17:48:36 +01:00