Commit Graph

12 Commits

Author SHA1 Message Date
1182d76205 replace the FolderStorage with DiskStorage
- The DiskStorage uses only one file instead of millions.
  Also the block size is only 512 byte instead of 4kb, which
  helps to reduce the memory usage for short sequences.
- Update primitiveCollections to get the new LongList.range
  and LongList.rangeClosed methods.
- BSFile now stores Time&Value sequences and knows how to
  encode the time values with delta encoding.
- Doc had to do some magic tricks to save memory. The path
  was initialized lazy and stored as byte array. This is no
  longer necessary. The patch was replaced by the
  rootBlockNumber of the BSFile.
- Had to temporarily disable the 'in' queries.
- The stored values are now processed as stream of LongLists
  instead of Entry. The overhead for creating Entries is
  gone, so is the memory overhead, because Entry was an
  object and had a reference to the tags, which is
  unnecessary.
2018-09-12 09:35:07 +02:00
b01d267300 update primitiveCollections
The new version of primitiveCollections requires Java 10.
2018-08-18 08:32:27 +02:00
99dbf31d8a update 3rd party libs 2018-08-09 07:20:09 +02:00
daaa0e6907 update dependencies
gradle to 4.8
jackson to 2.9.6
spring-boot to 2.0.3
guava to 25.1-jre
gradle-versions-plugin to 0.19.0
2018-06-17 08:59:48 +02:00
82b8a8a932 reduce memory footprint by lazily intializing the path in Doc
The path in Doc is not optional. This reduces memory consumption,
because we only have to store a long (the offset in the listing file).
This assumes, that only a small percentage of Docs is requested.
2018-05-06 12:58:10 +02:00
b06ccb0d00 update 3rd party libs
spring boot to 2.0.1
guava to 24.1-jre
jackson to 2.9.5
log4j2 to 2.10.0 (same version as pulled by spring boot)
testng to 6.14.3
2018-04-21 20:01:39 +02:00
b439c9d79a update third-party libs
antlr4: 4.7 -> 4.7.1
commons-lang3: 3.6 -> 3.7
2018-01-21 08:44:30 +01:00
ahr
2df66c7b2f update primitiveCollections
This fixes a performance issue where the IntLists were not sorted and
therefore slow union/intersection algorithms were chosen.
2017-12-29 08:20:52 +01:00
ahr
8225dd2077 update primitiveCollections to 0.1.20171216143737
Use intersection and union methods from IntList.
2017-12-16 17:35:16 +01:00
a636f2b9bd update primitive collections to 0.1.20171007100354 2017-11-18 10:09:47 +01:00
347f1fdc74 update 3rd-party libraries 2017-09-23 18:24:51 +02:00
ac1ee20046 replace ludb with data-store
LuDB has a few disadvantages. 
  1. Most notably disk space. H2 wastes a lot of valuable disk space.
     For my test data set with 44 million entries it is 14 MB 
     (sometimes a lot more; depends on H2 internal cleanup). With 
     data-store it is 15 KB.
     Overall I could reduce the disk space from 231 MB to 200 MB (13.4 %
     in this example). That is an average of 4.6 bytes per entry.
  2. Speed:
     a) Liquibase is slow. The first time it takes approx. three seconds
     b) Query and insertion. with data-store we can insert entries 
        up to 1.6 times faster.

Data-store uses a few tricks to save disk space:
  1. We encode the tags into the file names.
  2. To keep them short we translate the key/value of the tag into 
     shorter numbers. For example "foo" -> 12 and "bar" to 47. So the
     tag "foo"/"bar" would be 12/47. 
     We then translate this number into a numeral system of base 62
     (a-zA-Z0-9), so it can be used for file names and it is shorter.
     That way we only have to store the mapping of string to int.
  3. We do that in a simple tab separated file.
2017-04-16 09:07:28 +02:00