Commit Graph

130 Commits

Author SHA1 Message Date
6d5cdbafca FastTime a faster alternative to System.currentTimeMillis
FastTime is 100 times faster (according to my primitive
benchmark) than System.currentTimeMillis. It is less accurate.
2021-07-30 19:45:44 +02:00
ee79cb0022 cleanup after revert 2021-05-12 18:20:34 +02:00
7adfc7029f Revert "introduce indexes"
This reverts commit 36ccc57db6.
2021-05-12 18:18:57 +02:00
36ccc57db6 introduce indexes 2021-05-09 10:33:28 +02:00
11beda5432 make logging of insertion speed a little nicer 2020-11-24 10:00:53 +01:00
3e77c2a103 various fixes 2020-08-11 16:12:18 +02:00
9a311313ec use US locale to format strings
This is especially important for all strings that are
passed to gnuplot. Because gnuplot uses the US locale
during parsing.
2020-03-12 19:40:20 +01:00
5d8df6888d move Entry and Entries to data-store 2019-12-13 18:15:10 +01:00
550d7ba44e add flag to make CSV upload wait until entries are flushed
To make it easier/possible to write stable unit test the CSV upload
can optionally wait until all entries have been flushed to disk.
This is necessary for tests that ingest data and then read the data.
2019-12-13 18:05:20 +01:00
07ad62ddd9 use Junit5 instead of TestNG
We want to be able to use @SpringBootTest tests that fully initialize
the Spring application. This is much easier done with Junit than TestNG.
Gradle does not support (at least not easily) to run Junit and TestNG
tests. Therefore we switch to Junit with all tests.
The original reason for using TestNG was that Junit didn't support
data providers. But that finally changed in Junit5 with
ParameterizedTest.
2019-12-13 14:33:20 +01:00
e931856041 merge projects file-utils, byte-utils and pdb-utils
It turned out that most projects needed at least
two of the utils projects. file-utils and byte-utils
had only one class. Merging them made sense.
2019-12-08 18:47:54 +01:00
85679ca0c8 send CSV file via REST 2019-12-08 18:39:43 +01:00
06b379494f apply new code formatter and save action 2019-11-24 10:20:43 +01:00
4367323fcd replace deprecated dependency configurations
Using api and implementation instead of the
deprecated compile configuration.

Update to Gradle 6.0.
2019-11-10 11:08:50 +01:00
57ad6a1cee update SpringBoot to 2.1.9
Also remove direct dependencies to log4j-api and log4j-core where
possible. log4j-slf4j-impl is enough in many cases.
2019-10-04 20:15:09 +02:00
2f35978184 fetch available values for gallery via autocomplete method
We had a method that returned the values of a field
with respect to a query. That method was inefficient,
because it executed the query, fetched all Docs
and collected the values.
The autocomplete method we introduced a while back
can answer the same question but much more efficiently.
2019-08-25 18:52:05 +02:00
dfe9579726 use DateTimeRange.max() instead of arbitrary relative range 2019-04-20 20:36:26 +02:00
dbe0e02517 rename cluster to partition
We are not clustering the indices, we
are partitioning them.
2019-04-14 10:10:16 +02:00
5d0ceb112e add clustering for DiskStore 2019-03-17 10:53:02 +01:00
b5e2d0a217 introduce clustering for query completion indices 2019-03-16 10:19:28 +01:00
59aea1a15f introduce index clustering (part 1)
In order to prevent files from getting too big and
make it easier to implement retention policies, we
are splitting all files into chunks. Each chunk
contains the data for a time interval (1 month per
default).
This first changeset introduces the ClusteredPersistentMap
that implements this for PersistentMap. It is used
for a couple (not all) of indices.
2019-02-24 16:50:57 +01:00
372a073b6d PdbWriter is no longer in the API of DataStore 2019-02-16 16:24:14 +01:00
92a47d9b56 remove TagsToFile
Remove one layer of abstraction by moving the code into the DataStore.
2019-02-16 16:06:46 +01:00
117ef4ea34 use guava's cache as implementation for the HotEntryCache
My own implementation was faster, but was not able to
implement a size limitation.
2019-02-16 10:23:52 +01:00
493971bcf3 values used in queries were added to the keys.csv
Due to a mistake in Tag which added all strings used
by Tag into the String dictionary, the dictionary
did contain all values that were used in queries.
2019-02-09 08:28:23 +01:00
ea5884a5e6 move creation of PdbWriter to the DataStore 2019-02-07 18:06:41 +01:00
58bfba23bb reset lastEpochMilli when opening a new export file 2019-02-06 15:52:37 +00:00
668d73c926 introduced a new custom file format used for backup and ingestion
The new file format reduces repetition, is easy to parse,
easy to generate in any language and is human readable.
2019-02-03 15:44:35 +01:00
f2d16b6758 make CacheKey comparable
The CacheKey is used as a key in a HashMap. Lookup can
be faster if the CacheKey is comparable when there are
hash collisions.
In this case I was not able to measure any effect. I am
keeping the comparables nonetheless, because the can
only have a positive effect.
2019-01-01 08:47:48 +01:00
e537e94d39 HotEntryCache will update Instants only once per second
Calling Instant.now() several hundred thousand times per
second can be expensive. In my measurements >10% of the
time spend when loading new data was spend calling
Instant.now().
Fixed this by storing an Instant as static member and
updating it periodically in a separate thread.
2018-12-21 19:16:55 +01:00
d95a71e32e batch entries between TcpIngestor and PerformanceDB
One bottleneck was the blocking queue used to transport entries
from the listener thread to the ingestor thread.
Reduced the bottleneck by batching entries.
Interestingly the batch size of 100 was better than batch size
of 1000 and better than 10.
2018-12-21 13:11:35 +01:00
40f4506e13 use FastISODateParser.parseAsEpochMilli
Compared to FastISODateParser.parse, which returns an
OffsetDateTime object, parseAsEpochMilli returns the
epoch time millis. The performance improvement for
date parsing alone is roughly 100% (8m dates/s to
18m dates/s).
Insertion speed improved from 13-14s for 1.6m entries
to 11.5-12.5s.
2018-12-16 19:24:47 +01:00
f78f69328b add cache for docId to Doc mapping
A Doc does not change once it is created, so it is easy to cache.
Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).
2018-11-22 19:51:07 +01:00
cc0157fe0b update java 3rd-party libs 2018-11-20 19:13:59 +01:00
eaa234bfa5 rename put to putEntries
The method name put is used too often so that eclipse has a
hard time finding references.
2018-10-11 19:25:01 +02:00
979e001efd TcpIngestor can handle csv files 2018-10-11 18:56:16 +02:00
979d3269fa remove obsolete classes and methods 2018-10-04 18:46:51 +02:00
8939332004 remove the wrapper class PdbDB
It did not serve any purpose and could be replaced by DataStore.
2018-10-04 18:43:27 +02:00
01b93e32ca replace EhCache with a custom implementation
The cache must remove/evict writers after a few seconds, but EhCache
only evicts entries when a new entry is added. That is not acceptable
for us, because that would leave lots of files open and we would need
a second mechanism to close them.
Therefore I write a simple wrapper for a ConcurrentHashMap that evicts
entries after timeToLive+5s.
2018-10-03 20:22:45 +02:00
c9dcc77b53 reuse existing PdbFiles 2018-10-03 16:49:46 +02:00
60578b45ec PdbWriters are now closed by the cache TagsToFile
we do not have to close the files when the input streams are idle.
2018-10-03 16:47:29 +02:00
ad630fc6b2 simplify caching in TagsToFile
- PdbFiles no longer require dates to be monotonically
  increasing. Therefore TagsToFile does not have to ensure
  this. => We only have one file per Tags.
- Use EhCache instead of HashMap.
2018-09-30 10:38:25 +02:00
f07977c27a update java, gradle and third party libs 2018-09-29 09:08:29 +02:00
24fcfd7763 prepare the addition of a date index 2018-09-28 19:07:01 +02:00
84350c4dfb move TimeStampDeltaDecoder to BSFile
Now the encoding and decoding code is in the same class.
2018-09-13 13:08:45 +02:00
a2e63cca44 cleanup 2018-09-13 08:11:15 +02:00
1182d76205 replace the FolderStorage with DiskStorage
- The DiskStorage uses only one file instead of millions.
  Also the block size is only 512 byte instead of 4kb, which
  helps to reduce the memory usage for short sequences.
- Update primitiveCollections to get the new LongList.range
  and LongList.rangeClosed methods.
- BSFile now stores Time&Value sequences and knows how to
  encode the time values with delta encoding.
- Doc had to do some magic tricks to save memory. The path
  was initialized lazy and stored as byte array. This is no
  longer necessary. The patch was replaced by the
  rootBlockNumber of the BSFile.
- Had to temporarily disable the 'in' queries.
- The stored values are now processed as stream of LongLists
  instead of Entry. The overhead for creating Entries is
  gone, so is the memory overhead, because Entry was an
  object and had a reference to the tags, which is
  unnecessary.
2018-09-12 09:35:07 +02:00
89840cf9e9 update dependencies 2018-07-28 08:50:42 +02:00
daaa0e6907 update dependencies
gradle to 4.8
jackson to 2.9.6
spring-boot to 2.0.3
guava to 25.1-jre
gradle-versions-plugin to 0.19.0
2018-06-17 08:59:48 +02:00
911062e26b use RandomAccessFile in FolderStorage.getPathByOffset()
The old implementation opened a new buffered reader everytime
getPathByOffset was called. This took 1/20th of a second or
longer. For queries that visited thousands of files this could
take a long time.
We are now using a RandomAccessFile, that is opened once. The
average time spend in getPathByOffset is now down to 0.11ms.
2018-05-10 10:22:25 +02:00