Commit Graph

493 Commits

Author SHA1 Message Date
ahr
3387ebc134 use epoch millis instead of creating a date object
We only have to check if one timestamp is newer than another.
We don't have to create an expensive date object to do that.
2018-03-09 08:43:37 +01:00
ahr
829fddf88c merge key and value arrays
we have several hundred thousand of those MiniMaps and this reduces
the memory requirement by 8 bytes per instance
2018-03-09 08:40:12 +01:00
ahr
7e5b762c0d pre-compute firstByteMaxValue
this operation is executed very often during ingestion
2018-03-09 08:38:58 +01:00
ahr
5a9aae70af handle corrupt json
Entries must be separated by a newline. This allows
us to handle corrupt json entries, because we know
that entries only start at a line beginning.
2018-03-03 09:58:50 +01:00
ahr
9d4eb660a5 update gradle and spring
gradle to 4.6
spring to 1.5.10.RELEASE
2018-03-03 08:34:38 +01:00
ahr
6b60fd542c add percentile plots 2018-03-03 08:19:26 +01:00
b4a0514267 draw points and percentile lines with the same color 2018-01-21 14:09:34 +01:00
bb7701e7c4 use enum for line type instead of string 2018-01-21 11:01:30 +01:00
8f15aba0d5 replace individual percentile aggregates with a single one for all 2018-01-21 10:54:13 +01:00
b439c9d79a update third-party libs
antlr4: 4.7 -> 4.7.1
commons-lang3: 3.6 -> 3.7
2018-01-21 08:44:30 +01:00
469dc20411 add gradle plugin: gradle-versions-plugin 2018-01-21 08:43:45 +01:00
9f45eb24ca add trace logging for creation of new writer 2018-01-21 08:36:40 +01:00
ahr
740cb1cb2d print metrics every 10 seconds, not every 10.001 seconds 2018-01-14 09:52:08 +01:00
ahr
d98c45e8bd add index for tags-to-documents
Now we can find writer much faster, because we don't have to execute
a query for documents that match the tags. We can just look up the 
documents in the map.
Speedup: 2-4ms -> 0.002-0.01ms
2018-01-14 09:51:37 +01:00
ahr
64613ce43c add metric logging for getWriter 2018-01-13 10:32:03 +01:00
ahr
0f2fcc3c9c extract long_to_string converter 2018-01-06 08:40:58 +01:00
ahr
c5c7c03c66 add example logger 2017-12-30 10:09:19 +01:00
ahr
bcc30f0f3f add trace logging and make set of proposals synchronized
I checked if computing the proposals with a parallel stream would be
beneficial. Turns out the stream uses several threads, but the overall
computation is not faster, because each individual computation is
slower.
2017-12-30 10:08:54 +01:00
ahr
3cc512f73d update third party libs
testng 6.11 -> 6.13.1
jackson-databind 2.9.1 -> 2.9.3
guava 23.0 -> 23.6-jre
2017-12-30 10:06:57 +01:00
ahr
fc30ffd928 sort IntLists in DataStore
The IntLists were no longer sorted since we made the initialization run
in parallel. Therefore a much slower implementation for
intersection/union was used.
2017-12-30 09:45:50 +01:00
ahr
5617547d63 add percentile plots 2017-12-30 09:15:26 +01:00
ahr
9b9554552d Extract interface for DataSeries
This will make it possible to have DataSeries that do not require a csv
file on disk.
2017-12-29 09:15:29 +01:00
ahr
cc70f45c12 add different plot types
Step 1: 
Added PlotType enum and a drop down to the UI.
Extracted the code for scatter plots.
2017-12-29 08:57:34 +01:00
ahr
2df66c7b2f update primitiveCollections
This fixes a performance issue where the IntLists were not sorted and
therefore slow union/intersection algorithms were chosen.
2017-12-29 08:20:52 +01:00
ahr
e060e9761d cleanup 2017-12-23 10:06:52 +01:00
ahr
8037212145 synchronize docIdToDoc list
When we parallelized the initialization we forgot to
synchronize the docIdToDoc list.
Luckily there is a high probability, that queries return
results, that are obviously wrong.
2017-12-23 10:06:45 +01:00
ahr
888d25f7ea trim docIdToDoc list
This reduces memory usage by 1 or 2 MB.
33% of an ArrayList can be free. If the list is 1 million entries long,
then the list wastes 2.6 MB.
The Doc objects in the list are much bigger.
2017-12-23 09:42:08 +01:00
ahr
e59caa0f02 parallelize initialization of DataStore
When the files are already in the OS cache, then the initialization time
for 750k files went down from 35 seconds to 15 seconds.
2017-12-23 08:58:42 +01:00
ahr
a6251074cf add trace logging to ExpressionToDocIdVisitor 2017-12-20 11:14:41 +01:00
ahr
6509391059 sometimes plots are missing
The csv generation is running in parallel, but the 
list that collects the results was not synchronized.
2017-12-16 19:22:56 +01:00
ahr
cafaa7343c remove obsolete method 2017-12-16 19:20:38 +01:00
ahr
fd1479760a use same log format for console and file 2017-12-16 19:20:24 +01:00
ahr
a359652f8b log stdout/stderr of the gnuplot process 2017-12-16 19:19:35 +01:00
ahr
04b029e1be add trace logging 2017-12-16 19:19:12 +01:00
ahr
6ef4e7a96b reduce memory footprint of index by trimming IntLists
Reduced the memory usage of the IntLists in the index by 4.1MB (19.9MB
to 15.8MB) for 683,390 files and 4,046,250 values in the IntLists.
2017-12-16 17:57:15 +01:00
ahr
8225dd2077 update primitiveCollections to 0.1.20171216143737
Use intersection and union methods from IntList.
2017-12-16 17:35:16 +01:00
ahr
a2512b210f update to gradle 4.4 2017-12-16 17:33:45 +01:00
ahr
d63fabc85d prevent parallel plot requests
Plotting can take a long time and use a lot of resources. 
Multiple plot requests can cause the machine to run OOM.

We are now allowing plots for 500k files again. This is mainly to
prevent unwanted plots of everything.
2017-12-15 17:20:12 +01:00
ahr
8d48726472 remove unnecessary mapping to TagSpecificBaseDir 2017-12-15 16:52:20 +01:00
ahr
eb1f026c2f update spring-boot to 1.5.9 2017-12-11 08:28:21 +01:00
ahr
8860a048ff remove call of listRecursively on a file
The call was needed in a very early version.
2017-12-10 17:55:16 +01:00
ahr
c4dce942a6 parallelize csv generation
speedup 50% and more
2017-12-10 17:53:53 +01:00
ahr
3ee6336125 log time of query execution 2017-12-10 17:52:32 +01:00
ahr
f17bc55a8f hide prev/next image buttons when splitBy is not active 2017-12-10 17:28:29 +01:00
ahr
f2dfa92966 add refresh button 2017-12-10 17:21:59 +01:00
ahr
8e3213e2fc split by field 2017-12-10 17:00:45 +01:00
ahr
84084c3e08 remove logo 2017-12-10 09:34:10 +01:00
ahr
159c5ff371 write logs to logfile 2017-12-10 09:22:49 +01:00
ahr
06d25e7ceb do not allow search results with more than 100k docs
a) they take a long time to compute
b) danger of OOM
c) they should drill down
2017-12-10 09:19:28 +01:00
a6a2236d18 do not compute counts when proposing all keys 2017-11-18 13:03:45 +01:00