Commit Graph

435 Commits

Author SHA1 Message Date
andi 8f15aba0d5 replace individual percentile aggregates with a single one for all 2018-01-21 10:54:13 +01:00
andi b439c9d79a update third-party libs
antlr4: 4.7 -> 4.7.1
commons-lang3: 3.6 -> 3.7
2018-01-21 08:44:30 +01:00
andi 469dc20411 add gradle plugin: gradle-versions-plugin 2018-01-21 08:43:45 +01:00
andi 9f45eb24ca add trace logging for creation of new writer 2018-01-21 08:36:40 +01:00
ahr 740cb1cb2d print metrics every 10 seconds, not every 10.001 seconds 2018-01-14 09:52:08 +01:00
ahr d98c45e8bd add index for tags-to-documents
Now we can find writer much faster, because we don't have to execute
a query for documents that match the tags. We can just look up the 
documents in the map.
Speedup: 2-4ms -> 0.002-0.01ms
2018-01-14 09:51:37 +01:00
ahr 64613ce43c add metric logging for getWriter 2018-01-13 10:32:03 +01:00
ahr 0f2fcc3c9c extract long_to_string converter 2018-01-06 08:40:58 +01:00
ahr c5c7c03c66 add example logger 2017-12-30 10:09:19 +01:00
ahr bcc30f0f3f add trace logging and make set of proposals synchronized
I checked if computing the proposals with a parallel stream would be
beneficial. Turns out the stream uses several threads, but the overall
computation is not faster, because each individual computation is
slower.
2017-12-30 10:08:54 +01:00
ahr 3cc512f73d update third party libs
testng 6.11 -> 6.13.1
jackson-databind 2.9.1 -> 2.9.3
guava 23.0 -> 23.6-jre
2017-12-30 10:06:57 +01:00
ahr fc30ffd928 sort IntLists in DataStore
The IntLists were no longer sorted since we made the initialization run
in parallel. Therefore a much slower implementation for
intersection/union was used.
2017-12-30 09:45:50 +01:00
ahr 5617547d63 add percentile plots 2017-12-30 09:15:26 +01:00
ahr 9b9554552d Extract interface for DataSeries
This will make it possible to have DataSeries that do not require a csv
file on disk.
2017-12-29 09:15:29 +01:00
ahr cc70f45c12 add different plot types
Step 1: 
Added PlotType enum and a drop down to the UI.
Extracted the code for scatter plots.
2017-12-29 08:57:34 +01:00
ahr 2df66c7b2f update primitiveCollections
This fixes a performance issue where the IntLists were not sorted and
therefore slow union/intersection algorithms were chosen.
2017-12-29 08:20:52 +01:00
ahr e060e9761d cleanup 2017-12-23 10:06:52 +01:00
ahr 8037212145 synchronize docIdToDoc list
When we parallelized the initialization we forgot to
synchronize the docIdToDoc list.
Luckily there is a high probability, that queries return
results, that are obviously wrong.
2017-12-23 10:06:45 +01:00
ahr 888d25f7ea trim docIdToDoc list
This reduces memory usage by 1 or 2 MB.
33% of an ArrayList can be free. If the list is 1 million entries long,
then the list wastes 2.6 MB.
The Doc objects in the list are much bigger.
2017-12-23 09:42:08 +01:00
ahr e59caa0f02 parallelize initialization of DataStore
When the files are already in the OS cache, then the initialization time
for 750k files went down from 35 seconds to 15 seconds.
2017-12-23 08:58:42 +01:00
ahr a6251074cf add trace logging to ExpressionToDocIdVisitor 2017-12-20 11:14:41 +01:00
ahr 6509391059 sometimes plots are missing
The csv generation is running in parallel, but the 
list that collects the results was not synchronized.
2017-12-16 19:22:56 +01:00
ahr cafaa7343c remove obsolete method 2017-12-16 19:20:38 +01:00
ahr fd1479760a use same log format for console and file 2017-12-16 19:20:24 +01:00
ahr a359652f8b log stdout/stderr of the gnuplot process 2017-12-16 19:19:35 +01:00
ahr 04b029e1be add trace logging 2017-12-16 19:19:12 +01:00
ahr 6ef4e7a96b reduce memory footprint of index by trimming IntLists
Reduced the memory usage of the IntLists in the index by 4.1MB (19.9MB
to 15.8MB) for 683,390 files and 4,046,250 values in the IntLists.
2017-12-16 17:57:15 +01:00
ahr 8225dd2077 update primitiveCollections to 0.1.20171216143737
Use intersection and union methods from IntList.
2017-12-16 17:35:16 +01:00
ahr a2512b210f update to gradle 4.4 2017-12-16 17:33:45 +01:00
ahr d63fabc85d prevent parallel plot requests
Plotting can take a long time and use a lot of resources. 
Multiple plot requests can cause the machine to run OOM.

We are now allowing plots for 500k files again. This is mainly to
prevent unwanted plots of everything.
2017-12-15 17:20:12 +01:00
ahr 8d48726472 remove unnecessary mapping to TagSpecificBaseDir 2017-12-15 16:52:20 +01:00
ahr eb1f026c2f update spring-boot to 1.5.9 2017-12-11 08:28:21 +01:00
ahr 8860a048ff remove call of listRecursively on a file
The call was needed in a very early version.
2017-12-10 17:55:16 +01:00
ahr c4dce942a6 parallelize csv generation
speedup 50% and more
2017-12-10 17:53:53 +01:00
ahr 3ee6336125 log time of query execution 2017-12-10 17:52:32 +01:00
ahr f17bc55a8f hide prev/next image buttons when splitBy is not active 2017-12-10 17:28:29 +01:00
ahr f2dfa92966 add refresh button 2017-12-10 17:21:59 +01:00
ahr 8e3213e2fc split by field 2017-12-10 17:00:45 +01:00
ahr 84084c3e08 remove logo 2017-12-10 09:34:10 +01:00
ahr 159c5ff371 write logs to logfile 2017-12-10 09:22:49 +01:00
ahr 06d25e7ceb do not allow search results with more than 100k docs
a) they take a long time to compute
b) danger of OOM
c) they should drill down
2017-12-10 09:19:28 +01:00
andi a6a2236d18 do not compute counts when proposing all keys 2017-11-18 13:03:45 +01:00
andi 14d1367c4a remove duplicate enums 2017-11-18 12:30:45 +01:00
andi f2868fcc1b reduce memory footprint: old generation by 100 MB
This reduces the size of the old generation by 100MB (300MB down to
200MB). Unfortunately the total JVM size didn't change and is still
512MB.

Doc stores the path as byte array instead of Path.
2017-11-18 10:39:01 +01:00
andi cc49a8cf2a open PdbReaders only when reading
We used to open all PdbReaders in a search result and then interate over
them. This used a lot of heap space (> 8GB) for 400k files.
Now the PdbReaders are only opened while they are used. Heap usage was
less than 550 while reading more than 400k files.
2017-11-18 10:12:22 +01:00
andi a636f2b9bd update primitive collections to 0.1.20171007100354 2017-11-18 10:09:47 +01:00
andi 0555691864 update gradle to 4.3.2 and spring boot to 1.5.8 2017-11-18 09:32:49 +01:00
andi 995558588a add median and 90% percentile 2017-11-18 09:28:41 +01:00
ahr f8c03c434e print thousand delimiter (of whatever they are called) 2017-11-06 17:21:45 +01:00
ahr 78671a2d8c use linespoints instead of line and make linewidth 2 instead of 1 2017-11-06 17:04:56 +01:00