Commit Graph

819 Commits

Author SHA1 Message Date
ahr e060e9761d cleanup 2017-12-23 10:06:52 +01:00
ahr 8037212145 synchronize docIdToDoc list
When we parallelized the initialization we forgot to
synchronize the docIdToDoc list.
Luckily there is a high probability, that queries return
results, that are obviously wrong.
2017-12-23 10:06:45 +01:00
ahr 888d25f7ea trim docIdToDoc list
This reduces memory usage by 1 or 2 MB.
33% of an ArrayList can be free. If the list is 1 million entries long,
then the list wastes 2.6 MB.
The Doc objects in the list are much bigger.
2017-12-23 09:42:08 +01:00
ahr e59caa0f02 parallelize initialization of DataStore
When the files are already in the OS cache, then the initialization time
for 750k files went down from 35 seconds to 15 seconds.
2017-12-23 08:58:42 +01:00
ahr a6251074cf add trace logging to ExpressionToDocIdVisitor 2017-12-20 11:14:41 +01:00
ahr 6509391059 sometimes plots are missing
The csv generation is running in parallel, but the 
list that collects the results was not synchronized.
2017-12-16 19:22:56 +01:00
ahr cafaa7343c remove obsolete method 2017-12-16 19:20:38 +01:00
ahr fd1479760a use same log format for console and file 2017-12-16 19:20:24 +01:00
ahr a359652f8b log stdout/stderr of the gnuplot process 2017-12-16 19:19:35 +01:00
ahr 04b029e1be add trace logging 2017-12-16 19:19:12 +01:00
ahr 6ef4e7a96b reduce memory footprint of index by trimming IntLists
Reduced the memory usage of the IntLists in the index by 4.1MB (19.9MB
to 15.8MB) for 683,390 files and 4,046,250 values in the IntLists.
2017-12-16 17:57:15 +01:00
ahr 8225dd2077 update primitiveCollections to 0.1.20171216143737
Use intersection and union methods from IntList.
2017-12-16 17:35:16 +01:00
ahr a2512b210f update to gradle 4.4 2017-12-16 17:33:45 +01:00
ahr d63fabc85d prevent parallel plot requests
Plotting can take a long time and use a lot of resources. 
Multiple plot requests can cause the machine to run OOM.

We are now allowing plots for 500k files again. This is mainly to
prevent unwanted plots of everything.
2017-12-15 17:20:12 +01:00
ahr 8d48726472 remove unnecessary mapping to TagSpecificBaseDir 2017-12-15 16:52:20 +01:00
ahr eb1f026c2f update spring-boot to 1.5.9 2017-12-11 08:28:21 +01:00
ahr 8860a048ff remove call of listRecursively on a file
The call was needed in a very early version.
2017-12-10 17:55:16 +01:00
ahr c4dce942a6 parallelize csv generation
speedup 50% and more
2017-12-10 17:53:53 +01:00
ahr 3ee6336125 log time of query execution 2017-12-10 17:52:32 +01:00
ahr f17bc55a8f hide prev/next image buttons when splitBy is not active 2017-12-10 17:28:29 +01:00
ahr f2dfa92966 add refresh button 2017-12-10 17:21:59 +01:00
ahr 8e3213e2fc split by field 2017-12-10 17:00:45 +01:00
ahr 84084c3e08 remove logo 2017-12-10 09:34:10 +01:00
ahr 159c5ff371 write logs to logfile 2017-12-10 09:22:49 +01:00
ahr 06d25e7ceb do not allow search results with more than 100k docs
a) they take a long time to compute
b) danger of OOM
c) they should drill down
2017-12-10 09:19:28 +01:00
andi a6a2236d18 do not compute counts when proposing all keys 2017-11-18 13:03:45 +01:00
andi 14d1367c4a remove duplicate enums 2017-11-18 12:30:45 +01:00
andi f2868fcc1b reduce memory footprint: old generation by 100 MB
This reduces the size of the old generation by 100MB (300MB down to
200MB). Unfortunately the total JVM size didn't change and is still
512MB.

Doc stores the path as byte array instead of Path.
2017-11-18 10:39:01 +01:00
andi cc49a8cf2a open PdbReaders only when reading
We used to open all PdbReaders in a search result and then interate over
them. This used a lot of heap space (> 8GB) for 400k files.
Now the PdbReaders are only opened while they are used. Heap usage was
less than 550 while reading more than 400k files.
2017-11-18 10:12:22 +01:00
andi a636f2b9bd update primitive collections to 0.1.20171007100354 2017-11-18 10:09:47 +01:00
andi 0555691864 update gradle to 4.3.2 and spring boot to 1.5.8 2017-11-18 09:32:49 +01:00
andi 995558588a add median and 90% percentile 2017-11-18 09:28:41 +01:00
ahr f8c03c434e print thousand delimiter (of whatever they are called) 2017-11-06 17:21:45 +01:00
ahr 78671a2d8c use linespoints instead of line and make linewidth 2 instead of 1 2017-11-06 17:04:56 +01:00
ahr 64db4c48a2 add plots for percentiles 2017-11-06 16:57:22 +01:00
ahr 92dde94443 preparation to add plots for percentiles 2017-11-05 09:21:34 +01:00
ahr 870ff492d9 enable logging of metrics 2017-11-05 08:52:33 +01:00
ahr 27db9f934d increase entry buffer 2017-11-05 08:52:10 +01:00
andi 11b3610971 make invaders better
add kill count
do not move all invaders at once
2017-10-01 19:08:59 +02:00
andi 08f1961f51 replace spinner with a little game 2017-10-01 17:23:59 +02:00
andi 386f211377 make it possible to draw the legend outside of the plot area 2017-09-30 17:51:33 +02:00
andi d4fd25dc4c replace LinkedHashMap with a more memory efficient implementation
This saves approximately 50MB of heap space.
2017-09-30 17:51:02 +02:00
andi 7e00594382 add helper class that returns the size of objects 2017-09-30 17:49:21 +02:00
andi e0655f66fa skip invalid entries 2017-09-24 17:21:20 +02:00
andi a7cd918fc6 skip empty files 2017-09-24 17:12:17 +02:00
andi dc8262c37e replace deprecated API 2017-09-24 17:00:08 +02:00
andi 4b53baacae add scrollbar for proposals, again 2017-09-24 13:32:51 +02:00
andi 35500e387a add vertical marker lines that show which area will be used after
zooming
2017-09-24 12:52:44 +02:00
andi 8b74992e66 reduce memory allocations by 25% with a cache of formatted longs 2017-09-24 09:53:25 +02:00
andi 37eb05aebf use unix timestamps in the CSVs used by gnuplot
This is faster by factor 4, because we don't have to format the date.
2017-09-24 09:38:08 +02:00