perfdb

Author	SHA1	Message	Date
ahr	2df66c7b2f	update primitiveCollections This fixes a performance issue where the IntLists were not sorted and therefore slow union/intersection algorithms were chosen.	2017-12-29 08:20:52 +01:00
ahr	e060e9761d	cleanup	2017-12-23 10:06:52 +01:00
ahr	8037212145	synchronize docIdToDoc list When we parallelized the initialization we forgot to synchronize the docIdToDoc list. Luckily there is a high probability, that queries return results, that are obviously wrong.	2017-12-23 10:06:45 +01:00
ahr	888d25f7ea	trim docIdToDoc list This reduces memory usage by 1 or 2 MB. 33% of an ArrayList can be free. If the list is 1 million entries long, then the list wastes 2.6 MB. The Doc objects in the list are much bigger.	2017-12-23 09:42:08 +01:00
ahr	e59caa0f02	parallelize initialization of DataStore When the files are already in the OS cache, then the initialization time for 750k files went down from 35 seconds to 15 seconds.	2017-12-23 08:58:42 +01:00
ahr	a6251074cf	add trace logging to ExpressionToDocIdVisitor	2017-12-20 11:14:41 +01:00
ahr	04b029e1be	add trace logging	2017-12-16 19:19:12 +01:00
ahr	6ef4e7a96b	reduce memory footprint of index by trimming IntLists Reduced the memory usage of the IntLists in the index by 4.1MB (19.9MB to 15.8MB) for 683,390 files and 4,046,250 values in the IntLists.	2017-12-16 17:57:15 +01:00
ahr	8225dd2077	update primitiveCollections to 0.1.20171216143737 Use intersection and union methods from IntList.	2017-12-16 17:35:16 +01:00
andi	a6a2236d18	do not compute counts when proposing all keys	2017-11-18 13:03:45 +01:00
andi	f2868fcc1b	reduce memory footprint: old generation by 100 MB This reduces the size of the old generation by 100MB (300MB down to 200MB). Unfortunately the total JVM size didn't change and is still 512MB. Doc stores the path as byte array instead of Path.	2017-11-18 10:39:01 +01:00
andi	a636f2b9bd	update primitive collections to 0.1.20171007100354	2017-11-18 10:09:47 +01:00
andi	347f1fdc74	update 3rd-party libraries	2017-09-23 18:24:51 +02:00
andi	c9ff8b5586	only propose value if the existing prefix is a real prefix	2017-09-23 13:31:34 +02:00
andi	87858a79c1	compute proposals for blank strings Before we would only provide proposals for empty strings. But blank and empty is not that different.	2017-04-20 19:05:21 +02:00
andi	bcb2e6ca83	add query completion We are using ANTLR listeners to find out where in the query the cursor is. Then we generate a list of keys/values that might fit at that position. With that information we can generate new queries and sort them by the number of results they yield.	2017-04-17 16:25:14 +02:00
andi	f6a9fc2394	propose for an empty query	2017-04-16 10:39:17 +02:00
andi	44f30aafee	add a new facade in front of DataStore This is done in preparation for the proposal API. In order to compute proposals we need to consume the API of the DataStore, but the code does not need to be in the DataStore. Extracting the API allows us to separate these concerns.	2017-04-16 10:11:46 +02:00
andi	ac1ee20046	replace ludb with data-store LuDB has a few disadvantages. 1. Most notably disk space. H2 wastes a lot of valuable disk space. For my test data set with 44 million entries it is 14 MB (sometimes a lot more; depends on H2 internal cleanup). With data-store it is 15 KB. Overall I could reduce the disk space from 231 MB to 200 MB (13.4 % in this example). That is an average of 4.6 bytes per entry. 2. Speed: a) Liquibase is slow. The first time it takes approx. three seconds b) Query and insertion. with data-store we can insert entries up to 1.6 times faster. Data-store uses a few tricks to save disk space: 1. We encode the tags into the file names. 2. To keep them short we translate the key/value of the tag into shorter numbers. For example "foo" -> 12 and "bar" to 47. So the tag "foo"/"bar" would be 12/47. We then translate this number into a numeral system of base 62 (a-zA-Z0-9), so it can be used for file names and it is shorter. That way we only have to store the mapping of string to int. 3. We do that in a simple tab separated file.	2017-04-16 09:07:28 +02:00

19 Commits