perfdb

Author	SHA1	Message	Date
andi	06b379494f	apply new code formatter and save action	2019-11-24 10:20:43 +01:00
andi	4367323fcd	replace deprecated dependency configurations Using api and implementation instead of the deprecated compile configuration. Update to Gradle 6.0.	2019-11-10 11:08:50 +01:00
andi	7636781315	fix StringIndexOutOfBounds when caret is in position 0	2019-10-26 10:30:02 +02:00
andi	57ad6a1cee	update SpringBoot to 2.1.9 Also remove direct dependencies to log4j-api and log4j-core where possible. log4j-slf4j-impl is enough in many cases.	2019-10-04 20:15:09 +02:00
andi	0e9e2cd53a	remove dependency to Guava	2019-09-01 15:44:36 +02:00
andi	8579974051	performance improvement Queries like "firstname=John and lastname=???" were slightly inefficient. They fetched all firstnames, filtered to those that matched the prefix (e.g. John or Jonathan is this example) and then iterated over all those values and return the lastnames. Fixed by having two implementations. One for the case that only a few of the values in fieldA match and one for the case that many match.	2019-08-31 19:30:54 +02:00
andi	d8a114dbaf	handle globlike patterns in in-expressions	2019-08-31 17:34:17 +02:00
andi	f8e859fb6d	cleanup and javadoc	2019-08-31 16:52:13 +02:00
andi	0eee012798	allow 'not' for negation in addition to '!'	2019-08-31 08:30:13 +02:00
andi	4161cd7f98	only field prefixes returned instead of full values When using autocomplete to return field values I missed, that autocomplete had the feature that cut values at dots. So instead of returning full field values only the prefix up to the first dot was returned. Fixed by making the cut-at-dot feature optional.	2019-08-27 20:37:07 +02:00
andi	2f35978184	fetch available values for gallery via autocomplete method We had a method that returned the values of a field with respect to a query. That method was inefficient, because it executed the query, fetched all Docs and collected the values. The autocomplete method we introduced a while back can answer the same question but much more efficiently.	2019-08-25 18:52:05 +02:00
andi	6eaf4e10fc	add maxSize parameter to HotEntryCache	2019-08-24 19:24:20 +02:00
andi	feda901f6d	remove event types We only have removal events. The additional complexity of having a generic interface for many different event types does not pay off.	2019-08-18 20:30:25 +02:00
andi	4d9ea6d2a8	switch back to my own HotEntryCache implementation Guava's cache does not evict elements reliably by time. Configure a cache to have a lifetime of n seconds, then you cannot expect that an element is actually evicted after n seconds with Guava.	2019-08-18 20:14:14 +02:00
andi	3252fcf42d	improve trace logging - Add filename for trace logs for read/write operations.	2019-08-18 09:25:49 +02:00
andi	0b3eb97b96	Fix to string for maps with values of type Empty The MAX_KEY inserted into the tree had a value of one byte. This triggered an assertion for maps with values of type Empty, because they expected values to be empty. Fixed by using an empty array for the value of the MAX_KEY.	2019-08-12 08:35:40 +02:00
andi	9fb1a136c8	cache last used date prefix The 99.9999% use case is to ingest data from the same month.	2019-04-22 09:51:44 +02:00
andi	56085061ed	do not return anything if the field/value does not exist The computation of proposals is done by searching for values in a combined index. If one of the values didn't exist, then the algorithm returned all values. Fixed by checking that we query only existing field/values from the combined index.	2019-04-20 19:48:51 +02:00
andi	dbe0e02517	rename cluster to partition We are not clustering the indices, we are partitioning them.	2019-04-14 10:10:16 +02:00
andi	2a1885a77f	cluster the indices	2019-03-31 09:01:55 +02:00
andi	95f2f26966	handle IOExceptions earlier	2019-03-17 11:13:46 +01:00
andi	5d0ceb112e	add clustering for DiskStore	2019-03-17 10:53:02 +01:00
andi	b5e2d0a217	introduce clustering for query completion indices	2019-03-16 10:19:28 +01:00
andi	fb9f8592ac	make ClusteredPersistentMap easier to use	2019-02-24 19:20:44 +01:00
andi	59aea1a15f	introduce index clustering (part 1) In order to prevent files from getting too big and make it easier to implement retention policies, we are splitting all files into chunks. Each chunk contains the data for a time interval (1 month per default). This first changeset introduces the ClusteredPersistentMap that implements this for PersistentMap. It is used for a couple (not all) of indices.	2019-02-24 16:50:57 +01:00
andi	372a073b6d	PdbWriter is no longer in the API of DataStore	2019-02-16 16:24:14 +01:00
andi	92a47d9b56	remove TagsToFile Remove one layer of abstraction by moving the code into the DataStore.	2019-02-16 16:06:46 +01:00
andi	117ef4ea34	use guava's cache as implementation for the HotEntryCache My own implementation was faster, but was not able to implement a size limitation.	2019-02-16 10:23:52 +01:00
andi	7b00eede86	refactoring: extract EncoderDecoders from DataStore	2019-02-16 09:16:15 +01:00
andi	cbcb7714bb	split BSFile into a TimeSeries and a LongStream file BSFile was used to store two types of data. This makes the API complex. I split the API into two files with easier and more clear APIs. Interestingly the API of BSFile is still rather complex and has to consider both use cases.	2019-02-10 09:59:16 +01:00
andi	27b83234cc	group proposal as if they were hierarchical We interpret dots ('.') as hierarchy delimiter in. That way we can reduce the number of proposed values and show only those for the next level.	2019-02-09 15:21:35 +01:00
andi	493971bcf3	values used in queries were added to the keys.csv Due to a mistake in Tag which added all strings used by Tag into the String dictionary, the dictionary did contain all values that were used in queries.	2019-02-09 08:28:23 +01:00
andi	ea5884a5e6	move creation of PdbWriter to the DataStore	2019-02-07 18:06:41 +01:00
andi	99cdf557b3	add metric logger for query completion evaluation	2019-02-06 15:51:41 +00:00
andi	668d73c926	introduced a new custom file format used for backup and ingestion The new file format reduces repetition, is easy to parse, easy to generate in any language and is human readable.	2019-02-03 15:44:35 +01:00
andi	d4d1685f9f	replace stdout with logger	2019-02-02 16:49:21 +01:00
andi	151e9363e1	remove obsolete classes	2019-02-02 16:45:34 +01:00
andi	76e5d441de	rewrite query completion The old implementation searched for all possible values and then executed each query to see what matches. The new implementation uses several indices to find only the matching values.	2019-02-02 15:35:56 +01:00
andi	72e9a9ebe3	prepare more efficient query completion adding an index that answers the question given a query "a=b and c=", what are possible values for c.	2019-01-13 10:22:17 +01:00
andi	5197063ae3	the union of many small lists is expensive The reason seems to be the number of memory allocations. In order to create the union of 100 lists we have 99 memory allocations. The first needs the space for the first two lists, the second the space for the first three lists, and so on. We can reduce the number of allocations drastically (in many cases to one) by leveraging the fact that many of the lists were already sorted, non-overlapping and increasing, so that we can simply concatenate them.	2019-01-05 08:52:56 +01:00
andi	4cde10a9f2	read csv using input stream instead of reader We are now reading the CSV input without transforming the data into strings. This reduces the amount of bytes that have to be converted and copied. We also made Tag smaller. It no longer stores pointers to strings, instead it stored integers obtained by compressing the strings (see StringCompressor). This reduces memory usage and it speeds up hashcode and equals, which speeds up access to the writer cache. Performance gain is almost 100%: - 330k entries/s -> 670k entries/s, top speed measured over a second - 62s -> 32s, to ingest 16 million entries	2019-01-01 08:31:28 +01:00
andi	e537e94d39	HotEntryCache will update Instants only once per second Calling Instant.now() several hundred thousand times per second can be expensive. In my measurements >10% of the time spend when loading new data was spend calling Instant.now(). Fixed this by storing an Instant as static member and updating it periodically in a separate thread.	2018-12-21 19:16:55 +01:00
andi	253bbabd19	cleanup remove debug output	2018-11-25 07:49:23 +00:00
andi	593752470c	cleanup	2018-11-25 07:46:58 +01:00
andi	f78f69328b	add cache for docId to Doc mapping A Doc does not change once it is created, so it is easy to cache. Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).	2018-11-22 19:51:07 +01:00
andi	cc0157fe0b	update java 3rd-party libs	2018-11-20 19:13:59 +01:00
andi	afd1e36066	fix unsupported operation exception when adding to an unmodifiable set	2018-11-19 19:19:51 +01:00
andi	135ab42cd8	tags are now stored as variable length byte sequences of longs Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now stored as longs (variable length encoded) in the PersistenMap. Tags.filenameBytes was introduced to reduce memory consumption, when all tags were hold in memory. Tags are now stored in a PersistentMap and only read when needed. Moved the VariableByteEncoder into its own project, because it was needed by pdb-api.	2018-11-17 20:03:46 +01:00
andi	fce0f6a04d	use PersistentMap in DataStore Replaces the use of in-memory data structures with the PersistentMap. This is the crucial step in reducing memory usage for both persistent storage and main memory.	2018-11-17 09:45:35 +01:00
andi	bd88c63aff	ensure BSFiles use blocks that are aligned to 512 Byte offsets	2018-10-14 09:00:26 +02:00

1 2 3

103 Commits