perfdb

Author	SHA1	Message	Date
andi	151e9363e1	remove obsolete classes	2019-02-02 16:45:34 +01:00
andi	76e5d441de	rewrite query completion The old implementation searched for all possible values and then executed each query to see what matches. The new implementation uses several indices to find only the matching values.	2019-02-02 15:35:56 +01:00
andi	72e9a9ebe3	prepare more efficient query completion adding an index that answers the question given a query "a=b and c=", what are possible values for c.	2019-01-13 10:22:17 +01:00
andi	5197063ae3	the union of many small lists is expensive The reason seems to be the number of memory allocations. In order to create the union of 100 lists we have 99 memory allocations. The first needs the space for the first two lists, the second the space for the first three lists, and so on. We can reduce the number of allocations drastically (in many cases to one) by leveraging the fact that many of the lists were already sorted, non-overlapping and increasing, so that we can simply concatenate them.	2019-01-05 08:52:56 +01:00
andi	3dca7483de	utility that generates a csv with many different tags	2019-01-05 08:33:57 +01:00
andi	f2d16b6758	make CacheKey comparable The CacheKey is used as a key in a HashMap. Lookup can be faster if the CacheKey is comparable when there are hash collisions. In this case I was not able to measure any effect. I am keeping the comparables nonetheless, because the can only have a positive effect.	2019-01-01 08:47:48 +01:00
andi	4cde10a9f2	read csv using input stream instead of reader We are now reading the CSV input without transforming the data into strings. This reduces the amount of bytes that have to be converted and copied. We also made Tag smaller. It no longer stores pointers to strings, instead it stored integers obtained by compressing the strings (see StringCompressor). This reduces memory usage and it speeds up hashcode and equals, which speeds up access to the writer cache. Performance gain is almost 100%: - 330k entries/s -> 670k entries/s, top speed measured over a second - 62s -> 32s, to ingest 16 million entries	2019-01-01 08:31:28 +01:00
andi	0487c30582	use List instead of TreeMap for intToString mapping UniqueStringIntegerPairs stores mappings of integers 0-n to strings and vice versa. Mapping integers to strings does not need a TreeMap, it can be done with a List. Makes insertions 3 times (when using the in-memory variant that does not write to disk) and 7 times faster for int to string mapping.	2018-12-22 10:07:19 +01:00
andi	e537e94d39	HotEntryCache will update Instants only once per second Calling Instant.now() several hundred thousand times per second can be expensive. In my measurements >10% of the time spend when loading new data was spend calling Instant.now(). Fixed this by storing an Instant as static member and updating it periodically in a separate thread.	2018-12-21 19:16:55 +01:00
andi	d95a71e32e	batch entries between TcpIngestor and PerformanceDB One bottleneck was the blocking queue used to transport entries from the listener thread to the ingestor thread. Reduced the bottleneck by batching entries. Interestingly the batch size of 100 was better than batch size of 1000 and better than 10.	2018-12-21 13:11:35 +01:00
andi	73ad27ab96	remove lastAccessMap In the last commit I added a lastAccessMap to the HotEntryCache. This map made it much more efficient to evict entries. But it also made and put and get operation much more expensive. Overall that change lead to a 65% decrease in ingestion performance of the PerformanceDB. Fixed by removing the map again. Eviction has to look at all elements again.	2018-12-21 10:28:34 +01:00
andi	afba3b6f77	elements not evicted if new elements are added	2018-12-20 16:13:55 +01:00
andi	d52bfa0916	remove obsolete class RadixConverter	2018-12-17 19:11:33 +01:00
andi	3a4101bbf9	increase the buffer between ingestion and insertion thread I was finally able to show that there is a tiny but measureable effect of this buffer. I think it was not visible before, because the parsing was too slow. But now, that I replaced the date parser, the ingestion thread is twice as fast as the insertion thread. Therefore the buffer makes more sense.	2018-12-17 19:07:55 +01:00
andi	d37508b7a1	Pattern.split is faster than StringUtils.splitPreserveAll Document the fact, so that I do not have to repeat the same test a third time.	2018-12-17 19:05:34 +01:00
andi	40f4506e13	use FastISODateParser.parseAsEpochMilli Compared to FastISODateParser.parse, which returns an OffsetDateTime object, parseAsEpochMilli returns the epoch time millis. The performance improvement for date parsing alone is roughly 100% (8m dates/s to 18m dates/s). Insertion speed improved from 13-14s for 1.6m entries to 11.5-12.5s.	2018-12-16 19:24:47 +01:00
andi	23f800a441	add date parsing method that returns epochMillis instead of date object	2018-12-16 15:38:26 +01:00
andi	20c555c30a	update 3rd party libs and gradle	2018-12-07 14:06:59 +01:00
andi	253bbabd19	cleanup remove debug output	2018-11-25 07:49:23 +00:00
andi	a86a473b4a	use unix line breaks	2018-11-25 07:49:04 +00:00
andi	37207d67ab	use utf-8 as resource encoding	2018-11-25 07:29:29 +00:00
andi	593752470c	cleanup	2018-11-25 07:46:58 +01:00
andi	5404253bc6	use TreeMap in PersistentMapDiskNode instead of list	2018-11-24 15:57:05 +01:00
andi	d67e452a91	cache disk blocks in an LRU cache Improves read access by factor 4 for small trees.	2018-11-24 15:07:37 +01:00
andi	9889252205	use only one thread for evictions Instead of spawning a new thread for every cache, we use a single thread that will evict entries from all caches. The thread keeps a weak reference to the caches, so that they can be garbage collected.	2018-11-24 08:32:05 +01:00
andi	64771417e4	only iterates over elements when at least one element can be evicted	2018-11-23 07:23:38 +01:00
andi	f78f69328b	add cache for docId to Doc mapping A Doc does not change once it is created, so it is easy to cache. Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).	2018-11-22 19:51:07 +01:00
andi	6c546bd5b3	update primitiveCollections The new version comes with an improved removeAll method that is O(n+m) on sorted lists.	2018-11-21 18:55:54 +01:00
andi	cc0157fe0b	update java 3rd-party libs	2018-11-20 19:13:59 +01:00
andi	218ea9ed68	use custom date parser A specialized date parser that can only handle ISO-8601 like dates (2011-12-03T10:15:30.123Z or 2011-12-03T10:15:30+01:00) but does this roughly 10 times faster than DateTimeFormatter and 5 times faster than the FastDateParser of commons-lang3.	2018-11-19 19:23:57 +01:00
andi	6f48a25d53	do not force changes to disk diskBlock.force() makes insertion speed very slow, because it adds two digit ms to tree changes. I disabled it for now. The tree is not crash resistent anyway.	2018-11-19 19:22:27 +01:00
andi	afd1e36066	fix unsupported operation exception when adding to an unmodifiable set	2018-11-19 19:19:51 +01:00
andi	135ab42cd8	tags are now stored as variable length byte sequences of longs Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now stored as longs (variable length encoded) in the PersistenMap. Tags.filenameBytes was introduced to reduce memory consumption, when all tags were hold in memory. Tags are now stored in a PersistentMap and only read when needed. Moved the VariableByteEncoder into its own project, because it was needed by pdb-api.	2018-11-17 20:03:46 +01:00
andi	b2107acf4e	synchronize access to the PerstistentMap The map is not (yet) thread-safe. Eventually we'll replace the synchronized blocks with read/write locks on the nodes.	2018-11-17 10:02:29 +01:00
andi	fce0f6a04d	use PersistentMap in DataStore Replaces the use of in-memory data structures with the PersistentMap. This is the crucial step in reducing memory usage for both persistent storage and main memory.	2018-11-17 09:45:35 +01:00
andi	3ccf526608	PersistentMap now requires only a path instead of a DiskStorage This makes the PersistentMap easier to use.	2018-11-10 10:08:21 +01:00
andi	e90506c1b0	add visitor that find all values by a prefix of the key	2018-11-10 09:48:36 +01:00
andi	807257d330	remove the unused node visitor	2018-11-04 10:44:05 +01:00
andi	008f0db377	add generics to PersistencMap	2018-11-04 10:42:05 +01:00
andi	f2d5c27668	insertion of many values into the persistent map	2018-11-04 10:11:10 +01:00
andi	c6782df0e5	the root node can have more than two children it it is an inner node It is not yet possible to split inner nodes or the root node.	2018-10-27 10:17:45 +02:00
andi	8b48b8c3e7	add a pointer to the root node Before the offset of the root node was hard-coded. Now the offset of the pointer to the root node is hard-coded. That allows us to replace the root node.	2018-10-27 08:55:15 +02:00
andi	8bb98deb1e	PersistentMap can store data in multiple nodes	2018-10-26 18:35:32 +02:00
andi	bb4514c940	insert values into root node	2018-10-14 19:53:02 +02:00
andi	3855d03ead	BSFile uses a wrapper for DiskBlock to add BSFile specific stuff This keeps the DiskBlock class clean, so that it can be used for PersistentMap.	2018-10-14 17:13:33 +02:00
andi	c83b6e11e2	Add first part of a persistent map implementation.	2018-10-14 16:47:17 +02:00
andi	bd88c63aff	ensure BSFiles use blocks that are aligned to 512 Byte offsets	2018-10-14 09:00:26 +02:00
andi	a2520c0238	move method only used in tests to the tests	2018-10-13 20:03:02 +02:00
andi	b42fec8fe2	use var keyword	2018-10-13 10:14:52 +02:00
andi	b42bb88dff	DiskStorage can allocate and free blocks of arbitrary sizes	2018-10-13 10:03:41 +02:00

... 6 7 8 9 10 ...

703 Commits