perfdb

Author	SHA1	Message	Date
Andreas Huber	01b93e32ca	replace EhCache with a custom implementation The cache must remove/evict writers after a few seconds, but EhCache only evicts entries when a new entry is added. That is not acceptable for us, because that would leave lots of files open and we would need a second mechanism to close them. Therefore I write a simple wrapper for a ConcurrentHashMap that evicts entries after timeToLive+5s.	2018-10-03 20:22:45 +02:00
Andreas Huber	c9dcc77b53	reuse existing PdbFiles	2018-10-03 16:49:46 +02:00
Andreas Huber	60578b45ec	PdbWriters are now closed by the cache TagsToFile we do not have to close the files when the input streams are idle.	2018-10-03 16:47:29 +02:00
Andreas Huber	ad630fc6b2	simplify caching in TagsToFile - PdbFiles no longer require dates to be monotonically increasing. Therefore TagsToFile does not have to ensure this. => We only have one file per Tags. - Use EhCache instead of HashMap.	2018-09-30 10:38:25 +02:00
Andreas Huber	f07977c27a	update java, gradle and third party libs	2018-09-29 09:08:29 +02:00
Andreas Huber	24fcfd7763	prepare the addition of a date index	2018-09-28 19:07:01 +02:00
Andreas Huber	84350c4dfb	move TimeStampDeltaDecoder to BSFile Now the encoding and decoding code is in the same class.	2018-09-13 13:08:45 +02:00
Andreas Huber	a2e63cca44	cleanup	2018-09-13 08:11:15 +02:00
Andreas Huber	1182d76205	replace the FolderStorage with DiskStorage - The DiskStorage uses only one file instead of millions. Also the block size is only 512 byte instead of 4kb, which helps to reduce the memory usage for short sequences. - Update primitiveCollections to get the new LongList.range and LongList.rangeClosed methods. - BSFile now stores Time&Value sequences and knows how to encode the time values with delta encoding. - Doc had to do some magic tricks to save memory. The path was initialized lazy and stored as byte array. This is no longer necessary. The patch was replaced by the rootBlockNumber of the BSFile. - Had to temporarily disable the 'in' queries. - The stored values are now processed as stream of LongLists instead of Entry. The overhead for creating Entries is gone, so is the memory overhead, because Entry was an object and had a reference to the tags, which is unnecessary.	2018-09-12 09:35:07 +02:00
Andreas Huber	89840cf9e9	update dependencies	2018-07-28 08:50:42 +02:00
Andreas Huber	daaa0e6907	update dependencies gradle to 4.8 jackson to 2.9.6 spring-boot to 2.0.3 guava to 25.1-jre gradle-versions-plugin to 0.19.0	2018-06-17 08:59:48 +02:00
Andreas Huber	911062e26b	use RandomAccessFile in FolderStorage.getPathByOffset() The old implementation opened a new buffered reader everytime getPathByOffset was called. This took 1/20th of a second or longer. For queries that visited thousands of files this could take a long time. We are now using a RandomAccessFile, that is opened once. The average time spend in getPathByOffset is now down to 0.11ms.	2018-05-10 10:22:25 +02:00
Andreas Huber	82b8a8a932	reduce memory footprint by lazily intializing the path in Doc The path in Doc is not optional. This reduces memory consumption, because we only have to store a long (the offset in the listing file). This assumes, that only a small percentage of Docs is requested.	2018-05-06 12:58:10 +02:00
Andreas Huber	e3102c01d4	use listing.csv instead of iterating through all folders The hope is, that it is faster to read a single file instead of listing hundreds of folders.	2018-05-05 10:46:16 +02:00
Andreas Huber	b06ccb0d00	update 3rd party libs spring boot to 2.0.1 guava to 24.1-jre jackson to 2.9.5 log4j2 to 2.10.0 (same version as pulled by spring boot) testng to 6.14.3	2018-04-21 20:01:39 +02:00
Andreas Huber	22c99f8517	fix null pointer exception filename were generated without '$', but the parsing code expected the '$'.	2018-03-28 19:34:48 +02:00
Andreas Huber	81711d551f	fix performance regression The last improvement of memory usage introduced a performance regression. The ingestion performance dropped by 50%-80%, because for every inserted entry the Tags were created inefficient.	2018-03-27 19:30:18 +02:00
Andreas Huber	5343c0d427	reduce memory usage Reduce memory usage by storing the filename as string instead of individual tags.	2018-03-19 19:21:57 +01:00
ahr	3387ebc134	use epoch millis instead of creating a date object We only have to check if one timestamp is newer than another. We don't have to create an expensive date object to do that.	2018-03-09 08:43:37 +01:00
ahr	7e5b762c0d	pre-compute firstByteMaxValue this operation is executed very often during ingestion	2018-03-09 08:38:58 +01:00
ahr	6b60fd542c	add percentile plots	2018-03-03 08:19:26 +01:00
Andreas Huber	9f45eb24ca	add trace logging for creation of new writer	2018-01-21 08:36:40 +01:00
ahr	740cb1cb2d	print metrics every 10 seconds, not every 10.001 seconds	2018-01-14 09:52:08 +01:00
ahr	d98c45e8bd	add index for tags-to-documents Now we can find writer much faster, because we don't have to execute a query for documents that match the tags. We can just look up the documents in the map. Speedup: 2-4ms -> 0.002-0.01ms	2018-01-14 09:51:37 +01:00
ahr	64613ce43c	add metric logging for getWriter	2018-01-13 10:32:03 +01:00
ahr	3cc512f73d	update third party libs testng 6.11 -> 6.13.1 jackson-databind 2.9.1 -> 2.9.3 guava 23.0 -> 23.6-jre	2017-12-30 10:06:57 +01:00
ahr	cafaa7343c	remove obsolete method	2017-12-16 19:20:38 +01:00
ahr	04b029e1be	add trace logging	2017-12-16 19:19:12 +01:00
ahr	d63fabc85d	prevent parallel plot requests Plotting can take a long time and use a lot of resources. Multiple plot requests can cause the machine to run OOM. We are now allowing plots for 500k files again. This is mainly to prevent unwanted plots of everything.	2017-12-15 17:20:12 +01:00
ahr	8d48726472	remove unnecessary mapping to TagSpecificBaseDir	2017-12-15 16:52:20 +01:00
ahr	8860a048ff	remove call of listRecursively on a file The call was needed in a very early version.	2017-12-10 17:55:16 +01:00
ahr	3ee6336125	log time of query execution	2017-12-10 17:52:32 +01:00
ahr	06d25e7ceb	do not allow search results with more than 100k docs a) they take a long time to compute b) danger of OOM c) they should drill down	2017-12-10 09:19:28 +01:00
Andreas Huber	cc49a8cf2a	open PdbReaders only when reading We used to open all PdbReaders in a search result and then interate over them. This used a lot of heap space (> 8GB) for 400k files. Now the PdbReaders are only opened while they are used. Heap usage was less than 550 while reading more than 400k files.	2017-11-18 10:12:22 +01:00
ahr	64db4c48a2	add plots for percentiles	2017-11-06 16:57:22 +01:00
Andreas Huber	a7cd918fc6	skip empty files	2017-09-24 17:12:17 +02:00
Andreas Huber	347f1fdc74	update 3rd-party libraries	2017-09-23 18:24:51 +02:00
Andreas Huber	38873300c8	print last inserted entry during ingestion	2017-09-23 10:55:03 +02:00
Andreas Huber	dc716f8ac4	log more information in a more predictable manner when inserting entries	2017-04-19 19:32:23 +02:00
Andreas Huber	a99f6a276e	fix missing/wrong logging 1. Log the exception in PdbFileIterator with a logger instead of just printing it to stderr. 2. Increase log level for exceptions when inserting entries. 3. Log exception when creation of entry failed in TcpIngestor.	2017-04-17 18:27:25 +02:00
Andreas Huber	c58e7baf69	make sure there is an exception if the file is corrupt	2017-04-17 17:52:11 +02:00
Andreas Huber	bcb2e6ca83	add query completion We are using ANTLR listeners to find out where in the query the cursor is. Then we generate a list of keys/values that might fit at that position. With that information we can generate new queries and sort them by the number of results they yield.	2017-04-17 16:25:14 +02:00
Andreas Huber	f6a9fc2394	propose for an empty query	2017-04-16 10:39:17 +02:00
Andreas Huber	44f30aafee	add a new facade in front of DataStore This is done in preparation for the proposal API. In order to compute proposals we need to consume the API of the DataStore, but the code does not need to be in the DataStore. Extracting the API allows us to separate these concerns.	2017-04-16 10:11:46 +02:00
Andreas Huber	43d6eba7b7	skip entries if we cannot search for the pdb file Happened when the project was 'http:'.	2017-04-16 09:49:21 +02:00
Andreas Huber	ac1ee20046	replace ludb with data-store LuDB has a few disadvantages. 1. Most notably disk space. H2 wastes a lot of valuable disk space. For my test data set with 44 million entries it is 14 MB (sometimes a lot more; depends on H2 internal cleanup). With data-store it is 15 KB. Overall I could reduce the disk space from 231 MB to 200 MB (13.4 % in this example). That is an average of 4.6 bytes per entry. 2. Speed: a) Liquibase is slow. The first time it takes approx. three seconds b) Query and insertion. with data-store we can insert entries up to 1.6 times faster. Data-store uses a few tricks to save disk space: 1. We encode the tags into the file names. 2. To keep them short we translate the key/value of the tag into shorter numbers. For example "foo" -> 12 and "bar" to 47. So the tag "foo"/"bar" would be 12/47. We then translate this number into a numeral system of base 62 (a-zA-Z0-9), so it can be used for file names and it is shorter. That way we only have to store the mapping of string to int. 3. We do that in a simple tab separated file.	2017-04-16 09:07:28 +02:00
Andreas Huber	f22be73b42	switch the byte prefix of DATE_INCREMENT and MEASUREMENT Date increments have usually higher values. I had hoped to reduce the file size by a lot. But in my example data with 44 million entries (real life data) it only reduced the storage size by 1.5%. Also fixed a bug in PdbReader that prevented other values for the CONTINUATION byte. Also added a small testing tool that prints the content of a pdb file. It is not (yet) made available as standalone tool, but during debugging sessions it is very useful.	2017-04-13 20:19:29 +02:00
Andreas Huber	58f8606cd3	use special logger for insertion metrics This allows us to enable/disable metric logging without having to log other stuff.	2017-04-13 20:12:00 +02:00
Andreas Huber	8baf05962f	group by multiple fields Before we could only group by a single field. But it is acutally very useful to group by multiple fields. For example to see the graph for a small set of methods grouped by host and project.	2017-04-12 19:16:19 +02:00
Andreas Huber	b8b4a6d760	remove deprecated constructor and getter	2017-04-10 20:15:22 +02:00

1 2

92 Commits