perfdb

Author	SHA1	Message	Date
Andreas Huber	a99f6a276e	fix missing/wrong logging 1. Log the exception in PdbFileIterator with a logger instead of just printing it to stderr. 2. Increase log level for exceptions when inserting entries. 3. Log exception when creation of entry failed in TcpIngestor.	2017-04-17 18:27:25 +02:00
Andreas Huber	c58e7baf69	make sure there is an exception if the file is corrupt	2017-04-17 17:52:11 +02:00
Andreas Huber	bcb2e6ca83	add query completion We are using ANTLR listeners to find out where in the query the cursor is. Then we generate a list of keys/values that might fit at that position. With that information we can generate new queries and sort them by the number of results they yield.	2017-04-17 16:25:14 +02:00
Andreas Huber	f6a9fc2394	propose for an empty query	2017-04-16 10:39:17 +02:00
Andreas Huber	44f30aafee	add a new facade in front of DataStore This is done in preparation for the proposal API. In order to compute proposals we need to consume the API of the DataStore, but the code does not need to be in the DataStore. Extracting the API allows us to separate these concerns.	2017-04-16 10:11:46 +02:00
Andreas Huber	43d6eba7b7	skip entries if we cannot search for the pdb file Happened when the project was 'http:'.	2017-04-16 09:49:21 +02:00
Andreas Huber	ac1ee20046	replace ludb with data-store LuDB has a few disadvantages. 1. Most notably disk space. H2 wastes a lot of valuable disk space. For my test data set with 44 million entries it is 14 MB (sometimes a lot more; depends on H2 internal cleanup). With data-store it is 15 KB. Overall I could reduce the disk space from 231 MB to 200 MB (13.4 % in this example). That is an average of 4.6 bytes per entry. 2. Speed: a) Liquibase is slow. The first time it takes approx. three seconds b) Query and insertion. with data-store we can insert entries up to 1.6 times faster. Data-store uses a few tricks to save disk space: 1. We encode the tags into the file names. 2. To keep them short we translate the key/value of the tag into shorter numbers. For example "foo" -> 12 and "bar" to 47. So the tag "foo"/"bar" would be 12/47. We then translate this number into a numeral system of base 62 (a-zA-Z0-9), so it can be used for file names and it is shorter. That way we only have to store the mapping of string to int. 3. We do that in a simple tab separated file.	2017-04-16 09:07:28 +02:00
Andreas Huber	f22be73b42	switch the byte prefix of DATE_INCREMENT and MEASUREMENT Date increments have usually higher values. I had hoped to reduce the file size by a lot. But in my example data with 44 million entries (real life data) it only reduced the storage size by 1.5%. Also fixed a bug in PdbReader that prevented other values for the CONTINUATION byte. Also added a small testing tool that prints the content of a pdb file. It is not (yet) made available as standalone tool, but during debugging sessions it is very useful.	2017-04-13 20:19:29 +02:00
Andreas Huber	58f8606cd3	use special logger for insertion metrics This allows us to enable/disable metric logging without having to log other stuff.	2017-04-13 20:12:00 +02:00
Andreas Huber	8baf05962f	group by multiple fields Before we could only group by a single field. But it is acutally very useful to group by multiple fields. For example to see the graph for a small set of methods grouped by host and project.	2017-04-12 19:16:19 +02:00
Andreas Huber	b8b4a6d760	remove deprecated constructor and getter	2017-04-10 20:15:22 +02:00
Andreas Huber	ac8ad8d30f	close open files when no new entries are received If for 10 seconds no new entry is received, then all open files are flushed and closed. We do this to make sure, that we do not loose data, when we kill the process. There is still a risk of data loss if we kill the process while entries are received.	2017-04-10 20:13:10 +02:00
Andreas Huber	d72d6df0f4	update third-party libraries	2017-04-08 08:18:39 +02:00
Andreas Huber	2d78a70883	duration for inserts was wrong The bug was, that we computed the difference between millis and nanos. Also log duration for flushes.	2017-04-02 11:15:24 +02:00
Andreas Huber	ee00ecb4b5	remove obsolete class	2017-03-20 19:02:01 +01:00
Andreas Huber	9ab5d76d93	better exception logging	2017-03-19 09:08:41 +01:00
Andreas Huber	a01c8b3907	fix flaky test and improve error handling just ignore invalid entries	2017-03-18 10:14:41 +01:00
Andreas Huber	513c256352	update third party libraries	2017-03-17 16:23:21 +01:00
Andreas Huber	3456177291	add date range filter	2017-03-17 11:17:57 +01:00
Andreas Huber	5aee6f5e4d	use label '<none>' to for values that have not value for groupBy field	2017-02-12 18:56:37 +01:00
Andreas Huber	562dadb692	group plots by field	2017-02-12 09:59:14 +01:00
Andreas Huber	b238849d65	use text input for filtering, again	2017-02-12 09:32:46 +01:00
Andreas Huber	0c9195011a	use log4j in pdb-ui	2017-02-05 11:20:00 +01:00
Andreas Huber	3722ba02b1	add slf4j via log4j 2 logging	2017-02-05 09:53:25 +01:00
Andreas Huber	175a866c90	update third-party libraries	2017-02-05 08:54:49 +01:00
Andreas Huber	4f77515bbd	test for keywords db performance	2017-01-07 09:10:42 +01:00
Andreas Huber	c283568757	group plots by a single field	2016-12-30 18:45:01 +01:00
Andreas Huber	62437f384f	minor unimportant changes	2016-12-30 13:16:30 +01:00
Andreas Huber	58bb64c80a	save 12ms in when checking if cached writer can be used	2016-12-29 19:33:45 +01:00
Andreas Huber	f520f18e13	leverage the cached pdbwriters this increased performance from 500 entries per second to 4000.	2016-12-29 19:24:16 +01:00
Andreas Huber	de241ceb6d	finalize refactoring	2016-12-29 18:27:15 +01:00
Andreas Huber	68ac1dd631	reuse pdb writers	2016-12-28 08:39:20 +01:00
Andreas Huber	db0b3d6d24	new file format Store values in sequences of variable length. Instead of using 8 bytes per entry we are now using between 2 and 20 bytes. But we are also able to store every non-negative long value.	2016-12-27 10:24:56 +01:00
Andreas Huber	c5f0e8514c	remove debug output	2016-12-23 19:28:11 +01:00
Andreas Huber	580733d267	only store the tag specific base folder in the database before that we added each file (one per day and tag combination) to the db	2016-12-23 19:12:30 +01:00
Andreas Huber	6969c8ce46	all storage files for the same tags use the same storage folder - added an additional data folder as first level	2016-12-23 16:35:00 +01:00
Andreas Huber	85eaee940e	change directory structure - the tags come first, then the date, e.g. "mykey=myvalue_<uuid>/2016/01/01/<uuid>" - We do this, so that we don't have to tag each file, but only the root folder. This should speed up searches	2016-12-23 15:07:08 +01:00
Andreas Huber	5efab12063	test which verifies the dates in each file are monotonically increasing	2016-12-23 13:04:05 +01:00
Andreas Huber	470f3c730d	add UT for testing multiple files for different days	2016-12-23 12:48:26 +01:00
Andreas Huber	95e34831d3	simple auto-completion for the search box	2016-12-23 10:32:51 +01:00
Andreas Huber	d1e39513f3	create web application	2016-12-21 17:48:36 +01:00
Andreas Huber	35054b00b8	check what starts faster json, ludb or mapdb	2016-12-17 10:54:54 +01:00
Andreas Huber	d4c694dea3	group results by a single field	2016-12-14 19:36:38 +01:00
Andreas Huber	b25060a5d2	add first most simple result object	2016-12-14 17:59:04 +01:00
Andreas Huber	fa4921fcc9	use custom csv writer for performance	2016-12-13 18:41:19 +01:00
Andreas Huber	876520eb4c	do not create a new ObjectMapper per entry also read value with MappingIterator. This made reading 20-30 times faster. We can now read and index 100k-500k per second. The varianz might be due to LuDB slowness.	2016-12-12 18:45:02 +01:00
Andreas Huber	89fbaf2d06	TcpIngestor that receives a stream of json objects and stores them	2016-12-11 18:40:44 +01:00
Andreas Huber	e936df6f7e	render plot with a single dataseriew	2016-12-10 18:50:29 +01:00
Andreas Huber	81b39c5675	small enhancements	2016-12-10 15:36:06 +01:00
Andreas Huber	4376f8f783	log4j does not guarantee monotonically increasing date values	2016-12-10 15:35:29 +01:00

1 2

53 Commits