perfdb

Author	SHA1	Message	Date
Andreas Huber	60f1a79816	add settings to file upload This makes it possible to define properties for the uploaded CSV files. Currently we can define the separator and which columns are to be ignored.	2019-12-08 20:20:13 +01:00
Andreas Huber	85679ca0c8	send CSV file via REST	2019-12-08 18:39:43 +01:00
Andreas Huber	30504672bc	extend CsvReaderSettings by list of columns that will not be indexed	2019-12-01 09:21:07 +01:00
Andreas Huber	08b1be5334	extract CSV reading code to new file Refactoring to prepare the addition of CSV parsing rules. The parsing rules will contain information about which columns to ingest or ignore. This will be used to add the ability to upload files via HTTP post in addition to the TcpIngestor.	2019-11-30 17:58:01 +01:00
Andreas Huber	06b379494f	apply new code formatter and save action	2019-11-24 10:20:43 +01:00
Andreas Huber	10a7710940	fix CSV parser corrupts duration if duration is last element in line	2019-11-14 18:40:14 +01:00
Andreas Huber	b7c4fe4c1f	move scatter plot creation into an AggregateHandler	2019-10-20 08:11:09 +02:00
Andreas Huber	2cb81e5acd	make it possible to ignore columns using the csv ingestor	2019-07-04 09:51:33 +02:00
Andreas Huber	dbe0e02517	rename cluster to partition We are not clustering the indices, we are partitioning them.	2019-04-14 10:10:16 +02:00
Andreas Huber	59aea1a15f	introduce index clustering (part 1) In order to prevent files from getting too big and make it easier to implement retention policies, we are splitting all files into chunks. Each chunk contains the data for a time interval (1 month per default). This first changeset introduces the ClusteredPersistentMap that implements this for PersistentMap. It is used for a couple (not all) of indices.	2019-02-24 16:50:57 +01:00
Andreas Huber	668d73c926	introduced a new custom file format used for backup and ingestion The new file format reduces repetition, is easy to parse, easy to generate in any language and is human readable.	2019-02-03 15:44:35 +01:00
Andreas Huber	4cde10a9f2	read csv using input stream instead of reader We are now reading the CSV input without transforming the data into strings. This reduces the amount of bytes that have to be converted and copied. We also made Tag smaller. It no longer stores pointers to strings, instead it stored integers obtained by compressing the strings (see StringCompressor). This reduces memory usage and it speeds up hashcode and equals, which speeds up access to the writer cache. Performance gain is almost 100%: - 330k entries/s -> 670k entries/s, top speed measured over a second - 62s -> 32s, to ingest 16 million entries	2019-01-01 08:31:28 +01:00
Andreas Huber	40f4506e13	use FastISODateParser.parseAsEpochMilli Compared to FastISODateParser.parse, which returns an OffsetDateTime object, parseAsEpochMilli returns the epoch time millis. The performance improvement for date parsing alone is roughly 100% (8m dates/s to 18m dates/s). Insertion speed improved from 13-14s for 1.6m entries to 11.5-12.5s.	2018-12-16 19:24:47 +01:00
Andreas Huber	23f800a441	add date parsing method that returns epochMillis instead of date object	2018-12-16 15:38:26 +01:00
Andreas Huber	218ea9ed68	use custom date parser A specialized date parser that can only handle ISO-8601 like dates (2011-12-03T10:15:30.123Z or 2011-12-03T10:15:30+01:00) but does this roughly 10 times faster than DateTimeFormatter and 5 times faster than the FastDateParser of commons-lang3.	2018-11-19 19:23:57 +01:00
Andreas Huber	979e001efd	TcpIngestor can handle csv files	2018-10-11 18:56:16 +02:00
Andreas Huber	6d4e3da672	add test for sending entries with negative values to the ingestor	2018-10-07 09:08:25 +02:00
Andreas Huber	ad630fc6b2	simplify caching in TagsToFile - PdbFiles no longer require dates to be monotonically increasing. Therefore TagsToFile does not have to ensure this. => We only have one file per Tags. - Use EhCache instead of HashMap.	2018-09-30 10:38:25 +02:00
Andreas Huber	1182d76205	replace the FolderStorage with DiskStorage - The DiskStorage uses only one file instead of millions. Also the block size is only 512 byte instead of 4kb, which helps to reduce the memory usage for short sequences. - Update primitiveCollections to get the new LongList.range and LongList.rangeClosed methods. - BSFile now stores Time&Value sequences and knows how to encode the time values with delta encoding. - Doc had to do some magic tricks to save memory. The path was initialized lazy and stored as byte array. This is no longer necessary. The patch was replaced by the rootBlockNumber of the BSFile. - Had to temporarily disable the 'in' queries. - The stored values are now processed as stream of LongLists instead of Entry. The overhead for creating Entries is gone, so is the memory overhead, because Entry was an object and had a reference to the tags, which is unnecessary.	2018-09-12 09:35:07 +02:00
Andreas Huber	bda2de672e	improvements - split the 'sortby' select field into two fields - sort by average - legend shows plotted and total values in the date range - removed InlineDataSeries, because it was not used anymore	2018-05-01 17:32:25 +02:00
Andreas Huber	5343c0d427	reduce memory usage Reduce memory usage by storing the filename as string instead of individual tags.	2018-03-19 19:21:57 +01:00
ahr	5a9aae70af	handle corrupt json Entries must be separated by a newline. This allows us to handle corrupt json entries, because we know that entries only start at a line beginning.	2018-03-03 09:58:50 +01:00
Andreas Huber	ac1ee20046	replace ludb with data-store LuDB has a few disadvantages. 1. Most notably disk space. H2 wastes a lot of valuable disk space. For my test data set with 44 million entries it is 14 MB (sometimes a lot more; depends on H2 internal cleanup). With data-store it is 15 KB. Overall I could reduce the disk space from 231 MB to 200 MB (13.4 % in this example). That is an average of 4.6 bytes per entry. 2. Speed: a) Liquibase is slow. The first time it takes approx. three seconds b) Query and insertion. with data-store we can insert entries up to 1.6 times faster. Data-store uses a few tricks to save disk space: 1. We encode the tags into the file names. 2. To keep them short we translate the key/value of the tag into shorter numbers. For example "foo" -> 12 and "bar" to 47. So the tag "foo"/"bar" would be 12/47. We then translate this number into a numeral system of base 62 (a-zA-Z0-9), so it can be used for file names and it is shorter. That way we only have to store the mapping of string to int. 3. We do that in a simple tab separated file.	2017-04-16 09:07:28 +02:00
Andreas Huber	72436e9c8c	extract utility method a method to send json over tcp can be used by several tests	2017-04-08 08:21:50 +02:00
Andreas Huber	ed17d84da4	remove obsolete and disabled test	2017-04-08 08:18:17 +02:00
Andreas Huber	aadc9cbd21	move TcpIngestor to pdb-ui and start it in the web application. Also use the spring way of handling property files.	2017-03-19 08:00:18 +01:00
Andreas Huber	d1e39513f3	create web application	2016-12-21 17:48:36 +01:00

27 Commits