Commit Graph

703 Commits

Author SHA1 Message Date
151e9363e1 remove obsolete classes 2019-02-02 16:45:34 +01:00
76e5d441de rewrite query completion
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
2019-02-02 15:35:56 +01:00
72e9a9ebe3 prepare more efficient query completion
adding an index that answers the question
given a query "a=b and c=", what are possible values
for c.
2019-01-13 10:22:17 +01:00
5197063ae3 the union of many small lists is expensive
The reason seems to be the number of memory allocations. In order
to create the union of 100 lists we have 99 memory allocations.
The first needs the space for the first two lists, the second the
space for the first three lists, and so on.

We can reduce the number of allocations drastically (in many
cases to one) by leveraging the fact that many of the lists
were already sorted, non-overlapping and increasing, so that
we can simply concatenate them.
2019-01-05 08:52:56 +01:00
3dca7483de utility that generates a csv with many different tags 2019-01-05 08:33:57 +01:00
f2d16b6758 make CacheKey comparable
The CacheKey is used as a key in a HashMap. Lookup can
be faster if the CacheKey is comparable when there are
hash collisions.
In this case I was not able to measure any effect. I am
keeping the comparables nonetheless, because the can
only have a positive effect.
2019-01-01 08:47:48 +01:00
4cde10a9f2 read csv using input stream instead of reader
We are now reading the CSV input without transforming
the data into strings. This reduces the amount of bytes
that have to be converted and copied.
We also made Tag smaller. It no longer stores pointers
to strings, instead it stored integers obtained by
compressing the strings (see StringCompressor). This
reduces memory usage and it speeds up hashcode and
equals, which speeds up access to the writer cache.

Performance gain is almost 100%:
- 330k entries/s -> 670k entries/s, top speed measured over a second
- 62s -> 32s, to ingest 16 million entries
2019-01-01 08:31:28 +01:00
0487c30582 use List instead of TreeMap for intToString mapping
UniqueStringIntegerPairs stores mappings of integers
0-n to strings and vice versa. Mapping integers to
strings does not need a TreeMap, it can be done with
a List.
Makes insertions 3 times (when using the in-memory
variant that does not write to disk) and 7 times faster
for int to string mapping.
2018-12-22 10:07:19 +01:00
e537e94d39 HotEntryCache will update Instants only once per second
Calling Instant.now() several hundred thousand times per
second can be expensive. In my measurements >10% of the
time spend when loading new data was spend calling
Instant.now().
Fixed this by storing an Instant as static member and
updating it periodically in a separate thread.
2018-12-21 19:16:55 +01:00
d95a71e32e batch entries between TcpIngestor and PerformanceDB
One bottleneck was the blocking queue used to transport entries
from the listener thread to the ingestor thread.
Reduced the bottleneck by batching entries.
Interestingly the batch size of 100 was better than batch size
of 1000 and better than 10.
2018-12-21 13:11:35 +01:00
73ad27ab96 remove lastAccessMap
In the last commit I added a lastAccessMap to the HotEntryCache.
This map made it much more efficient to evict entries. But it
also made and put and get operation much more expensive. Overall
that change lead to a 65% decrease in ingestion performance of
the PerformanceDB.
Fixed by removing the map again. Eviction has to look at all
elements again.
2018-12-21 10:28:34 +01:00
afba3b6f77 elements not evicted if new elements are added 2018-12-20 16:13:55 +01:00
d52bfa0916 remove obsolete class RadixConverter 2018-12-17 19:11:33 +01:00
3a4101bbf9 increase the buffer between ingestion and insertion thread
I was finally able to show that there is a tiny but measureable
effect of this buffer. I think it was not visible before,
because the parsing was too slow. But now, that I replaced the
date parser, the ingestion thread is twice as fast as the
insertion thread. Therefore the buffer makes more sense.
2018-12-17 19:07:55 +01:00
d37508b7a1 Pattern.split is faster than StringUtils.splitPreserveAll
Document the fact, so that I do not have to repeat the same
test a third time.
2018-12-17 19:05:34 +01:00
40f4506e13 use FastISODateParser.parseAsEpochMilli
Compared to FastISODateParser.parse, which returns an
OffsetDateTime object, parseAsEpochMilli returns the
epoch time millis. The performance improvement for
date parsing alone is roughly 100% (8m dates/s to
18m dates/s).
Insertion speed improved from 13-14s for 1.6m entries
to 11.5-12.5s.
2018-12-16 19:24:47 +01:00
23f800a441 add date parsing method that returns epochMillis instead of date object 2018-12-16 15:38:26 +01:00
20c555c30a update 3rd party libs and gradle 2018-12-07 14:06:59 +01:00
253bbabd19 cleanup
remove debug output
2018-11-25 07:49:23 +00:00
a86a473b4a use unix line breaks 2018-11-25 07:49:04 +00:00
37207d67ab use utf-8 as resource encoding 2018-11-25 07:29:29 +00:00
593752470c cleanup 2018-11-25 07:46:58 +01:00
5404253bc6 use TreeMap in PersistentMapDiskNode instead of list 2018-11-24 15:57:05 +01:00
d67e452a91 cache disk blocks in an LRU cache
Improves read access by factor 4 for small trees.
2018-11-24 15:07:37 +01:00
9889252205 use only one thread for evictions
Instead of spawning a new thread for every cache, we use a single thread
that will evict entries from all caches.
The thread keeps a weak reference to the caches, so that they can be
garbage collected.
2018-11-24 08:32:05 +01:00
64771417e4 only iterates over elements when at least one element can be evicted 2018-11-23 07:23:38 +01:00
f78f69328b add cache for docId to Doc mapping
A Doc does not change once it is created, so it is easy to cache.
Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).
2018-11-22 19:51:07 +01:00
6c546bd5b3 update primitiveCollections
The new version comes with an improved removeAll method that is O(n+m)
on sorted lists.
2018-11-21 18:55:54 +01:00
cc0157fe0b update java 3rd-party libs 2018-11-20 19:13:59 +01:00
218ea9ed68 use custom date parser
A specialized date parser that can only handle ISO-8601 like dates
(2011-12-03T10:15:30.123Z or 2011-12-03T10:15:30+01:00) but does this
roughly 10 times faster than DateTimeFormatter and 5 times
faster than the FastDateParser of commons-lang3.
2018-11-19 19:23:57 +01:00
6f48a25d53 do not force changes to disk
diskBlock.force() makes insertion speed very slow, because it adds
two digit ms to tree changes. I disabled it for now. The tree is not
crash resistent anyway.
2018-11-19 19:22:27 +01:00
afd1e36066 fix unsupported operation exception when adding to an unmodifiable set 2018-11-19 19:19:51 +01:00
135ab42cd8 tags are now stored as variable length byte sequences of longs
Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now
stored as longs (variable length encoded) in the PersistenMap.
Tags.filenameBytes was introduced to reduce memory consumption, when
all tags were hold in memory. Tags are now stored in a PersistentMap
and only read when needed.

Moved the VariableByteEncoder into its own project, because it was
needed by pdb-api.
2018-11-17 20:03:46 +01:00
b2107acf4e synchronize access to the PerstistentMap
The map is not (yet) thread-safe. Eventually we'll replace the
synchronized blocks with read/write locks on the nodes.
2018-11-17 10:02:29 +01:00
fce0f6a04d use PersistentMap in DataStore
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
2018-11-17 09:45:35 +01:00
3ccf526608 PersistentMap now requires only a path instead of a DiskStorage
This makes the PersistentMap easier to use.
2018-11-10 10:08:21 +01:00
e90506c1b0 add visitor that find all values by a prefix of the key 2018-11-10 09:48:36 +01:00
807257d330 remove the unused node visitor 2018-11-04 10:44:05 +01:00
008f0db377 add generics to PersistencMap 2018-11-04 10:42:05 +01:00
f2d5c27668 insertion of many values into the persistent map 2018-11-04 10:11:10 +01:00
c6782df0e5 the root node can have more than two children it it is an inner node
It is not yet possible to split inner nodes or the root node.
2018-10-27 10:17:45 +02:00
8b48b8c3e7 add a pointer to the root node
Before the offset of the root node was hard-coded.
Now the offset of the pointer to the root node is hard-coded.
That allows us to replace the root node.
2018-10-27 08:55:15 +02:00
8bb98deb1e PersistentMap can store data in multiple nodes 2018-10-26 18:35:32 +02:00
bb4514c940 insert values into root node 2018-10-14 19:53:02 +02:00
3855d03ead BSFile uses a wrapper for DiskBlock to add BSFile specific stuff
This keeps the DiskBlock class clean, so that it can be used
for PersistentMap.
2018-10-14 17:13:33 +02:00
c83b6e11e2 Add first part of a persistent map implementation. 2018-10-14 16:47:17 +02:00
bd88c63aff ensure BSFiles use blocks that are aligned to 512 Byte offsets 2018-10-14 09:00:26 +02:00
a2520c0238 move method only used in tests to the tests 2018-10-13 20:03:02 +02:00
b42fec8fe2 use var keyword 2018-10-13 10:14:52 +02:00
b42bb88dff DiskStorage can allocate and free blocks of arbitrary sizes 2018-10-13 10:03:41 +02:00