Commit Graph

118 Commits

Author SHA1 Message Date
d4d1685f9f replace stdout with logger 2019-02-02 16:49:21 +01:00
151e9363e1 remove obsolete classes 2019-02-02 16:45:34 +01:00
76e5d441de rewrite query completion
The old implementation searched for all possible values and then
executed each query to see what matches.
The new implementation uses several indices to find only
the matching values.
2019-02-02 15:35:56 +01:00
72e9a9ebe3 prepare more efficient query completion
adding an index that answers the question
given a query "a=b and c=", what are possible values
for c.
2019-01-13 10:22:17 +01:00
5197063ae3 the union of many small lists is expensive
The reason seems to be the number of memory allocations. In order
to create the union of 100 lists we have 99 memory allocations.
The first needs the space for the first two lists, the second the
space for the first three lists, and so on.

We can reduce the number of allocations drastically (in many
cases to one) by leveraging the fact that many of the lists
were already sorted, non-overlapping and increasing, so that
we can simply concatenate them.
2019-01-05 08:52:56 +01:00
4cde10a9f2 read csv using input stream instead of reader
We are now reading the CSV input without transforming
the data into strings. This reduces the amount of bytes
that have to be converted and copied.
We also made Tag smaller. It no longer stores pointers
to strings, instead it stored integers obtained by
compressing the strings (see StringCompressor). This
reduces memory usage and it speeds up hashcode and
equals, which speeds up access to the writer cache.

Performance gain is almost 100%:
- 330k entries/s -> 670k entries/s, top speed measured over a second
- 62s -> 32s, to ingest 16 million entries
2019-01-01 08:31:28 +01:00
e537e94d39 HotEntryCache will update Instants only once per second
Calling Instant.now() several hundred thousand times per
second can be expensive. In my measurements >10% of the
time spend when loading new data was spend calling
Instant.now().
Fixed this by storing an Instant as static member and
updating it periodically in a separate thread.
2018-12-21 19:16:55 +01:00
253bbabd19 cleanup
remove debug output
2018-11-25 07:49:23 +00:00
593752470c cleanup 2018-11-25 07:46:58 +01:00
f78f69328b add cache for docId to Doc mapping
A Doc does not change once it is created, so it is easy to cache.
Speedup was from 1ms per Doc to 3ms for 444 Docs (0.00675ms/Doc).
2018-11-22 19:51:07 +01:00
cc0157fe0b update java 3rd-party libs 2018-11-20 19:13:59 +01:00
afd1e36066 fix unsupported operation exception when adding to an unmodifiable set 2018-11-19 19:19:51 +01:00
135ab42cd8 tags are now stored as variable length byte sequences of longs
Replaced Tags.filenameBytes with a SortedSet<Tag>. Tags are now
stored as longs (variable length encoded) in the PersistenMap.
Tags.filenameBytes was introduced to reduce memory consumption, when
all tags were hold in memory. Tags are now stored in a PersistentMap
and only read when needed.

Moved the VariableByteEncoder into its own project, because it was
needed by pdb-api.
2018-11-17 20:03:46 +01:00
fce0f6a04d use PersistentMap in DataStore
Replaces the use of in-memory data structures with the PersistentMap.
This is the crucial step in reducing memory usage for both persistent
storage and main memory.
2018-11-17 09:45:35 +01:00
bd88c63aff ensure BSFiles use blocks that are aligned to 512 Byte offsets 2018-10-14 09:00:26 +02:00
0539080200 use byte offsets instead of block numbers
We want to allow arbitrary allocations in DiskStorage. The
first step was to change the hard coded block size into a
dynamic one.
2018-10-12 08:10:43 +02:00
979d3269fa remove obsolete classes and methods 2018-10-04 18:46:51 +02:00
8939332004 remove the wrapper class PdbDB
It did not serve any purpose and could be replaced by DataStore.
2018-10-04 18:43:27 +02:00
f07977c27a update java, gradle and third party libs 2018-09-29 09:08:29 +02:00
24fcfd7763 prepare the addition of a date index 2018-09-28 19:07:01 +02:00
1d88c8dfd7 update spring-boot to 2.0.5.RELEASE
update commons-lang3 to 3.8
2018-09-13 18:58:07 +02:00
a2e63cca44 cleanup 2018-09-13 08:11:15 +02:00
c6a1291ee6 the pattern must match the property value exactly,
when matching property values to the query. This is important when
you have a property value that is a prefix of another property value,
e.g., AuditService.logEvent and AuditService.logEvents.
2018-09-13 07:55:13 +02:00
61f131571a add CamelCase matching to the query language 2018-09-12 13:42:23 +02:00
86b8f93752 replace 'in' queries with a simpler syntax
field in (val1, val2)
was replaced with
field=val1,val2
or
field=(val1, val2)
2018-09-12 10:10:01 +02:00
1182d76205 replace the FolderStorage with DiskStorage
- The DiskStorage uses only one file instead of millions.
  Also the block size is only 512 byte instead of 4kb, which
  helps to reduce the memory usage for short sequences.
- Update primitiveCollections to get the new LongList.range
  and LongList.rangeClosed methods.
- BSFile now stores Time&Value sequences and knows how to
  encode the time values with delta encoding.
- Doc had to do some magic tricks to save memory. The path
  was initialized lazy and stored as byte array. This is no
  longer necessary. The patch was replaced by the
  rootBlockNumber of the BSFile.
- Had to temporarily disable the 'in' queries.
- The stored values are now processed as stream of LongLists
  instead of Entry. The overhead for creating Entries is
  gone, so is the memory overhead, because Entry was an
  object and had a reference to the tags, which is
  unnecessary.
2018-09-12 09:35:07 +02:00
ea5e16fad5 expressions now support in-queries 2018-08-18 10:31:49 +02:00
b01d267300 update primitiveCollections
The new version of primitiveCollections requires Java 10.
2018-08-18 08:32:27 +02:00
99dbf31d8a update 3rd party libs 2018-08-09 07:20:09 +02:00
182d1edd97 add a datetime picker
Unfortunately the datetime picker does not support seconds. But it is
one of the few that support date and time and are flexible enough to
be used with VueJS.
2018-08-04 08:32:04 +00:00
daaa0e6907 update dependencies
gradle to 4.8
jackson to 2.9.6
spring-boot to 2.0.3
guava to 25.1-jre
gradle-versions-plugin to 0.19.0
2018-06-17 08:59:48 +02:00
b61a34a0e6 use existing RandomAccessFile when updating the listing file
Ingestion speed dropped drastically with the old implementation.
In some situations to 7 entries per second over a 10 second period
(sic!). When using the already opened RandomAccessFile the speed
is back to previous values of 40k-50k entries per second on my 10 year
old machine on an encrypted spinning disk.
2018-05-10 17:41:50 +02:00
911062e26b use RandomAccessFile in FolderStorage.getPathByOffset()
The old implementation opened a new buffered reader everytime
getPathByOffset was called. This took 1/20th of a second or
longer. For queries that visited thousands of files this could
take a long time.
We are now using a RandomAccessFile, that is opened once. The
average time spend in getPathByOffset is now down to 0.11ms.
2018-05-10 10:22:25 +02:00
82b8a8a932 reduce memory footprint by lazily intializing the path in Doc
The path in Doc is not optional. This reduces memory consumption,
because we only have to store a long (the offset in the listing file).
This assumes, that only a small percentage of Docs is requested.
2018-05-06 12:58:10 +02:00
e3102c01d4 use listing.csv instead of iterating through all folders
The hope is, that it is faster to read a single file instead of listing
hundreds of folders.
2018-05-05 10:46:16 +02:00
6d85c56cb0 range definitions for the y-axis
Sometimes it is useful to specify the certain y-axis range. For example
when you are only interested in the values that take longer than a
threshold. Or when you want to exclude some outliers. When you want to
compare plots in a gallery, it is very handy when all plots have the
same data-area.
2018-05-01 10:18:06 +02:00
b06ccb0d00 update 3rd party libs
spring boot to 2.0.1
guava to 24.1-jre
jackson to 2.9.5
log4j2 to 2.10.0 (same version as pulled by spring boot)
testng to 6.14.3
2018-04-21 20:01:39 +02:00
57938d5269 do not check if we can find values when proposing keys
Counting the available values is quite expensive and there are only a
few corner cases where this makes sense. One of them is when the query
is for a method that is not project specific and therefore no project
values can be found.
2018-04-14 10:38:00 +02:00
23e16ff61d ignore null values in tags 2018-04-14 09:58:51 +02:00
1755562a84 do not move the cursor to the end when applying a proposal 2018-04-08 14:06:13 +02:00
68ee88bce0 rewrite autocomplete in vue.js 2018-04-08 08:44:28 +02:00
22c99f8517 fix null pointer exception
filename were generated without '$', but the parsing code expected
the '$'.
2018-03-28 19:34:48 +02:00
de0f8412bd show proposals for empty terminals 2018-03-25 19:17:49 +02:00
5343c0d427 reduce memory usage
Reduce memory usage by storing the filename as string instead of
individual tags.
2018-03-19 19:21:57 +01:00
b439c9d79a update third-party libs
antlr4: 4.7 -> 4.7.1
commons-lang3: 3.6 -> 3.7
2018-01-21 08:44:30 +01:00
ahr
d98c45e8bd add index for tags-to-documents
Now we can find writer much faster, because we don't have to execute
a query for documents that match the tags. We can just look up the 
documents in the map.
Speedup: 2-4ms -> 0.002-0.01ms
2018-01-14 09:51:37 +01:00
ahr
bcc30f0f3f add trace logging and make set of proposals synchronized
I checked if computing the proposals with a parallel stream would be
beneficial. Turns out the stream uses several threads, but the overall
computation is not faster, because each individual computation is
slower.
2017-12-30 10:08:54 +01:00
ahr
fc30ffd928 sort IntLists in DataStore
The IntLists were no longer sorted since we made the initialization run
in parallel. Therefore a much slower implementation for
intersection/union was used.
2017-12-30 09:45:50 +01:00
ahr
cc70f45c12 add different plot types
Step 1: 
Added PlotType enum and a drop down to the UI.
Extracted the code for scatter plots.
2017-12-29 08:57:34 +01:00
ahr
2df66c7b2f update primitiveCollections
This fixes a performance issue where the IntLists were not sorted and
therefore slow union/intersection algorithms were chosen.
2017-12-29 08:20:52 +01:00