The drop down for plot types should only contain plot types that can
be combined. The reason is, that we can only draw images with two
x/y-axis. Therefore a combination of types that would need three or more
axis is not supported.
Classpathes in Eclipse are different from classpathes in Gradle due to
Gradle's 'implementation' configuration which does not provide
dependency transitively in Gradle.
In the previous changeset the code that determined
which axis the plots used was implemented as a
side effect of getting the Gnuplot definition of
an axis.
Changed that to an explit update call with simpler
logic.
Gnuplot does not handle long x-axis ticks very good.
It should know how wide the labels are and could adapt
the increment size accordingly, but it doesn't.
Fixed by explicitly defining the increment for x-axis
labels.
We generate CSV files with comma as separator.
When we write times with milli seconds, then
we use floating point numbers. Depending
on the locale those floating point numbers
may be written with comma instead of point.
If that happens, then the plots are messed up.
Fixed by enforcing the locale when formatting floats.
This makes it easier to use the mouse wheel
to zoom in. Without it you could zoom into
a region that had not data and then had to
use the date picker to change the date.
Rendering plots with millions of values is expensive. Before this fix
we wrote all values into CSV files. The CSV files were then read by
Gnuplot that did the rendering. But in an image with n×m pixes there
can only be nm different values. In most realistic scenarios we will
have many values that will be drawn to the same pixels. So we are
wasting time yb first generation the CSV for too many values and then
by parsing that CSV again.
Fixed by using a sparse 2D array to de-duplicate many values before
they get written to the CSV. The additional time we spend de-duplicating
is often smaller than the time saved when writing the CSV, so that the
total CSV writing is about as 'fast' as before (sometimes a little
faster, sometimes a little slower). But the time Gnuplot needs for
rendering drastically reduces. The factor depends on the data, of
course. We have seen factor 50 for realistic examples. Making a 15s
job run in 300ms.
Before: To compute the cumulative distribution we added every duration
into a LongList. This requires O(n) memory, where n is the number of
values.
Now: We store the durations + the number of occurrences in a
LongLongHashMap. This has the potential to reduce the memory
requirements if durations occur multiple times. There are a lot of
durations with 0, 1, 2 milliseconds. In the worst case every duration
is different. In that case the memory usage doubled with this solution.
Future: We are currently storing durations with milli seconds precision.
We don't have to do that. We cannot draw 100 million different values
on the y-axis in an images with only 1000px.
People are having trouble to understand durations like
100000 or 2.7E+6 milliseconds. Therefore we are
hanging the labels on the y-axis to include the unit
in the tic's label. We also use multiples of seconds,
minutes, hours and days instead of multiples of 10.
In order to prevent files from getting too big and
make it easier to implement retention policies, we
are splitting all files into chunks. Each chunk
contains the data for a time interval (1 month per
default).
This first changeset introduces the ClusteredPersistentMap
that implements this for PersistentMap. It is used
for a couple (not all) of indices.
- The DiskStorage uses only one file instead of millions.
Also the block size is only 512 byte instead of 4kb, which
helps to reduce the memory usage for short sequences.
- Update primitiveCollections to get the new LongList.range
and LongList.rangeClosed methods.
- BSFile now stores Time&Value sequences and knows how to
encode the time values with delta encoding.
- Doc had to do some magic tricks to save memory. The path
was initialized lazy and stored as byte array. This is no
longer necessary. The patch was replaced by the
rootBlockNumber of the BSFile.
- Had to temporarily disable the 'in' queries.
- The stored values are now processed as stream of LongLists
instead of Entry. The overhead for creating Entries is
gone, so is the memory overhead, because Entry was an
object and had a reference to the tags, which is
unnecessary.
The new datetimepicker can be used to specify date ranges. We no longer
need to define a start date and a range. This simplifies the code
for zooming and shifting considerably.