We do not know which fields exist at compile time.
But it is a great help to have some pre-selected
fields in groupBy.
Solved by adding a configuration option.
Gnuplot does not handle long x-axis ticks very good.
It should know how wide the labels are and could adapt
the increment size accordingly, but it doesn't.
Fixed by explicitly defining the increment for x-axis
labels.
Making the filter panel thinner by 18%. To do this
we moved the date range next to the query box.
The thinner filter box gives us more width for the plot.
We generate CSV files with comma as separator.
When we write times with milli seconds, then
we use floating point numbers. Depending
on the locale those floating point numbers
may be written with comma instead of point.
If that happens, then the plots are messed up.
Fixed by enforcing the locale when formatting floats.
This makes it easier to use the mouse wheel
to zoom in. Without it you could zoom into
a region that had not data and then had to
use the date picker to change the date.
We are now adding the daterangepicker without Angular. This is a little
bit dirty, because we have to load jquery, moment and daterangepicker
manually via script tags, but it works without major hassle.
It is (again) surprisingly hard to find a decent date+time-range
picker that works with Angular. Daterangepicker, which I used with
my VueJS application does not work with Angular. I can't get the
angularized version
https://github.com/fragaria/angular-daterangepicker) to work either.
And and a native Angular date picker
(https://github.com/GNURub/ngx-daterangepicker) doesn't work either.
Rendering plots with millions of values is expensive. Before this fix
we wrote all values into CSV files. The CSV files were then read by
Gnuplot that did the rendering. But in an image with n×m pixes there
can only be nm different values. In most realistic scenarios we will
have many values that will be drawn to the same pixels. So we are
wasting time yb first generation the CSV for too many values and then
by parsing that CSV again.
Fixed by using a sparse 2D array to de-duplicate many values before
they get written to the CSV. The additional time we spend de-duplicating
is often smaller than the time saved when writing the CSV, so that the
total CSV writing is about as 'fast' as before (sometimes a little
faster, sometimes a little slower). But the time Gnuplot needs for
rendering drastically reduces. The factor depends on the data, of
course. We have seen factor 50 for realistic examples. Making a 15s
job run in 300ms.
Before: To compute the cumulative distribution we added every duration
into a LongList. This requires O(n) memory, where n is the number of
values.
Now: We store the durations + the number of occurrences in a
LongLongHashMap. This has the potential to reduce the memory
requirements if durations occur multiple times. There are a lot of
durations with 0, 1, 2 milliseconds. In the worst case every duration
is different. In that case the memory usage doubled with this solution.
Future: We are currently storing durations with milli seconds precision.
We don't have to do that. We cannot draw 100 million different values
on the y-axis in an images with only 1000px.
Queries like "firstname=John and lastname=???" were slightly
inefficient.
They fetched all firstnames, filtered to those that matched the prefix
(e.g. John or Jonathan is this example) and then iterated over all those
values and return the lastnames.
Fixed by having two implementations. One for the case that only a few
of the values in fieldA match and one for the case that many match.