-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chunkIterator: optimize caching and AtT() calculation #7305
Conversation
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM but I'll let @krajorama 👍 it
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
54de2e2
to
900087a
Compare
default: | ||
panic(fmt.Errorf("chunkIterator: calling AtT with unknown chunk encoding %v", i.valType)) | ||
} | ||
return i.it.Timestamp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're on the right track , however this might lead to a degradation in context of the chunk merge iterator. Currently in the heap sort there we'd not do any call through an interface due to the caching. Please check if this iterator is used in that context and let's do some benchmark with the chunk merge iterator. https://github.com/grafana/mimir/blob/main/pkg/querier/iterators/chunk_merge_iterator.go
It might turn out that what we really want is to just cache the timestamp and literally nothing else - in case values are read once and never multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have gone through the implementation of chunkMergeIterator
, and these are my observations:
- calls to
AtT()
,AtFloat()
,AtHistogram()
,AtFloatHistogram()
never call a corresponding method onchunkIterator
, and always return only cached values, if any. - building a new
chunkMergeIterator
:- the constructor of
chunkMergeIterator
creates a list of non-overlapping iterators, calls aNext()
on each of the non-overlapping iterators, and finally initializes a heap of non-overlapping chunks. - the
Next()
on a non-overlapping iterator calls theNext()
on the current chunk of typechunkIterator
belonging to the former, which updates the timeseries and type of the underlying iterator of the latter. - the constructor doesn't cache any value, so calls to
AtT()
,AtFloat()
,AtHistogram()
,AtFloatHistogram()
would return erroneous values, which is correct, since it is supposed to callNext()
orSeek()
before any of theAt*()
functions
- the constructor of
- calling
chunkMergeIterator.Next()
:- this method loads the value from the topmost element in the heap, by calling
At()
,AtHistogram()
orAtFloatHistogram()
, depending on the type, caches the returned timestamp and value. Then it calls theNext()
on the topmpost element in the heap, so that its timestamp and value type are updated for a successive calls toNext()
- this means that, after a call to
chunkMergeIterator.Next()
, it is safe to return cached values when callingAtT()
,At()
,AtHistogram()
andAtFloatHistogram()
, because these values have actually been loaded from the correspondingchunkIterator
during thechunkMergeIterator.Next()
.
- this method loads the value from the topmost element in the heap, by calling
- calling
chunkMergeIterator.Seek()
:- this method calls the
Seek()
on all the non-overlapping iterators belonging tochunkMergeIterator
, and then updates the heap. - after that, it loads the value from the topmost element in the heap, by calling
At()
,AtHistogram()
orAtFloatHistogram()
, depending on the type, caches the returned timestamp and value. - this means that, after a call to
chunkMergeIterator.Next()
, it is safe to return cached values when callingAtT()
,At()
,AtHistogram()
andAtFloatHistogram()
, because these values have actually been loaded from the correspondingchunkIterator
during thechunkMergeIterator.Seek()
.
- this method calls the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed IRL: we'll cache the timestamp independently of the value via updates in Next() and Seek(). This will allow the seriesIteratorHeap
to be extremely efficient. While at the same time get rid of calling AtHistogram() and AtFloatHistogram() with nil
, which is for sure wasteful.
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
What this PR does
This PR improves the current implementation of
iterators.chunkIterator.AtT()
. Namely, if a cached value is not valid, the current implementation returns the timestamp obtained as a result of loading an appropriate value from the underlying iterator (of typechunk.Iterator
). In order to retrieve the current timestamp it is, though, not necessary to load a new value, since the timestamp can be taken directly by calling the underlying iterator'sTimestamp()
function.For example:
Moreover, this PR simplifies the checking whether the cached value is valid: auxiliary fields
cachedValueValid
,cachedHistogramValid
andcachedFloatHistogramValid
have been replaced with just one fieldcacheValid
.A comparison between the old and the new way of fetching the current timestamp showed that the latter requires less CPU:
Which issue(s) this PR fixes or relates to
Part of #7235
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.