Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy iteration (ForEach) over collections #765

Merged
merged 2 commits into from
Aug 16, 2018
Merged

Conversation

smira
Copy link
Contributor

@smira smira commented Aug 2, 2018

See #761

Description of the Change

aptly had a concept of loading small amount of info per each object
into memory once collection is accessed for the first time.

This might have simplified some operations, but it doesn't scale well
with huge aptly databases.

This is just intermediate step towards better memory management -
list of objects is not loaded unless some method is called.
ForEach method (mainly used in cleanup) is reimplemented to
iterate over database without ever loading all the objects into memory.

Memory was even worse with previous approach, as for each item usually
LoadComplete() is called, which pulls even more data into memory
and item stays in memory till the end of the iteration as it is referenced
from collection.list.

For the subsequent PR: reimplement ByUUID() and probably other methods
to avoid loading all the items into memory, at least for all the collecitons
except for published repos. When published repository is being loaded, it
might pull source local repo which in turn would trigger loading for all the
local repos which is not acceptable.

Checklist

  • unit-test added (if change is algorithm)
  • functional test added/updated (if change is functional)
  • man page updated (if applicable)
  • bash completion updated (if applicable)
  • documentation updated
  • author name in AUTHORS

@smira smira added this to the 1.4.0 milestone Aug 2, 2018
See #761

aptly had a concept of loading small amount of info per each object
into memory once collection is accessed for the first time.

This might have simplified some operations, but it doesn't scale well
with huge aptly databases.

This is just intermediate step towards better memory management -
list of objects is not loaded unless some method is called.
`ForEach` method (mainly used in cleanup) is reimplemented to
iterate over database without ever loading all the objects into memory.

Memory was even worse with previous approach, as for each item usually
`LoadComplete()` is called, which pulls even more data into memory
and item stays in memory till the end of the iteration as it is referenced
from `collection.list`.

For the subsequent PR: reimplement `ByUUID()` and probably other methods
to avoid loading all the items into memory, at least for all the collecitons
except for published repos. When published repository is being loaded, it
might pull source local repo which in turn would trigger loading for all the
local repos which is not acceptable.
@smira smira requested a review from a team August 3, 2018 21:44
@codecov
Copy link

codecov bot commented Aug 3, 2018

Codecov Report

Merging #765 into master will increase coverage by <.01%.
The diff coverage is 83.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #765      +/-   ##
==========================================
+ Coverage    63.7%   63.71%   +<.01%     
==========================================
  Files          50       50              
  Lines        6271     6308      +37     
==========================================
+ Hits         3995     4019      +24     
- Misses       1788     1797       +9     
- Partials      488      492       +4
Impacted Files Coverage Δ
deb/snapshot.go 66.82% <81.81%> (-0.18%) ⬇️
deb/remote.go 63.9% <83.33%> (-0.12%) ⬇️
deb/local.go 85.32% <84.21%> (-1.81%) ⬇️
deb/publish.go 63.81% <86.36%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86a1c41...de38011. Read the comment docs.

smira added a commit that referenced this pull request Aug 6, 2018
See #765, #761

Collections were relying on keeping in-memory list of all the objects
for any kind of operation which doesn't scale well the number of
objects in the database.

With this rewrite, objects are loaded only on demand which might
be pessimization in some edge cases but should improve performance
and memory footprint signifcantly.
Copy link
Contributor

@sliverc sliverc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM just a question: Do you have done any measurement whether there is a performance hit with this change?

@smira
Copy link
Contributor Author

smira commented Aug 9, 2018

@sliverc good point, I will try to add a micro-benchmark for it. At least we should see some number, not as good as real test, but getting closer

@smira
Copy link
Contributor Author

smira commented Aug 13, 2018

@sliverc I did a very simple benchmark (going to push as part of this PR) which does ForEach() over collection of 1024 snapshots (could be any other object, code is pretty similar).

Results, before the change (on master):

BenchmarkSnapshotCollectionForEach-8   	     500	   2790150 ns/op	 1352170 B/op	   30743 allocs/op

With this PR:

BenchmarkSnapshotCollectionForEach-8   	     500	   2769661 ns/op	 1066171 B/op	   29712 allocs/op

I would call this a good result, as there's almost no change in performance (which is expected, as ForEach has to go through every snapshot in the end).

For PR #762 I would add more benchmarks which assess performance of methods like ByName and ByUUID which should have a different performance profile.

(Edit): op here is a whole iteration over 1024 snapshots, not a single snapshot processing

@sliverc
Copy link
Contributor

sliverc commented Aug 15, 2018

@smira 👍 I agree this result looks good. I don't think this small performance hit will be felt during normal operation.

@smira smira merged commit 4717793 into master Aug 16, 2018
smira added a commit that referenced this pull request Aug 20, 2018
See #765, #761

Collections were relying on keeping in-memory list of all the objects
for any kind of operation which doesn't scale well the number of
objects in the database.

With this rewrite, objects are loaded only on demand which might
be pessimization in some edge cases but should improve performance
and memory footprint signifcantly.
smira added a commit that referenced this pull request Aug 20, 2018
See #765, #761

Collections were relying on keeping in-memory list of all the objects
for any kind of operation which doesn't scale well the number of
objects in the database.

With this rewrite, objects are loaded only on demand which might
be pessimization in some edge cases but should improve performance
and memory footprint signifcantly.
@smira smira deleted the 761-lazy-iteration branch August 24, 2018 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants