Implement lazy iteration (ForEach) over collections #765

smira · 2018-08-02T22:09:49Z

Description of the Change

aptly had a concept of loading small amount of info per each object
into memory once collection is accessed for the first time.

This might have simplified some operations, but it doesn't scale well
with huge aptly databases.

This is just intermediate step towards better memory management -
list of objects is not loaded unless some method is called.
ForEach method (mainly used in cleanup) is reimplemented to
iterate over database without ever loading all the objects into memory.

Memory was even worse with previous approach, as for each item usually
LoadComplete() is called, which pulls even more data into memory
and item stays in memory till the end of the iteration as it is referenced
from collection.list.

For the subsequent PR: reimplement ByUUID() and probably other methods
to avoid loading all the items into memory, at least for all the collecitons
except for published repos. When published repository is being loaded, it
might pull source local repo which in turn would trigger loading for all the
local repos which is not acceptable.

Checklist

unit-test added (if change is algorithm)
functional test added/updated (if change is functional)
man page updated (if applicable)
bash completion updated (if applicable)
documentation updated
author name in AUTHORS

See #761 aptly had a concept of loading small amount of info per each object into memory once collection is accessed for the first time. This might have simplified some operations, but it doesn't scale well with huge aptly databases. This is just intermediate step towards better memory management - list of objects is not loaded unless some method is called. `ForEach` method (mainly used in cleanup) is reimplemented to iterate over database without ever loading all the objects into memory. Memory was even worse with previous approach, as for each item usually `LoadComplete()` is called, which pulls even more data into memory and item stays in memory till the end of the iteration as it is referenced from `collection.list`. For the subsequent PR: reimplement `ByUUID()` and probably other methods to avoid loading all the items into memory, at least for all the collecitons except for published repos. When published repository is being loaded, it might pull source local repo which in turn would trigger loading for all the local repos which is not acceptable.

codecov · 2018-08-03T22:04:44Z

Codecov Report

Merging #765 into master will increase coverage by <.01%.
The diff coverage is 83.95%.

@@            Coverage Diff             @@
##           master     #765      +/-   ##
==========================================
+ Coverage    63.7%   63.71%   +<.01%     
==========================================
  Files          50       50              
  Lines        6271     6308      +37     
==========================================
+ Hits         3995     4019      +24     
- Misses       1788     1797       +9     
- Partials      488      492       +4

Impacted Files	Coverage Δ
deb/snapshot.go	`66.82% <81.81%> (-0.18%)`	⬇️
deb/remote.go	`63.9% <83.33%> (-0.12%)`	⬇️
deb/local.go	`85.32% <84.21%> (-1.81%)`	⬇️
deb/publish.go	`63.81% <86.36%> (+0.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86a1c41...de38011. Read the comment docs.

See #765, #761 Collections were relying on keeping in-memory list of all the objects for any kind of operation which doesn't scale well the number of objects in the database. With this rewrite, objects are loaded only on demand which might be pessimization in some edge cases but should improve performance and memory footprint signifcantly.

sliverc

This LGTM just a question: Do you have done any measurement whether there is a performance hit with this change?

smira · 2018-08-09T15:58:40Z

@sliverc good point, I will try to add a micro-benchmark for it. At least we should see some number, not as good as real test, but getting closer

smira · 2018-08-13T21:55:49Z

@sliverc I did a very simple benchmark (going to push as part of this PR) which does ForEach() over collection of 1024 snapshots (could be any other object, code is pretty similar).

Results, before the change (on master):

BenchmarkSnapshotCollectionForEach-8   	     500	   2790150 ns/op	 1352170 B/op	   30743 allocs/op

With this PR:

BenchmarkSnapshotCollectionForEach-8   	     500	   2769661 ns/op	 1066171 B/op	   29712 allocs/op

I would call this a good result, as there's almost no change in performance (which is expected, as ForEach has to go through every snapshot in the end).

For PR #762 I would add more benchmarks which assess performance of methods like ByName and ByUUID which should have a different performance profile.

(Edit): op here is a whole iteration over 1024 snapshots, not a single snapshot processing

sliverc · 2018-08-15T07:05:56Z

@smira 👍 I agree this result looks good. I don't think this small performance hit will be felt during normal operation.

See #765, #761 Collections were relying on keeping in-memory list of all the objects for any kind of operation which doesn't scale well the number of objects in the database. With this rewrite, objects are loaded only on demand which might be pessimization in some edge cases but should improve performance and memory footprint signifcantly.

smira added this to the 1.4.0 milestone Aug 2, 2018

smira force-pushed the 761-lazy-iteration branch from 3a9e554 to 0f4bbc4 Compare August 3, 2018 21:26

smira requested a review from a team August 3, 2018 21:44

smira mentioned this pull request Aug 6, 2018

Reimplement DB collections for mirrors, repos and snapshots #766

Merged

6 tasks

sliverc approved these changes Aug 7, 2018

View reviewed changes

Add simple benchmark for SnapshotCollection.ForEach()

de38011

smira merged commit 4717793 into master Aug 16, 2018

smira deleted the 761-lazy-iteration branch August 24, 2018 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement lazy iteration (ForEach) over collections #765

Implement lazy iteration (ForEach) over collections #765

smira commented Aug 2, 2018

codecov bot commented Aug 3, 2018 •

edited

Loading

sliverc left a comment

smira commented Aug 9, 2018

smira commented Aug 13, 2018 •

edited

Loading

sliverc commented Aug 15, 2018

Implement lazy iteration (ForEach) over collections #765

Implement lazy iteration (ForEach) over collections #765

Conversation

smira commented Aug 2, 2018

Description of the Change

Checklist

codecov bot commented Aug 3, 2018 • edited Loading

Codecov Report

sliverc left a comment

Choose a reason for hiding this comment

smira commented Aug 9, 2018

smira commented Aug 13, 2018 • edited Loading

sliverc commented Aug 15, 2018

codecov bot commented Aug 3, 2018 •

edited

Loading

smira commented Aug 13, 2018 •

edited

Loading