Skip other sections when reading metadata #826

jtibshirani · 2024-09-14T16:50:20Z

Looking at heap profiles, the ReadMetadata function creates a ton of garbage objects. The main contributor is in other sections from the TOC, specifically decoding compoundSection.offsets . However, to read metadata, we only really need to parse the metadata sections.

This PR introduces a skip method that skips over a section without reading it. This greatly reduces the allocations from ReadMetadata:

BenchmarkReadMetadata
BenchmarkReadMetadata-10    	   20908	     57245 ns/op	  184963 B/op	     118 allocs/op (before)
BenchmarkReadMetadata-10    	   67215	     17937 ns/op	    9688 B/op	     111 allocs/op (after)

jtibshirani · 2024-09-14T16:52:24Z

read.go

@@ -126,11 +122,14 @@ func (r *reader) readTOC(toc *indexTOC) error {
 				return err
 			}

+			skipSection := len(tags) > 0 && !slices.Contains(tags, tag)


Instead of introducing the "skip" concept, I could have taken advantage of the fact that the metadata sections are always first in the TOC. However, our index reading code is structured around flexible "section tags", and I got the feeling that section ordering wasn't an invariant we wanted to rely on.

jtibshirani · 2024-09-14T16:52:59Z

read.go

@@ -169,6 +174,27 @@ func (r *reader) readTOC(toc *indexTOC) error {
 	return nil
 }

+func (r *reader) readHeader() (simpleSection, uint32, error) {


I factored out the first part of readTOC (now readTOCSections). This wasn't critical for the change.

jtibshirani · 2024-09-14T16:53:31Z

read.go

@@ -395,9 +421,9 @@ func (r *reader) readIndexData(toc *indexTOC) (*indexData, error) {
 	return &d, nil
 }

-func (r *reader) readMetadata(toc *indexTOC) ([]*Repository, *IndexMetadata, error) {
+func (r *reader) parseMetadata(metaData simpleSection, repoMetaData simpleSection) ([]*Repository, *IndexMetadata, error) {


Also simplified this method, as it's not a big deal to be copying simpleSection. Not critical for the change.

keegancsmith

nice find!!!

Tiny follow up to #826. I resolved a conflict incorrectly and reverted a log line improvement.

cla-bot bot added the cla-signed label Sep 14, 2024

jtibshirani commented Sep 14, 2024

View reviewed changes

keegancsmith approved these changes Sep 16, 2024

View reviewed changes

jtibshirani requested a review from a team September 16, 2024 20:47

Base automatically changed from jtibs/index-toc to main September 17, 2024 01:34

jtibshirani force-pushed the jtibs/metadata branch from 2b05dff to 508594e Compare September 17, 2024 01:43

Skip other sections when reading metadata

d055f00

jtibshirani force-pushed the jtibs/metadata branch from 508594e to d055f00 Compare September 17, 2024 01:48

jtibshirani merged commit be438ef into main Sep 17, 2024
9 checks passed

jtibshirani deleted the jtibs/metadata branch September 17, 2024 02:00

jtibshirani mentioned this pull request Sep 18, 2024

Fix outdated log line #831

Merged

jtibshirani added a commit that referenced this pull request Sep 19, 2024

Fix outdated log line (#831)

5379bc9

Tiny follow up to #826. I resolved a conflict incorrectly and reverted a log line improvement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip other sections when reading metadata #826

Skip other sections when reading metadata #826

jtibshirani commented Sep 14, 2024

jtibshirani Sep 14, 2024

jtibshirani Sep 14, 2024

jtibshirani Sep 14, 2024

keegancsmith left a comment

Skip other sections when reading metadata #826

Skip other sections when reading metadata #826

Conversation

jtibshirani commented Sep 14, 2024

jtibshirani Sep 14, 2024

Choose a reason for hiding this comment

jtibshirani Sep 14, 2024

Choose a reason for hiding this comment

jtibshirani Sep 14, 2024

Choose a reason for hiding this comment

keegancsmith left a comment

Choose a reason for hiding this comment