Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactor: OOM since 0.32.3, and unsupported type of io.Reader: io.nopCloserWriterTo #6748

Closed
gebn opened this issue Sep 22, 2023 · 12 comments
Closed

Comments

@gebn
Copy link

gebn commented Sep 22, 2023

Thanos, Prometheus and Golang version used:

thanos, version 0.32.3 (branch: HEAD, revision: 3d98d7ce7a254b893e4c8ee8122f7f6edd3174bd)
  build user:       root@0b3c549e9dae
  build date:       20230920-07:36:31
  go version:       go1.20.8
  platform:         linux/arm64
  tags:             netgo
prometheus, version 2.45.0 (branch: HEAD, revision: 8ef767e396bf8445f009f945b0162fd71827f445)
  build user:       root@920118f645b7
  build date:       20230623-15:15:37
  go version:       go1.20.5
  platform:         linux/arm64
  tags:             netgo,builtinassets,stringlabels

Object Storage Provider: AWS S3

What happened: OOM kill a few seconds after startup:

[529166.313531] thanos invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
[529166.317255]  oom_kill_process+0x2f0/0x2fc
[529166.337537] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[529166.354118] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/thanos-compact.service,task=thanos,pid=8522,uid=994
[529166.355318] Out of memory: Killed process 8522 (thanos) total-vm:2709100kB, anon-rss:718480kB, file-rss:0kB, shmem-rss:0kB, UID:994 pgtables:1608kB oom_score_adj:0

What you expected to happen: Compact to run successfully, like it did with v0.32.2.

Full logs to relevant components:

Logs

Sep 22 22:04:39 i-0e0b90ad1f9224fc8 systemd[1]: Started thanos-compact.service - Thanos Compact.
Sep 22 22:04:39 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:39.520222552Z caller=factory.go:53 level=info msg="loading bucket configuration"
Sep 22 22:04:39 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:39.527770603Z caller=compact.go:643 level=info msg="starting compact node"
Sep 22 22:04:39 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:39.52784581Z caller=intrumentation.go:56 level=info msg="changing probe status" status=ready
Sep 22 22:04:39 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:39.528056196Z caller=intrumentation.go:75 level=info msg="changing probe status" status=healthy
Sep 22 22:04:39 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:39.528072991Z caller=http.go:73 level=info service=http/server component=compact msg="listening for requests and metrics" address=:10912
Sep 22 22:04:39 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:39.528323221Z caller=tls_config.go:274 level=info service=http/server component=compact msg="Listening on" address=[::]:10912
Sep 22 22:04:39 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:39.528354786Z caller=tls_config.go:277 level=info service=http/server component=compact msg="TLS is disabled." http2=false address=[::]:10912
Sep 22 22:04:39 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:39.528435925Z caller=compact.go:1414 level=info msg="start sync of metas"
Sep 22 22:04:42 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:42.512172116Z caller=fetcher.go:487 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=2.983650317s duration_ms=2983 cached=524 returned=524 partial=2
Sep 22 22:04:43 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:43.203893352Z caller=fetcher.go:487 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=3.675665409s duration_ms=3675 cached=524 returned=496 partial=2
Sep 22 22:04:44 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:44.589161094Z caller=fetcher.go:487 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=2.076799884s duration_ms=2076 cached=524 returned=524 partial=2
Sep 22 22:04:44 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:44.89471873Z caller=fetcher.go:487 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=1.690391446s duration_ms=1690 cached=524 returned=496 partial=2
Sep 22 22:04:44 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:44.89486167Z caller=compact.go:1419 level=info msg="start of GC"
Sep 22 22:04:44 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:44.903706922Z caller=compact.go:1442 level=info msg="start of compactions"
Sep 22 22:04:44 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:44.903931273Z caller=compact.go:1062 level=info group="0@{az=\"1\", region=\"eu-west-2\"}" groupKey=0@8103448589029548555 msg="compaction available and planned" plan="[01HAY8QTVQWX425QECPJMDS1QA (min time: 1695369600007, max time: 1695376800000) 01HAYFKJ3M83VGVFQ25D9A9Q51 (min time: 1695376800011, max time: 1695384000000) 01HAYPF8CZXQJERA73JADTMPFD (min time: 1695384000004, max time: 1695391200000) 01HAYXB0KM5MKSBJR72TP3HVTJ (min time: 1695391200004, max time: 1695398400000)]"
Sep 22 22:04:44 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:44.904078439Z caller=compact.go:1071 level=info group="0@{az=\"1\", region=\"eu-west-2\"}" groupKey=0@8103448589029548555 msg="finished running pre compaction callback; downloading blocks" plan="[01HAY8QTVQWX425QECPJMDS1QA (min time: 1695369600007, max time: 1695376800000) 01HAYFKJ3M83VGVFQ25D9A9Q51 (min time: 1695376800011, max time: 1695384000000) 01HAYPF8CZXQJERA73JADTMPFD (min time: 1695384000004, max time: 1695391200000) 01HAYXB0KM5MKSBJR72TP3HVTJ (min time: 1695391200004, max time: 1695398400000)]" duration=9.937µs duration_ms=0
Sep 22 22:04:44 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:44.903940102Z caller=compact.go:1062 level=info group="0@{az=\"2\", region=\"eu-west-2\"}" groupKey=0@2626878136571751866 msg="compaction available and planned" plan="[01HAY8QTNJ7F2HN8CGKXJE5M7H (min time: 1695369600366, max time: 1695376800000) 01HAYFKHXJS6627CN4YCEX8543 (min time: 1695376800366, max time: 1695384000000) 01HAYPF95KZVSYJDJZH5YCS307 (min time: 1695384000366, max time: 1695391200000) 01HAYXB0DKYM6J3MXQE2J9CRMV (min time: 1695391200366, max time: 1695398400000)]"
Sep 22 22:04:44 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:44.904252008Z caller=compact.go:1071 level=info group="0@{az=\"2\", region=\"eu-west-2\"}" groupKey=0@2626878136571751866 msg="finished running pre compaction callback; downloading blocks" plan="[01HAY8QTNJ7F2HN8CGKXJE5M7H (min time: 1695369600366, max time: 1695376800000) 01HAYFKHXJS6627CN4YCEX8543 (min time: 1695376800366, max time: 1695384000000) 01HAYPF95KZVSYJDJZH5YCS307 (min time: 1695384000366, max time: 1695391200000) 01HAYXB0DKYM6J3MXQE2J9CRMV (min time: 1695391200366, max time: 1695398400000)]" duration=13.301µs duration_ms=0
Sep 22 22:04:46 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:46.130309875Z caller=compact.go:1129 level=info group="0@{az=\"1\", region=\"eu-west-2\"}" groupKey=0@8103448589029548555 msg="downloaded and verified blocks; compacting blocks" plan="[/var/opt/thanos-compact/data/compact/0@8103448589029548555/01HAY8QTVQWX425QECPJMDS1QA /var/opt/thanos-compact/data/compact/0@8103448589029548555/01HAYFKJ3M83VGVFQ25D9A9Q51 /var/opt/thanos-compact/data/compact/0@8103448589029548555/01HAYPF8CZXQJERA73JADTMPFD /var/opt/thanos-compact/data/compact/0@8103448589029548555/01HAYXB0KM5MKSBJR72TP3HVTJ]" duration=1.226200963s duration_ms=1226
Sep 22 22:04:46 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:46.204877058Z caller=compact.go:1129 level=info group="0@{az=\"2\", region=\"eu-west-2\"}" groupKey=0@2626878136571751866 msg="downloaded and verified blocks; compacting blocks" plan="[/var/opt/thanos-compact/data/compact/0@2626878136571751866/01HAY8QTNJ7F2HN8CGKXJE5M7H /var/opt/thanos-compact/data/compact/0@2626878136571751866/01HAYFKHXJS6627CN4YCEX8543 /var/opt/thanos-compact/data/compact/0@2626878136571751866/01HAYPF95KZVSYJDJZH5YCS307 /var/opt/thanos-compact/data/compact/0@2626878136571751866/01HAYXB0DKYM6J3MXQE2J9CRMV]" duration=1.300603569s duration_ms=1300
Sep 22 22:04:46 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:46.679206302Z caller=compact.go:464 level=info msg="compact blocks" count=4 mint=1695369600007 maxt=1695398400000 ulid=01HAZES15NF3CVX7NX678M4XSP sources="[01HAY8QTVQWX425QECPJMDS1QA 01HAYFKJ3M83VGVFQ25D9A9Q51 01HAYPF8CZXQJERA73JADTMPFD 01HAYXB0KM5MKSBJR72TP3HVTJ]" duration=548.818429ms
Sep 22 22:04:46 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:46.682521066Z caller=compact.go:1159 level=info group="0@{az=\"1\", region=\"eu-west-2\"}" groupKey=0@8103448589029548555 msg="compacted blocks" new=01HAZES15NF3CVX7NX678M4XSP blocks="[/var/opt/thanos-compact/data/compact/0@8103448589029548555/01HAY8QTVQWX425QECPJMDS1QA /var/opt/thanos-compact/data/compact/0@8103448589029548555/01HAYFKJ3M83VGVFQ25D9A9Q51 /var/opt/thanos-compact/data/compact/0@8103448589029548555/01HAYPF8CZXQJERA73JADTMPFD /var/opt/thanos-compact/data/compact/0@8103448589029548555/01HAYXB0KM5MKSBJR72TP3HVTJ]" duration=552.140046ms duration_ms=552 overlapping_blocks=false
Sep 22 22:04:46 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:46.716745009Z caller=compact.go:464 level=info msg="compact blocks" count=4 mint=1695369600366 maxt=1695398400000 ulid=01HAZES180BV3B2PR3T721FRF5 sources="[01HAY8QTNJ7F2HN8CGKXJE5M7H 01HAYFKHXJS6627CN4YCEX8543 01HAYPF95KZVSYJDJZH5YCS307 01HAYXB0DKYM6J3MXQE2J9CRMV]" duration=511.333031ms
Sep 22 22:04:46 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:46.717988524Z caller=compact.go:1159 level=info group="0@{az=\"2\", region=\"eu-west-2\"}" groupKey=0@2626878136571751866 msg="compacted blocks" new=01HAZES180BV3B2PR3T721FRF5 blocks="[/var/opt/thanos-compact/data/compact/0@2626878136571751866/01HAY8QTNJ7F2HN8CGKXJE5M7H /var/opt/thanos-compact/data/compact/0@2626878136571751866/01HAYFKHXJS6627CN4YCEX8543 /var/opt/thanos-compact/data/compact/0@2626878136571751866/01HAYPF95KZVSYJDJZH5YCS307 /var/opt/thanos-compact/data/compact/0@2626878136571751866/01HAYXB0DKYM6J3MXQE2J9CRMV]" duration=512.582085ms duration_ms=512 overlapping_blocks=false
Sep 22 22:04:46 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:46.734219122Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HAZES15NF3CVX7NX678M4XSP/chunks/000001 err="unsupported type of io.Reader: io.nopCloser"
Sep 22 22:04:46 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:46.763809166Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HAZES180BV3B2PR3T721FRF5/chunks/000001 err="unsupported type of io.Reader: io.nopCloser"
Sep 22 22:04:47 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:47.020212245Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HAZES15NF3CVX7NX678M4XSP/index err="unsupported type of io.Reader: io.nopCloser"
Sep 22 22:04:47 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:47.124561109Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HAZES180BV3B2PR3T721FRF5/index err="unsupported type of io.Reader: io.nopCloser"
Sep 22 22:04:47 i-0e0b90ad1f9224fc8 thanos[8522]: ts=2023-09-22T22:04:47.189118867Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HAZES15NF3CVX7NX678M4XSP/meta.json err="unsupported type of io.Reader: io.nopCloserWriterTo"
Sep 22 22:04:47 i-0e0b90ad1f9224fc8 systemd[1]: thanos-compact.service: Main process exited, code=killed, status=9/KILL
Sep 22 22:04:47 i-0e0b90ad1f9224fc8 systemd[1]: thanos-compact.service: Failed with result 'signal'.
Sep 22 22:04:47 i-0e0b90ad1f9224fc8 systemd[1]: thanos-compact.service: Consumed 3.340s CPU time.

Anything else we need to know: VM has 1Gi memory (t4g.micro), using Debian 12.1.

@yeya24
Copy link
Contributor

yeya24 commented Sep 22, 2023

I am wondering if this is the same error we found in objstore.

Could you take a heap profile during compactor start up time?

@saswatamcode We need to release a new version for the two fixes anyway I guess

@saswatamcode
Copy link
Member

@yeya24 yes we do! I'm waiting for a few more bug reports before I do it, so that we can tackle all of them in on go in 0.32.4.

@gebn
Copy link
Author

gebn commented Sep 24, 2023

@yeya24 Please let me know if this captures what you're after:

heap
heap.pprof.tar.gz

@gebn
Copy link
Author

gebn commented Sep 26, 2023

Same here, v0.32.3 sidecars also started crashing soon after the compactor.

@saswatamcode
Copy link
Member

Thanks for reporting @gebn! Will release v0.32.4 soon. Can you try out a Thanos main image and check if the issue persists?

@gebn
Copy link
Author

gebn commented Sep 26, 2023

To save compiling, are binaries published anywhere? I'm afraid I'm not using containers. The builds don't seem to have (visible) artefacts.

@saswatamcode
Copy link
Member

Sorry @gebn, we only publish nightly images I'm afraid. Would you be able to compile a binary from main and test it out?

@Tolsto
Copy link

Tolsto commented Sep 27, 2023

Was running into the same issue, using main-2023-09-26-20d2900 fixes it.

@saswatamcode
Copy link
Member

@Tolsto that's great! Will try to release a patch version by tomorrow!

@saswatamcode
Copy link
Member

v0.32.4 is now available with these fixes: https://github.com/thanos-io/thanos/releases/tag/v0.32.4

@MichaHoffmann
Copy link
Contributor

I think this could be closed. WDYT @gebn ?

@gebn
Copy link
Author

gebn commented Oct 8, 2023

No issues since upgrading! Thanks all

@gebn gebn closed this as completed Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants