Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[B] Building large Docker images in Kaniko leads to OOMKilled job #283

Open
YevheniiSemendiak opened this issue Jun 22, 2021 · 5 comments
Open
Labels
bug Something isn't working

Comments

@YevheniiSemendiak
Copy link
Collaborator

YevheniiSemendiak commented Jun 22, 2021

Summary

If you build a relatively large docker image, Kaniko will fail to build it since its pod will be killed by K8s due to high memory usage (even in quite large presets, say, having 10Gigs of RAM).

Steps to reproduce

  1. Create a Dockerfile with the following content:
FROM neuromation/neuro-extras:21.3.19
RUN wget https://raw.githubusercontent.com/neuro-inc/platform-client-python/master/build-tools/garbage-files-generator.py && \
    python3 garbage-files-generator.py 1 7Gb
  1. Launch build via neuro-extras image build -s cpu-large . image:test-build-failure
  2. Observe Job was OOMKilled

Expected result

The build finishes properly.

Environment

Mandatory:
  • neuro-extras version: 21.3.19
  • neuro CLI version: 21.6.17
  • KANIKO_IMAGE_REF = "gcr.io/kaniko-project/executor"
  • KANIKO_IMAGE_TAG = "v1.5.1"

Additional information (optional)

Example job ID: job-98cf4efa-4128-49d6-9f8f-01937011ed67
Manual rerun with disabled Kaniko caching (job-c26e3874-fcfa-478f-a862-72a28394853c) leads to the same error.

@YevheniiSemendiak YevheniiSemendiak added the bug Something isn't working label Jun 22, 2021
@YevheniiSemendiak
Copy link
Collaborator Author

@YevheniiSemendiak
Copy link
Collaborator Author

YevheniiSemendiak commented Jun 22, 2021

v1.6.0 with --cache=false - OOMKilled (metrics)
v1.6.0 with --cache=false and default --snapshotMode - OOMKilled (job-699d81aa-0e61-4eef-8287-e7a611906765)
v1.5.0 - OOMKilled
v1.3.0 - OK, (yet, no --cache-copy-layers flag usage), metrics

@YevheniiSemendiak
Copy link
Collaborator Author

created an issue in Kaniko repo GoogleContainerTools/kaniko#1680

@YevheniiSemendiak
Copy link
Collaborator Author

Downgraded Kaniko to v1.3.0 in #287 , need to bump when the will be fixed in Kaniko repo.

@YevheniiSemendiak
Copy link
Collaborator Author

Kaniko release 1.7.0 resolves the problem. We need to expose that flag for users and bump Kaniko.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant