"gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137 #1669

snthibaud · 2021-06-13T13:16:36Z

Actual behavior
I am running a build on Cloud build. The build succeeds, but the caching snapshot at the end fails with the following messages:

Step #0: INFO[0154] Taking snapshot of full filesystem...
Finished Step #0
ERROR
ERROR: build step 0 "gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137

Expected behavior
I would like the whole build to succeed - including caching.

To Reproduce
Steps to reproduce the behavior:

Build on GCP Cloud Build using a cloudbuild.yaml with Kaniko caching enabled.

Additional Information
I cannot provide the Dockerfile, but it is based on continuumio/miniconda3 and also installs tensorflow in a conda environment. I think it started failing after tensorflow was added to the list of dependencies.

The text was updated successfully, but these errors were encountered:

snthibaud · 2021-06-13T13:19:01Z

Additionally - it builds fine with caching disabled and when a heavy 8 CPU machine type is used. However, I think it's strange that Kaniko caching requires more resources than the build itself.

hugbubby · 2021-06-13T13:41:11Z

I've been trying to work around this issue for the past several days. Kaniko consistently tries to use more memory than our kubernetes cluster has available. It only happens with our large images.

dakl · 2021-07-01T05:46:13Z

Any workaround available? My base image is tensorflow/tensorflow:2.4.0-gpu which weighs 2.35 GB compressed.

tk42 · 2021-07-08T14:15:00Z

@dakl try to downgrade to v.1.3.0 (as is mentioned in #1680). it works for me.

Mistic92 · 2022-02-16T13:56:24Z

Any update on this topic? I have this issue on every ML related dockerfile where we need to use pytorch and other libs.

imjasonh · 2022-02-16T14:05:32Z

The :latest image is quite old, pointing to :v1.6.0 due to issues with :v1.7.0

It's possible the bug is fixed at head, and while we wait for a v1.8.0 release (#1871) you can try out the latest commit-tagged release and see if that helps: gcr.io/kaniko-project/executor:09e70e44d9e9a3fecfcf70cb809a654445837631

If it's not fixed, it sounds like we need to figure out where layer contents are being buffered into memory while being cached, which it sounds like was introduced some time between v1.3 and now. If anybody investigates and finds anything useful, please add it here.

Mistic92 · 2022-02-16T14:21:55Z

Looks like it worked but I tried with cache disabled. On 1.6 even with cache disabled it was stopping. So good sign

wahyueko22 · 2022-03-08T06:22:37Z

any update for this issue ?, i am facing same problem when deploy ML image with sentence-transformers and torch>=1.6.0. the image size is more than 3 GB.

imjasonh · 2022-03-08T14:13:34Z

any update for this issue ?, i am facing same problem when deploy ML image with sentence-transformers and torch>=1.6.0. the image size is more than 3 GB.

It sounds like #1669 (comment) says this works with a newer commit-tagged image, and with caching disabled. It sounds like caching causes filesystem contents to be buffered in memory, which causes problems with large images.

lappazos · 2022-07-24T15:20:04Z

The :latest image is quite old, pointing to :v1.6.0 due to issues with :v1.7.0

It's possible the bug is fixed at head, and while we wait for a v1.8.0 release (#1871) you can try out the latest commit-tagged release and see if that helps: gcr.io/kaniko-project/executor:09e70e44d9e9a3fecfcf70cb809a654445837631

If it's not fixed, it sounds like we need to figure out where layer contents are being buffered into memory while being cached, which it sounds like was introduced some time between v1.3 and now. If anybody investigates and finds anything useful, please add it here.

happened to me too with a large image, and the referenced commit solved it. any update why its not solved yet in v1.8.1? @imjasonh

imjasonh · 2022-07-24T15:21:25Z

#2115 is the issue tracking the next release. I don't have any more information than what's in that issue.

imjasonh · 2022-07-24T15:29:20Z

Does this issue still happen at the latest commit-tagged image? With and without caching enabled?

granthamtaylor · 2022-08-07T00:42:49Z

@imjasonh I am still experiencing this issue with latest and v1.8.1 for an image with pytorch installed.

v1.3.0 seems to work as expected. Thank you @tk42 for the suggestion!

irg1008 · 2022-09-07T10:09:03Z

Any news on this? Still happening on v1.9.0

spookyuser · 2022-09-28T11:33:44Z

If you add --compressed-caching=false it works for me on 1.9.0

jtwigg · 2023-03-29T05:07:08Z

--compressed-caching=false worked well for most things except for COPY <src> <dst> and it turns out theres also --cache-copy-layers. I was still getting crushed by pytorch installations.

This is the cloudbuild.yaml that works really well now

steps:
- name: 'gcr.io/kaniko-project/executor:latest'
  args:
  - --destination=gcr.io/$PROJECT_ID/<name>
  - --cache=true
  - --cache-ttl=48h
  - --compressed-caching=false
  - --cache-copy-layers=true

Disable cache compression to allow large images, like images depending on `tensorflow` or `torch`. For more information, see: GoogleContainerTools/kaniko#1669

javiercornejo · 2023-11-23T21:34:34Z

I confirm I was having the same issue in Cloud Build and the --compressed-caching=false solved the problem with :latest so far.

lappazos mentioned this issue Jul 24, 2022

Release v1.9.0 #2115

Closed

ajtucker added a commit to GSS-Cogs/dd-cms that referenced this issue Aug 21, 2022

Try kaniko v1.3.0 following GoogleContainerTools/kaniko#1669 (comment)

38f13f2

devxpy added a commit to GooeyAI/gooey-server that referenced this issue Nov 16, 2022

use kaniko v1.3.0 GoogleContainerTools/kaniko#1669

e69eed2

devxpy added a commit to GooeyAI/gooey-server that referenced this issue Nov 16, 2022

use kaniko v1.3.0 GoogleContainerTools/kaniko#1669

541934f

davidcavazos added a commit to davidcavazos/beam that referenced this issue Jun 5, 2023

Disable cache compression

5e84f42

Disable cache compression to allow large images, like images depending on `tensorflow` or `torch`. For more information, see: GoogleContainerTools/kaniko#1669

davidcavazos mentioned this issue Jun 5, 2023

Disable cache compression apache/beam#27023

Merged

3 tasks

sl0thentr0py mentioned this issue Jul 25, 2023

feat(elixir): Use precompiled packages from erlang solutions for elixir getsentry/craft#481

Merged

jsilva mentioned this issue Mar 11, 2024

feat: configurable kaniko based cloud builds avast/wanna-ml#107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137 #1669

"gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137 #1669

snthibaud commented Jun 13, 2021

snthibaud commented Jun 13, 2021

hugbubby commented Jun 13, 2021

dakl commented Jul 1, 2021

tk42 commented Jul 8, 2021

Mistic92 commented Feb 16, 2022

imjasonh commented Feb 16, 2022

Mistic92 commented Feb 16, 2022

wahyueko22 commented Mar 8, 2022

imjasonh commented Mar 8, 2022

lappazos commented Jul 24, 2022

imjasonh commented Jul 24, 2022

imjasonh commented Jul 24, 2022

granthamtaylor commented Aug 7, 2022 •

edited

Loading

irg1008 commented Sep 7, 2022

spookyuser commented Sep 28, 2022

jtwigg commented Mar 29, 2023

javiercornejo commented Nov 23, 2023

"gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137 #1669

"gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137 #1669

Comments

snthibaud commented Jun 13, 2021

snthibaud commented Jun 13, 2021

hugbubby commented Jun 13, 2021

dakl commented Jul 1, 2021

tk42 commented Jul 8, 2021

Mistic92 commented Feb 16, 2022

imjasonh commented Feb 16, 2022

Mistic92 commented Feb 16, 2022

wahyueko22 commented Mar 8, 2022

imjasonh commented Mar 8, 2022

lappazos commented Jul 24, 2022

imjasonh commented Jul 24, 2022

imjasonh commented Jul 24, 2022

granthamtaylor commented Aug 7, 2022 • edited Loading

irg1008 commented Sep 7, 2022

spookyuser commented Sep 28, 2022

jtwigg commented Mar 29, 2023

javiercornejo commented Nov 23, 2023

granthamtaylor commented Aug 7, 2022 •

edited

Loading