Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False-positive cache-hit on COPY --from #870

Closed
haizaar opened this issue Nov 20, 2019 · 19 comments
Closed

False-positive cache-hit on COPY --from #870

haizaar opened this issue Nov 20, 2019 · 19 comments
Assignees
Labels
area/caching For all bugs related to cache issues kind/bug Something isn't working priority/p2 High impact feature/bug. Will get a lot of users happy

Comments

@haizaar
Copy link

haizaar commented Nov 20, 2019

Actual behavior
I have a two stage docker image that first build sphinx docs and then copies it to the second nginx image:

FROM python:3.7.5-alpine3.10 AS builder
RUN apk add --no-cache build-base libxml2-dev libxslt-dev graphviz
COPY requirements.txt /
RUN pip install --upgrade -r /requirements.txt
WORKDIR /build
COPY . ./
RUN make html
########################
FROM nginx:1.17.5-alpine AS runner
COPY --from=builder /build/_build/html /usr/share/nginx/html

I build it on Google Cloud Build using gcr.io/kaniko-project/executor:latest builder.

The problem is the last copy instruction - even through generated HTML files differ from the previously built revision, kaniko still prefers to use the cached version:

Step #0: build succeeded.
Step #0:
Step #0: The HTML pages are in _build/html.
Step #0:
Step #0: Build finished. The HTML pages are in _build/html.
Step #0: INFO[0062] Taking snapshot of full filesystem...
Step #0: INFO[0064] Pushing layer gcr.io/xxx/docs-server/cache:cd602b7af9c972414b3241eb3a192a606755174cf665f9806155078915975932 to cache now
Step #0: INFO[0068] Saving file /build/_build/html for later use.
Step #0: INFO[0068] Deleting filesystem...
Step #0: INFO[0068] Downloading base image nginx:1.17.5-alpine
Step #0: INFO[0069] Error while retrieving image from cache: getting file info: stat /cache/sha256:b4c0378c841cd76f0b75bc63454bfc6fe194a5220d4eab0d75963bccdbc327ff: no such file or directory
Step #0: INFO[0069] Downloading base image nginx:1.17.5-alpine
Step #0: INFO[0069] Checking for cached layer gcr.io/xxx/docs-server/cache:5de22bf6a7763f10cd0ffa0592b8f51c876286a1d0f50564cf2f21bfd6262dc4...
Step #0: INFO[0069] Using caching version of cmd: COPY --from=builder /build/_build/html /usr/share/nginx/html
Step #0: INFO[0069] Skipping unpacking as no commands require it.
Step #0: INFO[0069] Taking snapshot of full filesystem...
Step #0: INFO[0070] COPY --from=builder /build/_build/html /usr/share/nginx/html
Step #0: INFO[0070] Found cached layer, extracting to filesystem

Expected behavior
Detect that stage 0 image files' have changes and discard cache for COPY --from directive in the stage 1.

To Reproduce
Steps to reproduce the behavior:

  1. Run:
mkdir /tmp/bug; cd /tmp/bug
python3.7 -mvenv /tmp/bug-env && source /tmp/bug-env/bin/activate
mkdir docs
echo 'Sphinx==2.2.1' >> docs/requirements.txt
pip install -r docs/requirements.txt
cd docs
echo no | sphinx-quickstart -p foo -a foo -v 0.0.1 -r 0.0.1 -l en

Then add the above Dockerfile under docs/.

  1. Now add it all to git, commit and trigger Cloud Build using the following step:
steps:
  - name: gcr.io/kaniko-project/executor:latest
    args:
      - --cache=true
      - --cache-ttl=336h  # 2 weeks
      - --context=/workspace/docs
      - --dockerfile=/workspace/docs/Dockerfile
      - --destination=gcr.io/$PROJECT_ID/docs-server:$BRANCH_NAME-$COMMIT_SHA
  1. Change docs/index.rst, commit, trigger new build and see that resulting images still has the old version of the docs

Additional Information

  • Dockerfile - see above
  • Build Context - see above
  • Kaniko Image sha256:9c40a04cf1bc9d886f7f000e0b7fa5300c31c89e2ad001e97eeeecdce9f07a29

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@stroncium
Copy link

stroncium commented Nov 20, 2019

I experience this problem too.

Current workaround for me is using gcr.io/kaniko-project/executor:v0.13.0.

@cvgw cvgw added area/caching For all bugs related to cache issues kind/bug Something isn't working priority/p2 High impact feature/bug. Will get a lot of users happy labels Nov 22, 2019
@cvgw cvgw self-assigned this Nov 22, 2019
@mcfedr
Copy link
Contributor

mcfedr commented Nov 25, 2019

Just found this having the exact same issue using kaniko on cloud build. will try suggested work aorund/roll back.

@cvgw
Copy link
Contributor

cvgw commented Nov 26, 2019

this appears to the same underlying problem as #845 and #589

@mcfedr
Copy link
Contributor

mcfedr commented Dec 2, 2019

@cvgw Any movement on getting that fix merged?

@cvgw
Copy link
Contributor

cvgw commented Dec 2, 2019

@mcfedr hoping to get merged this week

@kzap
Copy link

kzap commented Dec 10, 2019

I can also confirm this does not exist in 0.13.0 but not sure why
hope the fixes can get merged soon and released

This is how I recreated the issue locally using a test Dockerfile and AWS ECR as a Cache repo

Testing

We can replicate this bug on our machine using Docker

  • Create our variables and test Dockerfile
# set some variables
docker_image="gcr.io/kaniko-project/executor:debug-v0.14.0"
ecr_endpoint="YOUR_AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com"
ecr_repo="YOUR_ECR_REPO_NAME"

# create our test Dockerfile
mkdir -p ./tmp/
cat <<'EOF' > ./tmp/Dockerfile
FROM busybox:latest as builder
COPY ./date.txt /tmp/date.txt

FROM busybox:latest
COPY --from=builder /tmp/date.txt /tmp/test.txt
EOF
  • Run Kaniko locally using docker
# run kaniko debug image with some env vars
docker run \
  --tty \
  --interactive \
  --rm \
  --volume "$(pwd)/tmp:/app" \
  --volume "$HOME/.aws":/root/.aws \
  --env "AWS_DEFAULT_REGION=us-east-1" \
  --env "ECR_ENDPOINT=${ecr_endpoint}" \
  --env "ECR_REPO=${ecr_repo}" \
  --entrypoint "" \
  "${docker_image}" \
  sh
  • Create ECR Config inside the Kaniko container
# create ECR config file for kaniko
mkdir -p /kaniko/.docker/
cat <<EOF > /kaniko/.docker/config.json
{
  "credHelpers": {
    "${ECR_ENDPOINT}": "ecr-login"
  }
}
EOF
  • Create the 1st image w/ cache from inside the Kaniko container
# Create first image
echo $(date) > /app/date.txt
ecr_tag="test-$(date +%s)"
/kaniko/executor \
  --context "/app" \
  --dockerfile "/app/Dockerfile" \
  --destination "${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag}" \
  --cache-repo "${ECR_ENDPOINT}/${ECR_REPO}/cache" \
  --cache=true \
  --cleanup

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image1.txt"
  • Create 2nd Image w/ cache
# Create 2nd Image
echo $(date) > /app/date.txt
ecr_tag="test-$(date +%s)"
/kaniko/executor \
  --context "/app" \
  --dockerfile "/app/Dockerfile" \
  --destination "${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag}" \
  --cache-repo "${ECR_ENDPOINT}/${ECR_REPO}/cache" \
  --cache=true \
  --cleanup

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image2.txt"
  • Create 3rd Image w/o using Cache so we can compare the output
# Create 3rd Image w/o Cache
echo $(date) > /app/date.txt
ecr_tag="test-$(date +%s)"
/kaniko/executor \
  --context "/app" \
  --dockerfile "/app/Dockerfile" \
  --destination "${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag}" \
  --cache-repo "${ECR_ENDPOINT}/${ECR_REPO}/cache" \
  --cache=false \
  --cleanup

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image3.txt"
  • Run the 3 docker commands which output 3 text files and cat them
$ cat image{1..3}.txt
cat image1.
Tue Dec 10 14:30:10 UTC 2019
Tue Dec 10 14:30:10 UTC 2019
Tue Dec 10 15:09:24 UTC 2019
  • The files should differ slightly but they are the same, meaning it did not give us the latest file, it used the cache. The third file image3.txt gave us an updated date.txt file because it did not use the cache.

How to fix:

Lets try again to build but with docker image gcr.io/kaniko-project/executor:debug-v0.13.0

  • Run kaniko 0.13.0:
# Run Kaniko Image 0.13.0
docker_image="gcr.io/kaniko-project/executor:debug-v0.13.0"

# run kaniko debug image with some env vars
docker run \
  --tty \
  --interactive \
  --rm \
  --volume "$(pwd)/tmp:/app" \
  --volume "$HOME/.aws":/root/.aws \
  --env "AWS_DEFAULT_REGION=us-east-1" \
  --env "ECR_ENDPOINT=${ecr_endpoint}" \
  --env "ECR_REPO=${ecr_repo}" \
  --entrypoint "" \
  "${docker_image}" \
  sh
  • Create ECR config for Kaniko
# create ECR config file for kaniko
mkdir -p /kaniko/.docker/
cat <<EOF > /kaniko/.docker/config.json
{
  "credHelpers": {
    "${ECR_ENDPOINT}": "ecr-login"
  }
}
EOF
  • Create 4th Docker Image w/ Cache and 0.13.0
echo $(date) > /app/date.txt
ecr_tag="test-$(date +%s)"
/kaniko/executor \
  --context "/app" \
  --dockerfile "/app/Dockerfile" \
  --destination "${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag}" \
  --cache-repo "${ECR_ENDPOINT}/${ECR_REPO}/cache" \
  --cache=true \
  --cleanup

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image4.txt"
  • Create 5th Docker Image w/ Cache and 0.13.0
echo $(date) > /app/date.txt
ecr_tag="test-$(date +%s)"
/kaniko/executor \
  --context "/app" \
  --dockerfile "/app/Dockerfile" \
  --destination "${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag}" \
  --cache-repo "${ECR_ENDPOINT}/${ECR_REPO}/cache" \
  --cache=true \
  --cleanup

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image5.txt"
  • Run docker commands that were output and compare outputs
$ cat image{4..5}.txt
Tue Dec 10 15:32:45 UTC 2019
cat image5.	Tue Dec 10 15:35:38 UTC 2019
  • The timestamps are different so they are not being cached and we are getting the latest version of date.txt. This is what we want

@cvgw
Copy link
Contributor

cvgw commented Dec 10, 2019

Once #899 is fixed this issue should be resolved

@cvgw
Copy link
Contributor

cvgw commented Dec 17, 2019

This should be fixed as of master@a675098

Some assistance verifying that it is fixed would be appreciated. Thank you

@mcfedr
Copy link
Contributor

mcfedr commented Dec 17, 2019

This should be fixed as of master@a675098

Some assistance verifying that it is fixed would be appreciated. Thank you

Is there a prebuilt image from master that I can test with? it looks like gcr.io/kaniko-project/executor:latest is fixed at the last release

@fredr
Copy link

fredr commented Dec 17, 2019

@mcfedr I think you can use the git hash, like gcr.io/kaniko-project/executor:a675098b452d020bf678063a3ac2d3be84b0e545

@edbighead
Copy link

@edbighead
Copy link

edbighead commented Dec 17, 2019

This should be fixed as of master@a675098

Some assistance verifying that it is fixed would be appreciated. Thank you

@cvgw fix is working on the latest version for me.

Dockerfile

FROM maven:3.6.2-jdk-11-slim AS build  
WORKDIR /usr/src/app  

COPY pom.xml .
RUN mvn dependency:go-offline

COPY src ./src  
RUN mvn clean package

FROM openjdk:11.0.4-jre-slim
WORKDIR /home/user

COPY --from=build /usr/src/app/target/app-0.0.1.jar /home/user/app.jar

RUN chmod +x /home/user/app.jar

ENTRYPOINT ["java", "-jar", "app.jar"]
EXPOSE 8080

kaniko output running multiple builds, changing source code
INFO[0390] No cached layer found for cmd COPY --from=build /usr/src/app/target/app-0.0.1.jar /home/user/app.jar

@toxsick
Copy link

toxsick commented Dec 17, 2019

@cvgw works for me too. Thanks!

@cvgw
Copy link
Contributor

cvgw commented Dec 17, 2019

Thanks everyone for the help solving and verifying this issue. Closing, but feel free to reopen if it isn't 100% fixed

@cvgw cvgw closed this as completed Dec 17, 2019
@stroncium
Copy link

Any info on when release version lands(0.15.0?)? (I can test commit based version, but won't be able to use non-release version in production.)

@cvgw
Copy link
Contributor

cvgw commented Dec 17, 2019

@stroncium plan is to release v0.15.0 this week

@jamil-s
Copy link

jamil-s commented Apr 19, 2021

Hey folks, we're seeing the same issue on the most recent debug build. In a multi-stage build, the first layer generates a new image and yet the second layer still results to the cached version, not sure if there's been a regression in this

@DXCanas
Copy link

DXCanas commented May 8, 2021

Also seeing this in latest

@KasnocknDave
Copy link

Also seeing this in 1.6, examples described here #1348 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/caching For all bugs related to cache issues kind/bug Something isn't working priority/p2 High impact feature/bug. Will get a lot of users happy
Projects
None yet
Development

Successfully merging a pull request may close this issue.