Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-284] [Bug] Unable to run dbt inside a docker container due to missing dependencies #4784

Closed
1 task done
alexrosenfeld10 opened this issue Feb 24, 2022 · 30 comments · Fixed by #8069
Closed
1 task done
Labels
bug Something isn't working docker Related to official Docker files/images for dbt help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors install

Comments

@alexrosenfeld10
Copy link
Contributor

alexrosenfeld10 commented Feb 24, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Running dbt deps inside a Dockerfile has no effect on the final resulting image. This is pretty much a full blocker for me, and I feel like I'm going crazy that this isn't already an issue anyone has raised. I must be missing something obvious, right?

Expected Behavior

Running dbt deps should persist the generated dbt_packages in the final resulting image. Immutable deploys are super important so you know exactly what you're getting every time you ship code. This means the deps are bundled in the image at build time. That way if anything is down (like the servers where the dbt dependencies are hosted), my image still has everything it needs inside it to start up. Or, if I need to ship the exact same image again, I can.

Steps To Reproduce

Set up this docker file:

FROM ghcr.io/dbt-labs/dbt-core:1.0.latest
RUN pip install dbt-snowflake
COPY / ./
RUN dbt deps
ENTRYPOINT ["dbt", "--log-format", "json", "--warn-error", "run"]

The resulting image will not contain the dbt_packages directory, even though the dbt deps command was run. For example:

❯ docker run -it --entrypoint=/bin/sh my-image:my-tag
# ls
dbt_project.yml  models	     seeds	tests
analyses      macros	       packages.yml  snapshots
#

There is no dbt_packages directory.

Relevant log output

17:09:24  + docker build -t <my-company-jfrog-url>:2022-02-24.43-83023aa-rel -f Dockerfile .
17:09:24  Sending build context to Docker daemon  600.1kB

17:09:24  Step 1/5 : FROM ghcr.io/dbt-labs/dbt-core:1.0.latest
17:09:24   ---> afbbb0909dea
17:09:24  Step 2/5 : RUN pip install dbt-snowflake
17:09:24   ---> Using cache
17:09:24   ---> 218706b2f74e
17:09:24  Step 3/5 : COPY / ./
17:09:25   ---> 614cce69e3d8
17:09:25  Step 4/5 : RUN dbt deps
17:09:25   ---> Running in 2b9fc7ee9938
17:09:27  22:09:27  Running with dbt=1.0.2
17:09:28  22:09:28  Installing dbt-labs/dbt_utils
17:09:29  22:09:28    Installed from version 0.8.1
17:09:29  22:09:28    Up to date!
17:09:29  22:09:28  Installing calogica/dbt_expectations
17:09:29  22:09:29    Installed from version 0.5.1
17:09:29  22:09:29    Updated version available: 0.5.2
17:09:29  22:09:29  Installing calogica/dbt_date
17:09:29  22:09:29    Installed from version 0.5.3
17:09:29  22:09:29    Up to date!
17:09:29  22:09:29  
17:09:29  22:09:29  Updates available for packages: ['calogica/dbt_expectations']                 
17:09:29  Update your versions in packages.yml, then run dbt deps
17:09:30  Removing intermediate container 2b9fc7ee9938
17:09:30   ---> 1420e054b93a
17:09:30  Step 5/5 : ENTRYPOINT ["dbt", "--log-format", "json", "--warn-error", "run"]
17:09:30   ---> Running in 9a6e49e3fe2a
17:09:31  Removing intermediate container 9a6e49e3fe2a
17:09:31   ---> aa6f5ca2bcba
17:09:31  Successfully built aa6f5ca2bcba
17:09:31  Successfully tagged <my-company-jfrog-url>:2022-02-24.43-83023aa-rel

Environment

- OS:
- Python:
- dbt:

What database are you using dbt with?

No response

Additional Context

other folks are having the same issue: https://getdbt.slack.com/archives/C2JRRQDTL/p1645743857126709

@alexrosenfeld10 alexrosenfeld10 added bug Something isn't working triage labels Feb 24, 2022
@github-actions github-actions bot changed the title [Bug] Unable to run dbt inside a docker container due to missing dependencies [CT-284] [Bug] Unable to run dbt inside a docker container due to missing dependencies Feb 24, 2022
@iknox-fa iknox-fa removed the triage label Feb 24, 2022
@alexrosenfeld10
Copy link
Contributor Author

alexrosenfeld10 commented Feb 24, 2022

OK, I think this is because of this line: https://github.com/dbt-labs/dbt-core/blob/main/docker/Dockerfile#L53

it makes the default directory immutable at build time, which is super not-obvious for the user. Spent a good few hours digging around for this!

Here's how I'm getting around it:

FROM ghcr.io/dbt-labs/dbt-snowflake:1.0.latest

RUN dbt --version

WORKDIR /usr/my-project
COPY . .
RUN dbt deps

# TODO obviously this kind of thing needs to be parameterized in future.
ENTRYPOINT ["dbt", "--log-format", "json", "--warn-error", "run"]

@iknox-fa
Copy link
Contributor

Hmm. It seems to be working correctly for me:

> docker build -t deps_test .     
[+] Building 4.2s (9/9) FINISHED                                                                                                                        
 => [internal] load build definition from Dockerfile                                                                                               0.0s
 => => transferring dockerfile: 37B                                                                                                                0.0s
 => [internal] load .dockerignore                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                    0.0s
 => [internal] load metadata for ghcr.io/dbt-labs/dbt-core:1.0.latest                                                                              0.0s
 => [1/4] FROM ghcr.io/dbt-labs/dbt-core:1.0.latest                                                                                                0.0s
 => [internal] load build context                                                                                                                  0.1s
 => => transferring context: 1.29MB                                                                                                                0.1s
 => CACHED [2/4] RUN pip install dbt-snowflake                                                                                                     0.0s
 => [3/4] COPY / ./                                                                                                                                0.0s
 => [4/4] RUN dbt deps                                                                                                                             3.1s
 => exporting to image                                                                                                                             0.9s 
 => => exporting layers                                                                                                                            0.9s 
 => => writing image sha256:1085b060d72afe7f2b77b9f924c4f9269629b7fc686e01c0f45a51075fda9d84                                                       0.0s
 => => naming to docker.io/library/deps_test                                                                                                       0.0s

> docker run -it --entrypoint=/bin/sh deps_test
# ls
Dockerfile  data  dbt_packages	dbt_project.yml  logs  models  profiles.yml  target  tests  unused
# 

If it's possible can you provide your packages block from your project file so I can give a closer look? It feels to me like dbt deps is failing somehow.

@alexrosenfeld10
Copy link
Contributor Author

Thank you so much for taking a look @iknox-fa . What's really strange is it works for me on my local machine, but not in my CI job. However, my CI job runs the same commands... as per the log file I posted.

Here's my packages file, it's rather simple:

packages:
  - package: dbt-labs/dbt_utils
    version: 0.8.1
  - package: calogica/dbt_expectations
    version: 0.5.2

@alexrosenfeld10
Copy link
Contributor Author

also, here's my .dockerignore:

pycharm_settings.png
README.md
Jenkinsfile
Dockerfile
.gitignore
.pre-commit-config.yaml
CODEOWNERS
ansible
scripts

@iknox-fa
Copy link
Contributor

iknox-fa commented Feb 24, 2022

Lol those are two of the same deps I used to test. Very strange. I've reached the end of my day so I'll take another look with fresh eyes in the AM. In the meantime I'm glad you have a workaround.

@alexrosenfeld10
Copy link
Contributor Author

alexrosenfeld10 commented Feb 24, 2022

Yeah, I'm fried too I've been working on this issue all day haha. I hope we can get a consistent reproduction. Using the different directory seems to work, and I got my first successful deployed run of dbt with it so that's at least something! Thanks again for the help.

@alexrosenfeld10
Copy link
Contributor Author

alexrosenfeld10 commented Feb 25, 2022

Btw, I tested this a tiny bit further - it definitely is because of the VOLUME statement. Any files created in a docker VOLUME as a result of a RUN command during the build process aren't allowed to persist:

FROM ghcr.io/dbt-labs/dbt-snowflake:1.0.latest

RUN dbt --version

RUN touch test.txt

WORKDIR /usr/my-project
COPY . .
RUN dbt deps

ENTRYPOINT ["dbt", "--log-format", "json", "--warn-error", "run"]

Then, playing around in the resulting container:

❯ docker run -it --entrypoint=/bin/sh <my-companies-jfrog-url>:2022-02-25.50-8d25a43-rel
# pwd
/usr/my-project
# ls
dbt_packages     logs    models	     seeds	tests
analyses      dbt_project.yml  macros  packages.yml  snapshots
# ls ..
app  bin  disc-dbt  games  include  lib  libexec  local  sbin  share  src
# cd ../app
# pwd
/usr/app
# ls
dbt
# cd dbt
# pwd
/usr/app/dbt
# ls
# ls -a
.  ..

test.txt is nowhere to be found.

@alexrosenfeld10
Copy link
Contributor Author

signing off for reals now 👋 catch'ya on this issue tomorrow or something

@leahwicz
Copy link
Contributor

@alexrosenfeld10 thanks for reporting! At the moment we are not going to get time in the near future to dig into this more unfortunately. I'm going to add the help_wanted label on the issue in case anyone wants to take a crack at it before we free up. The Docker image creation code is in the repo if people wanted to test out solutions or try something new.

@leahwicz leahwicz added the help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors label May 10, 2022
@alexrosenfeld10
Copy link
Contributor Author

@leahwicz ok, fair enough. I think it might be as simple as removing the VOLUME /usr/app line, but I don't have any tribal knowledge on why it's there in the first place. 🤷

@alexrosenfeld10
Copy link
Contributor Author

at the very least, hopefully this issue serves as a searchable destination for users debugging the same weird behavior

@eugene-nikolaev
Copy link

at the very least, hopefully this issue serves as a searchable destination for users debugging the same weird behavior

it does)

So is there any workaround?

@eugene-nikolaev
Copy link

Btw, it can change the volume state on Mac OS that's why it is even more confusing.

@jtcohen6 sorry for bugging, but what is the intended way to setup own dbt deps using a dbt docker image?

@alexrosenfeld10
Copy link
Contributor Author

@eugene-nikolaev yes, the workaround is what I said here: #4784 (comment)

@alexrosenfeld10
Copy link
Contributor Author

(whoops didn't mean to close)

@eugene-nikolaev
Copy link

@alexrosenfeld10 you mean to copypaste a Dockerfile from dbt-core and throw away that line? Well, that will work, thanks

@alexrosenfeld10
Copy link
Contributor Author

alexrosenfeld10 commented Jun 29, 2022

No, I don't. I mean do this:

FROM ghcr.io/dbt-labs/dbt-snowflake:1.1.latest

# We have to move out of the default dir because of https://github.com/dbt-labs/dbt-core/issues/4784
WORKDIR /wherever/you/want
COPY . .
RUN dbt deps

# warnings as errors - https://docs.getdbt.com/reference/global-configs#warnings-as-errors
# further arguments are passed in the kubernetes config as "args"
ENTRYPOINT ["dbt","--log-format", "json", "--warn-error", "run"]

@alexrosenfeld10
Copy link
Contributor Author

ugh, the default action on command + enter is "comment and close". Obviously there are workarounds, but the core issue isn't fixed so i'm gonna reopen

@eugene-nikolaev
Copy link

@alexrosenfeld10, thanks a lot, works fine!

@yummydum
Copy link

yummydum commented Aug 17, 2022

Spent a whole day on the same problem, and finally I found this issue.
In my local Mac machine I can build with dbt_packages/ included in the image, but the same build in the github action server with ubuntu-latest OS does not have the dbt_packages/.
At the minimum, this should be mentioned in the docs.

@sklirg
Copy link

sklirg commented May 31, 2023

I also spent a lot of time on this problem before finding the root cause, and after that, this issue.

The reason for this differing from e.g. local machine to CI is because of the builder used. I can reproduce the same behaviour locally with the following Dockerfile:

FROM ghcr.io/dbt-labs/dbt-snowflake:1.0.latest

COPY . .

RUN dbt deps

with the following packages.yml

packages:
  - package: dbt-labs/dbt_utils
    version: 0.8.1
  - package: calogica/dbt_expectations
    version: 0.5.2

And a valid dbt_project.yml (because this is needed for dbt deps to run).

Using podman build (which builds using buildah) reproduces this issue:

$ podman build --no-cache .
...
$ podman run -it  --entrypoint /bin/sh 5f5231ec1807
# pwd
/usr/app/dbt
# ls -al
total 12
drwxr-xr-x. 2 root root   67 May 31 12:47 .
drwxr-xr-x. 3 root root   17 Apr 25  2022 ..
-rw-r--r--. 1 root root   71 May 31 12:45 Dockerfile
-rw-r--r--. 1 root root 1486 May 31 12:46 dbt_project.yml
-rw-r--r--. 1 root root  119 May 31 12:45 packages.yml

Then, building an image using Docker (buildkit enabled) instead yields the following result:

$ docker build .
...
$ docker run -it --entrypoint /bin/bash cf4946f3e532893de2af62f39d47c22b0c0aef16a7c3a46bd117dd2995881b20
root@9fac9fb1fbd5:/usr/app/dbt# ls -al
total 28
drwxr-xr-x 4 root root 4096 May 31 12:50 .
drwxr-xr-x 3 root root 4096 May 31 12:50 ..
-rw-r--r-- 1 root root   71 May 31 12:45 Dockerfile
drwxr-xr-x 5 root root 4096 May 31 12:50 dbt_packages
-rw-r--r-- 1 root root 1486 May 31 12:46 dbt_project.yml
drwxr-xr-x 2 root root 4096 May 31 12:50 logs
-rw-r--r-- 1 root root  119 May 31 12:45 packages.yml

root@9fac9fb1fbd5:/usr/app/dbt# ls -al dbt_packages/
total 20
drwxr-xr-x 5 root root 4096 May 31 12:50 .
drwxr-xr-x 4 root root 4096 May 31 12:50 ..
drwxrwxr-x 5 root root 4096 May 31 12:50 dbt_date
drwxrwxr-x 7 root root 4096 May 31 12:50 dbt_expectations
drwxrwxr-x 7 root root 4096 May 31 12:50 dbt_utils

And building with docker but without buildkit enabled yields the initial result/"error":

$ DOCKER_BUILDKIT=0 docker build .
...
$ docker run -it --entrypoint /bin/bash 0af8c7d3c809
root@43ed63c3131a:/usr/app/dbt# ls -al
total 20
drwxr-xr-x 2 root root 4096 May 31 12:56 .
drwxr-xr-x 3 root root 4096 May 31 12:56 ..
-rw-r--r-- 1 root root   71 May 31 12:45 Dockerfile
-rw-r--r-- 1 root root 1486 May 31 12:46 dbt_project.yml
-rw-r--r-- 1 root root  119 May 31 12:45 packages.yml

So, the behaviour is inconsistent across docker engines, and only docker buildkit produces the "expected" result.

Personally, I'm in favour of removing the VOLUME directive from the core Dockerfile. A VOLUME directive is used to indicate that this path is to be considered externally hosted, and will always be mounted in from an external source (to the docker runtime). It is not possible to override a VOLUME directive after one has been set.

The way this Dockerfile is currently set up, it is hard to use it as a source to build upon -- because the WORKDIR is in a VOLUME, meaning that any changes (e.g. RUN commands) to the same directory are not persisted. As mentioned previously in this thread, using another directory works fine (e.g. WORKDIR /usr/dbt).

Furthermore, on the VOLUME train, there should be few affected by removing this directive. Specifying the --volume flag to a container runtime (e.g. docker run --volume /opt/dbt:/usr/app) is always allowed and will create a volume binding even though no VOLUME directive is set in the Dockerfile. So, if the VOLUME directive had not already been set in the Dockerfile, mounting dbt sources from an external source would have worked just fine.
However, there might be some "magic" which auto-creates a volume if it is specified in the container image manifest, but it is undefined^ if this is persisted across container restarts (^undefined as in the behaviour varies across container runtimes). Specifying the volume path in a docker-compose.yml-file (but without a binding to a backing file system path) will create a local volume which will be re-used across container restarts, but not written to a "visible" file system path, IIRC.

To try to summarise: The current Dockerfile is hard to use without digging relatively deep into how Docker works (or finding this issue). The main pain point is the VOLUME directive. Removing it should be relatively straight-forward for existing users as far as I can tell. It will also resolve this issue, and making it a lot easier to use "out of the box" without having some documentation telling you to use another WORKDIR for things to work (and if that's the solution... why is there then a VOLUME specified which is not in use?).

@indy-jonesy
Copy link

indy-jonesy commented Jun 16, 2023

There is a different approach to solving this volume issue affecting dbt_packages and dbt deps, that I just came across by tinkering around.

I have built my own custom Dockerfile, but faced similar issues when trying to mount a VOLUME on the same directory of the active dbt project. Safe to say, RUN dbt deps was not persisting into the running container.

I was able to come up with a different approach to this problem that solved my use case.

You can simply update where dbt deps installs its packages for your dbt project, using the packages-install-path.

In the case of the current Dockerfile which has a VOLUME at /usr/app, simple pointing packages-install-path in the dbt_project.yml to a directory outside /usr/app should be enough. I did it in my service accounts home directory, so /home/service-account-user/dbt-packages

✅ TLDR SOLUTION ✅

  1. We know the VOLUME is pointing to the /usr/app directory
  2. Update the dbt_project.yml to have a packages-install-path: /home/service-account-user/dbt-packages
  • (Or really any path not associated with the VOLUME)
  1. Now, when you use RUN dbt deps it will package the installs in the appropriate directory and persist across the Docker layers.

The benefit here is this is just a modification to your dbt_project.yml not the actual Dockerfile .

Frankly, dbt labs should just update their image to place their packages outside of that directory to avoid this problem for everyone.

@yehoshuadimarsky
Copy link

Running into the same issue, thanks everyone for your help. Just to add on to the previous solution, here's what I'm doing, using an env var (see the jinja dbt docs) to set the packages-install-path and by default use the default one of dbt_packages. I only set this env var in the docker image, so I can control it fully.

In dbt_project.yml:

clean-targets:
  - "target"
  - "{{ env_var('MYCOMPANY_DBT_PACKAGE_INSTALL_PATH', 'dbt_packages') }}"
packages-install-path: "{{ env_var('MYCOMPANY_DBT_PACKAGE_INSTALL_PATH', 'dbt_packages') }}"

My Dockerfile

FROM ghcr.io/dbt-labs/dbt-bigquery:1.5.0

ENV MYCOMPANY_DBT_PACKAGE_INSTALL_PATH=/home/dbt_packages

RUN mkdir -m 777 -p ${MYCOMPANY_DBT_PACKAGE_INSTALL_PATH}

COPY ./my_package /usr/app
COPY ./profiles /root/.dbt/

RUN dbt deps

@alexrosenfeld10
Copy link
Contributor Author

@jtcohen6 maybe it's time to revisit this issue? Seems like a lot of folks have this problem, and I don't think it'd be a major lift for dbt Labs to fix

@dbeatty10
Copy link
Contributor

@alexrosenfeld10 would you be interested in raising a pull request, by any chance? e.g., a PR that removes this line (assuming that is the solution you are suggesting):

VOLUME /usr/app

The other key thing we'd need to do is write up an explanation of what users can do to restore the previous behavior (if needed).

@sklirg may have already provided that here:

Furthermore, on the VOLUME train, there should be few affected by removing this directive. Specifying the --volume flag to a container runtime (e.g. docker run --volume /opt/dbt:/usr/app) is always allowed and will create a volume binding even though no VOLUME directive is set in the Dockerfile.

@alexrosenfeld10
Copy link
Contributor Author

Sure, I can, but have little context into the impact / test cases / process needed.

@dbeatty10
Copy link
Contributor

Here's a rough outline of the most important pieces when opening the PR:

  1. Make your change in a local branch
  2. Install changie, run changie new, and commit the file it creates
  3. Sign the CLA
  4. Open a PR with your change

You've already done most of those in #8069 -- it's looking great 😎

The main part that is missing is the changelog part -- I'll follow-up within the PR itself for next steps there.

Then a member of the dbt Labs engineering team will review the PR once it's open. They'll help figure out what kind of testing is needed and give any other feedback that is needed prior to merge.

@alexrosenfeld10
Copy link
Contributor Author

alexrosenfeld10 commented Jul 11, 2023

yep, have done that before for actual code changes in here, just don't have the time on hand right now to get my local set up again (new machine). If you or someone else wants to push it over the line, that'd be fine, otherwise I'll get to it.. sometime

@dbeatty10
Copy link
Contributor

No prob @alexrosenfeld10 -- I just added the changelog entry to the PR ✅

@alexrosenfeld10
Copy link
Contributor Author

Thanks @dbeatty10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docker Related to official Docker files/images for dbt help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors install
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants