Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow-osx is failing in CI #21758

Closed
BillyONeal opened this issue Nov 29, 2021 · 15 comments · Fixed by #21912
Closed

tensorflow-osx is failing in CI #21758

BillyONeal opened this issue Nov 29, 2021 · 15 comments · Fixed by #21912
Assignees
Labels
category:port-bug The issue is with a library, which is something the port should already support

Comments

@BillyONeal
Copy link
Member

This issue is tracking investigation as to why tensorflow and tensorflow-cc are broken on our MacOS CI machines. Consider a recent CI run: https://dev.azure.com/vcpkg/public/_build/results?buildId=63779

I was able to reproduce this locally, tensorflow fails with:

bion@Billys-MBP vcpkg % ./vcpkg install tensorflow tensorflow-cc
Computing installation plan...
The following packages will be built and installed:
    tensorflow[core]:x64-osx -> 2.6.0
    tensorflow-cc[core]:x64-osx -> 2.6.0
  * tensorflow-common[core]:x64-osx -> 2.6.0
Additional packages (*) will be modified to complete this operation.
Detecting compiler hash for triplet x64-osx...
Restored 0 packages from /Users/bion/.cache/vcpkg/archives in 1.243 ms. Use --debug to see more details.
Starting package 1/3: tensorflow-common:x64-osx
Building package tensorflow-common[core]:x64-osx...
-- Installing: /Users/bion/vcpkg/packages/tensorflow-common_x64-osx/share/tensorflow-common/copyright
-- Performing post-build validation
-- Performing post-build validation done
Stored binary cache: /Users/bion/.cache/vcpkg/archives/bf/bff52bffca58f369d3f2db4f408db568032660656907661c9b668adf94fc4109.zip
Installing package tensorflow-common[core]:x64-osx...
Elapsed time for package tensorflow-common:x64-osx: 306.5 ms
Starting package 2/3: tensorflow:x64-osx
Building package tensorflow[core]:x64-osx...
-- Downloading https://github.com/bazelbuild/bazel/releases/download/4.1.0/bazel-4.1.0-darwin-x86_64 -> bazel-4.1.0-darwin-x86_64...
-- Installing: /Users/bion/vcpkg/downloads/tools/bazel/4.1.0-darwin/bazel
CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:127 (message):
    Command failed: /usr/bin/python3 -m pip install --user -U --force-reinstall numpy
    Working Directory: /Users/bion/vcpkg/buildtrees/tensorflow
    Error code: 1
    See logs for more information:
      /Users/bion/vcpkg/buildtrees/tensorflow/prerequesits-pip-x64-osx-out.log
      /Users/bion/vcpkg/buildtrees/tensorflow/prerequesits-pip-x64-osx-err.log

Call Stack (most recent call first):
  installed/x64-osx/share/tensorflow-common/tensorflow-common.cmake:54 (vcpkg_execute_required_process)
  ports/tensorflow/portfile.cmake:7 (include)
  scripts/ports.cmake:142 (include)


Error: Building package tensorflow:x64-osx failed with: BUILD_FAILED

When I run

/usr/bin/python3 -m pip install --user -U --force-reinstall numpy

directly outside of vcpkg I indeed get build failures attempting to compile numpy. However, I got a warning message:

ERROR: Could not build wheels for numpy which use PEP 517 and cannot be installed directly
WARNING: You are using pip version 20.2.3; however, version 21.3.1 is available.
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.

After /Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip, tensorflow appears to install OK.

I'm not sure how to fix this, since upgrading pip seems to be a "system modifying" action we don't want vcpkg to do. I could just manually upgrade pip on all the osx test hardware, but that seems to be just masking the problem.

We could add an extra help message, but ideally there would be some way to update pip in a private "user" way like we try to install numpy in the first place...

/cc @jgehw

@BillyONeal BillyONeal added the category:port-bug The issue is with a library, which is something the port should already support label Nov 29, 2021
@cenit
Copy link
Contributor

cenit commented Nov 29, 2021

what about adding also a --user option? the upgrade should happen in user folder and mask away system wide old pip...

@BillyONeal
Copy link
Member Author

what about adding also a --user option? the upgrade should happen in user folder and mask away system wide old pip...

Maybe that's the right thing. (Words cannot describe how little I understand about pip :) )

@JackBoosY
Copy link
Contributor

Should we temporary skip tensorflow in the failure triplets?
It only affect port tensorflow, tensorflow-cc, tensorflow-common and ffmpeg[tensorflow].

@ras0219-msft
Copy link
Contributor

Simply --user would not be enough; my understanding is that it would still modify the user's "global" version. The user would observe that their pip is now a different version after running vcpkg install tensorflow, which is just as unacceptable as apt install xyz.

If there is an option to bootstrap pip into a 100% private location for vcpkg, that could be interesting. Based on https://docs.python.org/3/library/ensurepip.html, it seems like we might be able to do

python -m ensurepip --upgrade --root /some/private/vcpkg/location
export PATH=/some/private/vcpkg/location:$PATH

@Hoikas
Copy link
Contributor

Hoikas commented Dec 1, 2021

FWIW, the python3 port installs a functioning interpreter that has pip available.

@BillyONeal
Copy link
Member Author

FWIW, the python3 port installs a functioning interpreter that has pip available.

We have pip available, just not of the matching version for the numpy wheel tensorflow wants.

Forcing the user to build python to use tensorflow is probably overkill and probably isn't what they want, since that would give them a tensorflow that loads into their custom version of python, not necessarily the release copies.

@cenit
Copy link
Contributor

cenit commented Dec 1, 2021

i’d have thought that a pip dependency, host: true was expected along our journey….

@cenit
Copy link
Contributor

cenit commented Dec 1, 2021

(which means yes, building python to build tensorflow….)

@BillyONeal
Copy link
Member Author

If there is an option to bootstrap pip into a 100% private location for vcpkg, that could be interesting. Based on https://docs.python.org/3/library/ensurepip.html

Unfortunately that says it uses the version of pip bootstrapped with that particular version of Python which is apparently too old here, since that would be the same version as the one that got automatically bootstrapped as when we got Python in the first place

@BillyONeal
Copy link
Member Author

At this point I think a reasonable thing to do would be to print a warning like we do for apt and friends that tensorflow's build may fail without the user running that command first, just like we do for ports that have system dependencies on apt-get and friends...

@ras0219-msft
Copy link
Contributor

ras0219-msft commented Dec 3, 2021

Unfortunately that says it uses the version of pip bootstrapped with that particular version of Python ...

I see. I think there should still be a potential path like:

$ python3 -m venv --symlinks --upgrade-deps /path/to/buildtrees/tensorflow/$triplet-venv
$ export VIRTUAL_ENV=/path/to/buildtrees/tensorflow/$triplet-venv
$ export PATH=/path/to/buildtrees/tensorflow/$triplet-venv/bin:$PATH
# python3 and pip are now from the virtual environment, with `pip` and `setuptools` upgraded to the latest on pypi

https://docs.python.org/3/library/venv.html

It's then also safe to install whatever tensorflow wants into that virtual environment, since it won't touch anything else on the user's machine (numpy, etc).

@jgehw
Copy link
Contributor

jgehw commented Dec 8, 2021

Thanks for the python venv ideas. I incorporated @ras0219-msft's suggestions in PR #21912.

@jgehw
Copy link
Contributor

jgehw commented Dec 8, 2021

BTW: From https://github.com/microsoft/vcpkg#telemetry I read that vcpkg collects some statistics. How can I see how many (approx.) installs there are from "my" port, and if this is little or much compared to an average port? Can I also see how "my" port installs are distributed over triplets?

@jgehw
Copy link
Contributor

jgehw commented Dec 8, 2021

Now Linux CI is failing complaining venv package is missing:

The virtual environment was not created successfully because ensurepip is not
available.  On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.

    apt install python3.8-venv

I think this is the error message @BillyONeal was referring to. Can one of the maintainers please install this package on CI (or instruct me how to do this, should I surprisingly have sufficient permission to do so)?

@BillyONeal
Copy link
Member Author

BTW: From https://github.com/microsoft/vcpkg#telemetry I read that vcpkg collects some statistics. How can I see how many (approx.) installs there are from "my" port, and if this is little or much compared to an average port? Can I also see how "my" port installs are distributed over triplets?

At risk of getting off topic on this thread, I'm not sure if that's information we can recover from the statistics we collect. For example, user-controlled identifiers must be crypto hashed so that we don't inadvertently collect PII if someone creates their own port "my-social-security-number-is-555-55-5555", and we also can't keep more than a couple weeks of data. We can ask questions like "how many people installed this port at this version in the last week" but not "what are all the things people installed". Here's where that's done; it looks like we do record a SHA per port:

https://github.com/microsoft/vcpkg-tool/blob/583d99603db279c1004799629f703913851a21cd/src/vcpkg/install.cpp#L1258-L1291

(We want no appearance, as well as no reality, of collecting information anyone may consider personal or problematic given that our bootstrap process must be noninteractive, giving us no opportunity to prompt on the question to make it truly opt-in)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:port-bug The issue is with a library, which is something the port should already support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants