Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

dep init freeze for hours #947

Closed
cpapidas opened this issue Aug 4, 2017 · 50 comments · Fixed by #1206
Closed

dep init freeze for hours #947

cpapidas opened this issue Aug 4, 2017 · 50 comments · Fixed by #1206

Comments

@cpapidas
Copy link

cpapidas commented Aug 4, 2017

What version of Go (go version) and dep (git describe --tags) are you using?

go version go1.8 linux/amd64
dep latest version

What dep command did you run?

$ dep init -v
No Output

What did you expect to see?

The new files
*Gopkg.toml
*Gopkg.lock
*vendor(folder)

What did you see instead?

Nothing the command stuck there for hours and I have to press ctr+c to exit and have back the terminal.

@sdboyer
Copy link
Member

sdboyer commented Aug 4, 2017

hi! thanks for the issue. this is quite odd.

did you run this on an existing project, or something new? if the former, what if any tool were you using previously? is this reproducible?

no places in dep jump to mind where it could hang prior to generating any output. unless...@matjam, maybe a lockfile thing?

strace output would also probably help enormously to clear this up.

@matjam
Copy link
Contributor

matjam commented Aug 5, 2017

might be.

maybe I should add some stderr logging if its waiting for >60 seconds .. at least until you figure out the notification framework thing.

@jonahfang
Copy link

jonahfang commented Aug 5, 2017

I have the same problem! I use go in docker container and it is a very simple program.

@sdboyer
Copy link
Member

sdboyer commented Aug 5, 2017

@matjam yeah, i'll relent and break my rule a bit more - we can add some print statements around the lock. probably fire one right away if it decides we have to wait, then...maybe one every 10s or so thereafter?

@matjam
Copy link
Contributor

matjam commented Aug 5, 2017

@sdboyer okay, I'll give you a PR with something that does that. I'll use 15s if only to be contrary.

We are just assuming that it's the locking thing, but, it's the most likely culprit.

I did do some research; the locking library is mostly correct in it's behaviour; there is a potential race there, but I've not been able to quantify what the real risk is of hitting it.

What is the addage? Race conditions are so rare that they always happen? Something like that.

matjam added a commit to matjam/dep that referenced this issue Aug 5, 2017
waiting for something, we at least print a message to stderr about it.

Hopefully will make situations like what is described in golang#947 obvious.
@sdboyer
Copy link
Member

sdboyer commented Aug 7, 2017

ok, we have the warning message printing now when the lockfile is busy. @jonahfang, @cpapidas - could you update and see if you're still experiencing the indefinite hang?

@jonahfang
Copy link

@sdboyer It keep saying and still hang:

waiting for lockfile /go/pkg/dep/sm.lock: Lockfile created, but doesn't exist

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

@jonahfang OK, can we get some more details about the system?

  • What kernel/distribution?
  • Can you show us the mount options (output of mount)?
  • Is this being run inside a container?
  • is there an existing sm.lock file?

@sdboyer Its this code in the lockfile package:

	// return value intentionally ignored, as ignoring it is part of the algorithm
	_ = os.Link(tmplock.Name(), name)

	fiTmp, err := os.Lstat(tmplock.Name())
	if err != nil {
		return err
	}
	fiLock, err := os.Lstat(name)
	if err != nil {
		// tell user that a retry would be a good idea
		if os.IsNotExist(err) {
			return ErrNotExist
		}
		return err
	}

So it's failing to link.

@nightlyone any ideas?

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

@jonahfang also, can you try the following

$ go get github.com/nightlyone/lockfile
$ cd $GOPATH/src/github.com/nightlyone/lockfile
$ go test

you should see output like this:

$ go test
PASS
ok  	github.com/nightlyone/lockfile	0.011s

@jonahfang
Copy link

@matjam , the test result of nightlyone/lockfile as follows:

/go/src/github.com/nightlyone/lockfile $ go test
Error locking lockfile:  link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.397504553 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestBasicLockUnlock (0.00s)
--- FAIL: TestRogueDeletion (0.00s)
        lockfile_test.go:129: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.695546947 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRogueDeletionDeadPid (0.00s)
        lockfile_test.go:158: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.451082950 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRemovesStaleLockOnDeadOwner (0.01s)
        lockfile_test.go:202: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.451935720 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestInvalidPidLeadToReplacedLockfileAndSuccess (0.01s)
        lockfile_test.go:231: unexpected error: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.088922970 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
FAIL
exit status 1
FAIL    github.com/nightlyone/lockfile  0.043s

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

@jonahfang yeah something is wrong with that filesystem.

still need to see output of "mount"

@jonahfang
Copy link

I am being run inside a container, there is not an existing sm.lock file. I use image registry.cn-hangzhou.aliyuncs.com/fangzx/govim_1.0.

@jonahfang
Copy link

I run docker image in MacBookPro using Docker Toolbox:

docker run \
  --rm \
  -it \
  --name godemo \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v $PWD:/go \
  -v $PWD/.vim:/home/dev \
  registry.cn-hangzhou.aliyuncs.com/fangzx/govim_1.0

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

My guess is your container doesn't allow writes to the filesystem for whatever reason?

@jonahfang
Copy link

jonahfang commented Aug 7, 2017

But I use glide and godep well.

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

But when it's not in a container, it works fine. So...

@jonahfang
Copy link

I use glide and godep in the same docker container well.

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

That really means nothing, just because other things work, doesn't mean the container isn't broken, or the way you're using it isn't broken, or that there isn't something wrong with your kernel, or ...

That "operation not permitted" message is coming from the kernel. It's a response to a system call for doing file operations on /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.397504553

if the kernel is saying "no" that's a problem with either the container you're using, or docker, or something else on your system.

@jonahfang
Copy link

jonahfang commented Aug 7, 2017

My docker container use ubuntu:14.04,and filesystem mounted througth VirtualBox volume.

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

It's because it's trying to do a hard link inside the container. That container, or your system, doesn't allow that, so it's failing.

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

@sdboyer I think we need to change the way we do locking, this is another issue that's being caused by the hard link in the lockfile package; there was the ReFS issue #913 and now this.

@jonahfang are you running Windows 10 with ReFS for the underlying filesystem by any chance?

@jonahfang
Copy link

No, I use mac only.

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

Well, the container, or something, is not allow hard links in the filesystem, so that's why it's failing.

@matjam
Copy link
Contributor

matjam commented Aug 7, 2017

@jonahfang Can you please grab this library and test it in your container. You should get the following output:

$ go get gopkg.in/check.v1
$ go get github.com/theckman/go-flock
$ cd $GOPATH/src/github.com/theckman/go-flock
$ go test
OK: 7 passed
PASS
ok  	github.com/theckman/go-flock	0.012s

@jonahfang
Copy link

@matjam go test passed!

@jonahfang
Copy link

@matjam Any progress of this issue?

@matjam
Copy link
Contributor

matjam commented Aug 12, 2017

I need someone with ReFS on Windows to test the same module before I can recommend to @sdboyer that we switch to go-flock.

You'll have to find a workaround in the meantime. Not running inside a the container that you're using, for example. Other containers seem to handle links just fine.

@JesseRhoads
Copy link

JesseRhoads commented Aug 18, 2017

Was there a patch for this? I have this exact problem, running a docker container inside an ubuntu trusty vagrant box on a mac.
I can create files inside the container just fine, it just won't do it via go dep.
It does the same thing at the vagrant box level as well. Vagrant mounts the source directory so I don't think it can hard link.

@christianrondeau
Copy link

Update :

$ proot --link2symlink dep ensure -add github.com/src-d/go-git
all dirs lacked any go code

And nothing in the vendor folder... So, better but not a workaround yet.

@detailyang
Copy link

Well, It looks like the linkat syscall get the EPERM as the following:

linkat(AT_FDCWD, "/shared/gopath/pkg/dep/sm.lock.735551337", AT_FDCWD, "/shared/gopath/pkg/dep/sm.lock", 0) = -1 EPERM (Operation not permitted)

/shared (vboxsf) is mounted by virtualbox and my host machine is the Mac osx.
It's recommended to set gopath local file system:)

@matjam
Copy link
Contributor

matjam commented Sep 3, 2017

@cpapidas Would appreciate it if you could test this PR and confirm that it works for you.

@cpapidas
Copy link
Author

cpapidas commented Sep 7, 2017

@matjam I had tried your PR and I always get the same error.

$ dep init
waiting for lockfile /root/gopath/pkg/dep/sm.lock
waiting for lockfile /root/gopath/pkg/dep/sm.lock

also if I run the dep with "official" master version I get the following error

$ dep init
waiting for lockfile /root/gopath/pkg/dep/sm.lock: Lockfile created, but doesn't exist

@matjam
Copy link
Contributor

matjam commented Sep 7, 2017

@cpapidas can you give us information about what system you're running dep on?

Is this being run inside a container?

@cpapidas
Copy link
Author

cpapidas commented Sep 8, 2017

Yes, I run this inside a docker container. All the golang configuration (GOPATH, etc ...) exists only in the container.

docker version Docker version 17.06.2-ce, build cec0b72

Also, have in mind that I run the docker from mac OS 10.12 or Vagrant 1.9.3.

dockerfile

FROM ubuntu:16.04

WORKDIR /root

RUN apt-get update \
  && apt-get install curl git -y \
  && cd ~ \
  && curl -O https://storage.googleapis.com/golang/go1.8.linux-amd64.tar.gz \
  && tar xvf go1.8.linux-amd64.tar.gz \
  && chown -R root:root ./go \
  && mv go /usr/local

ENV GOROOT /usr/local/go
ENV GOPATH /root/gopath
ENV GOBIN /root/gopath/bin
ENV PATH $PATH:$GOROOT/bin:$GOPATH/bin

COPY run /usr/local/bin/run
RUN chmod +x /usr/local/bin/run

CMD ["/usr/local/bin/run"]

The run file just builds and run the golang program.

Please if you want anything else, let me know.

@matjam
Copy link
Contributor

matjam commented Sep 8, 2017

@cpapidas Can you give me the output of "docker info"?

@cpapidas
Copy link
Author

cpapidas commented Sep 8, 2017

$ docker info

Containers: 7
 Running: 0
 Paused: 0
 Stopped: 7
Images: 51
Server Version: 17.06.2-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 67
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-77-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.797GiB
Name: ubuntu-xenial
ID: BAX7:PGP6:LPUZ:7QVL:VGGP:YVLI:34R5:Y3SE:SSSC:BCKK:2DWS:ZDYM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

@matjam
Copy link
Contributor

matjam commented Sep 8, 2017

Okay. So, apparently aufs doesn't do either fnctl style flocking or hard linking in a standard way.

The notes in the "incompatible" section are relevant, if not particularly clear: http://aufs.sourceforge.net/aufs.html

@sdboyer we will need to do something else. We can't use a global lock file like this as a way to arbitrate who gets to write changes.

@nightlyone
Copy link

Please note that the bug seems to only reproduce on non-Linux docker hosts. So I believe it is specific to vboxfs and not to aufs.

@christianrondeau
Copy link

@nightlyone actually I can reproduce in Termux, which is not running Docker.

@jack-chung
Copy link

I'm facing the same problem with Mac OS (host) -> Vagrant (virtualbox) -> Ubuntu 16 (guest). No docker. The $GOPATH folder inside ubuntu is a virtual synced folder from the host OS.

Running dep on the same folder directly on the host OS works.

@sdboyer
Copy link
Member

sdboyer commented Sep 29, 2017

if you're running into this problem, a workaround should be to use the latest tip and set DEPNOLOCK=1 in the environment.

just, make very sure that you don't run two dep processes at once. it's reasonably likely to corrupt your GOPATH/pkg/dep/sources cache 💥 🤷‍♂️

@JesseRhoads
Copy link

JesseRhoads commented Sep 30, 2017

The DEPNOLOCK=1 workaround fixes the problem in virtualbox+docker.
Thanks @ayang64 and @sdboyer.

@dch
Copy link

dch commented Oct 30, 2017

FWIW ran into the same issue reported here while building github.com/sachaos/todoist on FreeBSD with zfs filesystem which is extremely common on FreeBSD. Might this be a stdlib bug?

I added my notes in case this helps future people landing on this erro, work-around works.

> uname -a
FreeBSD wintermute.skunkwerks.at 12.0-CURRENT FreeBSD 12.0-CURRENT #17 r325010+e50db1d5c304(master): Thu Oct 26 13:59:17 UTC 2017     root@wintermute:/usr/obj/usr/src/sys/GENERIC  amd64

> go version
go version go1.9.1 freebsd/amd64

> dep version
dep:
 version     : devel
 build date  : 
 git hash    : 
 go version  : go1.9.1
 go compiler : gc
 platform    : freebsd/amd64
> dep ensure -v
dep ensure
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist

requested test repos

> go get github.com/nightlyone/lockfile
>  cd $GOPATH/src/github.com/nightlyone/lockfile
> go test
Error locking lockfile:  link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.431102591 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestBasicLockUnlock (0.00s)
--- FAIL: TestRogueDeletion (0.00s)
	lockfile_test.go:129: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.949387529 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRogueDeletionDeadPid (0.00s)
	lockfile_test.go:158: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.448157908 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRemovesStaleLockOnDeadOwner (0.00s)
	lockfile_test.go:202: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.959225638 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestInvalidPidLeadToReplacedLockfileAndSuccess (0.00s)
	lockfile_test.go:231: unexpected error: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.332173640 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
FAIL
exit status 1
FAIL	github.com/nightlyone/lockfile	0.004s

>  go get gopkg.in/check.v1
> go get github.com/theckman/go-flock
> cd $GOPATH/src/github.com/theckman/go-flock
> go test
OK: 7 passed
PASS
ok  	github.com/theckman/go-flock	0.003s

results when using workaround:

> env DEPNOLOCK=1 dep ensure -v
Gopkg.lock was already in sync with imports and Gopkg.toml
(1/22) Wrote golang.org/x/sys@master
(2/22) Wrote golang.org/x/text@master
(3/22) Wrote github.com/magiconair/properties@v1.7.3
(4/22) Wrote github.com/spf13/afero@master
(5/22) Wrote gopkg.in/yaml.v2@v2
(6/22) Wrote github.com/spf13/pflag@v1.0.0
(7/22) Wrote github.com/spf13/viper@v1.0.0
(8/22) Wrote github.com/stretchr/testify@v1.1.4
(9/22) Wrote github.com/hashicorp/hcl@master
(10/22) Wrote github.com/fatih/color@v1.5.0
(11/22) Wrote github.com/davecgh/go-spew@v1.1.0
(12/22) Wrote github.com/satori/go.uuid@v1.1.0
(13/22) Wrote github.com/mitchellh/mapstructure@master
(14/22) Wrote github.com/mattn/go-colorable@v0.0.9
(15/22) Wrote github.com/pelletier/go-toml@v1.0.1
(16/22) Wrote github.com/pmezard/go-difflib@v1.0.0
(17/22) Wrote github.com/mattn/go-isatty@v0.0.3
(18/22) Wrote github.com/pkg/browser@master
(19/22) Wrote github.com/spf13/cast@v1.1.0
(20/22) Wrote github.com/fsnotify/fsnotify@v1.4.2
(21/22) Wrote github.com/spf13/jwalterweatherman@master
(22/22) Wrote github.com/urfave/cli@v1.20.0

\o/ thanks!

@sdboyer
Copy link
Member

sdboyer commented Oct 30, 2017

oi. yeah, another reason to swap out the lib we use to do this 😢

@RichyHBM
Copy link

This issue is still occurring, has there been any progress at to change locking mechanism?

This is happening on linux, no container

@JesseRhoads
Copy link

@RichyHBM Did you try export DEPNOLOCK=1 in the shell before running go dep as a workaround?
That is what we have done until the mechanism changes.

@RichyHBM
Copy link

Yes, that works as expected but it shouldnt be the fix

@Napat
Copy link

Napat commented Apr 27, 2018

Lock mechanism still problem with docker(golang:1.10-alpine) on Windows10Pro(hyper-v enable).
Btw workaround is ok.

@jims
Copy link

jims commented Jun 18, 2018

I get this on windows on an exFAT filesystem as well

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet