Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Race condition in FindTags #946

Closed
shanson7 opened this issue Jun 15, 2018 · 1 comment · Fixed by #949
Closed

Race condition in FindTags #946

shanson7 opened this issue Jun 15, 2018 · 1 comment · Fixed by #949

Comments

@shanson7
Copy link
Collaborator

Got a panic like:

fatal error: concurrent map read and map write

goroutine 9506682 [running]:
runtime.throw(0xf977b8, 0x21)
        /usr/lib/go/src/runtime/panic.go:605 +0x95 fp=0xc4b587fb88 sp=0xc4b587fb68 pc=0x42bd45
runtime.mapaccess2(0xe26160, 0xc4209fa480, 0xc4b587fc8c, 0x4, 0xc4b587ffa0)
        /usr/lib/go/src/runtime/hashmap.go:413 +0x24e fp=0xc4b587fbd0 sp=0xc4b587fb88 pc=0x4088ae
github.com/grafana/metrictank/idx/memory.(*TagQuery).filterTagsFromChan(0xc5e5bf1200, 0xc6a8d62000, 0xc460ac2d20, 0xc460ac2cc0, 0x1)
        /home/jenkins/workspace/metrictank-build/gopath/src/github.com/grafana/metrictank/idx/memory/tag_query.go:851 +0x5e1 fp=0xc4b587ffb8 sp=0xc4b587fbd0 pc=0xb69111
runtime.goexit()
        /usr/lib/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc4b587ffc0 sp=0xc4b587ffb8 pc=0x45ce51
created by github.com/grafana/metrictank/idx/memory.(*TagQuery).RunGetTags
        /home/jenkins/workspace/metrictank-build/gopath/src/github.com/grafana/metrictank/idx/memory/tag_query.go:987 +0x1dd

further down:

goroutine 7073 [runnable]:
strings.Index(0xc7b458046e, 0x12, 0xf79edd, 0x1, 0x2)
        /usr/lib/go/src/strings/strings_amd64.go:27 +0x5c3
strings.genSplit(0xc7b458046e, 0x12, 0xf79edd, 0x1, 0x0, 0x1, 0x2, 0x2, 0xc5480a0fd0)
        /usr/lib/go/src/strings/strings.go:246 +0x11b
strings.SplitN(0xc7b458046e, 0x12, 0xf79edd, 0x1, 0x2, 0xc545611201, 0x2, 0x2)
        /usr/lib/go/src/strings/strings.go:268 +0x5c
github.com/grafana/metrictank/idx/memory.(*MemoryIdx).indexTags(0xc4201cf110, 0xc4385a9050)
        /home/jenkins/workspace/metrictank-build/gopath/src/github.com/grafana/metrictank/idx/memory/memory.go:296 +0x110
github.com/grafana/metrictank/idx/memory.(*MemoryIdx).AddOrUpdate(0xc4201cf110, 0x8cf797ff0d1924f9, 0xaafed7be6e899a96, 0xc400000001, 0xc6e3b32300, 0xc5000001c1, 0x0, 0x0, 0x0, 0x0, ...)
        /home/jenkins/workspace/metrictank-build/gopath/src/github.com/grafana/metrictank/idx/memory/memory.go:269 +0x3f3
github.com/grafana/metrictank/idx/cassandra.(*CasIdx).AddOrUpdate(0xc4201cf110, 0x8cf797ff0d1924f9, 0xaafed7be6e899a96, 0xaafed7be00000001, 0xc6e3b32300, 0x1c1, 0x0, 0x0, 0x0, 0x0, ...)
        /home/jenkins/workspace/metrictank-build/gopath/src/github.com/grafana/metrictank/idx/cassandra/cassandra.go:298 +0xe6
github.com/grafana/metrictank/input.DefaultHandler.ProcessMetricData(0xc4db3164b8, 0xc4db3164bc, 0xc4db3164c0, 0xc523ed8be0, 0xc523ed8c00, 0xc4db3164c4, 0x1559800, 0xc420153020, 0x156aba0, 0xc4201cf110, ...)

I think it goes something like this:

  1. In FindTags we acquire a Read lock here
  2. In RunGetTags we determine that we can end early here
  3. The workers can still iterate one more time before detecting that stopCh was closed and hit this map read here

While (3) is happening the read lock from (1) could be long gone and another thread can have acquired a write lock and be modifying the map.

@replay
Copy link
Contributor

replay commented Jun 20, 2018

Good find, that explanation makes sense.
I guess we could just add a q.wg.Wait() before the return from RunGetTags() here. Then it should not return before all the filterTagsFromChan() go routines have stopped (have not tested that).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants