Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Deadlock in memory index #1833

Closed
shanson7 opened this issue May 22, 2020 · 3 comments · Fixed by #1834
Closed

Deadlock in memory index #1833

shanson7 opened this issue May 22, 2020 · 3 comments · Fixed by #1834

Comments

@shanson7
Copy link
Collaborator

Describe the bug
Noticed initially during startup, that while Loading the memory index and doing a meta record update at the same time, loading gets stuck forever. Took a stack trace but it's 20MB in size. Here is the (aggressively) reduced file via panicparse.

deadlock.clean.stack.txt

Helpful Information
Metrictank Version: master@commit 1dcfc8d
Golang Version (if not using an official binary or docker image): 1.12.6
OS: RHEL 7

@shanson7
Copy link
Collaborator Author

shanson7 commented May 22, 2020

I think I have it:

First, we have the write queue which is stuck trying to acquire UnpartitionedMemoryIdx.Lock. This will block all RLocks trying to be acquired for UnpartitionedMemoryIdx. Nothing inherently wrong here.

1: semacquire [91 minutes] [Created by memory.NewWriteQueue @ write_queue.go:36]
    sync             sema.go:71                   runtime_SemacquireMutex(*uint32(#5693), bool(#31896))
    sync             rwmutex.go:98                (*RWMutex).Lock(*RWMutex(#5692))
    memory           memory.go:517                (*UnpartitionedMemoryIdx).indexTags(*UnpartitionedMemoryIdx(#5692), *MetricDefinition(#25554))
    memory           memory.go:630                (*UnpartitionedMemoryIdx).add(*UnpartitionedMemoryIdx(#5692), *Archive(#25554))
    memory           write_queue.go:76            (*WriteQueue).flush(*WriteQueue(#6007))
    memory           write_queue.go:95            (*WriteQueue).loop(#6007)

Next we have addMetaRecord which is stuck trying to acquire UnpartitionedMemoryIdx.RLock via the installed idLookup callback. Per above, it won't be able to get that before the write queue is done.

17: semacquire [91 minutes] [Created by memory.(*metaTagEnricher).addMetaRecord @ meta_tags.go:734]
    sync             sema.go:71                   runtime_SemacquireMutex(#5694, *)
    sync             rwmutex.go:50                (*RWMutex).RLock(...)
    memory           memory.go:301                NewUnpartitionedMemoryIdx.func1(*, 0, *, 1, 1, 0xffffffffffffffff, *)
    memory           meta_tags.go:91              (*metaTagIdx).getOrgMetaTagIndex.func1(0, *, 1, 1, 0xffffffffffffffff, *)

_addMetaRecord is called from the event listener triggered by the event that addMetaRecord above pushed. It acquires metaTagEnricher.Lock and is blocked on chan rcv. This chan is fed by addMetaRecord, but won't get anything until the above Rlock is acquired.

1: chan receive [91 minutes] [Created by memory.(*metaTagEnricher).start @ meta_tags.go:539]
    memory           meta_tags.go:748             (*metaTagEnricher)._addMetaRecord(*metaTagEnricher(#8470), interface{}(#56))
    memory           meta_tags.go:573             (*metaTagEnricher).start.func1(#8470)

Now we have delMetaRecord which is in the idLookup callback and is holding UnpartitionedMemoryIdx.RLock and is blocking up everyone else. It is stuck on a WaitGroup in (*TagQueryContext).Run. This WaitGroup is Done'd in several places.

1: semacquire [91 minutes] [Created by memory.(*metaTagEnricher).delMetaRecord @ meta_tags.go:780]
    sync             sema.go:56                   runtime_Semacquire(*uint32(#49786))
    sync             waitgroup.go:130             (*WaitGroup).Wait(*WaitGroup(#49785))
    memory           tag_query.go:225             (*TagQueryContext).Run(*TagQueryContext(#49785), TagIndex(#13183), map[MKey]*Archive(0x0), *metaTagHierarchy(#45604))
    memory           memory.go:311                NewUnpartitionedMemoryIdx.func1(0x1, 0x0, #2899, 0x1, 0x1, 0xffffffffffffffff, #45604)
    memory           meta_tags.go:91              (*metaTagIdx).getOrgMetaTagIndex.func1(0, #2899, 1, 1, 0xffffffffffffffff, #45604)

One such place that Dones the above WaitGroup is byTagValueFromMetricTagIndex. However this is stuck in a select pushing an id into the rawResChan from the callback above.

1: select [91 minutes] [Created by memory.(*idSelector).byTagValue @ tag_query_id_selector.go:105]
    memory           tag_query_id_selector.go:129 (*idSelector).byTagValueFromMetricTagIndex(#33146)

The above rawResChan is blocked because the function that receives those ideas is the event _delMetaRecord. This is stuck behind the _addMetaRecord record event above.

@replay
Copy link
Contributor

replay commented May 22, 2020

Thanks for the useful analysis. I think the most import change which we should make is this:

The enricher event handler should never get stuck on waiting for an external lock, that's just asking for a dead lock and I think i didn't realize at the time when I wrote that code that the memory index lock can now block the execution of the enricher event queue because it is waiting for that id chan to get closed. Instead of pushing the addMetaRecord / delMetaRecord events into the queue and concurrently starting the necessary lookup from the memory index, it would be better to just call metaTagEnricher.idLookup() synchronously inside metaTagEnricher.addMetaRecord() / metaTagEnricher.delMetaRecord(), collect the results in a slice, and when it's finished push that slice with the addMetaRecord/delMetaRecord events instead of pushing the id chan.
We should be able to use a pool to reduce allocs for those slices as they'll get populated in metaTagEnricher.addMetaRecord() / metaTagEnricher.delMetaRecord() and freed in the corresponding event handlers.
You mentioned that your original stack trace was huge... was it full of routines concurrently doing that id lookup triggered by metaTagEnricher.addMetaRecord() because on startup tons of meta records got added to the enricher? or what were most of the routines doing?

@shanson7
Copy link
Collaborator Author

shanson7 commented May 26, 2020

You mentioned that your original stack trace was huge...what were most of the routines doing?

The summary I posted indicates how many routines were stuck in each stack, but ~1500 routines were coming from index HTTP requests (find_by_tag and autocomplete).

Edit: Looks like ~4000 gocql routines as well. 1300 in (*writeCoalescer).writeFlusher and 1300 in (*Conn).heartBeat and 1300 stemming from (*Conn).serve. Will need to look at config settings to see why there might be so many of those.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants