Deadlock in memory index #1833

shanson7 · 2020-05-22T12:07:26Z

Describe the bug
Noticed initially during startup, that while Loading the memory index and doing a meta record update at the same time, loading gets stuck forever. Took a stack trace but it's 20MB in size. Here is the (aggressively) reduced file via panicparse.

deadlock.clean.stack.txt

Helpful Information
Metrictank Version: master@commit 1dcfc8d
Golang Version (if not using an official binary or docker image): 1.12.6
OS: RHEL 7

The text was updated successfully, but these errors were encountered:

shanson7 · 2020-05-22T13:37:00Z

I think I have it:

First, we have the write queue which is stuck trying to acquire UnpartitionedMemoryIdx.Lock. This will block all RLocks trying to be acquired for UnpartitionedMemoryIdx. Nothing inherently wrong here.

1: semacquire [91 minutes] [Created by memory.NewWriteQueue @ write_queue.go:36]
    sync             sema.go:71                   runtime_SemacquireMutex(*uint32(#5693), bool(#31896))
    sync             rwmutex.go:98                (*RWMutex).Lock(*RWMutex(#5692))
    memory           memory.go:517                (*UnpartitionedMemoryIdx).indexTags(*UnpartitionedMemoryIdx(#5692), *MetricDefinition(#25554))
    memory           memory.go:630                (*UnpartitionedMemoryIdx).add(*UnpartitionedMemoryIdx(#5692), *Archive(#25554))
    memory           write_queue.go:76            (*WriteQueue).flush(*WriteQueue(#6007))
    memory           write_queue.go:95            (*WriteQueue).loop(#6007)

Next we have addMetaRecord which is stuck trying to acquire UnpartitionedMemoryIdx.RLock via the installed idLookup callback. Per above, it won't be able to get that before the write queue is done.

17: semacquire [91 minutes] [Created by memory.(*metaTagEnricher).addMetaRecord @ meta_tags.go:734]
    sync             sema.go:71                   runtime_SemacquireMutex(#5694, *)
    sync             rwmutex.go:50                (*RWMutex).RLock(...)
    memory           memory.go:301                NewUnpartitionedMemoryIdx.func1(*, 0, *, 1, 1, 0xffffffffffffffff, *)
    memory           meta_tags.go:91              (*metaTagIdx).getOrgMetaTagIndex.func1(0, *, 1, 1, 0xffffffffffffffff, *)

_addMetaRecord is called from the event listener triggered by the event that addMetaRecord above pushed. It acquires metaTagEnricher.Lock and is blocked on chan rcv. This chan is fed by addMetaRecord, but won't get anything until the above Rlock is acquired.

1: chan receive [91 minutes] [Created by memory.(*metaTagEnricher).start @ meta_tags.go:539]
    memory           meta_tags.go:748             (*metaTagEnricher)._addMetaRecord(*metaTagEnricher(#8470), interface{}(#56))
    memory           meta_tags.go:573             (*metaTagEnricher).start.func1(#8470)

Now we have delMetaRecord which is in the idLookup callback and is holding UnpartitionedMemoryIdx.RLock and is blocking up everyone else. It is stuck on a WaitGroup in (*TagQueryContext).Run. This WaitGroup is Done'd in several places.

1: semacquire [91 minutes] [Created by memory.(*metaTagEnricher).delMetaRecord @ meta_tags.go:780]
    sync             sema.go:56                   runtime_Semacquire(*uint32(#49786))
    sync             waitgroup.go:130             (*WaitGroup).Wait(*WaitGroup(#49785))
    memory           tag_query.go:225             (*TagQueryContext).Run(*TagQueryContext(#49785), TagIndex(#13183), map[MKey]*Archive(0x0), *metaTagHierarchy(#45604))
    memory           memory.go:311                NewUnpartitionedMemoryIdx.func1(0x1, 0x0, #2899, 0x1, 0x1, 0xffffffffffffffff, #45604)
    memory           meta_tags.go:91              (*metaTagIdx).getOrgMetaTagIndex.func1(0, #2899, 1, 1, 0xffffffffffffffff, #45604)

One such place that Dones the above WaitGroup is byTagValueFromMetricTagIndex. However this is stuck in a select pushing an id into the rawResChan from the callback above.

1: select [91 minutes] [Created by memory.(*idSelector).byTagValue @ tag_query_id_selector.go:105]
    memory           tag_query_id_selector.go:129 (*idSelector).byTagValueFromMetricTagIndex(#33146)

The above rawResChan is blocked because the function that receives those ideas is the event _delMetaRecord. This is stuck behind the _addMetaRecord record event above.

replay · 2020-05-22T20:00:09Z

Thanks for the useful analysis. I think the most import change which we should make is this:

The enricher event handler should never get stuck on waiting for an external lock, that's just asking for a dead lock and I think i didn't realize at the time when I wrote that code that the memory index lock can now block the execution of the enricher event queue because it is waiting for that id chan to get closed. Instead of pushing the addMetaRecord / delMetaRecord events into the queue and concurrently starting the necessary lookup from the memory index, it would be better to just call metaTagEnricher.idLookup() synchronously inside metaTagEnricher.addMetaRecord() / metaTagEnricher.delMetaRecord(), collect the results in a slice, and when it's finished push that slice with the addMetaRecord/delMetaRecord events instead of pushing the id chan.
We should be able to use a pool to reduce allocs for those slices as they'll get populated in metaTagEnricher.addMetaRecord() / metaTagEnricher.delMetaRecord() and freed in the corresponding event handlers.
You mentioned that your original stack trace was huge... was it full of routines concurrently doing that id lookup triggered by metaTagEnricher.addMetaRecord() because on startup tons of meta records got added to the enricher? or what were most of the routines doing?

shanson7 · 2020-05-26T10:06:51Z

You mentioned that your original stack trace was huge...what were most of the routines doing?

The summary I posted indicates how many routines were stuck in each stack, but ~1500 routines were coming from index HTTP requests (find_by_tag and autocomplete).

Edit: Looks like ~4000 gocql routines as well. 1300 in (*writeCoalescer).writeFlusher and 1300 in (*Conn).heartBeat and 1300 stemming from (*Conn).serve. Will need to look at config settings to see why there might be so many of those.

shanson7 added the bug label May 22, 2020

robert-milan added the customer-impacting label May 22, 2020

robert-milan added this to the sprint-12 milestone May 22, 2020

replay mentioned this issue May 26, 2020

Fix deadlock 1833 #1834

Merged

replay closed this as completed in #1834 May 28, 2020

replay mentioned this issue Aug 12, 2020

meta tags: avoid frequent write locks #1881

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock in memory index #1833

Deadlock in memory index #1833

shanson7 commented May 22, 2020

shanson7 commented May 22, 2020 •

edited

replay commented May 22, 2020

shanson7 commented May 26, 2020 •

edited

Deadlock in memory index #1833

Deadlock in memory index #1833

Comments

shanson7 commented May 22, 2020

shanson7 commented May 22, 2020 • edited

replay commented May 22, 2020

shanson7 commented May 26, 2020 • edited

shanson7 commented May 22, 2020 •

edited

shanson7 commented May 26, 2020 •

edited