refactor cassandra read queries #931

Dieterbe · 2018-06-06T10:02:54Z

problem

when customers query 2 years worth of data, because we use 4-weekly rowkeys, we read from 27 different row keys, each using a separate query. When using avgConsolidator do times 2 (we read sum and cnt), so 54. this is per series.
this customer was querying 743 series, resulting in 743*54=40k read queries (each completing in under 10ms generally, but still causing a big latency overall)

We speculate that by doing fewer queries that hit multiple rowkeys at once, we can get better performance. I'm currently testing this

procedure

./launch.sh docker-dev-custom-cfg-kafka

wait until up. then (use latest fakemetrics code):

./fakemetrics backfill --kafka-mdm-addr localhost:9092 --kafka-mdm-v2=false --offset $((2*366*24))h --period 1800s --speedup 360000 --mpo 500

this should complete in about 3 minutes.
load MT dashboard and confirm it ingested data at 100kHz for 3 minutes

when done: in grafana look at some.id.of.a.metric.{1,72,199,320,438};id=*;some=tag to confirm data looks good (over 2y)

http benchmarking

when I was trying with only 1 series:

curl 'http://localhost:6060/render?target=some.id.of.a.metric.*&format=json&from=-2y' # to spot check result
time curl 'http://localhost:6060/render?target=some.id.of.a.metric.*&format=json&from=-2y' >/dev/null # benchmark

pre and post were both about 1.46 to 1.48s

same but with more series

input:

GET http://localhost:6060/render?target=some.id.of.a.metric.*&format=json&from=-2y
GET http://localhost:6060/
GET http://localhost:6060/
GET http://localhost:6060/
GET http://localhost:6060/

pre

date; cat input | vegeta attack -rate 1 -duration 300s | vegeta report; date
Thu Jun  7 21:04:00 CEST 2018
Requests      [total, rate]            300, 1.00
Duration      [total, attack, wait]    4m59.002713175s, 4m59.000036889s, 2.676286ms
Latencies     [mean, 50, 95, 99, max]  516.551765ms, 2.617993ms, 2.588240007s, 2.640280368s, 2.710676775s
Bytes In      [total, mean]            756023280, 2520077.60
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:300  
Error Set:
Thu Jun  7 21:08:59 CEST 2018

post

~ ❯❯❯ date; cat input | vegeta attack -rate 1 -duration 300s | vegeta report; date
Thu Jun  7 21:28:00 CEST 2018
Requests      [total, rate]            300, 1.00
Duration      [total, attack, wait]    4m59.002309685s, 4m59.000077373s, 2.232312ms
Latencies     [mean, 50, 95, 99, max]  509.845307ms, 2.964046ms, 2.549719772s, 2.589971517s, 2.648294993s
Bytes In      [total, mean]            756023172, 2520077.24
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:300  
Error Set:
Thu Jun  7 21:32:59 CEST 2018

CPU ram/cpu/io :
https://snapshot.raintank.io/dashboard/snapshot/K3C0XYLkWl4Vjq3nZtI5oih7wLek4h0J?orgId=2
MT dashboard:
https://snapshot.raintank.io/dashboard/snapshot/OrKp30bsRTMfS2pBsXNBWw6r48rV224I?orgId=2

conclusion

much fewer, and slower cassandra queries (as expected)
very minor decrease in http render latency
significant cassandra CPU reduction in favor of slight MT CPU increase
iters to points became 6ms slower: why? cause of increased cpu? or did we accidentally return more iters/chunks?
cassandra to iter 8ms slower. why? because of sorting perhaps?
cassandra get chunks about 20ms faster

Dieterbe · 2018-06-07T21:08:46Z

not looking too promising.
tried raising get-targets-concurrency but no avail. the cassandra searches simply become just as much slower, making the end result the same.
cpu spent in go-tsz.(*bstream).readBits, go-tsz.(*bstream).readByte, and go-tsz.(*Iter).Next is about 40% tried upgrading go-tsz to latest, but it's about the same. cassandra latency outweighs the rest, so cpu profile is misleading.

Dieterbe · 2018-06-08T12:59:25Z

PLOT TWIST:
the problem happens when enabling the chunk cache. cpu goes up, memory blows up, and the Failed to submit event to accounting, channel was blocked looks very similar to the production event.
note the 80% success rate (with 4/5 reqs being health checks, so all the render requests were failing)

see:
https://snapshot.raintank.io/dashboard/snapshot/EtYL25AHkAERJJa5p7KH9JrthFLDJgyb?orgId=2
https://snapshot.raintank.io/dashboard/snapshot/qBbjmBD1uzeh3U6eCaM6mw1ltk1w9jgC?orgId=2

date; cat input | vegeta attack -rate 1 -duration 300s | vegeta report; date
Fri Jun  8 14:01:11 CEST 2018
Requests      [total, rate]            300, 1.00
Duration      [total, attack, wait]    4m59.000800494s, 4m58.999803659s, 996.835µs
Latencies     [mean, 50, 95, 99, max]  1.579504ms, 939.818µs, 3.072098ms, 12.04762ms, 97.612382ms
Bytes In      [total, mean]            480, 1.60
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  80.00%
Status Codes  [code:count]             200:240  0:60  
Error Set:
Get http://localhost:6060/render?target=some.id.of.a.metric.*&format=json&from=-2y: net/http: timeout awaiting response headers
Fri Jun  8 14:06:36 CEST 2018

metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | 2018/06/08 12:06:07 [flat_accnt.go:136 act()] [E] Failed to submit event to accounting, channel was blocked
metrictank_1    | [Macaron] 2018-06-08 12:06:08: Started GET / for 172.20.0.1
metrictank_1    | [Macaron] 2018-06-08 12:06:08: Completed / 200 OK in 302.36µs
metrictank_1    | [Macaron] 2018-06-08 12:06:09: Started GET / for 172.20.0.1
metrictank_1    | [Macaron] 2018-06-08 12:06:09: Completed / 200 OK in 311.431µs
metrictank_1    | [Macaron] 2018-06-08 12:06:10: Started GET / for 172.20.0.1
metrictank_1    | [Macaron] 2018-06-08 12:06:10: Completed / 200 OK in 275.913µs
caddy           | 08/Jun/2018:12:06:11 +0000 [ERROR 502 /render] context canceled
metrictank_1    | [Macaron] 2018-06-08 12:06:16: Completed /render?target=some.id.of.a.metric.*&format=json&from=-2y 499  in 44.777299076s
caddy           | 08/Jun/2018:12:06:16 +0000 [ERROR 502 /render] context canceled
metrictank_1    | [Macaron] 2018-06-08 12:06:19: Completed /render?target=some.id.of.a.metric.*&format=json&from=-2y 499  in 43.01550749s
metrictank_1    | [Macaron] 2018-06-08 12:06:19: Completed /render?target=some.id.of.a.metric.*&format=json&from=-2y 499  in 38.015525732s
caddy           | 08/Jun/2018:12:06:21 +0000 [ERROR 502 /render] context canceled
caddy           | 08/Jun/2018:12:06:26 +0000 [ERROR 502 /render] context canceled

Dieterbe · 2018-06-13T11:01:39Z

after applying #943 :

cpu a bit lower
doesn't blow up ram anymore
requests all finish in under 3s (even when cache is supposed to be warm, which is way too high)
still seeing many cass/requests (which is why i suspect latency doesn't go down) and all hit-partial in the dashboard.
also seeing Failed to submit event to accounting, channel was blocked still, that might be preventing the cache from working properly. printing all events in FlatAccnt.act and running the workload shows about 90% caused by evnt_hit_chnk and 10% by evnt_add_chnk => will work on batched events for the cache
CPU profile:

(pprof) top30 -cum cache
Active filters:
   focus=cache
Showing nodes accounting for 3.72s, 8.48% of 43.88s total
Dropped 91 nodes (cum <= 0.22s)
Showing top 30 nodes out of 125
      flat  flat%   sum%        cum   cum%
     0.20s  0.46%  0.46%      5.98s 13.63%  github.com/grafana/metrictank/mdata/cache.(*CCache).Search.func1
     0.06s  0.14%  0.59%      5.78s 13.17%  github.com/grafana/metrictank/mdata/cache/accnt.(*FlatAccnt).HitChunk
     0.04s 0.091%  0.68%      5.25s 11.96%  github.com/grafana/metrictank/mdata/cache/accnt.(*FlatAccnt).act
     0.04s 0.091%  0.77%      4.08s  9.30%  github.com/grafana/metrictank/vendor/github.com/raintank/worldping-api/pkg/log.Error
     0.03s 0.068%  0.84%      4.04s  9.21%  github.com/grafana/metrictank/vendor/github.com/raintank/worldping-api/pkg/log.(*Logger).Error
     0.13s   0.3%  1.14%      3.77s  8.59%  github.com/grafana/metrictank/mdata/cache/accnt.(*FlatAccnt).eventLoop
     0.07s  0.16%  1.30%      3.45s  7.86%  github.com/grafana/metrictank/vendor/github.com/raintank/worldping-api/pkg/log.(*Logger).writerMsg
     0.29s  0.66%  1.96%      2.70s  6.15%  github.com/grafana/metrictank/mdata/cache/accnt.(*LRU).touch
         0     0%  1.96%      2.51s  5.72%  runtime.systemstack
     0.49s  1.12%  3.08%      1.78s  4.06%  runtime.mallocgc
     0.02s 0.046%  3.12%      1.77s  4.03%  runtime.Caller
     0.85s  1.94%  5.06%      1.51s  3.44%  runtime.mapaccess2
         0     0%  5.06%      1.37s  3.12%  github.com/grafana/metrictank/api.(*Server).getSeriesFixed
         0     0%  5.06%      1.37s  3.12%  github.com/grafana/metrictank/api.(*Server).getTarget
         0     0%  5.06%      1.37s  3.12%  github.com/grafana/metrictank/api.(*Server).getTargetsLocal.func1
         0     0%  5.06%      1.35s  3.08%  github.com/grafana/metrictank/api.(*Server).getSeries
         0     0%  5.06%      1.35s  3.08%  github.com/grafana/metrictank/api.(*Server).getSeriesCachedStore
     0.19s  0.43%  5.49%      1.35s  3.08%  runtime.chansend
     0.02s 0.046%  5.54%      1.13s  2.58%  runtime.selectnbsend
         0     0%  5.54%      1.12s  2.55%  github.com/grafana/metrictank/mdata/cache.(*CCache).Search
         0     0%  5.54%      1.12s  2.55%  github.com/grafana/metrictank/mdata/cache.(*CCacheMetric).Search
     0.12s  0.27%  5.81%      1.12s  2.55%  github.com/grafana/metrictank/mdata/cache.(*CCacheMetric).searchForward
     0.02s 0.046%  5.86%      1.11s  2.53%  runtime.callers
         0     0%  5.86%      1.09s  2.48%  runtime.callers.func1
     0.20s  0.46%  6.31%      1.09s  2.48%  runtime.gentraceback
         0     0%  6.31%      1.01s  2.30%  fmt.Sprintf
     0.06s  0.14%  6.45%      0.90s  2.05%  container/list.(*List).MoveToFront
     0.19s  0.43%  6.88%      0.87s  1.98%  runtime.gcWriteBarrier
     0.34s  0.77%  7.66%      0.85s  1.94%  runtime.pcvalue
     0.36s  0.82%  8.48%      0.84s  1.91%  runtime.lock
(pprof) list LRU.*touch
Total: 43.88s
ROUTINE ======================== github.com/grafana/metrictank/mdata/cache/accnt.(*LRU).touch in /home/dieter/go/src/github.com/grafana/metrictank/mdata/cache/accnt/lru.go
     290ms      2.70s (flat, cum)  6.15% of Total
         .          .     14:		list:  list.New(),
         .          .     15:		items: make(map[interface{}]*list.Element),
         .          .     16:	}
         .          .     17:}
         .          .     18:
      20ms       20ms     19:func (l *LRU) touch(key interface{}) {
     270ms      1.78s     20:	if ent, ok := l.items[key]; ok {
         .      900ms     21:		l.list.MoveToFront(ent)
         .          .     22:	} else {
         .          .     23:		l.items[key] = l.list.PushFront(key)
         .          .     24:	}
         .          .     25:}
         .          .     26:

the logger.Error overhead is coming from the FlatAccnt.act() errors, so moving to batched events should save about 10% cpu

Dieterbe · 2018-06-14T11:21:14Z

I ran the experiment above again (with chunkcache-AddRange merged but not yet the batched accounting)
the stats look a bit different and i'm not sure why (maybe i ran with a different cpu governor last time) but anyway i just want to compare that against the latest code which uses batched accounting.

differences with previous run cpu 250%, full cache hits, no cass hits once cached. (this doesn't match previous observation where i said cpu was lower and only had partial hits and leading to cassandra requests)

date; cat input | vegeta attack -rate 1 -duration 300s | vegeta report; date
Thu Jun 14 12:50:59 CEST 2018
Requests      [total, rate]            300, 1.00
Duration      [total, attack, wait]    4m59.000888467s, 4m58.99990704s, 981.427µs
Latencies     [mean, 50, 95, 99, max]  506.915838ms, 1.762138ms, 2.570193057s, 2.676656689s, 4.827845583s
Bytes In      [total, mean]            756995640, 2523318.80
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:300  
Error Set:
Thu Jun 14 12:55:58 CEST 2018

https://snapshot.raintank.io/dashboard/snapshot/9j0cGMOzq3sz9b4E7zSO0yIfUa6Wb9dW?orgId=2
https://snapshot.raintank.io/dashboard/snapshot/eZU42MDabp3yGaPxran2v1b4nep1bEQK?orgId=2

batched accounting:

date; cat input | vegeta attack -rate 1 -duration 300s | vegeta report; date
Thu Jun 14 13:47:37 CEST 2018
Requests      [total, rate]            300, 1.00
Duration      [total, attack, wait]    4m59.003074928s, 4m59.000059056s, 3.015872ms
Latencies     [mean, 50, 95, 99, max]  371.625767ms, 2.817488ms, 1.857461051s, 1.959102619s, 3.821066751s
Bytes In      [total, mean]            756931320, 2523104.40
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:300  
Error Set:
Thu Jun 14 13:52:36 CEST 2018

https://snapshot.raintank.io/dashboard/snapshot/EwZbEhluRrfeaurmDwxKNpS67Ax7Z0Zx?orgId=2
https://snapshot.raintank.io/dashboard/snapshot/SFnItYfVuwx5JUce6Jhyo95x5r12qRkD

conclusion: 20% cpu reduction
faster response times
hit rate same for both (100%), not querying cassandra after initial one.

1) less contention on channel 2) prevent channel overflowing 3) less CPU spent printing `Failed to submit event to accounting, channel was blocked` For the record, When I patch the code like so: diff --git a/mdata/cache/accnt/flat_accnt.go b/mdata/cache/accnt/flat_accnt.go index 4097e067..d202960c 100644 --- a/mdata/cache/accnt/flat_accnt.go +++ b/mdata/cache/accnt/flat_accnt.go @@ -1,7 +1,9 @@ package accnt import ( + "fmt" "sort" + "time" "github.com/grafana/metrictank/mdata/chunk" "github.com/raintank/worldping-api/pkg/log" @@ -178,6 +180,7 @@ func (a *FlatAccnt) eventLoop() { }, ) case evnt_add_chnks: + pre := time.Now() payload := event.pl.(*AddsPayload) a.addRange(payload.metric, payload.chunks) cacheChunkAdd.Add(len(payload.chunks)) @@ -189,6 +192,7 @@ func (a *FlatAccnt) eventLoop() { }, ) } + fmt.Println("CODE evnt_add_chnks took", time.Since(pre).Nanoseconds()) case evnt_hit_chnk: payload := event.pl.(*HitPayload) a.lru.touch( @@ -198,6 +202,7 @@ func (a *FlatAccnt) eventLoop() { }, ) case evnt_hit_chnks: + pre := time.Now() payload := event.pl.(*HitsPayload) for _, chunk := range payload.chunks { a.lru.touch( @@ -207,6 +212,7 @@ func (a *FlatAccnt) eventLoop() { }, ) } + fmt.Println("CODE evnt_hit_chnks took", time.Since(pre).Nanoseconds()) and run the workload described in #931 (queries for 2y of data across 500 series) I get these durations: awk '/evnt_add_chnks/ {print $4}' code2.txt |goplot hist 1187.00 -> 11160892.40: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 850 22320597.80: ▇▇ 39 (upto 22 ms) 33480303.20: ▇ 17 (upto 33 ms) 44640008.60: ▏ 5 (upto 45 ms) 55799714.00: ▏ 5 (upto 56 ms) 66959419.40: ▏ 1 (upto 67 ms) 78119124.80: ▏ 0 89278830.20: ▏ 0 100438535.60: ▏ 0 111598241.00: ▏ 1 (upto 111 ms) awk '/evnt_hit_chnks/ {print $4}' code2.txt |goplot hist 45.00 -> 22366478.50: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7624 44732912.00: ▏ 65 (upto 45 ms) 67099345.50: ▏ 29 (upto 67 ms) 89465779.00: ▏ 8 (upto 89 ms) 111832212.50: ▏ 1 (upto 112 ms) 134198646.00: ▏ 1 (upto 134 ms) 156565079.50: ▏ 0 178931513.00: ▏ 1 (upto 179 ms) 201297946.50: ▏ 0 223664380.00: ▏ 1 (upto 224 ms)

Dieterbe · 2018-06-14T12:59:41Z

the original problem is now addressed via #943
but this PR is still useful to merge, especially to lower cassandra CPU usage.

I just pushed these updates to this PR:
remove the "do not merge" commits that introduced and reverted MT_CHUNK_CACHE_MAX_SIZE: 0 and that adjusted retention for tests (retentions = 1s:6h:2min:2,30min:2y:6h:1 in docker/docker-dev-custom-cfg-kafka/storage-schemas.conf)
rebased to latest master.

NOTE: when merging, need to add nil checks for the new ccache methods introduced in #943

* CassandraStore.Search: instead of doing a query for each row, do a single query across all rows. with 4week rows, that would have been a lot of rows for long timerange queries * outcome -> readResult * no more need to sort readResults (there is only one) however we must now sort all chunks from Cassandra. * change: store.cassandra.chunks_per_row to store.cassandra.chunks_per_response

* pre-prepared queries for better performance * remove a bunch of unneeded cruft * by keeping a few properties in the Table definition we can cleanup a bunch of code

replay · 2018-06-15T19:29:58Z

store/cassandra/table.go

+const QueryFmtRead = "SELECT ts, data FROM %s WHERE key IN ? AND ts < ?"
+const QueryFmtWrite = "INSERT INTO %s (key, ts, data) values(?,?,?) USING TTL ?"
+
+// TTLTables stores table definitions keyed by their TTL


am i right that the TTLs are in hours? if so i'd specify that, otherwise it's very easy to assume that's seconds

they are not

replay

LGTM

1) less contention on channel 2) prevent channel overflowing 3) less CPU spent printing `Failed to submit event to accounting, channel was blocked` For the record, When I patch the code like so: diff --git a/mdata/cache/accnt/flat_accnt.go b/mdata/cache/accnt/flat_accnt.go index 4097e067..d202960c 100644 --- a/mdata/cache/accnt/flat_accnt.go +++ b/mdata/cache/accnt/flat_accnt.go @@ -1,7 +1,9 @@ package accnt import ( + "fmt" "sort" + "time" "github.com/grafana/metrictank/mdata/chunk" "github.com/raintank/worldping-api/pkg/log" @@ -178,6 +180,7 @@ func (a *FlatAccnt) eventLoop() { }, ) case evnt_add_chnks: + pre := time.Now() payload := event.pl.(*AddsPayload) a.addRange(payload.metric, payload.chunks) cacheChunkAdd.Add(len(payload.chunks)) @@ -189,6 +192,7 @@ func (a *FlatAccnt) eventLoop() { }, ) } + fmt.Println("CODE evnt_add_chnks took", time.Since(pre).Nanoseconds()) case evnt_hit_chnk: payload := event.pl.(*HitPayload) a.lru.touch( @@ -198,6 +202,7 @@ func (a *FlatAccnt) eventLoop() { }, ) case evnt_hit_chnks: + pre := time.Now() payload := event.pl.(*HitsPayload) for _, chunk := range payload.chunks { a.lru.touch( @@ -207,6 +212,7 @@ func (a *FlatAccnt) eventLoop() { }, ) } + fmt.Println("CODE evnt_hit_chnks took", time.Since(pre).Nanoseconds()) and run the workload described in #931 (queries for 2y of data across 500 series) I get these durations: awk '/evnt_add_chnks/ {print $4}' code2.txt |goplot hist 1187.00 -> 11160892.40: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 850 22320597.80: ▇▇ 39 (upto 22 ms) 33480303.20: ▇ 17 (upto 33 ms) 44640008.60: ▏ 5 (upto 45 ms) 55799714.00: ▏ 5 (upto 56 ms) 66959419.40: ▏ 1 (upto 67 ms) 78119124.80: ▏ 0 89278830.20: ▏ 0 100438535.60: ▏ 0 111598241.00: ▏ 1 (upto 111 ms) awk '/evnt_hit_chnks/ {print $4}' code2.txt |goplot hist 45.00 -> 22366478.50: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7624 44732912.00: ▏ 65 (upto 45 ms) 67099345.50: ▏ 29 (upto 67 ms) 89465779.00: ▏ 8 (upto 89 ms) 111832212.50: ▏ 1 (upto 112 ms) 134198646.00: ▏ 1 (upto 134 ms) 156565079.50: ▏ 0 178931513.00: ▏ 1 (upto 179 ms) 201297946.50: ▏ 0 223664380.00: ▏ 1 (upto 224 ms)

Dieterbe force-pushed the cass-multiple-rowkeys branch 2 times, most recently from af9b8d0 to 952d194 Compare June 8, 2018 11:35

Dieterbe mentioned this pull request Jun 14, 2018

chunk cache perf fixes: AddRange + batched accounting #943

Merged

Dieterbe force-pushed the cass-multiple-rowkeys branch from 952d194 to e8535cc Compare June 14, 2018 12:59

Dieterbe changed the title ~~WIP Cassandra multiple rowkeys~~ refactor cassandra read queries Jun 14, 2018

Dieterbe requested review from replay and woodsaj June 14, 2018 13:00

Dieterbe added 8 commits June 14, 2018 16:21

support completely disabling cache

6b408a7

remove unused {ChunkReadRequest,outcome}.month

b41a9b1

add example to explain better

e5a1784

don't stutter

b6bcb14

move table stuff to separate file

116dac5

refactor table stuff, pre-prepare queries

4a196d6

* pre-prepared queries for better performance * remove a bunch of unneeded cruft * by keeping a few properties in the Table definition we can cleanup a bunch of code

optimize allocations and loop in SearchTable

fa42728

Dieterbe force-pushed the cass-multiple-rowkeys branch from e8535cc to fa42728 Compare June 14, 2018 14:21

replay reviewed Jun 15, 2018

View reviewed changes

replay approved these changes Jun 15, 2018

View reviewed changes

Dieterbe merged commit d12b42f into master Jun 16, 2018

Dieterbe mentioned this pull request Jun 29, 2018

Point slice pool fixes #958

Merged

Dieterbe deleted the cass-multiple-rowkeys branch September 18, 2018 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor cassandra read queries #931

refactor cassandra read queries #931

Dieterbe commented Jun 6, 2018 •

edited

Dieterbe commented Jun 7, 2018 •

edited

Dieterbe commented Jun 8, 2018 •

edited

Dieterbe commented Jun 13, 2018 •

edited

Dieterbe commented Jun 14, 2018 •

edited

Dieterbe commented Jun 14, 2018 •

edited

replay Jun 15, 2018

Dieterbe Jun 16, 2018

replay left a comment

refactor cassandra read queries #931

refactor cassandra read queries #931

Conversation

Dieterbe commented Jun 6, 2018 • edited

problem

procedure

http benchmarking

when I was trying with only 1 series:

same but with more series

pre

post

conclusion

Dieterbe commented Jun 7, 2018 • edited

Dieterbe commented Jun 8, 2018 • edited

Dieterbe commented Jun 13, 2018 • edited

Dieterbe commented Jun 14, 2018 • edited

Dieterbe commented Jun 14, 2018 • edited

replay Jun 15, 2018

Choose a reason for hiding this comment

Dieterbe Jun 16, 2018

Choose a reason for hiding this comment

replay left a comment

Choose a reason for hiding this comment

Dieterbe commented Jun 6, 2018 •

edited

Dieterbe commented Jun 7, 2018 •

edited

Dieterbe commented Jun 8, 2018 •

edited

Dieterbe commented Jun 13, 2018 •

edited

Dieterbe commented Jun 14, 2018 •

edited

Dieterbe commented Jun 14, 2018 •

edited