[Relay][Quantization] KL-divergence-based per-layer calibration #3538

vinx13 · 2019-07-12T14:56:18Z

KL divergence algorithm ported from MXNet quantization
CollectStats pass that collects input of each simulated_quantize in annotated graph into a tuple output
Support floating-point scale
max_scale as an alternative for power2_scale in weight quantization

Evaluation code

https://gist.github.com/vinx13/6f1eb1f9e2c0a8786149ee881bfcd6aa

What's left:

I added QAnnotateKind.BIAS. I'm not sure whether it is necessary. Currently there are a few tricks in handling bias (nbit_bias, valid_range, ...). It would be good to find a better solution and avoid these tricks.
In my evaluation script, I have to write quantization workflow by myself (optimize, annotate, calibrate, realize). Please also share your thought on the design of calibrate function. We need to decide how users can specify different ways of quantization (max, power2, KLD, ...)

Evaluation result on ImageNet:

max_scale for weights, KL divergence for activations:

resnet18_v1, 0.70642 / 0.89702
resnet50_v1, 0.73682 / 0.91664
resnet101_v1, 0.74484 / 0.9208
resnet18_v2, 0.70794 / 0.89832
resnet50_v2, 0.7691 / 0.93268
resnet101_v2, 0.78204 / 0.94124

power2 for weights, KL divergence restricted to power2 value for activations (use --eval-power2 option in my evaluation script):

resnet18_v1, 0.70332 / 0.89526
resnet50_v1, 0.73426 / 0.9146
resnet101_v1, 0.72434 / 0.91058
resnet18_v2, 0.70314 / 0.89618
resnet50_v2, 0.76486 / 0.93108
resnet101_v2, 0.78066 / 0.94002

These experiments are done under opt_level=2. When opt_level=3, FoldScaleAxis might generate some outliers in bias vector and cause significant accuracy drops. We should use different scales than taking the maximum for bias in this case.

cc @tqchen @ZihengJiang @eqy @ajtulloch @antinucleon @FrozenGene

src/relay/pass/quantize.cc

vinx13 · 2019-07-16T14:49:44Z

This one is ready. Please review and share your thoughts on calibration api design.

python/tvm/relay/quantize/quantize.py

src/relay/pass/quantize.cc

ZihengJiang · 2019-07-18T23:12:41Z

src/relay/pass/quantize.cc

+// =============
+// calibration
+
+class StatsCollector : private ExprMutator {


If just collect stats, ExprVisitor should be enough

ExprMutator is needed actually. This mutator transform annotated expr to original expr by removing each simulate_quantize.
For example Relay program:

%1 = .. %2 = simulate_quantize(%1) %3 = op(%2) %4 = simulate_quantize(%3)

We need to profile %1 and %3. But %3 takes %2 as input, we need to replace input of %3 with %1 (because in Annotate pass simulate_quantize in %2 is not in passthrough mode, we need to either remove it or rewrite it in passthrough mode)

@ZihengJiang I was thinking that the other pr #3543 actually breaks this pass (because the result of this pass contains annotations and casts)

@vinx13 Why not collect stats before annotate?

@ZihengJiang Annotations tell us which nodes should be profiled. If we want to collect stats before annotate, we need to repeat the code similar to annotate to decide which node should be quantized.

Okay, let's keep the current way. #3543 will not breaks this pass since annotation.cast_hint and annotation.stop_fusion will not change the running result. They are just annotation and you can view them as identity. One thing is, instead of detecting and jumping simulated_quantize inside of IRMutator, let's adding an option like simulated_quantize(kind=kIdentity) for eliminating the impact of simulated_quantize

@ZihengJiang updated

src/relay/pass/quantize.cc

python/tvm/relay/quantize/quantize.py

ZihengJiang · 2019-07-22T04:55:24Z

@vinx13 Could you please address other comments?
We can change the calibrate API like you did for now. In long term, we should think about like calibrate(graph, mod, ctx, fcalibrate), where fcalibrate(sq_op, stats) is a callback function which can be provided by user.

src/relay/pass/quantize/calibration.cc

python/tvm/relay/quantize/kl_divergence.py

tqchen · 2019-07-24T18:22:59Z

@ZihengJiang @vinx13 please followup on this and let us merge soon

zhenhuaw-me

I leave some comments, please ping me if I understand in a wrong way.

python/tvm/relay/quantize/kl_divergence.py

src/relay/pass/quantize/quantize.h

tqchen · 2019-08-01T19:55:56Z

@ZihengJiang @jackwish please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

zhenhuaw-me

LGTM basically.

…he#3538) * [Relay][Quantization] Support floating-point scale * [Relay][Quantization] KL-divergence calibration on dataset * Fix unhandled LeftShift case in QuantizeRealize * Fix lint * drop QBias * fix lint * address comments * address comments * Update comments * address comments * lint * kQIdentity = 0

vinx13 added 2 commits July 12, 2019 14:00

[Relay][Quantization] Support floating-point scale

9140d33

[Relay][Quantization] KL-divergence calibration on dataset

c9f46c4

tqchen added priority: high status: need review labels Jul 12, 2019

shoubhik reviewed Jul 15, 2019

View reviewed changes

src/relay/pass/quantize.cc Outdated Show resolved Hide resolved

vinx13 added 2 commits July 16, 2019 08:27

Fix unhandled LeftShift case in QuantizeRealize

0b9c5f7

Fix lint

20ec855

vinx13 force-pushed the feature/calibration_v2 branch from 0377167 to 9d71db8 Compare July 16, 2019 14:22

vinx13 marked this pull request as ready for review July 16, 2019 14:23

vinx13 force-pushed the feature/calibration_v2 branch from 9d71db8 to 3d1d4cf Compare July 16, 2019 14:23

vinx13 force-pushed the feature/calibration_v2 branch from 3d1d4cf to 99ffbc2 Compare July 16, 2019 14:52

drop QBias

0e55518

vinx13 force-pushed the feature/calibration_v2 branch from 99ffbc2 to 0e55518 Compare July 16, 2019 15:02

fix lint

577387d

vinx13 force-pushed the feature/calibration_v2 branch from 76b8b76 to 577387d Compare July 16, 2019 15:37

ZihengJiang self-assigned this Jul 17, 2019

ZihengJiang requested changes Jul 18, 2019

View reviewed changes

ZihengJiang mentioned this pull request Jul 19, 2019

[QUANTIZE] Refactor quantization codebase and fix model accuracy #3543

Merged

vinx13 force-pushed the feature/calibration_v2 branch 2 times, most recently from 5f0406e to 16b27d4 Compare July 22, 2019 06:48

address comments

7ba8f30

vinx13 force-pushed the feature/calibration_v2 branch from 16b27d4 to 7ba8f30 Compare July 22, 2019 09:05

ZihengJiang reviewed Jul 23, 2019

View reviewed changes

src/relay/pass/quantize/calibration.cc Outdated Show resolved Hide resolved

ZihengJiang reviewed Jul 23, 2019

View reviewed changes

python/tvm/relay/quantize/kl_divergence.py Outdated Show resolved Hide resolved

address comments

a7557a4

zhenhuaw-me reviewed Jul 26, 2019

View reviewed changes

python/tvm/relay/quantize/kl_divergence.py Show resolved Hide resolved

python/tvm/relay/quantize/kl_divergence.py Show resolved Hide resolved

python/tvm/relay/quantize/kl_divergence.py Outdated Show resolved Hide resolved

Update comments

8e9be26

vinx13 and others added 2 commits July 27, 2019 07:16

address comments

3e09ee9

lint

0dd38e7

vinx13 force-pushed the feature/calibration_v2 branch from cc34c02 to 0dd38e7 Compare July 27, 2019 13:31

ZihengJiang reviewed Jul 27, 2019

View reviewed changes

src/relay/pass/quantize/quantize.h Outdated Show resolved Hide resolved

kQIdentity = 0

493d14b

vinx13 force-pushed the feature/calibration_v2 branch from 10ef14a to 493d14b Compare July 30, 2019 07:20

zhenhuaw-me approved these changes Aug 2, 2019

View reviewed changes

ZihengJiang approved these changes Aug 2, 2019

View reviewed changes

ZihengJiang merged commit 33ab3c6 into apache:master Aug 2, 2019

tqchen added status: accepted and removed priority: high status: need review labels Aug 2, 2019

tqchen mentioned this pull request Aug 2, 2019

Data-Aware Calibration #2651

Closed

vinx13 mentioned this pull request Aug 16, 2019

[Relay][Quantization] Fix out-of-date realize #3790

Merged

vinx13 mentioned this pull request Oct 24, 2019

[Relay][Quantize] Use fixed point mulplications #4160

Merged

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

masahi mentioned this pull request Nov 28, 2019

[WIP][RELAY][QUANTIZATION] automatic data-driven calibration and per-channel scales #3294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay][Quantization] KL-divergence-based per-layer calibration #3538

[Relay][Quantization] KL-divergence-based per-layer calibration #3538

vinx13 commented Jul 12, 2019 •

edited

Loading

vinx13 commented Jul 16, 2019

ZihengJiang Jul 18, 2019

vinx13 Jul 19, 2019

vinx13 Jul 22, 2019

ZihengJiang Jul 23, 2019 •

edited

Loading

vinx13 Jul 23, 2019

ZihengJiang Jul 25, 2019

vinx13 Aug 1, 2019

ZihengJiang commented Jul 22, 2019

tqchen commented Jul 24, 2019

zhenhuaw-me left a comment

tqchen commented Aug 1, 2019

zhenhuaw-me left a comment

[Relay][Quantization] KL-divergence-based per-layer calibration #3538

[Relay][Quantization] KL-divergence-based per-layer calibration #3538

Conversation

vinx13 commented Jul 12, 2019 • edited Loading

Evaluation code

What's left:

Evaluation result on ImageNet:

max_scale for weights, KL divergence for activations:

power2 for weights, KL divergence restricted to power2 value for activations (use --eval-power2 option in my evaluation script):

vinx13 commented Jul 16, 2019

ZihengJiang Jul 18, 2019

Choose a reason for hiding this comment

vinx13 Jul 19, 2019

Choose a reason for hiding this comment

vinx13 Jul 22, 2019

Choose a reason for hiding this comment

ZihengJiang Jul 23, 2019 • edited Loading

Choose a reason for hiding this comment

vinx13 Jul 23, 2019

Choose a reason for hiding this comment

ZihengJiang Jul 25, 2019

Choose a reason for hiding this comment

vinx13 Aug 1, 2019

Choose a reason for hiding this comment

ZihengJiang commented Jul 22, 2019

tqchen commented Jul 24, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

tqchen commented Aug 1, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

vinx13 commented Jul 12, 2019 •

edited

Loading

ZihengJiang Jul 23, 2019 •

edited

Loading