Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][Quantization] KL-divergence-based per-layer calibration #3538

Merged
merged 12 commits into from
Aug 2, 2019

Conversation

vinx13
Copy link
Member

@vinx13 vinx13 commented Jul 12, 2019

  • KL divergence algorithm ported from MXNet quantization
  • CollectStats pass that collects input of each simulated_quantize in annotated graph into a tuple output
  • Support floating-point scale
  • max_scale as an alternative for power2_scale in weight quantization

Evaluation code

https://gist.github.com/vinx13/6f1eb1f9e2c0a8786149ee881bfcd6aa

What's left:

  • I added QAnnotateKind.BIAS. I'm not sure whether it is necessary. Currently there are a few tricks in handling bias (nbit_bias, valid_range, ...). It would be good to find a better solution and avoid these tricks.
  • In my evaluation script, I have to write quantization workflow by myself (optimize, annotate, calibrate, realize). Please also share your thought on the design of calibrate function. We need to decide how users can specify different ways of quantization (max, power2, KLD, ...)

Evaluation result on ImageNet:

max_scale for weights, KL divergence for activations:

resnet18_v1, 0.70642 / 0.89702
resnet50_v1, 0.73682 / 0.91664
resnet101_v1, 0.74484 / 0.9208
resnet18_v2, 0.70794 / 0.89832
resnet50_v2, 0.7691 / 0.93268
resnet101_v2, 0.78204 / 0.94124

power2 for weights, KL divergence restricted to power2 value for activations (use --eval-power2 option in my evaluation script):

resnet18_v1, 0.70332 / 0.89526
resnet50_v1, 0.73426 / 0.9146
resnet101_v1, 0.72434 / 0.91058
resnet18_v2, 0.70314 / 0.89618
resnet50_v2, 0.76486 / 0.93108
resnet101_v2, 0.78066 / 0.94002

These experiments are done under opt_level=2. When opt_level=3, FoldScaleAxis might generate some outliers in bias vector and cause significant accuracy drops. We should use different scales than taking the maximum for bias in this case.

cc @tqchen @ZihengJiang @eqy @ajtulloch @antinucleon @FrozenGene

@vinx13 vinx13 marked this pull request as ready for review July 16, 2019 14:23
@vinx13
Copy link
Member Author

vinx13 commented Jul 16, 2019

This one is ready. Please review and share your thoughts on calibration api design.

@ZihengJiang ZihengJiang self-assigned this Jul 17, 2019
python/tvm/relay/quantize/quantize.py Show resolved Hide resolved
src/relay/pass/quantize.cc Outdated Show resolved Hide resolved
// =============
// calibration

class StatsCollector : private ExprMutator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If just collect stats, ExprVisitor should be enough

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExprMutator is needed actually. This mutator transform annotated expr to original expr by removing each simulate_quantize.
For example Relay program:

%1 = ..
%2 = simulate_quantize(%1)
%3 = op(%2)
%4 = simulate_quantize(%3)

We need to profile %1 and %3. But %3 takes %2 as input, we need to replace input of %3 with %1 (because in Annotate pass simulate_quantize in %2 is not in passthrough mode, we need to either remove it or rewrite it in passthrough mode)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZihengJiang I was thinking that the other pr #3543 actually breaks this pass (because the result of this pass contains annotations and casts)

Copy link
Contributor

@ZihengJiang ZihengJiang Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinx13 Why not collect stats before annotate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZihengJiang Annotations tell us which nodes should be profiled. If we want to collect stats before annotate, we need to repeat the code similar to annotate to decide which node should be quantized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's keep the current way. #3543 will not breaks this pass since annotation.cast_hint and annotation.stop_fusion will not change the running result. They are just annotation and you can view them as identity. One thing is, instead of detecting and jumping simulated_quantize inside of IRMutator, let's adding an option like simulated_quantize(kind=kIdentity) for eliminating the impact of simulated_quantize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZihengJiang updated

src/relay/pass/quantize.cc Outdated Show resolved Hide resolved
src/relay/pass/quantize.cc Show resolved Hide resolved
python/tvm/relay/quantize/quantize.py Show resolved Hide resolved
@ZihengJiang
Copy link
Contributor

@vinx13 Could you please address other comments?
We can change the calibrate API like you did for now. In long term, we should think about like calibrate(graph, mod, ctx, fcalibrate), where fcalibrate(sq_op, stats) is a callback function which can be provided by user.

@vinx13 vinx13 force-pushed the feature/calibration_v2 branch 2 times, most recently from 5f0406e to 16b27d4 Compare July 22, 2019 06:48
@tqchen
Copy link
Member

tqchen commented Jul 24, 2019

@ZihengJiang @vinx13 please followup on this and let us merge soon

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave some comments, please ping me if I understand in a wrong way.

python/tvm/relay/quantize/kl_divergence.py Show resolved Hide resolved
python/tvm/relay/quantize/kl_divergence.py Show resolved Hide resolved
python/tvm/relay/quantize/kl_divergence.py Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Aug 1, 2019

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM basically.

@ZihengJiang ZihengJiang merged commit 33ab3c6 into apache:master Aug 2, 2019
@tqchen tqchen mentioned this pull request Aug 2, 2019
wweic pushed a commit to wweic/tvm that referenced this pull request Aug 9, 2019
…he#3538)

* [Relay][Quantization] Support floating-point scale

* [Relay][Quantization] KL-divergence calibration on dataset

* Fix unhandled LeftShift case in QuantizeRealize

* Fix lint

* drop QBias

* fix lint

* address comments

* address comments

* Update comments

* address comments

* lint

* kQIdentity = 0
wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 6, 2019
…he#3538)

* [Relay][Quantization] Support floating-point scale

* [Relay][Quantization] KL-divergence calibration on dataset

* Fix unhandled LeftShift case in QuantizeRealize

* Fix lint

* drop QBias

* fix lint

* address comments

* address comments

* Update comments

* address comments

* lint

* kQIdentity = 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants