Add TrOCR example #304

robertknight · 2024-08-13T07:53:11Z

This is a vision-to-text example with a very similar structure to the DistilViT example.

Different TrOCR model sizes use different tokenizers. This example works with the base model which uses a BPE tokenizer, but not the small model which uses a unigram tokenizer.

Compared to Ocrs the models are much larger and thus slower to execute. However being a bigger model it also has more capacity.

TODO:

Implement the If operator. This will allow using the "merged" output model from Optimum (decoder_model_merged.onnx), which is faster than using the cache-less model (decoder_model.onnx) alone and more size-efficient than using separate models for the initial run and subsequent runs. See Implement If operator #306.
Support cross-attention KV-caches in rten-generate. These are the past_key_values.{layer}.encoder.{key,value} inputs that Optimum uses. Unlike self-attention KV-caches these are generated once when the encoder is run for the first time and skipped in subsequent runs (Support cross-attention key-value caches in rten-generate #318)
Investigate why LayerNormalization op is not fused in the decoder
- The problem is that the "shift and scale" pattern in fuse_layer_norm doesn't match because it expects arguments to the Add and Mul operators to be constants. However they are actually value nodes which capture values from constants defined in the parent graph.

Support loading tokenizers which contain entries in the `vocab` map that do not appear in either `merges` or `added_tokens`. The TrOCR base model on Hugging Face (https://huggingface.co/microsoft/trocr-base-printed) has an `<|endoftext|>` token in the vocab which does not appear in the `merges` or `added_tokens` fields.

robertknight force-pushed the trocr-example branch from c7e5d7b to 73e0888 Compare August 14, 2024 07:11

robertknight mentioned this pull request Aug 17, 2024

Implement If operator #306

Merged

4 tasks

robertknight force-pushed the trocr-example branch 4 times, most recently from b1331ef to c055d34 Compare August 22, 2024 08:40

Set use_cache_branch inputs to 0 by default in rten-cli

ddf4109

robertknight force-pushed the trocr-example branch from c055d34 to 6fdc1f5 Compare August 23, 2024 07:45

robertknight added 2 commits August 23, 2024 08:59

Add TrOCR example

446b05c

robertknight force-pushed the trocr-example branch from 6fdc1f5 to 18e9b2a Compare August 23, 2024 08:02

robertknight marked this pull request as ready for review August 23, 2024 08:05

robertknight merged commit 09e8d19 into main Aug 23, 2024
2 checks passed

robertknight deleted the trocr-example branch August 23, 2024 08:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TrOCR example #304

Add TrOCR example #304

robertknight commented Aug 13, 2024 •

edited

Loading

Add TrOCR example #304

Add TrOCR example #304

Conversation

robertknight commented Aug 13, 2024 • edited Loading

robertknight commented Aug 13, 2024 •

edited

Loading