Add chatbot example using Qwen2 #282

robertknight · 2024-07-15T07:05:14Z

Add a chatbot demo. This uses the Qwen2 model since it provides 0.5b and 1.5b sizes that are best-in-class, and the tokenization is conveniently a derivative of the GPT-2 tokenization which rten-text supports. Larger models would produce better results, but since RTen only supports fp32 precision at present, such models become slow due to the memory bandwidth requirements. 1.5b is about the largest size that produces "usable" speed on my Intel i5.

In the process it was necessary to add some capabilities to rten-generate and rten-text to better support chat-like applications:

Support adding user input to the model input after the initial generation, via Generator::append_prompt
Fix generation of the attention_mask input. It is supposed to have the length of the input sequence not just the new input IDs. Previously it was incorrect after the initial step, but the code worked because on subsequent steps it had size 1 and that gets broadcast to whatever size is required.
Support multiple stop tokens. Qwen2 uses <|imend|> and <|endoftext|>
Support temperature in top-K sampling
Add fake support for NFC normalization in text encoding

TODO

Tests for append_prompt
~~Replace dummy NFC normalization with proper implementation~~ (Deferred for later)

Zero is a value that is "safe" for more inputs with an `_ids` suffix, such as `position_ids`.

In HuggingFace models this input's sequence axis is expected to have the same size as the sequence length, not the length of the input IDs that are being provided at the current step. The length was correct for the initial prompt but wrong for subsequent generation steps. However when only one new token was added during iterative decoding, the `attention_mask` worked despite being the wrong size because 1-sized inputs are broadcast by various operators. When appending multiple tokens after the initial generation however, such as when adding a tokenized chat message from a user, this broadcasting failed.

- Fix a design mistake that using `stop_on_token` would cause generation to silently stop without propagating the error to the caller. - Support multiple end-of-turn token IDs. Qwen2 for example can emit either `<|endoftext|>` or `<|im_end|>` tokens.

This is useful in chat applications for example where generation alternates between iterative decoding and feeding in tokenized user input.

This allows loading `tokenizer.json` files which specify NFC normalization, but doesn't actually implement the normalization yet. This is OK as long as the input text doesn't require it.

Qwen2 was chosen as an initial chatbot example because its tokenization is very similar to the already-supported GPT-2 and it is one of the best very small instruction-tuned models.

robertknight added 5 commits July 14, 2024 14:15

Use zero as the value for *_ids inputs in rten-cli

78cd621

Zero is a value that is "safe" for more inputs with an `_ids` suffix, such as `position_ids`.

Revise stop_on_tokens API

f991f63

- Fix a design mistake that using `stop_on_token` would cause generation to silently stop without propagating the error to the caller. - Support multiple end-of-turn token IDs. Qwen2 for example can emit either `<|endoftext|>` or `<|im_end|>` tokens.

Support adding to prompt after initial generation

5ef3cb2

This is useful in chat applications for example where generation alternates between iterative decoding and feeding in tokenized user input.

Support specifying temperature in TopKSampler

65b837b

robertknight force-pushed the qwen-chat branch from a356447 to 290cf22 Compare July 16, 2024 06:27

robertknight added 3 commits July 16, 2024 07:34

Improve explanation of memory usage impact of Model::load_mmap

ce0b717

Add dummy NFC normalization support in tokenizer

259e24c

This allows loading `tokenizer.json` files which specify NFC normalization, but doesn't actually implement the normalization yet. This is OK as long as the input text doesn't require it.

Add chatbot example using Qwen2

03245c8

Qwen2 was chosen as an initial chatbot example because its tokenization is very similar to the already-supported GPT-2 and it is one of the best very small instruction-tuned models.

robertknight force-pushed the qwen-chat branch from 290cf22 to 03245c8 Compare July 16, 2024 06:35

robertknight marked this pull request as ready for review July 16, 2024 06:38

robertknight merged commit 5ef6e67 into main Jul 16, 2024
2 checks passed

robertknight deleted the qwen-chat branch July 16, 2024 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chatbot example using Qwen2 #282

Add chatbot example using Qwen2 #282

robertknight commented Jul 15, 2024 •

edited

Loading

Add chatbot example using Qwen2 #282

Add chatbot example using Qwen2 #282

Conversation

robertknight commented Jul 15, 2024 • edited Loading

robertknight commented Jul 15, 2024 •

edited

Loading