Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Inference #17

Open
Dhan800 opened this issue Oct 11, 2023 · 2 comments
Open

Batch Inference #17

Dhan800 opened this issue Oct 11, 2023 · 2 comments

Comments

@Dhan800
Copy link

Dhan800 commented Oct 11, 2023

Thanks for your hard work. I tried to conduct batch inference but encountered some errors. My code looks like:
prompts = tokenizer(test_dataset, return_tensors='pt', padding=True, truncation=True)
gen_tokens = model.generate(
**prompts,
do_sample=False,
max_new_tokens=30,
)
gen_text = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)
The error message is about "reporting a bug to pytorch". I think the problems roots in "hidden_states.to(torch.float32)". I say in your evaluation code, there is only "inference_on_one". Can you provide more guidance?

Thank you for your time and consideration.

@Dhan800
Copy link
Author

Dhan800 commented Oct 11, 2023

Sorry for missing some critical information. I am using QLORA. Here is my configuration.
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
I have 1460 test samples. Without batch inference, it can take up to 46 minutes on 3090.

@SuperBruceJia
Copy link

Please check our codes for your reference:
https://github.com/vkola-lab/medpodgpt/blob/main/utils/eval_small_utils.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants