-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tuning Grounded Conversation Generation (GCG) Task #52
Comments
Hi @hungnh1125, I appreciate your interest in our work. Please note that during training, the global image encoder and grounding image encoder are kept frozen and the region encoder, projection layers (VL and L-P) and the pixel decoder are fully finetuned, while the LLM is LORA finetuned with The training instructions are provided in this readme. Please note that it took us around 20 hours to run GCG finetuning on 8 NVIDIA A100-40GB GPUs. I hope it answers your questions. Good Luck, and let me know if you have any questions. |
@mmaaz60 Thank you so much for replying to me. Thank you so much. |
Hi @hungnh1125, Thank you for your interest in our work. This is because we are using image at 336x336 resolution with 14 patch size. |
@mmaaz60 I checked and saw that the size of the image embedding is [576, 4096]. Could you help me explain why you didn't add the torch.zeros((mask.shape[0], 576)).bool().cuda() before mask? What is the meaning of adding only torch.zeros((mask.shape[0], 575)).bool().cuda() before mask and torch.zeros((mask.shape[0], 1)).bool().cuda() after mask? Thank you so much. |
Thank for your amazing work. I would like to reproduce the results from the task of Grounded Conversation Generation.
Could you please help me make the following information apparent?
The text was updated successfully, but these errors were encountered: