-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems of Parameter registration #7
Comments
I'll check it as soon as possible. Thank you. |
Emmm, I realized that register your parameter is not enough. (register_parameter recommended) Your need to call optim.add_param_group({"params": my_new_param}) as well to enable the training. Maybe there exist easier ways. Tell me if you have found one, Thx~ |
Thank you for your opinion. I'll solve this problem. |
Well, that is the simple but ugly way (laughter) Since the size of Anyway, fix h and w in init will solve this problem. By the way, may you provide your reimplement result on ImageNet? |
I know.. If we use a “relative=True”, we have to fix a output shape. If you know how to solve this problem, could I get some your idea? |
Sorry, I don't have an elegant way either. You can use register_parameter and optim.add_param_group like I said. This will enable the dynamic network definition. But it is still troublesome. |
Thank you for advise. I’ll try later. Thanks for the issue. I'll make changes to keep it going :) |
This is not mentioned in the paper, but I think we can get around the problem of variable input size by combining this with an adaptive pooling layer right before it. This way, even though we still have to fix H and W for this particular layer, we can feed into the network what ever input size we want. |
so... why the dim0 of key_rel_w/h should be 2*self.shape-1 instead of others? |
have you figure out this problem |
Attention-Augmented-Conv2d/attention_augmented_conv.py
Lines 95 to 99 in c04acfb
model.named_parameters()
as input. And theoptimizer.step()
andoptimizer.zero_grad()
will ignore yourkey_rel_w
andkey_rel_h
because they are not in themodel.named_parameters()
. [Through the gradients will be calculated normally when loss.backward() is called.]self.key_rel_w
andself.key_rel_h
instead.The text was updated successfully, but these errors were encountered: