Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Una pregunta sobre la Sección 4.2 #12

Open
qiqigit opened this issue Sep 13, 2024 · 0 comments
Open

Una pregunta sobre la Sección 4.2 #12

qiqigit opened this issue Sep 13, 2024 · 0 comments

Comments

@qiqigit
Copy link

qiqigit commented Sep 13, 2024

@joanrod
Hola, Juan! Gracias por compartir este proyecto!

Tengo una pregunta sobre la Sección 4.2 de tu artículo.
No estoy seguro de qué "feature maps" se utilizan exactamente en el cálculo de OCR Perceptual loss.

Está escrito en la Sección 4.2 de la siguiente manera:
"... through the OCR model, and extract L feature maps from intermediate layers. Specifically, we store the activation map after each upsampling layer..."

However, in the code, it seems that the feature maps are extracted from the VGG16-BN part of the network, instead of the upsampling layers (which are in the UNet part of the network).
https://github.com/joanrod/ocr-vqgan/blob/68e36b568b59df275940296c164b1cf40585512b/taming/modules/losses/craft.py#L89

https://github.com/joanrod/ocr-vqgan/blob/68e36b568b59df275940296c164b1cf40585512b/taming/modules/losses/lpips.py#L28-L29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant