rinna RoBERTa's max_length is 510 not 512? #3

masayakondo · 2021-09-02T01:50:48Z

Hi, I have been using rinna RoBERTa for a while now.
I have a question.
The max_length of rinna RoBERTa is 510 (not 512), right?
Is this the intended result? If this was the intended result, why did you use 510 instead of 512 for max_length?

rinna RoBERTa's padding_idx is 3 (not 1). So I think the starting position of position_embeddings is padding_idx+1=4 as in the following problem, but the size of position_embeddings in rinna RoBERTa is (514, 768). If I actually enter text with a length of 512, I get an index error.

RoBERTa and 514 facebookresearch/fairseq#1187

ZHAOTING · 2021-09-02T12:31:04Z

Hi @masayakondo, the maximum length of rinna/japanese-roberta-base is 514, which aligns with the first dimension of the position_embeddings.
padding_idx has nothing to do with the maximum length, since It only interacts with the word_embeddings.

So I believe there should not be any errors inputting a 512-length sequence of tokens. Could you please share the code that causes the error? Thanks!

masayakondo · 2021-09-02T13:21:29Z

Hi @ZHAOTING , thank you for your reply.
For example, when I ran the following code, I got an index error.

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained('rinna/japanese-roberta-base')

# sample sentence ( ▁ ドド・・・ド</s>）
input_ids = torch.tensor([9] + [100 for _ in range(510)] + [2]).unsqueeze(0)
print(input_ids.size()) # torch.Size([1, 512])
model(input_ids)
# IndexError: index out of range in self

In the case of RoBERTa, from the following code, I thought that padding_idx and position_embeddings, or padding_idx and sentence length, were related.

https://github.com/huggingface/transformers/blob/master/src/transformers/models/roberta/modeling_roberta.py#L154-L156

I am very sorry if my comment is misguided. Thanks!

ZHAOTING · 2021-09-02T13:59:44Z

You are correct about huggingface's roberta code! I didn't notice how they construct position_ids when it is not explicitly provided.
To be honest, I don't understand why they start with padding_idx instead of 0 when constructing position_ids, and I think it is wrong.

To properly use our model, please try constructing position_ids by yourself and using it as an argument along with input_ids. Hope it helps.

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained('rinna/japanese-roberta-base')

input_ids = torch.tensor([9] + [100 for _ in range(510)] + [2]).unsqueeze(0)

max_seq_len = input_ids.size(1)
position_ids = torch.LongTensor(list(range(0, max_seq_len))).unsqueeze(0)

output = model(input_ids, position_ids=position_ids)

masayakondo · 2021-09-02T14:29:04Z

To be honest, I don't understand why they start with padding_idx instead of 0 when constructing position_ids, and I think it is wrong.

Yeah, I think you're right, too...

To properly use our model, please try constructing position_ids by yourself and using it as an argument along with input_ids. Hope it helps.

Thanks you for the advice. I will refer to it. Thanks!

masayakondo closed this as completed Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rinna RoBERTa's max_length is 510 not 512? #3

rinna RoBERTa's max_length is 510 not 512? #3

masayakondo commented Sep 2, 2021

ZHAOTING commented Sep 2, 2021

masayakondo commented Sep 2, 2021

ZHAOTING commented Sep 2, 2021

masayakondo commented Sep 2, 2021

rinna RoBERTa's max_length is 510 not 512? #3

rinna RoBERTa's max_length is 510 not 512? #3

Comments

masayakondo commented Sep 2, 2021

ZHAOTING commented Sep 2, 2021

masayakondo commented Sep 2, 2021

ZHAOTING commented Sep 2, 2021

masayakondo commented Sep 2, 2021