Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"One another" and NumForm #123

Closed
amir-zeldes opened this issue Jan 19, 2021 · 11 comments
Closed

"One another" and NumForm #123

amir-zeldes opened this issue Jan 19, 2021 · 11 comments

Comments

@amir-zeldes
Copy link
Contributor

The current analysis of "one another" is:

one/CD/NUM <-det- another

This is fine, but I've just added NumForm to GUM, made "one" be NumForm=Word and I get a validation error:

Feature NumForm is not permitted with UPOS DET in language [en].

Possible solutions:

  1. Allow NumForm on DET - I'm not crazy about this because this warning actually caught a legitimate error elsewhere in the corpus
  2. Not use NumForm on this case - a bit annoying, since it's as much "one" as any other use of "one"
  3. Not tag "one" as CD - the PRP generic "one" doesn't get NumForm, since it's not considered a number, so this would be consistent with other uses of pronominal "one"
  4. Add some sort of mechanism exempting specific multiword expressions or subgraphs in the validator

I think I like 3 best, but any opinions? Adding @nschneid and @dan-zeman since it relates to the validator.

@dan-zeman
Copy link
Member

I like 3, too. But I find 2 as appealing. I don't think you would ever use Arabic or Roman numerals for "one" in "one another" (now disregarding social media where "4U" is used instead of "for you"), would you? In that case it is not "as much one as any other use of one".

As for 4, the validator can (since yesterday) check language-UPOS-feature-value combinations but not more than that. I would like to add more flexibility in defining language-specific rules in the future but it won't be soon.

@amir-zeldes
Copy link
Contributor Author

Thanks Dan - I think if we say that this word is xpos=CD/upos=NUM, then it should have NumForm like all other cases. If you wanted to know which uses of the number "one" are always spelled out in English, you could then discover that this is one of them. If by contrast we don't really think this is a number, then we should just retag it (but then I would like that to apply to EWT as well, which is why I open the issue in this repo).

For reference, PTB has this "one" four times as CD, and twice as NN (I guess by analogy to "anyone"?)

@dan-zeman
Copy link
Member

Yes. I think that it should either be tagged NUM CD NumType=Card (and allow all values of NumForm), or it should be a pronoun with all consequences for features and dependencies.

@amir-zeldes
Copy link
Contributor Author

Yes, I'm fine with either of these, but something needs to be changed either way:

  • If we choose NUM: deprel needs to be changed to nummod for one in "one another"
  • If we choose PRON: POS tags need to be changed for the same cases

I can do either, but would like consensus with EWT!

@amir-zeldes
Copy link
Contributor Author

Actually on second thought there is yet another option that I almost find better - change the deprel and dominance to fixed, maybe even with reflexive pronoun features on the first token. What do you think?

@dan-zeman
Copy link
Member

fixed does not sound bad but you (and the EWT team) still have to decide on the tags. I think I lean towards PRON now, but not reflexive.

@amir-zeldes
Copy link
Contributor Author

If I were doing this all over, I would probably choose PRON, but since PTB has it as CD (at least as a majority), that might motivate us to stick with NUM. Either way, if it's fixed, the individual constituent POS tags would not be very important IMO.

@nschneid
Copy link
Contributor

I would have assumed that "one" in "one another" is PRON. Note that there are other contexts where "one" can be pronominal meaning "someone", which in UD is PRON (versus NN in PTB). I.e. UD takes a broader view of pronouns than PTB.

As for the deprel, fixed sounds good to me.

Related: "each other"

@amir-zeldes
Copy link
Contributor Author

OK, I went with fixed for the now released GUM V7 - thanks everyone! I think it would be great if EWT adopts this analysis as well. For now I left the POS alone, since PTB already has CD and I don't like messing with existing standards (who knows how many people have already copied that). I also thought briefly what would happen if we have something like this Dickinson poem:

The Brain—is wider than the Sky—
For—put them side by side—
**The one the other** will contain
...

In this case it would be inconvenient for "one" to be PRON, since it has an article, but this is really the transparent source of the reflexive "one another". So maybe CD is not such a bad idea to begin with...

Leaving this open as a reminder to consider using fixed for EWT as well!

@nschneid
Copy link
Contributor

nschneid commented Jan 20, 2021

Oh boy, yes, "one" can have noun-like modifiers ("Do you want the tall one on the left or the short one on the right?") in this anaphoric sense. CGEL calls this a pro-nominal, as opposed to the personal pronoun where it refers to a generic individual, similar to "someone" ("One is forced to conclude that...").

But on the particular topic of this thread, CGEL (p. 428) considers "one another" and "each other" to be multiword reciprocal pronouns:

image

So this supports fixed, and I suppose it wouldn't hurt to tag both the words as PRON.

@nschneid
Copy link
Contributor

Documented for fixed in UniversalDependencies/docs@ff6acd6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants