Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iobj/obj (with ccomp and xcomp) consistency in matrix verb annotations #55

Closed
tzshi opened this issue Jan 29, 2018 · 43 comments
Closed
Milestone

Comments

@tzshi
Copy link

tzshi commented Jan 29, 2018

Is there some annotation guideline that I missed, or is this indeed an annotation consistency issue about the object of the following cases (ECM/control verbs)?

While most of the sentences annotate the object with obj, a few are annotated with iobj, such as:

  1. 15->16 is annotated with iobj
# sent_id = email-enronsent30_01-0033
# text = Juan communicated some numbers to me and when reviewing this request would like to ask you to consider the following:
...
15	ask	ask	VERB	VB	VerbForm=Inf	13	xcomp	13:xcomp	_
16	you	you	PRON	PRP	Case=Acc|Person=2|PronType=Prs	15	iobj	15:iobj	_
17	to	to	PART	TO	_	18	mark	18:mark	_
18	consider	consider	VERB	VB	VerbForm=Inf	15	xcomp	15:xcomp	_
19	the	the	DET	DT	Definite=Def|PronType=Art	18	obj	18:obj	_
20	following	follow	VERB	VBG	VerbForm=Ger	19	amod	19:amod	SpaceAfter=No
21	:	:	PUNCT	:	_	2	punct	2:punct	_
  1. 4->5 annotated with iobj
# sent_id = email-enronsent27_01-0058
# text = PS Your brother told me he went to 3 bowl games (when I found out that two of them were the galleryfurniture.com bowl and that one in Shreveport (I can't remember the name of it)) I realized he is a very, very sick college football fan.
...
4	told	tell	VERB	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	0	root	0:root	_
5	me	I	PRON	PRP	Case=Acc|Number=Sing|Person=1|PronType=Prs	4	iobj	4:iobj	_
6	he	he	PRON	PRP	Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs	7	nsubj	7:nsubj	_
7	went	go	VERB	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	4	ccomp	4:ccomp	_
8	to	to	ADP	IN	_	11	case	11:case	_
9	3	3	NUM	CD	NumType=Card	11	nummod	11:nummod	_
10	bowl	bowl	NOUN	NN	Number=Sing	11	compound	11:compound	_
11	games	game	NOUN	NNS	Number=Plur	7	obl	7:obl	_
...

And the same time I can find (more) instances with verbs like tell and ask where the objects are annotated with obj.

There are more of these kinds, and can be found through Grew, using the following pattern:

pattern { 
    N1 [cat="VERB"];
    N1 -[iobj]-> N3;
    N1 -[xcomp]-> N2;
    N1 << N3;
    N3 << N2;
}

iobj X ccomp gives 43 occurrences,
obj X ccomp give 138,
iobj X xcomp give 20,
obj X xcomp gives 903.

@nschneid
Copy link
Contributor

My understanding is that the 2 examples you highlight are correct: iobj should be used for the non-case-marked recipient of verbs like ask and tell, even if there's no obj in the sentence (there may be a clausal complement).

Looking at a few obj + xcomp examples, several are verbs like make, let, and keep where the xcomp is a secondary predicate. I believe these are correct.

Looking at a few obj + ccomp examples, several are instances of ask, tell, etc. These indeed look like errors: I believe they should be changed to iobj. But there are some other verbs used in constructions that also match this pattern where obj looks correct.

@jnivre
Copy link

jnivre commented Jan 30, 2018

I know that the guidelines say that iobj should be used in these cases, but I honestly now think that this was a mistake. All the usual tests indicate that these are direct objects, and their treatment as indirect objects also complicate the analysis of control for the enhanced dependencies. In fact, the original source of the problem is the choice to treat "her" as the indirect object in "he gave her a book". This is essentially a semantic analysis, identifying the indirect object with the recipient role, a type of analysis that UD otherwise rejects. All the syntactic criteria again point to "her" being the direct object and "a book" being what typologists call a secondary object (not an indirect object). UD probably does not want to introduce a special relation for secondary objects, but with hindsight it would have been much better to treat "her" as obj and "a book" as iobj, because it would not have propagated errors to other constructions. I am not sure whether this is something that can be considered as a "correction" of v2 guidelines or if we have to wait for v3.

@dan-zeman
Copy link
Member

It is English-specific, therefore I think it would not be a change of the universal v2 guidelines and it can be corrected now.

But is "a book" really "less core" than "her" in "he gave her a book"? Aren't both the objects "equally core"? If they are, shouldn't both be labeled obj?

@jnivre
Copy link

jnivre commented Jan 30, 2018

In most dialects of English, only "her" can be passivised, for example:

she was given a book
*a book was given her

Compare this to:

a book was given to her
*she was given a book to

So it seems that the two "double object" constructions are really quite similar to alternations like "load the truck with hay" vs. "load hay on the truck", where only one thing can be the direct object at a time.

@sylvainkahane
Copy link

I agree with Joakim. Same thing for relative clauses I think:

the girl I gave the book
*the book I gave her

This analysis was defended in a Bresnan 1981's paper.

@amir-zeldes
Copy link
Contributor

amir-zeldes commented Jan 30, 2018

I don't think we can completely take the semantics out of it, or if we do, we might need to revise a lot of things to be more 'surfacy'. For example xcomp for verbs of becoming, which in stronger cased Germanic languages take what looks like a nominative:

  • I became a teacher - xcomp(became,teacher)
  • I became him (note the non-nominative appearance of the pronoun, *I became he)

In this example, I can't see an overt difference to a normal object, notwithstanding the non-passivizability. If that were to be an argument, we could switch to transitive causative xcomp with 'they made him a teacher' (and 'he was made a teacher'). If we don't use semantics, why should that be xcomp and not double obj?

To be clear, I'm not advocating giving up xcomp here, and I do understand the semantic motivation, as well as the analogy to other Germanic languages:

  • Ich bin ein Leherer geworden (nominative, cf. *einen Lehrer, accusative)

On the other hand, I think the predicate of 'become' is very much a core argument, but is not the same thing as obj, so we should allow some degree of semantic reasoning in deciding what we consider 'the same'.

But maybe a better argument not to make recipients obj is that we generally have this information in the English treebanks, so deleting it would be a big loss! I think it can also be useful when comparing with languages that have more overt indirect object marking.

@sylvainkahane
Copy link

Dative shift is a regular alternation/redistribution. It seems reasonnable to use the same strategy as for passive or causative. In I gave her a book, we want to say that her is a true object and that it corresponds to the dative of the alternative construction (I gave a book to her). So a label such as obj:datshift could be an option.

@sylvainkahane
Copy link

I would like to react to my own proposition. Such a proposition presupposes that the we have identified a base construction among the two constructions in alternation and that we can say that one is a redistribution of the other. I'm not certain that the notion of base construction is correct and that a base construction can be identified. For the active and passive voices it is quite clear because one construction is simpler and more frequent than the other, but for the dative shift I have no intuition (maybe because I'm not a native speaker). Of course if we consider that I gave her a book is the base construction, the annotation changes. Maybe we should adopt a neutral annotation that does not presuppose any direction in the alternation.

@amir-zeldes
Copy link
Contributor

I'm not sure I like the idea of 'base construction'... A lot of constructionist and psycholinguistic work has shown that that concept is really tied to specific prototypical cases, whereas in other cases the supposedly basic variant is not basic at all. A good example in English is X is based on Y, which looks like a passive of Y bases on X. Although the latter is possible, the former is much more likely, and it would seem to be stored as the 'normal' form of that construction for speakers.

But linguistic evidence aside, I would like to avoid verbs having two things called obj without an ability to distinguish which one corresponds to which argument structure slot. I realize we already do this when there are multiple obl, but those can often be disambiguated by the preposition they govern via case (or in an enhanced representation). For me, iobj currently does the job nicely, so I don't feel a need to change it. But maybe that's just related to the applications I use the data for...

@jnivre
Copy link

jnivre commented Jan 31, 2018

I don't think you need to assume a base construction to argue that the recipient should be obj in the double object construction. I also want to avoid having two obj, so the proposal would be to swap obj and iobj in the double object construction, because this is more consistent with the overall UD philosophy and makes the right prediction about things like passivisation. It also eliminates the anomaly of having iobj without obj in sentences like "she told him that ...", which otherwise appears to be an ad hoc exception.

@amir-zeldes
Copy link
Contributor

I agree the base construction discussion can be put the side, but I'm not so happy about having the recipient be the obj, because in a transfer of possession verb without a recipient, the theme would again be the object:

I gave everything!
obj(gave,everything)

This means that we can no longer rely on theme=obj, recipient=iobj, which is exactly the distinction that I care about for most applications. I'm sure others may see this differently, but from my point of view, this would be breaking something that doesn't need fixing :|

@nschneid
Copy link
Contributor

nschneid commented Jan 31, 2018

To echo @amir-zeldes, I think the current policy is straightforward to apply because you have parallelism in what argument iobj applies to:

  • She told me_iobj.
  • She told me_iobj the bad news_obj.
  • She told me_iobj of the bad news_obl.
  • She told me_iobj that something happened_ccomp.
  • She told me_iobj to leave_xcomp.
  • She told the news_obj to me_obl.

@jnivre
Copy link

jnivre commented Jan 31, 2018

Yes, this is a typical conflict between clauses 1-2 (sound for linguistic analysis and typology) and clauses 5-6 (understandable to non-linguists and useful for downstream NLP) in Manning's Law. I happily admit that I have a bias for the former. :)

@jnivre
Copy link

jnivre commented Jan 31, 2018

On a more serious note, the problem with assuming a one-to-one mapping between grammatical relations and thematic roles is that it will fail in many other cases. For example:

(1) they loaded the truck with hay
(2) they loaded hay on the track

In (1) obj maps to goal or location, in (2) it maps to theme.

(3) the window broke
(4) he broke the window

In (3) nsubj maps to theme, in (4) it maps to agent.

If you really want to derive a semantic representation, you have to do linking properly. Assuming a consistent mapping is just wishful thinking, and therefore we might as well do proper syntax instead. :)

@gossebouma
Copy link

I was going over the Dutch data for cases with obj/iobj and ccomp/xcomp and it turns out we have quite a few verbs where both annotations occur. The criterion I think is whether the NP argument can become the subject in a passive or not. So in most cases, we should be able to decide between obj and iobj per predicate.
However, there are also a few cases where the data goes both ways, the verb 'vragen' (to ask) being a case in point. We find both

Ik (nom) werd gevraagd om te komen
I was asked to com
Mij (non-nom) werd gevraagd te komen
Aan (to) mij werd gevraagd te komen

and similar situations where the nominal argument does or does not agree with the finite auxiliary.

Grammar purists tend to point out that only the non-nominative/ non-agreeing cases are correct here, but clearly actual usage does not always obey this rule.

@sylvainkahane
Copy link

@jnivre It is also a conflict between syntax and semantics. As you recalled, a purely syntactic annotation would have decided to encode the complement that can be passived and extracted as the obj.

But as remarked by @amir-zeldes, if we do that we lost the link between the two possible constructions of give. So the questions are:

• Do we want to keep this link? Is syntax concerned by this link? Or is it essentially semantic and should it be kept at the enhanced level?

• If we want to keep it, how can we proceed?

For the active-passive alternation, UD scheme has decided to keep the link between the two constructions. The way it is done presupposes that the active construction is the base construction or at least the default construction (which is reasonnable for many languages).

@dseddah
Copy link

dseddah commented Feb 1, 2018

Hi all,

one possible solution could be to encode both the canonical (the "deep" structure) and the final (the surface) realizations as it was proposed, following many other works, in our deep syntax proposals (Candito et al, 2014, [1] for the native scheme, Candito et al (2017) for the Enhanced-like UD one [2], see [3] for Marie's depling slides).

Sylvain@ I don't think the enhanced-* scheme focuses on Semantic, from what I understood (and discussions are still ongoing anyway), it's more about having complete syntactic structures, namely all core argument relations being represented by an actual edge. Details varie of course :)

[1] http://www.lrec-conf.org/proceedings/lrec2014/pdf/494_Paper.pdf
[2] http://aclweb.org/anthology/W/W17/W17-6507.pdf
[3] http://www.linguist.univ-paris-diderot.fr/~mcandito/Publications/depling17-slides.pdf

@amir-zeldes
Copy link
Contributor

@jnivre and @sylvainkahane I think it is no coincidence that distinct labels have emerged for both passives and ditransitives in particular. These are precisely the constructions which in English do not have overt adpositional markers, and beyond English, I think this is typologically frequently(ish?) the case as well.

The reason why the spray/load alternations are less worrisome to concerns such as Manning's clauses 5-6 is IMO the fact that the combination of predicate lemma and preposition can nicely disambiguate which argument is which. For true double objects, and especially if you have word order variation (e.g. 'give me it' next to, in some varieties/languages, 'give it me'), the label becomes rather crucial.

Since what is being discussed, at least for English, is giving up a useful existing distinction, I feel obliged to object. On the other hand if we're just thinking of renaming things (e.g. using obj:iobj or something), then that is less crucial of course.

@dan-zeman
Copy link
Member

@amir-zeldes Yes, I think it should be about renaming things. I think it would be more appropriate to use a subtype like obj:rcpt, that overtly admits that this is primarily about the semantic role. (In other languages, it will be much clearer that the theme is less core than the recipient, so one may again ask what exactly :iobj means.) I agree with you that useful information should not be lost. This is why we now use obl:arg in Czech because we do not want to lose usefule distinction between arguments and adjuncts, which is not supported at the universal level of UD relations.

@amir-zeldes
Copy link
Contributor

@dan-zeman Sorry for the slow reply, back from NAACL now: yes, I understand, but if it's really just renaming, and we do recommend for languages to make this distinction, I'd just as soon not rename it and stay with iobj. I think we should prioritize stability if possible and only rename things if we really have to.

It also sounds like we need an in person/skype meeting to really work out the oblique/adverbial clause issue. Maybe in conjunction with UDW, if it's not too late for everyone?

@dan-zeman
Copy link
Member

Better late than never :-) UDW should work.

@nschneid nschneid added this to the v2.12 milestone Dec 5, 2022
amir-zeldes added a commit that referenced this issue Apr 27, 2023
  * Related issues: #55 #282
  * See also UniversalDependencies/docs#916
  * @nschneid please take a look and feel free to change!
@nschneid
Copy link
Contributor

nschneid commented Apr 27, 2023

Thanks!

  • A few precision errors due to verbs like "(re)assure", "advise", and "inform". I think it's best to monitor the list of iobj verbs and weed out false positives.
  • Do the above changes handle control cases where the object is sister to xcomp (tell us to VP)? https://universal.grew.fr/?custom=6449c8c3a4878 (keeping in mind there should be an E:nsubj:xsubj from the embedded verb)

(Note: changes haven't propagated to the Grew-match server yet. Sometimes takes about an hour.)

@amir-zeldes
Copy link
Contributor

changes haven't propagated

OK, note I only changed the split up files so far, so the big files are unchanged

precision errors due to verbs like "(re)assure", "advise", and "inform"

I included those verbs based on the ccomp variant:

  • I informed/advised/(re)assured you/iobj that.../ccomp

Is that wrong?

Do the above changes handle control cases where the object is sister to xcomp

Since recall is based on a parser's predictions, it's easy to believe many such cases would be missed. If we want to include those though, where in practice there is only a clausal motivation for the ditransitive reading, then I think it should definitely apply to inform/advise etc. (and some can appear in both constructions, e.g. I advise you to go)

@nschneid
Copy link
Contributor

Oh you're right—I forgot that we are including verbs that license two objects OR object+ccomp.

How about a Depedit rule: obj(Y=tell/ask/..., X) & xcomp(Y, Z) -> iobj(Y,X) & E:iobj(Y,X) & E:nsubj:xsubj(Z, X)? The list of relevant verbs can be constructed from https://universal.grew.fr/?custom=644aaaed97bac.

(I was going to say that a different enhanced dependency is needed for "promise X to Y" as "promise" is a subject control verb, but I don't see this anywhere in the data!)

@amir-zeldes
Copy link
Contributor

This all sounds good, but I'm confused about the edeps - I already did E:iobj in the commit, and I think E:nsubj:xsubj should already be in the data, since those edges already applied when it was plain obj. Or am I missing something?

@nschneid
Copy link
Contributor

nschneid commented Apr 27, 2023

You only did this for objects that the parser labeled as iobj, right? I'm suggesting to apply the rule to catch instances the parser may have missed.

You're right that E:nsubj:xsubj is there for current obj+xcomp except for a few cases due to relative clauses (edit: these are correct—the obj is the relative pronoun and the xsubj is the antecedent). There are 20 instances with iobj where the E:nsubj:xsubj is missing.

(BTW, build.py will produce the 3 main .conllu files from source docs. I just pushed.)

@nschneid
Copy link
Contributor

nschneid commented Apr 29, 2023

Of the verbs that license iobj+ccomp:

TODO: others, like "cost", that license iobj+obj or just iobj.

@nschneid
Copy link
Contributor

I must admit I'm having some qualms about iobj for "allow" and "permit". While the double object construction is possible ("I will allow/permit you 3 cookies") it is rare, and often the verb is sufficiently abstract that the iobj would be a nonvolitional entity where no possession is implied ("These measures will allow the economy to grow at a healthy rate / ?These measures will allow the economy a healthy rate of growth").

@nschneid
Copy link
Contributor

nschneid commented Apr 29, 2023

I am guessing raising should not trigger iobj in cases like:

  • newly released papers, showing him complicit in the airliner bombing
    • has nothing to do with showing something to him
  • they believe that to be true

These are just obj+xcomp right (even though the verbs do have a sense with iobj)?

@nschneid
Copy link
Contributor

@amir-zeldes thoughts on "allow" and "permit" (see above)?

@amir-zeldes
Copy link
Contributor

? I could have sworn I commented on allow somewhere but can't see the comment now... Yes, I'm on board with allow etc., because of "allow/permit me the honor of..."

@nschneid
Copy link
Contributor

nschneid commented Apr 30, 2023

So TBC, "allow/permit X to Y" should always be iobj? In GUM it's currently obj half the time.

@amir-zeldes
Copy link
Contributor

It definitely shouldn't be half and half... I will fix it one way or the other, but remind me - are we definitely sure we want this for xcomp? At first I thought maybe this construction should have its own status and be left as always obj, because it never alternates with a prepositional dative, and it is not considered a violation of double object to have obj + xcomp.

On the other hand, I see the case for making the same distinction here as with ccomp, and in "ask him to do" we have the same lexical entry for "ask" as in "ask him that he go ...". What's more, in dative-marking languages we do see the case distinction, so cross-linguistically this would be nice and consistent:

  • She made me do it : sie hat mich.ACC gezwungen, es zu machen
  • She advised me to do it : sie hat mir.DAT geraten, es zu machen

So I guess in sum I would say yeah, I would be OK with using iobj for allow/permit in English based on the alternation behavior with non-infinitival complements. What do you think?

@nschneid
Copy link
Contributor

nschneid commented May 1, 2023

I think the underlying principle is that removing a complement (whether obj, ccomp, or xcomp) should not change the deprel of the remaining object if the meaning doesn't change. The test for iobj is whether the verb could combine it with obj or ccomp. Whether an xcomp could be present is irrelevant to the obj vs. iobj distinction. (Otherwise, we would end up with "ask him/iobj (a question/obj)" but "ask him/obj to leave/xcomp", as you point out, so the deprel for the first object would not be invariant to dropping the xcomp.)

The reason this policy feels like a stretch for "allow" and "permit" is that these USUALLY do not occur with two objects or object+ccomp. They feel like raising verbs: "we allowed him to leave" = allow(we, leave(he)) as opposed to the control interpretation allow(we, he, leave(he)). But they have an infrequent double object usage: "we'll allow him this indulgence" = allow(we, he, indulgence).

I think we need an exception for raising interpretations anyway, due to cases like

  • newly released papers, showing him complicit in the airliner bombing
    • has nothing to do with showing something to him

But I'm not sure that allows ellipsis of the xcomp:

  • *There was a question of whether he was complicit in the bombing, and in fact, the evidence showed him. (= showed him so, showed him complicit)
    • Compare: He wanted to leave, and we allowed him.

@amir-zeldes
Copy link
Contributor

Hm, OK - I think the raising argument doesn't hold for allow/permit on a formal level (you don't get the actual 'raising' behavior with expletives like with "seem" or "happen"), so I guess our guidelines force our hand here. If you really don't like it on permit/allow, then I think the only way to exempt them is by arguing that the ditransitive variants are archaic, and maintain a list of exempt verbs that these should go on. These are edge cases so I don't care too deeply which side they land on, but we should be consistent, I can implement it either way.

@nschneid
Copy link
Contributor

nschneid commented May 1, 2023

I wouldn't call them archaic. "I'll allow you three wishes" is perfectly fine.

I suppose we should just go with iobj, even though semantically it can stretch beyond typical characteristics like animacy when coupled with an infinitival xcomp. (It can even be an event: "We'll allow the meeting to be scheduled on Saturday.")

@amir-zeldes
Copy link
Contributor

We'll allow the meeting to be scheduled on Saturday

That's completely fine by me! None of this is meant to capture semantic classes IMO - just the opposite, the corpus can now allow us to find non-person entities which function as indirect objects. There are plenty of examples in the data, also with 'give':

  • it is reasonable to give its views greater weight
  • gave it a quick tug

So I'll go with iobj for allow/permit then.

@amir-zeldes
Copy link
Contributor

I guess this applies to cause + xcomp too, so these should be fixed:

https://universal.grew.fr/?custom=644ff8215ebdc

@nschneid
Copy link
Contributor

nschneid commented May 1, 2023

Although I guess we can't assign all of them iobj because it's a situation like "tell"—the sole object could be either obj or iobj:

  • I'll allow you/iobj three wishes/obj.
  • I'll allow you/iobj to make three wishes.
  • I'll allow three wishes/obj.
  • I'll allow three wishes/obj to be made.
  • (Do you want to make some wishes?) I'll allow you/iobj.

@nschneid
Copy link
Contributor

nschneid commented May 1, 2023

With allow/prevent/cause and one object, should the criterion be that affected entities are iobj, and everything else is obj?

  • a change in the climate caused the skies to cloud over
  • back legs were pushed in causing the dresser to lean into the wall

These are not animate, but they are affectees, and could occur in a double object like "cause the dresser damage". So I guess iobj. But in "I'll allow three wishes", "three wishes" is an event.

It could be ambiguous whether an entity is an affectee of causing/allowing or not:

  • The airline allows one pet.
    • Interpretation where the pet is given permission to board: iobj - cf. The airline allows one pet boarding privileges
    • Interpretation where somebody is allowed to bring one pet: obj - cf. The airline allows passengers one pet

@amir-zeldes
Copy link
Contributor

Although I guess we can't assign all of them iobj because it's a situation like "tell"—the sole object could be either obj or iobj

Absolutely, I'm running a script on GUM based on entity type and confirming changes one by one

should the criterion be that affected entities are iobj

I think you can use that as a first heuristic, but no, I don't think that's the criterion. The real criterion is just 'which slot does it occupy in the ditransitive version':

  • a change in the climate caused the skies to cloud over
    • iobj, because it's the first object in the transformed version "The changed caused the sky clouding over" (admittedly awkward, but a CFG allowing "caused the sky harm" should accept it)

onversely "a change in the climate caused clouding to occur in the sky" is obj, under the same interpretation.

It could be ambiguous whether an entity is an affectee of causing/allowing or not

Right, and again I would expect them to show up in the corresponding alternation slots in paraphrases.

amir-zeldes added a commit to amir-zeldes/gum that referenced this issue May 1, 2023
amir-zeldes added a commit to amir-zeldes/gum that referenced this issue May 1, 2023
@nschneid
Copy link
Contributor

nschneid commented May 1, 2023

@nschneid
Copy link
Contributor

nschneid commented Oct 28, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants