Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial mention matches get higher scores than full ones #123

Open
abhinavkulkarni opened this issue Oct 31, 2022 · 0 comments
Open

Partial mention matches get higher scores than full ones #123

abhinavkulkarni opened this issue Oct 31, 2022 · 0 comments

Comments

@abhinavkulkarni
Copy link

Hi,

One quirk I have observed is that partial mention matches seem to get scored higher by the ED than the full ones. The following example illustrates the point:

import requests
import spacy


nlp = spacy.load('en_core_web_trf')


API_URL = "http://0.0.0.0:5555"
text_doc = """In early September, in just 48 hours the UK got a new prime minister (Liz Truss) and a new king (Charles III, following the death of Queen Elizabeth II).

Both take over at a turbulent time in British politics, with no shortage of current and future challenges. To name just a few: a stagnant economy, sky-high energy prices, more Brexit fallout with the EU, and Scots demanding a fresh independence vote.

On GZERO World, Ian Bremmer speaks to former British PM Tony Blair (1997-2007), who believes there will be a lot of uncertainty over the next year or two if Truss insists on big tax cuts and big borrowing.

Blair also looks back at the queen's legacy and the future of the monarchy, explains why Brexit will hurt but probably not fragment the UK, and defends why we need to return to his comfort zone of the political center to fix today's problems.
"""

doc = nlp(text_doc)

spans = []
for ent in doc.ents:
    if ent.label_ == 'PERSON':
        span = (ent.start_char, len(ent.text))
        spans.append(span)

ed_result = requests.post(API_URL, json={
    "text": text_doc,
    "spans": spans
}).json()

for result in ed_result:
    print(result)

I get the following output:

[70, 9, 'Liz Truss', 'Liz_Truss', 0.3872783780234141, 0.0, 'NULL']
[97, 11, 'Charles III', 'Charles,_Prince_of_Wales', 0.3447332806264307, 0.0, 'NULL']
[139, 12, 'Elizabeth II', 'Elizabeth_II', 0.5253115976087314, 0.0, 'NULL']
[423, 11, 'Ian Bremmer', 'Ian_Bremmer', 0.3872783780234141, 0.0, 'NULL']
[463, 10, 'Tony Blair', 'Prime_Minister_of_the_United_Kingdom', 0.5042874506104144, 0.0, 'NULL']
[564, 5, 'Truss', 'Liz_Truss', 0.7929572470920269, 0.0, 'NULL']
[614, 5, 'Blair', 'Tony_Blair', 0.959426865481041, 0.0, 'NULL']

As can be seen, the mention Truss seems to get quite a high score compared to the full mention Liz Truss.

Thanks!

@abhinavkulkarni abhinavkulkarni changed the title Partical mention matches get higher scores than full ones Partial mention matches get higher scores than full ones Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant