Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why no sparql for each question ? #4

Open
WaNePr opened this issue Jul 23, 2019 · 23 comments
Open

Why no sparql for each question ? #4

WaNePr opened this issue Jul 23, 2019 · 23 comments

Comments

@WaNePr
Copy link

WaNePr commented Jul 23, 2019

Thanks for the dataset, but why there is no sparql query for each question ?

@kelvin-jiang
Copy link
Owner

Hi,

These results were extracted from a preprocessed version of Freebase, so we did not use SPARQL queries.

Kelvin

@WaNePr
Copy link
Author

WaNePr commented Jul 24, 2019

If no SPARQL queries is provided, how to use this dataset properly ?

@kelvin-jiang
Copy link
Owner

The dataset includes the Freebase MIDs that directly corresponds to entities in Freebase. Also, a subset of Freebase (preprocessed) has been linked in the README.

@WaNePr
Copy link
Author

WaNePr commented Jul 24, 2019

i see, but to simulate the execution of sparql query, shouldn't we know the query form, e.g. SELECT, ASK, SELECT DISTINCT etc. ?

@kelvin-jiang
Copy link
Owner

Unfortunately, I was not the one that worked on preprocessing Freebase, so I can't provide you with some sample SPARQL queries. Sorry about that.

@WaNePr
Copy link
Author

WaNePr commented Jul 24, 2019

I am wondering how to check the correct answer of question if only the variables and predicates are known but not the format.
Do you mind if you cound point out the person who responsible for the pre-processing or refer this question to him ?

Many thanks !

@kelvin-jiang
Copy link
Owner

I'm not sure what you mean by format, but the object node in each Freebase triple is always the answer to the question.

@WaNePr
Copy link
Author

WaNePr commented Jul 27, 2019

How do you evaluate the result ?

@WaNePr
Copy link
Author

WaNePr commented Jul 29, 2019

Is it possible to provide the pre-processing files and the evaluation scripts so that we know how to evaluate the result

@kelvin-jiang
Copy link
Owner

This dataset should be used to evaluate your KBQA results, there is no need to evaluate it. It was previously labelled and assessed by human annotators.

@WaNePr
Copy link
Author

WaNePr commented Jul 29, 2019

I do know that the dataset is labelled and assessed by human annotators.

My question is , I see that you provided a subset of Freebase (2.2GB zip) for evaluations, but in the subset, only the EntityMid is given but not the corresponding EntityName. Therefore i would like to know where i could find a NameFile for the subset, so that we could do the evaluation if we use this dataset ?
p.s. It could be generated in pre-processing, that's why i ask if you could share the pre-processing files.

Many Thanks.

@kelvin-jiang
Copy link
Owner

If you want to get from EntityMid to EntityName, use the type.object.name predicates within the Freebase subset. They map a Freebase MID to its name, and you can filter its tag (e.g. @en) to get language-specific names. An example in the subset: m.010gj6wc type.object.name "Prague"@en.

@WaNePr
Copy link
Author

WaNePr commented Jul 29, 2019

All EntityMid within the subset can be indexed this way ?

@kelvin-jiang
Copy link
Owner

Yes, should be, this is all straight from the original Freebase data dumps. Therefore, it may be possible for some unpopular Freebase MIDs to be missing labels.

@WaNePr
Copy link
Author

WaNePr commented Jul 29, 2019

But the 'unpopular Freebase MIDs without labels are not in the FreebaseQA dataset , am i right ?

@WaNePr WaNePr closed this as completed Jul 29, 2019
@WaNePr WaNePr reopened this Jul 29, 2019
@kelvin-jiang
Copy link
Owner

Theoretically yes, if they didn't have a label, our algorithm would not have been able to pick them up.

@WaNePr
Copy link
Author

WaNePr commented Aug 1, 2019

To evaluate the performance, one can either count the final answerMID or count the topicentitymid and the inferential chain as correct, which way you used ?

@kelvin-jiang
Copy link
Owner

The TopicEntityMid refers to the Freebase MID for some entity in the question, not the answer. Instead, you probably want to evaluate your model's performance with the final AnswersMid and AnswersName.

@WaNePr
Copy link
Author

WaNePr commented Aug 2, 2019

I assume that finding the 'answermid' is equivalent to finding 'topicentitymid+inferentialchain', Is it ?

@WaNePr
Copy link
Author

WaNePr commented Aug 2, 2019

Another question is Mediator Nodes are not some nodes " do not have a name or alias associated with it" as described in your paper, right? It has an actually name in the subset of freebase dataset.

@kelvin-jiang
Copy link
Owner

No, mediator nodes should not have names (like m.010gj6wc type.object.name "Prague"@en), even in the Freebase subset.

@WaNePr
Copy link
Author

WaNePr commented Aug 19, 2019

Is it common in the whole dataset for which we are not able to uniquely determine the answer to a question by querying freebase subset with the corresponding topicentitymid and the inferential chain?

For the example:
{
"Question-ID": "FreebaseQA-eval-31",
"RawQuestion": "Valencia was the venue for the 2007 and 2010 America's Cup, as the defending yacht was from which landlocked country?",
"ProcessedQuestion": "valencia was the venue for the 2007 and 2010 america's cup, as the defending yacht was from which landlocked country",
"Parses": [
{
"Parse-Id": "FreebaseQA-eval-31.P0",
"PotentialTopicEntityMention": "2010 america's cup",
"TopicEntityName": "2010 america's cup",
"TopicEntityMid": "m.03hh8pp",
"InferentialChain": "user.jamie.default_domain.yacht_racing.competition.competitor..user.jamie.default_domain.yacht_racing.competitor.country",
"Answers": [
{
"AnswersMid": "m.06mzp",
"AnswersName": [
"switzerland"
]
}
]
}
]
}
We can get two answers, m.06mzp (Switzerland), m.09c7w0 (United States) given the annotated topic entity m.03hh8pp and inferential chain user.jamie.default_domain.yacht_racing.competition.competitor..user.jamie.default_domain.yacht_racing.competitor.country. Without looking up the entity descriptions, one cannot narrow down the answer set, in this case, with the clue "landlocked country" in the question.

Another example:
{
"Question-ID": "FreebaseQA-eval-35",
"RawQuestion": "On the 2014 Winter Olympic Games who did the British men's curling team play in the final?",
"ProcessedQuestion": "on the 2014 winter olympic games who did the british men's curling team play in the final",
"Parses": [
{
"Parse-Id": "FreebaseQA-eval-35.P0",
"PotentialTopicEntityMention": "2014 winter olympic games",
"TopicEntityName": "2014 winter olympics",
"TopicEntityMid": "m.03mfdg",
"InferentialChain": "olympics.olympic_games.participating_countries",
"Answers": [
{
"AnswersMid": "m.0d060g",
"AnswersName": [
"canada"
]
}
]
}
]
}
How can we know the answer should be Canada given only the inferential chain olympics.olympic_games.participating_countries since many countries participated the olympic game in that year.

@kelvin-jiang
Copy link
Owner

These examples probably shouldn't have been included in the FreebaseQA data set since the inferential chains don't completely reflect the meaning behind the questions. The data set isn't perfect since labelling was done by human annotators as I've mentioned earlier, so sometimes some bad examples pop up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants