Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeValueMatcher not stripping datatype string hence not matching values correctly #4

Open
ziqizhang opened this issue May 4, 2017 · 0 comments

Comments

@ziqizhang
Copy link
Owner

TODO

(Thanks to Josef Janoušek from the Odalic project)

"The AttributeValueMatcher in the method score ( https://github.com/ziqizhang/sti/blob/master/sti-main/src/uk/ac/shef/dcs/sti/core/scorer/AttributeValueMatcher.java#L104 ) was not able to match the input cell value 694 (of datatype NUMBER) and the attribute which has the value "694"^^http://www.w3.org/2001/XMLSchema#positiveInteger - so in the DBpedia knowledge base the text representation of the value of the literal attribute contains also the data type (according to XML Schema) - as shown at https://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fp+%3Fo+where+{%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FA_Game_of_Thrones%3E+%3Fp+%3Fo}&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on . So because it was not matched, all attributes had the score 0.0 and relation was not discovered.
So I made changes in collected attributes used for matching ( https://github.com/ziqizhang/sti/blob/master/sti-main/src/uk/ac/shef/dcs/sti/core/algorithm/tmp/TColumnColumnRelationEnumerator.java#L65 ) - when the attribute value contains "^^", then I cut the datatype part of the string and set only the number (e.g. 694) as value of the attribute, and also I set the valueURI of the attribute to null, because otherwise the method classifyAttributeValueDataType of AttributeValueMatcher ( https://github.com/ziqizhang/sti/blob/master/sti-main/src/uk/ac/shef/dcs/sti/core/scorer/AttributeValueMatcher.java#L185 ) sets datatype to named_entity. After these changes the value of the attribute is just 694 and datatype is set to NUMBER, so the score method of AttributeValueMatcher is able to match it with the input cell value and the relation is discovered."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant