Skip to content

kuhumcst/DanNet

Repository files navigation

DanNet logo

DanNet is a WordNet for the Danish language. The goal of this project is to represent DanNet in full using RDF as its native representation at both the database level, in the application space, and as its primary serialisation format.

Compatibility

Special care has been taken to maximise the compatibility of this iteration of DanNet. Like the DanNet of yore, the base dataset is published as both RDF (Turtle) and CSV. RDF is the native representation and can be loaded as-is inside a suitable RDF graph database, e.g. Apache Jena. The CSV files are now published along with column metadata as CSVW.

Companion datasets

Apart from the base DanNet dataset, several companion datasets exist expanding the graph with additional data. The companion datasets collectively provide a broader view of the data with both implicit and explicit links to other data:

  • The COR companion dataset links DanNet resources to IDs from the COR project.
  • The DDS companion dataset decorates DanNet resources with sentiment data.
  • The OEWN extension companion dataset provides DanNet-like labels for the Open English WordNet to better facilitate browsing the connections between the two datasets.

The current version of the datasets can be downloaded on wordnet.dk/dannet. All of the releases from 2023 and onwards are available as releases on this project page.

Inferred data

Additional data is also implicitly inferred from the base dataset, the aforementioned companion datasets, and any associated ontological metadata. These inferred data points can be browsed along with the rest of the data on the official DanNet website.

Inferring data so can be both computationally expensive and mentally taxing for the consumer of the data, so we do not always publish the fully inferred graph in a DanNet release; when we do, those release will be specifically marked as containing this extra data.

Standards-based

The old DanNet was modelled as tables inside a relational database. Two serialised representations also exist: RDF/XML 1.0 and a custom CSV format. The latter served as input for the new data model, remapping the relations described in these files onto a modern WordNet based on the Ontolex-lemon standard combined with various relations defined by the Global Wordnet Association as used in the official GWA RDF standard.

In Ontolex-lemon...

  • Synsets are analogous to ontolex:LexicalConcept.
  • Word senses are analogous to ontolex:LexicalSense.
  • Words are analogous to ontolex:LexicalEntry.
  • Forms are analogous to ontolex:Form.

alt text

By building DanNet according to these standards we maximise its ability to integrate with other lexical resources, in particular with other WordNets.

Significant changes

New schema, prefixes, URIs

DanNet uses a new schema, available in this repository and also at https://wordnet.dk/dannet/schema.

DanNet uses the following URI prefixes for the dataset instances, concepts (members of a dns:ontologicalType) and the schema itself:

NOTE: these new prefixes/URIs take over from the ones used for DanNet 2.2 (the last version before the 2023 re-release):

All the new URIs resolve to HTTP resources, which is to say that accessing a resource with a GET request (e.g. through a web browser) returns data for the resource (or schema) in question.

Finally, the new DanNet schema is written in accordance with the RDF conventions listed by Philippe Martin.

Implementation