Skip to content

Design: Background

Jonathan Yu edited this page Feb 6, 2017 · 2 revisions

Background

The Data Provider Node ontology was developed as part of the eReefs project. A requirement in eReefs is to be able to discover Data Providers Node services and datasets and validating their conformance and availability to the overall eReefs information architecture. The eReefs information architecture was envisaged as shown in the figure below via a distributed system where several Data Provider Nodes (DPNs) would be orchestrated to facilitate access to data via agreed conventions. The experience in the past was that having a centralised information system limits the reuse of the overall system and that actually these nodes already exist, so leveraging them was required rather than changing current governance arrangements and practices.

eReefs info architecture

Rationale

The establishment or realisation of DPNs conceptually would allow for these DPNs to be reused in other contexts thus promoting maximal reuse of the information infrastructure as depicted in the figure below: DPNs

To meet this requirement, a catalogue of services and datasets is needed, but also lightweight metadata about the services, DPN, service interfaces, Dataset content, provenance, and physical endpoints.

Key enablers are tools that a DPN can use to provide appropriate metadata about their datasets as well as metadata to services to access those dataset and for DPNs to verify and validate whether their systems conform to the eReefs conventions (or any other conventions if/when required).

Initial approach

A format/model/schema that is machine-readable and semantically rich was defined so that tools can be used to automate the discovery of services and datasets. The DPN ontology was proposed which featured a lightweight approach to describe the services and datasets a data provider may host and publish as a mechanism for capturing the appropriate metadata for a DPN. The principle was to define an ontology with minimal ontological commitment as possible to allow maximal reuse for variety of service and dataset descriptions.

The DPN ontology currently defines a minimal set of classes required to describe a data provider node, its services/service interfaces and datasets. Dataset descriptions using just the above DPN-O is currently just a stub. This was to accomodate the varied way Datasets are described (e.g. DCAT, DCAT-AP, data.gov.au Dataset ontology, VOID).

DPN ontology

DPN-O only featured bare-bones classes such as Node, Service, Dataset, Catalog. However, for it to be useful, vocabularies and extension classes are required. These are achieved via add-on modules which extended DPN-O, such as the Semantic service descriptions module. The DPN semantic service descriptions (DPN-S) is used to verify and validate a DPN according to the eReefs conventions.

Meeting conventions such as the eReefs conventions can then be achieved as a DPN can then be integrated as part of the eReefs information architecture. Having nodes, services and datasets described via DPN-O enables tooling to be developed to perform automated discovery of services and datasets, as well as begin to do semantic brokering between user request for datasets and DPN datasets. See [Data Brokering].