Skip to content

Building an RDF browser using Elasticsearch, Kibi and node red

François Belleau edited this page Sep 15, 2017 · 10 revisions

Participants

François Belleau, 2010 Biohacker

Objective

For year we have been building data store exposed as linked data. One main problem about semantic web adoption by researchers is the lack of convivial software to analyse life science data in RDF. The goal of this project is to illustrate the potential of the popular data analysis tool Kibana and Elasticsearch.

In 2010, I had the privilege to participate to Tokyo Biohackathon, at the time the Bio2RDF Virtuoso triplestore was at its beginning, I was a fan of this emerging technology. My main contribution then was to promote semantic web and explains to biohacker its advantage over traditional approaches. OpenLinks Virtuoso software has been central to so many successful projects. At that time Uniprot, Ensembl, Pubchem, Reactome, GO did not have their own SPARQL endpoint, that was Bio2RDF mission then to have major database exposed as linked data. Seven years later, Linked data production store in life science is a reality and our community a been very effective to promote it. Despite these success, the need for a proper Linked data browser software designed for end user remains a obstacle for SW triplestore to be browsed.

Seven years later, now that so many semantic web project have gain maturity, (identifier.org, schema.org, JSON-LD) many initial problems have been successfully solved. Remember TBL's Tabulator sotware, it was a great idea but never evolve to a production software. We all used Virtuoso Facet Browser to explore and build our project, but frankly, it has never been a end user appropriate tool.

BioMart project (http://www.biomart.org/) have build a solid community of end user. It took the SRS community to the next level, with an open source software that had been adopted by major data provider (HGNC, MGI, Reactome, Ensembl, etc.). An attempt was made to create a SPARQL engine like interface over the data store, not a success. But this relational database technology project could not get close the agility of triplestore. Its major advantage was a really cool user interface that was the same over different web site. From my personal experience, there is not such an appropriate user interface yet to help end user adopt triplestore on a daily basis. This is still my goal.

As a data analyst, I use daily the ELK stack (Elasticsearch, Logstash, Kibana) to do my job. Here, I proposed to expose the potential of these open source software to illustrate that may be the Linked Data browser for life scientist may already be there.

Finally, I will use Kibi stack from the Siren company instead of Kibana, because of its build in plugins and relational capability.

Day 1, Install a KIBI + Elasticsearch (ES) and node-red server

The first thing to be done was installing the tools

Day 2, Upload DisGeNet associations and Wikidata Homo sapiens gene in ES

Data extracted with node-red from DisGeNet sparql endpoint and Wikidata LDF API.

Day 3, Upload GO from Ontobee and OMIM from Bio2RDF into ES

Data extracted with node-red from Bio2RDF and Ontobee SPARQL endpoints.

Day 4 Build Kibana user interface in Kibi

The demo web site

http://vps146209.vps.ovh.ca/goto/cec8f5d474926905c6b5ae2aaad5f3ba

The tabulated, text oriented and graphic user interface for the 4 database mashup.

Node-red workflow to copy Bio2RDF's OMIM into ES

Activating Kibi relational connection between DisGeNet index and Wikidata index based on ncbigene identifier, to find Paget disease related associations.

Corresponding documents are selected in the Wikidata index !

Clone this wiki locally