Skip to content

Virtuoso Freebase Setup

sameersingh edited this page Sep 23, 2014 · 5 revisions

Virtuoso Freebase Setup

Adding some notes here to keep track of how Virtuoso Freebase was setup, and how to query it using SPARQL.

Creating the dump

  1. Install Virtuoso Open-source on Ubuntu using sudo aptitude install vituoso-opensource
  2. Ensure /var/lib/virtuoso-opensource-6.1/db is linked to HDD with a lot of space.
  3. Get the freebase dump using wget http://download.freebaseapps.com/ into the db folder
  4. Gunzip it (requires ~330G): gunzip freebase-rdf-*.gz
  5. Load RDF triples into virtuoso:
    • isql-vt 1111
    • Register load request: SQL> ld_dir('.', 'freebase-rdf-*', 'http://freebase.com');
    • To see if the request registered: SQL> select * from DB.DBA.load_list;
    • SQL> rdf_loader_run();
    • In another isql-vt window: SQL> SPARQL SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2;

Resources

Querying Freebase Using SPARQL

Based on https://groups.google.com/forum/#!topic/sindicetech-freebase/93PBGJBnnIU.

Basic steps for each query:

  1. Point browser to http://localhost:8890/sparql.
  2. Ensure query produces expected output (use http://freebase.com as the Graph IRI)
  3. Run same query using curl with limits off, TSV format, etc.

Queries

For the complete reference, see Freebase types and relations, and Virtuoso SPARQL service.

Get number of triples in the DB.

SELECT COUNT(*) { 
  ?s ?p ?o
} 

Get all relations of a mention.

PREFIX ns: <http://rdf.freebase.com/ns/>
select * where {
   ns:m.014zcr ?p ?o
} 
LIMIT 10

For multi-hop relations, one would do:

PREFIX ns: <http://rdf.freebase.com/ns/>
select * where {
  ns:m.014zcr ns:film.actor.film ?film_performance .
  ?film_performance ns:film.performance.film ?film .
  ?film ns:type.object.name ?name .
  ?film ns:film.film.initial_release_date ?initial_release_date .
  FILTER(lang(?name) = 'en')
} 
LIMIT 1