* Desription

  • Need VoID descriptions, class and its the entity number ,property and its triple number,distinct subject and object number information
  • For triple patterns with bound variables which are not covered in the VOID statistics, the system send a SPARQL ASK query to pre-selected(after predicated and type selection) endpoints --- no limitation of predicate bound
  • Dynamic Programming and prefer bushy trees.

* source code: rdffederator - Revision 192: /trunk/src

 http://code.google.com/p/rdffederator/

* step:

  1. make the corresponding VoID files
  • method 1:

1.1 data download:

drugbank:  http://download.bio2rdf.org/release/3/drugbank/ drugbank.nq.gz 19-Dec-2014 13:23 52M

omim:  http://download.bio2rdf.org/release/3/omim/ omim.nq.gz 11-Nov-2014 03:50 124M

pharmgkb:  http://download.bio2rdf.org/release/3/pharmgkb/ diseases.nq.gz genes.nq.gz rsid.nq.gz drugs.nq.gz offsides.nq.gz twosides.nq.gz 01-Jun-2014 17:54 2.4G or so

1.2 ./generate_void.sh ../data/drugbank.nq drugbank_void.n3

    counting triples and properties
    counting types and entities
    counting distinct objects
    counting distinct subjects

drugbank: time taken: 91 seconds

omim: time taken: 411 seconds

  • method 2:

Made the void files with the statistics queries in the following page:  https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-Release-3-Summary-Statistics

2. execute: ./SPLENDID.sh SPLENDID-config.n3 query/query1.txt

For the VoID files with method 1,the following errors occur:

20:13:36 [WARN ] [Rio error] Expected ':', found '>' (465, -1) (ParseErrorLogger.java:28)
20:13:36 [WARN ] [Rio error] Namespace prefix 'LBSL' used but not defined (465,-1) (ParseErrorLogger.java:28)
20:13:36 [ERROR] [Rio fatal] Not a valid (absolute) URI: null (465, -1) (ParseErrorLogger.java:32)
20:13:36 [ERROR] can not parse VOID file file:/opt/services/fsearch/app/splendid/void/omim.n3: Not a valid (absolute) URI: null [line 465] (VoidStatistics.java:407)

check the corresponding void files, and deal with the lines as follows. we simply do the following changes:

void:property fusion> => void:property <fusion>

Result:

case 1: 5 results

Query\time(ms) time 1 time 2 time 3 time 4time 5 Avg
case1 6724 4763 4766 4765 47634764

case 2: error: Caused by: org.openrdf.query.QueryEvaluationException?: Virtuoso 37000 Error SP030: SPARQL compiler, line 2: syntax error at '"GLUCOCORTICOID-REMEDIABLE"' before '"i"'

case 3: part of results(216) returned and timeout occurred when processing request org.apache.commons.httpclient.HttpMethodDirector? executeWithRetry.

when use "limit 100" ,it took(ms): 348080,341991,339716,339089,340024 ,340205(avg)

case 4: timeout after returned 76 results

when use "limit 100" ,it took(ms) :822992,619742, case 5: timeout after returned 304 results

when use "limit 100" ,it took(ms) 387507,148426,130344,130165,130181, 134779(avg)

添付ファイル