* Desription * Need VoID descriptions, class and its the entity number ,property and its triple number,distinct subject and object number information * For triple patterns with bound variables which are not covered in the VOID statistics, the system send a SPARQL ASK query to pre-selected(after predicated and type selection) endpoints --- no limitation of predicate bound * Dynamic Programming and prefer bushy trees. * source code: rdffederator - Revision 192: /trunk/src http://code.google.com/p/rdffederator/ * step: 1. make the corresponding VoID files * method 1: 1.1 data download: drugbank: http://download.bio2rdf.org/release/3/drugbank/ drugbank.nq.gz 19-Dec-2014 13:23 52M omim: http://download.bio2rdf.org/release/3/omim/ omim.nq.gz 11-Nov-2014 03:50 124M pharmgkb: http://download.bio2rdf.org/release/3/pharmgkb/ diseases.nq.gz genes.nq.gz rsid.nq.gz drugs.nq.gz offsides.nq.gz twosides.nq.gz 01-Jun-2014 17:54 2.4G or so 1.2 ./generate_void.sh ../data/drugbank.nq drugbank_void.n3 {{{ counting triples and properties counting types and entities counting distinct objects counting distinct subjects }}} drugbank: time taken: 91 seconds omim: time taken: 411 seconds * method 2: Made the void files with the statistics queries in the following page: [https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-Release-3-Summary-Statistics] 2. execute: ./SPLENDID.sh SPLENDID-config.n3 query/query1.txt For the VoID files with method 1,the following errors occur: {{{ 20:13:36 [WARN ] [Rio error] Expected ':', found '>' (465, -1) (ParseErrorLogger.java:28) 20:13:36 [WARN ] [Rio error] Namespace prefix 'LBSL' used but not defined (465,-1) (ParseErrorLogger.java:28) 20:13:36 [ERROR] [Rio fatal] Not a valid (absolute) URI: null (465, -1) (ParseErrorLogger.java:32) 20:13:36 [ERROR] can not parse VOID file file:/opt/services/fsearch/app/splendid/void/omim.n3: Not a valid (absolute) URI: null [line 465] (VoidStatistics.java:407) }}} check the corresponding void files, and deal with the lines as follows. we simply do the following changes: void:property fusion> => void:property Result: case 1: 5 results ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||6724 ||4763 ||4766 ||4765|| 4763|| case 2: error: Caused by: org.openrdf.query.QueryEvaluationException: Virtuoso 37000 Error SP030: SPARQL compiler, line 2: syntax error at '"GLUCOCORTICOID-REMEDIABLE"' before '"i"' case 3: part of results(216) returned and timeout occurred when processing request org.apache.commons.httpclient.HttpMethodDirector executeWithRetry. when use "limit 100" ,it took(ms): 348080,341991,339716,339089,340024 case 4: timeout after returned 76 results when use "limit 100" ,it took(ms) :822992,619742, case 5: timeout after returned 304 results when use "limit 100" ,it took(ms) 387507,148426,130344,130165,130181