* Desription
- Need VoID descriptions, class and its the entity number ,property and its triple number,distinct subject and object number information
- For triple patterns with bound variables which are not covered in the VOID statistics, the system send a SPARQL ASK query to pre-selected(after predicated and type selection) endpoints --- no limitation of predicate bound
- Dynamic Programming and prefer bushy trees.
* source code: rdffederator - Revision 192: /trunk/src
* step:
- make the corresponding VoID files
- method 1:
1.1 data download:
drugbank: http://download.bio2rdf.org/release/3/drugbank/ drugbank.nq.gz 19-Dec-2014 13:23 52M
omim: http://download.bio2rdf.org/release/3/omim/ omim.nq.gz 11-Nov-2014 03:50 124M
pharmgkb: http://download.bio2rdf.org/release/3/pharmgkb/ diseases.nq.gz genes.nq.gz rsid.nq.gz drugs.nq.gz offsides.nq.gz twosides.nq.gz 01-Jun-2014 17:54 2.4G or so
1.2 ./generate_void.sh ../data/drugbank.nq drugbank_void.n3
counting triples and properties counting types and entities counting distinct objects counting distinct subjects
drugbank: time taken: 91 seconds
omim: time taken: 411 seconds
- method 2:
Made the void files with the statistics queries in the following page: https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-Release-3-Summary-Statistics
2. execute: ./SPLENDID.sh SPLENDID-config.n3 query/query1.txt
For the VoID files with method 1,the following errors occur:
20:13:36 [WARN ] [Rio error] Expected ':', found '>' (465, -1) (ParseErrorLogger.java:28) 20:13:36 [WARN ] [Rio error] Namespace prefix 'LBSL' used but not defined (465,-1) (ParseErrorLogger.java:28) 20:13:36 [ERROR] [Rio fatal] Not a valid (absolute) URI: null (465, -1) (ParseErrorLogger.java:32) 20:13:36 [ERROR] can not parse VOID file file:/opt/services/fsearch/app/splendid/void/omim.n3: Not a valid (absolute) URI: null [line 465] (VoidStatistics.java:407)
check the corresponding void files, and deal with the lines as follows. we simply do the following changes:
void:property fusion> => void:property <fusion>
Result:
case 1: 5 results
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 | Avg |
case1 | 6724 | 4763 | 4766 | 4765 | 4763 | 4764 |
case 2: error: Caused by: org.openrdf.query.QueryEvaluationException?: Virtuoso 37000 Error SP030: SPARQL compiler, line 2: syntax error at '"GLUCOCORTICOID-REMEDIABLE"' before '"i"'
case 3: part of results(216) returned and timeout occurred when processing request org.apache.commons.httpclient.HttpMethodDirector? executeWithRetry.
when use "limit 100" ,it took(ms): 348080,341991,339716,339089,340024 ,340205(avg)
case 4: timeout after returned 76 results
when use "limit 100" ,it took(ms) :822992,619742, case 5: timeout after returned 304 results
when use "limit 100" ,it took(ms) 387507,148426,130344,130165,130181, 134779(avg)
添付ファイル
- SPLENDID-config.n3 (2.0 KB) - 登録者 wu 10 年 前.
- drugbank.n3 (69.1 KB) - 登録者 wu 10 年 前.
- kegg.n3 (74.3 KB) - 登録者 wu 10 年 前.
- omim.n3 (51.1 KB) - 登録者 wu 10 年 前.
- pharmgkb.n3 (52.9 KB) - 登録者 wu 10 年 前.
- sider.n3 (21.2 KB) - 登録者 wu 10 年 前.