survey – TogoRDF

Context Navigation

バージョン 13 (更新者: wu, 14 年前)
--

Triple Store Survey for Life Science Data

Overview
Platform
Data
Approach
Database
- Bigdata => Bigdata
- 4store => 4store
- Virtuoso => Virtuoso
- Owlim-se => OwlimSe
- Mulgara => Mulgara

Summarize =>Summarize

Overview

Platform

* Machine:

OS: GNU/linux
CPU: GenuineIntel? 6; model name : Intel(R) Xeon(R) CPU E5649 @ 2.53GHz; 12 cores 24 hyper-threading
Mem: 65996128 kB
Harddisk: SCSI Raid 0 (three hard disks of 1 Tera bytes, two of them are used to store data)

* Software:

JDK:1.6.0_26
Virtuoso: 6.3 commercial
OwlimSE: 4.3.4238
Mulgara: 2.1.12
4store: 1.1.4
Bigdata: RWSTORE_1_1_0

Data

Allie: .n3 format, 94,420,989 tripples, sparql query attachment:allie.txt .

PDBJ: .rdf.gz format ,589,987,335 triples, 77878 files, from ftp://ftp.pdbj.org/XML/rdf/. sparql query attachment:pdbj.txt .

The queries in PDBJ are point queries which retrieve the relative characteristics of certain EntryID, such as 107L. Therefore their result set is small but the number of query joins is big.

Uniprot: .rdf.gz format , about 4 billion triples, the 3 largest files are uniprot.rdf.gz,uniparc.rdf.gz,uniref.rdf.gz, from ftp://ftp.uniprot.org/pub/databases/uniprot/ (the experiment used data was 2011.Nov version). sparql query attachment:uniprot.txt or http://beta.sparql.uniprot.org/.

DDBJ: .rdf.gz format, about 8 billion triples, 330 files, from ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/. sparql query attachment:ddbj.txt .

Approach

We evaluated the data in every Sparql end point at least twice to make it sure that there is no much difference between two test values.

We did the query evaluation by executing the whole query mix (composed of the query sequence) five times in every Sparql endpoint, and then get the average time cost of every query.

添付ファイル

allie.txt (2.5 KB) - 登録者 wu 14 年前.
pdbj.txt (2.6 KB) - 登録者 wu 14 年前.
cell.txt (9.2 KB) - 登録者 wu 14 年前.
ddbj.txt (3.0 KB) - 登録者 wu 14 年前.
uniprot.txt (3.5 KB) - 登録者 wu 14 年前.

異なるフォーマットでダウンロード:

テキスト