== Triple Store Survey for Life Science Data == * [#overview Overview] * [#platform Platform] * [#data Data] * [#approach Approach] * Database * Bigdata [wiki:bigdata => Bigdata ] * 4store [wiki:4store => 4store ] * Virtuoso [wiki:Virtuoso => Virtuoso ] * Owlim-se [wiki:OwlimSe => OwlimSe ] * Mulgara [wiki:Mulgara => Mulgara ] * Summarize [wiki:summarize =>Summarize] === Overview === #overview === Platform === #platform * Machine: * OS: GNU/linux * CPU: GenuineIntel 6; model name : Intel(R) Xeon(R) CPU E5649 @ 2.53GHz; 12 cores 24 hyper-threading * Mem: 65996128 kB * Harddisk: SCSI Raid 0 (three hard disks of 1 Tera bytes, two of them are used to store data) * Software: * JDK:1.6.0_26 * Virtuoso: 6.3 commercial * OwlimSE: 4.3.4238 * Mulgara: 2.1.12 * 4store: 1.1.4 * Bigdata: RWSTORE_1_1_0 === Data === #data Allie: .n3 format, 94,420,989 tripples, sparql query attachment:allie.txt . PDBJ: .rdf.gz format ,589,987,335 triples, 77878 files, from [ftp://ftp.pdbj.org/XML/rdf/]. sparql query attachment:pdbj.txt. Uniprot: .rdf.gz format , about 4 billion triples, the 3 largest files are uniprot.rdf.gz,uniparc.rdf.gz,uniref.rdf.gz, from [ftp://ftp.uniprot.org/pub/databases/uniprot/] (the experiment used data was 2011.Nov version). sparql query attachment:uniprot.txt or [http://beta.sparql.uniprot.org/]. DDBJ: .rdf.gz format, about 8 billion triples, 330 files, from [ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/]. sparql query attachment:ddbj.txt . === Approach === #approach We evaluated the data in every Sparql end point at least twice to make it sure that there is no much difference between two test values. We did the query evaluation by executing the whole query mix (composed of the query sequence) five times in every Sparql endpoint, and then get the average time cost of every query.