バージョン 50 から バージョン 51 における更新: survey

差分発生行の前後
無視リスト:
更新日時:
2012/07/24 16:59:23 (12 年 前)
更新者:
wu
コメント:

--

凡例:

変更なし
追加
削除
変更
  • survey

    v50 v51  
    1919We present an evaluation of native triple stores on biological data. Compared with the data in other areas biological data is typically huge. Therefore the performance of bulk loading and querying are essential to decide whether a triple store can be applied into the biological field. Our target is to verify whether the current triple stores are efficient to deal with the tremendous biological data.  We test five native triple stores Virtuoso, OwlimSE, Mulgara, 4store, and Bigdata. We select five real biological data set instead of synthetic ones ranging from tens of millions to eight billions. We present their load times and query cost. We do not test the inference ability this time. 
    2020 
    21 For each database we provide several results by adjusting their parameters, which could influence the performance importantly. However  These parameters could perform differently with different hardware and software platforms, and even with different data set. It is difficult to test all the cases by adjusting and combining all the parameters for every data set because the importing of our data set, such as uniprot and DDBJ, may take over two days or several weeks. Therefore we do not guarantee what we provide is the best performance of each database although we try to find out the best performance for each triple store. 
     21For each database we provide several results by adjusting their parameters, which could influence the performance importantly. However  These parameters could perform differently with different hardware and software platforms, and even with different data set. It is difficult to test all the cases by adjusting and combining all the parameters for every data set because the importing of our data set, such as UniProt and DDBJ, may take over one week. Therefore we do not guarantee what we provide is the best performance of each database although we try to find out the best performance for each triple store. 
    2222 
    2323'''4store'''  
     
    6767  * Bigdata: RWSTORE_1_1_0  
    6868  * Mulgara: 2.1.13 
    69   * OwlimSE: 4.3.4238 
     69  * OWLIM-SE: 4.3.4238 
    7070  * Virtuoso: 6.4 commercial 
    7171 
     
    7474We select five real typical biological  data sets instead of synthetic data, the number of triples of which range from 10 Million to 8 Billion.  We summarize the query characteristics in [wiki:Query => QueryCharacteristics ]. 
    7575 
    76 '''Cell cycle''': .rdf (RDFXML) format, 11,315,866 tripples, from [http://www.semantic-systems-biology.org/]. The Sparql query attachment:cell.txt . 
     76'''Cell Cycle Ontology ''': .rdf (RDFXML) format, 11,315,866 tripples, from [http://www.semantic-systems-biology.org/]. The Sparql query attachment:cell.txt . 
    7777 
    7878'''Allie''': .n3 format, 94,420,989 tripples, sparql query attachment:allie.txt . 
    7979 
    80 '''PDBJ''': .rdf.gz format, 589,987,335 triples, 77878 files,  from  [ftp://ftp.pdbj.org/XML/rdf/]. The Sparql query attachment:pdbj.txt. 
     80'''PDBj''': .rdf.gz format, 589,987,335 triples, 77878 files,  from  [ftp://ftp.pdbj.org/XML/rdf/]. The Sparql query attachment:pdbj.txt. 
    8181 
    8282The queries in PDBJ are point queries which retrieve the relative characteristics of certain EntryID, such as 107L. Therefore their result set is small but the number of query joins is big.