バージョン 52 から バージョン 53 における更新: survey

差分発生行の前後
無視リスト:
更新日時:
2012/08/28 11:36:38 (12 年 前)
更新者:
wu
コメント:

--

凡例:

変更なし
追加
削除
変更
  • survey

    v52 v53  
    1717=== Overview === #overview 
    1818 
    19 We present an evaluation of native triple stores on biological data. Compared with the data in other areas biological data is typically huge. Therefore the performance of bulk loading and querying are essential to decide whether a triple store can be applied into the biological field. Our target is to verify whether the current triple stores are efficient to deal with the tremendous biological data.  We test five native triple stores Virtuoso, OwlimSE, Mulgara, 4store, and Bigdata. We select five real biological data set instead of synthetic ones ranging from tens of millions to eight billions. We present their load times and query cost. We do not test the inference ability this time. 
     19We present an evaluation of native triple stores on biological data. Compared with the data in other areas biological data is typically huge. Therefore the performance of bulk loading and querying are essential to decide whether a triple store can be applied to the biological field. Our target is to verify whether the current triple stores are efficient to deal with the tremendous biological data.  We tested five native triple stores Virtuoso, OwlimSE, Mulgara, 4store, and Bigdata. We chose five real biological data set instead of synthetic ones ranging from tens of millions to eight billions. We presented their load times and query cost but did not test the inference ability in this study. 
    2020 
    21 For each database we provide several results by adjusting their parameters, which could influence the performance importantly. However  These parameters could perform differently with different hardware and software platforms, and even with different data set. It is difficult to test all the cases by adjusting and combining all the parameters for every data set because the importing of our data set, such as UniProt and DDBJ, may take over one week. Therefore we do not guarantee what we provide is the best performance of each database although we try to find out the best performance for each triple store. 
     21For each database we provide several results by adjusting their parameters, which could influence the performance importantly. However  these parameters could perform differently with different hardware and software platforms, and even with different data set. It is difficult to test all the cases by adjusting and combining all the parameters for every data set because the importing of our data set, such as UniProt and DDBJ, may take several days. Therefore we do not guarantee what we provide is the best performance of each database although we try to find out the best performance for each triple store. 
    2222 
    2323'''4store'''  
     
    3232'''OWLIM-SE''' 
    3333 
    34 OwlimSE is a member of OWLIM family, which provides native RDF engines implemented in Java and deliveries full performance through both Sesame and Jena. From OwlimSE 4.3 it begins to support SPARQL 1.1 Federation. It supports for the semantics of RDFS, OWL 2 RL and OWL 2 QL. OwlimSE is only available in commercial license.  Please refer to [http://www.ontotext.com/owlim]. 
     34OWLIM-SE is a member of OWLIM family, which provides native RDF engines implemented in Java and deliveries full performance through both Sesame and Jena. From OwlimSE 4.3 it begins to support SPARQL 1.1 Federation. It supports for the semantics of RDFS, OWL 2 RL and OWL 2 QL. OWLIM-SE is only available in commercial license.  Please refer to [http://www.ontotext.com/owlim]. 
    3535 
    3636'''Mulgara''' 
    3737 
    38 Mulgara is written entirely in Java and available in open source. Mulgara provides a SQL-like language iTQL(Interactive Tucana Query Language)  shell  to query and update Mulgara databases, which also support RDFS and OWL inferencing.  It also provides a SPARQL query parser and query engine. Please refer to [http://www.mulgara.org/]. 
     38Mulgara is written entirely in Java and available in open source. Mulgara provides a SQL-like language iTQL(Interactive Tucana Query Language)  shell  to query and update Mulgara databases, which also supports RDFS and OWL inferencing.  It also provides a SPARQL query parser and query engine. Please refer to [http://www.mulgara.org/]. 
    3939 
    4040 
     
    4242'''Virtuoso'''  
    4343 
    44 Virtuoso provides a triple storage solution for RDF in RDBMS platform. Virtuoso is a multi-purpose data server for RDBMS, RDF, XML and so on. It offers stored procedures to load RDFXML, ntriples, and compressed triples and supports for SPARQL.  Virtuoso supports limited RDFS and OWL inferencing. Virtuoso can be run in both standalone and  cluster mode.The function as a standalone triple store server is available in both open source and commercial licenses. Please refer to [http://virtuoso.openlinksw.com/]. 
     44Virtuoso provides a triple storage solution for RDF in RDBMS platform. Virtuoso is a multi-purpose data server for RDBMS, RDF, XML and so on. It offers stored procedures to load RDFXML, ntriples, and compressed triples and supports for SPARQL.  Virtuoso supports limited RDFS and OWL inferencing. Virtuoso can be run in both standalone and  cluster mode. The function as a standalone triple store server is available in both open source and commercial licenses. Please refer to [http://virtuoso.openlinksw.com/]. 
    4545 
    4646The following table summarizes some basic information. 
    4747 
    4848 
    49 ||      ||OpenSource||  cluster||       inference||     federated query|| 
     49||Triple Store || OpenSource||  cluster||       inference||     federated query|| 
    5050||4store||      Yes||   Yes||   No||    No|| 
    5151||Bigdata||     Yes||   Yes||   RDFS and limited OWL inference  ||Yes|| 
     
    7272=== Data === #data    
    7373 
    74 We select five real typical biological  data sets instead of synthetic data, the number of triples of which range from 10 Million to 8 Billion.  We summarize the query characteristics in [wiki:Query => QueryCharacteristics ]. 
     74We chose five real typical biological  data sets instead of synthetic data, the number of triples of which ranges from 10 Million to 8 Billion.  We summarize the query characteristics in [wiki:Query => QueryCharacteristics ]. 
    7575 
    7676'''Cell Cycle Ontology ''': .rdf (RDFXML) format, 11,315,866 tripples, from [http://www.semantic-systems-biology.org/]. The Sparql query attachment:cell.txt . 
     
    8888=== Approach === #approach 
    8989    
    90    We imported the data in every Sparql end point at least twice to make it sure that there is no much difference between two test values:|2nd-1st|/max(2nd,1st)<0.1(we took the first value in the summary part now because some loading is still in test ).  
    91    
    92    We did the query evaluation by executing the whole query mix (composed of the query sequence) five times in every Sparql endpoint, remove the highest one and then get the average time cost of other four queries. We report the five detailed time cost in  every database section and the average cost in the summary section. 
     90   We imported the data with default parameters and several empirically improved settings. And then we test each triple store twice with the best setting, and reported their average cost as the importing cost. 
    9391 
     92   We did the query evaluation by executing the whole query mix (composed of the query sequence) five times  in each triple store , remove the highest one and then get the average time cost of the other four queries. We presented the five detailed time cost in  each database section and the average cost in the summary section. 
     93