初期バージョン から バージョン 1 における更新: OwlimSe4.3

差分発生行の前後
無視リスト:
更新日時:
2012/10/12 17:31:10 (12 年 前)
更新者:
wu
コメント:

--

凡例:

変更なし
追加
削除
変更
  • OwlimSe4.3

    v1 v1  
     1* [#Configure OwlimSE 配置] 
     2 
     3* [#load Load performance]  
     4    * [#allieload Allie upload] 
     5    * [#pdbjload PDBJ upload] 
     6    * [#uniprotload Uniprot upload] 
     7    * [#ddbjload DDBJ upload] 
     8* [#Sparql  Sparql query performance]  
     9    * [#alliequery Allie query ] 
     10    * [#pdbjquery PDBJ query ] 
     11    * [#uniprotquery Uniprot query ] 
     12    * [#ddbjquery DDBJ query ] 
     13 
     14 
     15=== OwlimSE 配置 === #Configure 
     16 
     17It is well known that the construction of index can help accelerate the query but make the update slow. In OwlimSE, we make cache-memory = tuple-index-memory, namely, we enable POS/PSO indices but disable PCSOT,PTSOC,SP, PO indices and full-text search. By setting build-pcsot,build-ptsoc,ftsIndexPolicy,enablePredicateList to true or some appropriate value("onCommit", "onStartup" or "onShutdown" for ftsIndexPolicy) we can enable them accordingly. 
     18 
     19 
     20 
     21More information please refer to [http://owlim.ontotext.com/display/OWLIMv43/OWLIM-SE+Configuration]  
     22 
     23 
     24=== Load Performance === #load 
     25 
     26'''Approach 1:''' 'load' command in the Sesame console application, for files including less than one billion triples.  
     27 
     28Owlim showed that they can not load a billion statements with Owlim in a large file with a load command. 
     29 
     30 
     31{{{ 
     32Operation step (we use Allie as an example): 
     33 
     34-- create allie.ttl template: 
     35[togordf@ts01 ~]$ ls ~/.aduna/openrdf-sesame-console/templates/ 
     36allie.ttl 
     37---in openrdf-console directory 
     38[togordf@ts01 ~]$ ./console.sh 
     3918:12:24.166 [main] DEBUG info.aduna.platform.PlatformFactory - os.name = linux 
     4018:12:24.171 [main] DEBUG info.aduna.platform.PlatformFactory - Detected Posix platform 
     41Connected to default data directory 
     42Commands end with '.' at the end of a line 
     43Type 'help.' for help 
     44> connect "http://localhost:8080/openrdf-sesame". 
     45Disconnecting from default data directory 
     46Connected to http://localhost:8080/openrdf-sesame 
     47> help create. 
     48Usage: 
     49create <template-name> 
     50  <template-name>   The name of a repository configuration template 
     51> create allie. 
     52> open allie. 
     53Opened repository 'allie' 
     54uniprot> load $PathOfData 
     55 
     56}}} 
     57 
     58Please refer to [http://owlim.ontotext.com/display/OWLIMv40/OWLIM-SE+Administrative+Tasks]: 
     59In general RDF data can be loaded into a given Sesame repository using the 'load' command in the Sesame console application or directly through the workbench web application. However, neither of these approaches will work when using a very large number of triples, e.g. a billion statements. A common solution would be to convert the RDF data into a line-based RDF format (e.g. N-triples) and then split it into many smaller files (e.g. using the linux command 'split'). This would allow each file to be uploaded separately using either the console or workbench applications.  
     60 
     61'''Approach 2:''' 
     62 
     63The idea is from uniprot, which uses owlim as an library as follows: 
     64 
     65Basically They have one specific loader program, where there is one java thread that reads the triples into a blocking queue. Then a different number of threads take triples from that queue and insert the data into OWLIM-se (or any other sesame API compatible triplestore). Normally one inserting thread per owlim file-repository fragment. The inserter treads use transactions that commit every half a million statements. The basic is to add statements not files. 
     66 
     67    final org.openrdf.model.Statement sesameStatement = getSesameStatement(object); 
     68 
     69    //Takes one from the blocking queue filled by the other thread 
     70 
     71        connection.add(sesameStatement, graph); 
     72 
     73and every millionth statement , do connection.commit(); 
     74 
     75(Please refer to [https://github.com/JervenBolleman/sesame-loader/] for details) 
     76 
     77 
     78 
     79=== Allie upload === #allieload  
     80 
     81Approach 1:  38 minutes 
     82 
     83Approach 2:  28 minutes 
     84 
     85=== PDBJ upload === #pdbjload  
     86 
     87Approach 2:  197mins 
     88 
     89=== Uniprot upload === #uniprotload  
     90 
     91 
     92when vm.swappiness=60 
     93 
     94{{{ 
     95-Xmx60G -Xms30G -Druleset=empty -Dentity-index-size=675000000 -Dcache-memory=20633m  -DenablePredicateList=false -Dtuple-index-memory=20633m -DftsIndexPolicy=never  -Dbuild-pcsot=false -Dbuild-ptsoc=false  -Djournaling=true -Drepository-type=file-repository   -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -Dentity-id-size=32   
     96 
     97}}} 
     98 
     99 we used 12 threads to import, it took 68 hours and 29 minutes.          
     100 
     101when vm.swappiness=60 
     102 
     103{{{ 
     104 -Xmx60G -Xms30G -Druleset=empty -Dentity-index-size=800000000 -Dcache-memory=20000m  -DenablePredicateList=false -Dtuple-index-memory=20000m -DftsIndexPolicy=never  -Dbuild-pcsot=false -Dbuild-ptsoc=false  -Djournaling=true -Drepository-type=file-repository  -XX:+HeapDumpOnOutOfMemoryError -Dentity-id-size=32 
     105}}} 
     106 
     107 we used 3 threads to import, it took 59 hours and 15 minutes. 
     108 
     109 
     110 
     111=== DDBJ upload === #ddbjload   
     112 
     113when 
     114{{{ 
     115-Xmx60G -Xms30G -Druleset=empty -Dentity-index-size=675000000 -Dcache-memory=20633m  -DenablePredicateList=false -Dtuple-index-memory=20633m -DftsIndexPolicy=never  -Dbuild-pcsot=false -Dbuild-ptsoc=false  -Djournaling=true -Drepository-type=file-repository   -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -Dentity-id-size=32   
     116 
     117}}} 
     118 
     119and we used 12 threads to import, it took 128hours15minutes and an outofMemory occurred at the end. And we took another 54hours and 53minutes to do a data restore for using the database.  
     120 
     121when  
     122 
     123{{{ 
     124 -Xmx60G -Xms30G -Druleset=empty -Dentity-index-size=800000000 -Dcache-memory=20000m  -DenablePredicateList=false -Dtuple-index-memory=20000m -DftsIndexPolicy=never  -Dbuild-pcsot=false -Dbuild-ptsoc=false  -Djournaling=true -Drepository-type=file-repository  -XX:+HeapDumpOnOutOfMemoryError -Dentity-id-size=32 
     125}}} 
     126 
     127and we use 3 threads to import, it took 82hours and 2 minutes successfully to import DDBJ.  
     128 
     129when vm.swappiness=10,  
     130 
     131it took 49 hours and 12 minutes. 
     132 
     133 
     134'''Load performance''' 
     135 
     136 
     137||loadtime|| Cell Cycle Ontology || Allie || PDBj || UniProt || DDBJ || 
     138|| 1st time || 3mins || 28mins ||197mins  ||59hs15mins  ||49hs12mins  || 
     139|| 2nd time || 3mins || 30mins ||219mins  ||  ||50hs26mins|| 
     140|| average || 3mins || 29mins ||208mins  ||  ||49hs49mins || 
     141 
     142 
     143Until the failure Owlim had finished 7,883,140,000 triples within 70.5 hours.  
     144 
     145 
     146 
     147=== Sparql query performance ===  
     148 
     149=== Cell cycle query === #cellquery  
     150 
     151 
     152||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || 
     153||case1 ||111   ||116|| 109     ||109|| 112|| 
     154||case2 ||6     ||6     ||6||   6       ||6|| 
     155||case3 ||2     ||2     ||2||   2||     2|| 
     156||case4 ||156   ||148   ||151   ||148|| 149|| 
     157||case5 ||416   ||574   ||404|| 401||   433 
     158||case6 ||2182||        2120||  1940||  2245||  2040|| 
     159||case7 ||2     ||2     ||3     ||2||   6|| 
     160||case8 ||33    ||33||  33||    32||    33|| 
     161||case9 ||23    ||23||  20      ||22||  22|| 
     162||case10 ||0    ||0     ||0     ||0||   0|| 
     163||case11 ||6    ||6     ||6     ||6||   6|| 
     164||case12 ||6    ||7     ||6     ||6     ||7|| 
     165||case13 ||2    ||2     ||2     ||2     ||2|| 
     166||case14 ||0    ||0     ||0     ||0     ||0|| 
     167||case15 ||46043        ||46334 ||45843 ||46294 ||47640|| 
     168||case16 ||X||X||       X||     X||     X|| 
     169||case17 ||X||X||       X||     X||     X|| 
     170||case18 ||X||X||       X||     X||     X|| 
     171||case19 ||13   ||14    ||13    ||14    ||14|| 
     172 
     173note: do not support '''count''' query in case16,17 and 18. 
     174 
     175=== Allie query === #alliequery  
     176 
     177||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || 
     178||case1 ||149|| 138||   147||   152||   144|| 
     179||case2 ||2036||        1954||  2049||  1959||  1971|| 
     180||case3 ||1520||        1484||  1464||  1467||  1490|| 
     181||case4 ||36||  37      ||40    ||38||  41|| 
     182||case5 ||380858||      67225   ||69009||       68948|| 68296|| 
     183 
     184 
     185 
     186=== PDBJ query === #pdbjquery  
     187 
     188||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || 
     189||case1 ||52||  61      ||55||  53||    50|| 
     190||case2 ||1||   1||     1||     1||     1|| 
     191||case3 ||188|| 191||   204||   203||   182|| 
     192||case4 ||4||   4||     4||     4||     4|| 
     193 
     194 
     195=== Uniprot query === #uniprotquery  
     196 
     197 
     198||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || 
     199 
     200||case1 ||305|| 295     ||405   ||864|| 711|| 
     201||case2 ||349   ||400   ||312   ||470   ||898|| 
     202||case3 ||440   ||460   ||674   ||500|| 1049    || 
     203||case4 ||15    ||200   ||170   ||201   ||172|| 
     204||case5 ||20    ||22||  20||    22||    77|| 
     205||case6 ||850266||      605532||        650282||        645702||        612007|| 
     206||case7 ||1138731||     446141  ||584173        ||223218        ||482121|| 
     207||case8 ||13449 ||13617||       502||   482||   13262|| 
     208||case9 ||3430  ||3166  ||673|| 639     ||3214|| 
     209||case10 ||127019||     113550||        958||   1085||  119581|| 
     210||case11 ||6669 ||6287  ||179   ||142   ||6455|| 
     211||case12 ||266  ||205   ||39    ||10    ||213|| 
     212||case13 ||32   ||29    ||6     ||6     ||45|| 
     213||case14 ||42   ||41||  45      ||45||  40|| 
     214||case15 ||29112|| 38094||38291||       34950|| 67722|| 
     215||case16 ||378191||     372805||        375879||        274524||        265025|| 
     216||case17 ||6163 ||5948  ||5828  ||5916||        5808|| 
     217||case18 ||83955        ||8942||        8842||  9025    ||8792|| 
     218 
     219=== DDBJ query === #ddbjquery  
     220||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || 
     221||case1 ||26500 ||25588 ||17118||16823||        15064|| 
     222||case2 ||3400  ||3437  ||3136  ||3203  ||3365|| 
     223||case3 ||3874  ||3923  ||3556  ||3643||        3765|| 
     224||case4 ||237   ||104   ||53    ||52||  118|| 
     225||case5 ||247   ||83    ||61||  86||    110 
     226||case6 ||109|| 129||   144||   112||   104|| 
     227||case7 ||7871  ||7646||        3990||  5923||  4577|| 
     228||case8 ||16278 ||14020 ||6991  ||11214||       9645|| 
     229||case9 ||3640  ||2824||        1605    ||2314||        1656|| 
     230||case10 ||1    ||1     ||1     ||1     ||1|| 
     231