| 1 | * [#Configure OwlimSE 配置] |
| 2 | |
| 3 | * [#load Load performance] |
| 4 | * [#allieload Allie upload] |
| 5 | * [#pdbjload PDBJ upload] |
| 6 | * [#uniprotload Uniprot upload] |
| 7 | * [#ddbjload DDBJ upload] |
| 8 | * [#Sparql Sparql query performance] |
| 9 | * [#alliequery Allie query ] |
| 10 | * [#pdbjquery PDBJ query ] |
| 11 | * [#uniprotquery Uniprot query ] |
| 12 | * [#ddbjquery DDBJ query ] |
| 13 | |
| 14 | |
| 15 | === OwlimSE 配置 === #Configure |
| 16 | |
| 17 | It is well known that the construction of index can help accelerate the query but make the update slow. In OwlimSE, we make cache-memory = tuple-index-memory, namely, we enable POS/PSO indices but disable PCSOT,PTSOC,SP, PO indices and full-text search. By setting build-pcsot,build-ptsoc,ftsIndexPolicy,enablePredicateList to true or some appropriate value("onCommit", "onStartup" or "onShutdown" for ftsIndexPolicy) we can enable them accordingly. |
| 18 | |
| 19 | |
| 20 | |
| 21 | More information please refer to [http://owlim.ontotext.com/display/OWLIMv43/OWLIM-SE+Configuration] |
| 22 | |
| 23 | |
| 24 | === Load Performance === #load |
| 25 | |
| 26 | '''Approach 1:''' 'load' command in the Sesame console application, for files including less than one billion triples. |
| 27 | |
| 28 | Owlim showed that they can not load a billion statements with Owlim in a large file with a load command. |
| 29 | |
| 30 | |
| 31 | {{{ |
| 32 | Operation step (we use Allie as an example): |
| 33 | |
| 34 | -- create allie.ttl template: |
| 35 | [togordf@ts01 ~]$ ls ~/.aduna/openrdf-sesame-console/templates/ |
| 36 | allie.ttl |
| 37 | ---in openrdf-console directory |
| 38 | [togordf@ts01 ~]$ ./console.sh |
| 39 | 18:12:24.166 [main] DEBUG info.aduna.platform.PlatformFactory - os.name = linux |
| 40 | 18:12:24.171 [main] DEBUG info.aduna.platform.PlatformFactory - Detected Posix platform |
| 41 | Connected to default data directory |
| 42 | Commands end with '.' at the end of a line |
| 43 | Type 'help.' for help |
| 44 | > connect "http://localhost:8080/openrdf-sesame". |
| 45 | Disconnecting from default data directory |
| 46 | Connected to http://localhost:8080/openrdf-sesame |
| 47 | > help create. |
| 48 | Usage: |
| 49 | create <template-name> |
| 50 | <template-name> The name of a repository configuration template |
| 51 | > create allie. |
| 52 | > open allie. |
| 53 | Opened repository 'allie' |
| 54 | uniprot> load $PathOfData |
| 55 | |
| 56 | }}} |
| 57 | |
| 58 | Please refer to [http://owlim.ontotext.com/display/OWLIMv40/OWLIM-SE+Administrative+Tasks]: |
| 59 | In general RDF data can be loaded into a given Sesame repository using the 'load' command in the Sesame console application or directly through the workbench web application. However, neither of these approaches will work when using a very large number of triples, e.g. a billion statements. A common solution would be to convert the RDF data into a line-based RDF format (e.g. N-triples) and then split it into many smaller files (e.g. using the linux command 'split'). This would allow each file to be uploaded separately using either the console or workbench applications. |
| 60 | |
| 61 | '''Approach 2:''' |
| 62 | |
| 63 | The idea is from uniprot, which uses owlim as an library as follows: |
| 64 | |
| 65 | Basically They have one specific loader program, where there is one java thread that reads the triples into a blocking queue. Then a different number of threads take triples from that queue and insert the data into OWLIM-se (or any other sesame API compatible triplestore). Normally one inserting thread per owlim file-repository fragment. The inserter treads use transactions that commit every half a million statements. The basic is to add statements not files. |
| 66 | |
| 67 | final org.openrdf.model.Statement sesameStatement = getSesameStatement(object); |
| 68 | |
| 69 | //Takes one from the blocking queue filled by the other thread |
| 70 | |
| 71 | connection.add(sesameStatement, graph); |
| 72 | |
| 73 | and every millionth statement , do connection.commit(); |
| 74 | |
| 75 | (Please refer to [https://github.com/JervenBolleman/sesame-loader/] for details) |
| 76 | |
| 77 | |
| 78 | |
| 79 | === Allie upload === #allieload |
| 80 | |
| 81 | Approach 1: 38 minutes |
| 82 | |
| 83 | Approach 2: 28 minutes |
| 84 | |
| 85 | === PDBJ upload === #pdbjload |
| 86 | |
| 87 | Approach 2: 197mins |
| 88 | |
| 89 | === Uniprot upload === #uniprotload |
| 90 | |
| 91 | |
| 92 | when vm.swappiness=60 |
| 93 | |
| 94 | {{{ |
| 95 | -Xmx60G -Xms30G -Druleset=empty -Dentity-index-size=675000000 -Dcache-memory=20633m -DenablePredicateList=false -Dtuple-index-memory=20633m -DftsIndexPolicy=never -Dbuild-pcsot=false -Dbuild-ptsoc=false -Djournaling=true -Drepository-type=file-repository -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -Dentity-id-size=32 |
| 96 | |
| 97 | }}} |
| 98 | |
| 99 | we used 12 threads to import, it took 68 hours and 29 minutes. |
| 100 | |
| 101 | when vm.swappiness=60 |
| 102 | |
| 103 | {{{ |
| 104 | -Xmx60G -Xms30G -Druleset=empty -Dentity-index-size=800000000 -Dcache-memory=20000m -DenablePredicateList=false -Dtuple-index-memory=20000m -DftsIndexPolicy=never -Dbuild-pcsot=false -Dbuild-ptsoc=false -Djournaling=true -Drepository-type=file-repository -XX:+HeapDumpOnOutOfMemoryError -Dentity-id-size=32 |
| 105 | }}} |
| 106 | |
| 107 | we used 3 threads to import, it took 59 hours and 15 minutes. |
| 108 | |
| 109 | |
| 110 | |
| 111 | === DDBJ upload === #ddbjload |
| 112 | |
| 113 | when |
| 114 | {{{ |
| 115 | -Xmx60G -Xms30G -Druleset=empty -Dentity-index-size=675000000 -Dcache-memory=20633m -DenablePredicateList=false -Dtuple-index-memory=20633m -DftsIndexPolicy=never -Dbuild-pcsot=false -Dbuild-ptsoc=false -Djournaling=true -Drepository-type=file-repository -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -Dentity-id-size=32 |
| 116 | |
| 117 | }}} |
| 118 | |
| 119 | and we used 12 threads to import, it took 128hours15minutes and an outofMemory occurred at the end. And we took another 54hours and 53minutes to do a data restore for using the database. |
| 120 | |
| 121 | when |
| 122 | |
| 123 | {{{ |
| 124 | -Xmx60G -Xms30G -Druleset=empty -Dentity-index-size=800000000 -Dcache-memory=20000m -DenablePredicateList=false -Dtuple-index-memory=20000m -DftsIndexPolicy=never -Dbuild-pcsot=false -Dbuild-ptsoc=false -Djournaling=true -Drepository-type=file-repository -XX:+HeapDumpOnOutOfMemoryError -Dentity-id-size=32 |
| 125 | }}} |
| 126 | |
| 127 | and we use 3 threads to import, it took 82hours and 2 minutes successfully to import DDBJ. |
| 128 | |
| 129 | when vm.swappiness=10, |
| 130 | |
| 131 | it took 49 hours and 12 minutes. |
| 132 | |
| 133 | |
| 134 | '''Load performance''' |
| 135 | |
| 136 | |
| 137 | ||loadtime|| Cell Cycle Ontology || Allie || PDBj || UniProt || DDBJ || |
| 138 | || 1st time || 3mins || 28mins ||197mins ||59hs15mins ||49hs12mins || |
| 139 | || 2nd time || 3mins || 30mins ||219mins || ||50hs26mins|| |
| 140 | || average || 3mins || 29mins ||208mins || ||49hs49mins || |
| 141 | |
| 142 | |
| 143 | Until the failure Owlim had finished 7,883,140,000 triples within 70.5 hours. |
| 144 | |
| 145 | |
| 146 | |
| 147 | === Sparql query performance === |
| 148 | |
| 149 | === Cell cycle query === #cellquery |
| 150 | |
| 151 | |
| 152 | ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || |
| 153 | ||case1 ||111 ||116|| 109 ||109|| 112|| |
| 154 | ||case2 ||6 ||6 ||6|| 6 ||6|| |
| 155 | ||case3 ||2 ||2 ||2|| 2|| 2|| |
| 156 | ||case4 ||156 ||148 ||151 ||148|| 149|| |
| 157 | ||case5 ||416 ||574 ||404|| 401|| 433 |
| 158 | ||case6 ||2182|| 2120|| 1940|| 2245|| 2040|| |
| 159 | ||case7 ||2 ||2 ||3 ||2|| 6|| |
| 160 | ||case8 ||33 ||33|| 33|| 32|| 33|| |
| 161 | ||case9 ||23 ||23|| 20 ||22|| 22|| |
| 162 | ||case10 ||0 ||0 ||0 ||0|| 0|| |
| 163 | ||case11 ||6 ||6 ||6 ||6|| 6|| |
| 164 | ||case12 ||6 ||7 ||6 ||6 ||7|| |
| 165 | ||case13 ||2 ||2 ||2 ||2 ||2|| |
| 166 | ||case14 ||0 ||0 ||0 ||0 ||0|| |
| 167 | ||case15 ||46043 ||46334 ||45843 ||46294 ||47640|| |
| 168 | ||case16 ||X||X|| X|| X|| X|| |
| 169 | ||case17 ||X||X|| X|| X|| X|| |
| 170 | ||case18 ||X||X|| X|| X|| X|| |
| 171 | ||case19 ||13 ||14 ||13 ||14 ||14|| |
| 172 | |
| 173 | note: do not support '''count''' query in case16,17 and 18. |
| 174 | |
| 175 | === Allie query === #alliequery |
| 176 | |
| 177 | ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || |
| 178 | ||case1 ||149|| 138|| 147|| 152|| 144|| |
| 179 | ||case2 ||2036|| 1954|| 2049|| 1959|| 1971|| |
| 180 | ||case3 ||1520|| 1484|| 1464|| 1467|| 1490|| |
| 181 | ||case4 ||36|| 37 ||40 ||38|| 41|| |
| 182 | ||case5 ||380858|| 67225 ||69009|| 68948|| 68296|| |
| 183 | |
| 184 | |
| 185 | |
| 186 | === PDBJ query === #pdbjquery |
| 187 | |
| 188 | ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || |
| 189 | ||case1 ||52|| 61 ||55|| 53|| 50|| |
| 190 | ||case2 ||1|| 1|| 1|| 1|| 1|| |
| 191 | ||case3 ||188|| 191|| 204|| 203|| 182|| |
| 192 | ||case4 ||4|| 4|| 4|| 4|| 4|| |
| 193 | |
| 194 | |
| 195 | === Uniprot query === #uniprotquery |
| 196 | |
| 197 | |
| 198 | ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || |
| 199 | |
| 200 | ||case1 ||305|| 295 ||405 ||864|| 711|| |
| 201 | ||case2 ||349 ||400 ||312 ||470 ||898|| |
| 202 | ||case3 ||440 ||460 ||674 ||500|| 1049 || |
| 203 | ||case4 ||15 ||200 ||170 ||201 ||172|| |
| 204 | ||case5 ||20 ||22|| 20|| 22|| 77|| |
| 205 | ||case6 ||850266|| 605532|| 650282|| 645702|| 612007|| |
| 206 | ||case7 ||1138731|| 446141 ||584173 ||223218 ||482121|| |
| 207 | ||case8 ||13449 ||13617|| 502|| 482|| 13262|| |
| 208 | ||case9 ||3430 ||3166 ||673|| 639 ||3214|| |
| 209 | ||case10 ||127019|| 113550|| 958|| 1085|| 119581|| |
| 210 | ||case11 ||6669 ||6287 ||179 ||142 ||6455|| |
| 211 | ||case12 ||266 ||205 ||39 ||10 ||213|| |
| 212 | ||case13 ||32 ||29 ||6 ||6 ||45|| |
| 213 | ||case14 ||42 ||41|| 45 ||45|| 40|| |
| 214 | ||case15 ||29112|| 38094||38291|| 34950|| 67722|| |
| 215 | ||case16 ||378191|| 372805|| 375879|| 274524|| 265025|| |
| 216 | ||case17 ||6163 ||5948 ||5828 ||5916|| 5808|| |
| 217 | ||case18 ||83955 ||8942|| 8842|| 9025 ||8792|| |
| 218 | |
| 219 | === DDBJ query === #ddbjquery |
| 220 | ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || |
| 221 | ||case1 ||26500 ||25588 ||17118||16823|| 15064|| |
| 222 | ||case2 ||3400 ||3437 ||3136 ||3203 ||3365|| |
| 223 | ||case3 ||3874 ||3923 ||3556 ||3643|| 3765|| |
| 224 | ||case4 ||237 ||104 ||53 ||52|| 118|| |
| 225 | ||case5 ||247 ||83 ||61|| 86|| 110 |
| 226 | ||case6 ||109|| 129|| 144|| 112|| 104|| |
| 227 | ||case7 ||7871 ||7646|| 3990|| 5923|| 4577|| |
| 228 | ||case8 ||16278 ||14020 ||6991 ||11214|| 9645|| |
| 229 | ||case9 ||3640 ||2824|| 1605 ||2314|| 1656|| |
| 230 | ||case10 ||1 ||1 ||1 ||1 ||1|| |
| 231 | |