バージョン 9 (更新者: wu, 12 年 前)

--

* OwlimSE 配置

* Load performance

* Sparql query performance

OwlimSE 配置

JVMSetting:
 -Xmx55G -Xms30G -XX:+UseG1GC -XX:+TieredCompilation
-Druleset=empty -Dentity-index-size=1147483647 -Dcache-memory=16645m -Dtuple-index-memory=15G -DenablePredicateList=false  -DftsIndexPolicy=never  -Dbuild-pcsot=false -Dbuild-ptsoc=false  -Djournaling=true -Drepository-type=file-repository  -Dentity-id-size=32  

More information please refer to  http://owlim.ontotext.com/display/OWLIMv43/OWLIM-SE+Configuration

Load Performance

Approach 1: 'load' command in the Sesame console application, for files including less than one billion triples.

Owlim showed that they can not load a billion statements with Owlim in a large file with a load command.

Operation step (we use Allie as an example):

-- create allie.ttl template:
[togordf@ts01 ~]$ ls ~/.aduna/openrdf-sesame-console/templates/
allie.ttl
---in openrdf-console directory
[togordf@ts01 ~]$ ./console.sh
18:12:24.166 [main] DEBUG info.aduna.platform.PlatformFactory - os.name = linux
18:12:24.171 [main] DEBUG info.aduna.platform.PlatformFactory - Detected Posix platform
Connected to default data directory
Commands end with '.' at the end of a line
Type 'help.' for help
> connect "http://localhost:8080/openrdf-sesame".
Disconnecting from default data directory
Connected to http://localhost:8080/openrdf-sesame
> help create.
Usage:
create <template-name>
  <template-name>   The name of a repository configuration template
> create allie.
> open allie.
Opened repository 'allie'
uniprot> load $PathOfData

Please refer to  http://owlim.ontotext.com/display/OWLIMv40/OWLIM-SE+Administrative+Tasks: In general RDF data can be loaded into a given Sesame repository using the 'load' command in the Sesame console application or directly through the workbench web application. However, neither of these approaches will work when using a very large number of triples, e.g. a billion statements. A common solution would be to convert the RDF data into a line-based RDF format (e.g. N-triples) and then split it into many smaller files (e.g. using the linux command 'split'). This would allow each file to be uploaded separately using either the console or workbench applications.

Approach 2:

The idea is from uniprot, which uses owlim as an library as follows:

Basically They have one specific loader program, where there is one java thread that reads the triples into a blocking queue. Then a different number of threads take triples from that queue and insert the data into OWLIM-se (or any other sesame API compatible triplestore). Normally one inserting thread per owlim file-repository fragment. The inserter treads use transactions that commit every half a million statements. The basic is to add statements not files.

final org.openrdf.model.Statement sesameStatement = getSesameStatement(object);

//Takes one from the blocking queue filled by the other thread

connection.add(sesameStatement, graph);

and every millionth statement , do connection.commit();

(Please refer to  https://github.com/JervenBolleman/sesame-loader/ for details)

Allie upload

Approach 1: 38 minutes

Approach 2: 28 minutes

PDBJ upload

Approach 2: 127mins

Uniprot upload

uniprot.rdf.gz: 3,161,144,451 triples, about 28 hours

DDBJ upload

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid30632.hprof ...
Dump file is incomplete: file size limit
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at com.ontotext.trree.big.collections.g.if(Unknown Source)
        at com.ontotext.trree.big.collections.c.for(Unknown Source)
        at com.ontotext.trree.big.collections.b.a(Unknown Source)
        at com.ontotext.trree.big.collections.b.g(Unknown Source)
        at com.ontotext.trree.big.h.shutdown(Unknown Source)
        at com.ontotext.trree.OwlimSchemaRepository.doShutDown(Unknown Source)
        at com.ontotext.trree.OwlimSchemaRepository.shutDown(Unknown Source)
        at com.github.sesameloader.owlim.OwlimRepositoryManager.shutDown(OwlimRepositoryManager.java:44)
        at loader.load(loader.java:141)
        at loader.main(loader.java:87)

Until the failture Owlim had finished 7,883,140,000 triples within 70.5 hours.

Sparql query performance

Allie query performance

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 149 138 147 152 144
case2 2036 1954 2049 1959 1971
case3 1520 1484 1464 1467 1490
case4 36 37 40 38 41
case5 380858 67225 69009 68948 68296

PDBJ query performance

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 52 61 55 53 50
case2 1 1 1 1 1
case3 188 191 204 203 182
case4 4 4 4 4 4

Uniprot query performance

DDBJ query performance