バージョン 18 (更新者: wu, 12 年 前)

--

* Bigdata Configuration

* Load performance

* Sparql query performance

Bigdata Configuration

The journal in Bigdata (please refer to  http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=StandaloneGuide for details.)

The WORM (Write-Once, Read-Many) is the traditional log-structured append only journal. It was designed for very fast write rates and is used to buffer writes for scale-out. This is a good choice for immortal databases where people want access to ALL history. Scaling is to several billions of triples.

The RW store (Read-Write) supports recycling of allocation slots on the backing file. It may be used as a time-bounded version of an immortal database where history is aged off of the database over time. This is a good choice for standalone workloads where updates are continuously arriving and older database states may be released. The RW store is also less sensitive to data skew because it can reuse B+Tree node and leaf revisions within a commit group on large data set loads. Scaling should be better than the WORM for standalone and could reach to 10B+ triples. The default property file is attachment:RWStore.properties ダウンロード.

Load Performance

Approach 1:

Upload data from Bigdata sparql point(NanoSparqlServer?). Post the data every 10000 lines. Please refer to attachment:upload.pl ダウンロード for details.

Approach 2:

Upload with com.bigdata.rdf.store.DataLoader? tools and RW store default parameter.

And test the situation when adding GC in JVM.

-Xmx55G -Xms30G -XX:+UseG1GC -XX:+TieredCompilation? -XX:+HeapDumpOnOutOfMemoryError

Without garbage collecting it took 35 minutes to upload Allie.

Approach 3:

We modified the following two important parameters(In the rest test we use this configure in default):

com.bigdata.btree.writeRetentionQueue.capacity=500000
com.bigdata.rdf.sail.BigdataSail.bufferCapacity=1000000

Approach 4: Split the file into 12 small files.

Allie upload

Approach 1: 26hours

Approach 2: 5.89hours when Setting JVM GC : 6.75hours

Approach 3: 2.61 hours

Approach 4: 1.03 hours

PDBJ upload

Result: 8.95 hours

Uniprot upload

time: over one week(7.48 days): 646336127ms

INFO : 646335942 main com.bigdata.rdf.store.DataLoader??.logCounters(DataLoader??.java:1185): extent=249818775552, stmts=3161144450, bytes/stat=79 Wrote: 241474404352 bytes. Total elapsed=646336127ms

Without collecting garbage: 74.3hours

INFO : 267385254 main com.bigdata.rdf.store.DataLoader?.logCounters(DataLoader?.java:1185): extent=249818775552, stmts=3161144450, bytes/stat=79

Wrote: 241489608704 bytes. Total elapsed=267385843ms

loadtime Cell Cycle Ontology Allie PDBj UniProt?
1st time 3mins 62mins 537mins
2nd time 3mins mins mins
average 3mins mins mins

Sparql query performance

Cell cycle query

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 341 353 327 328 327
case2 46 43 41 45 39
case3 3361 3039 2855 3284 3416
case4 32 21 9 22 10
case5 416 574 404 401 433
case6 1216 1295 1135 1277 1134
case7 21 21 19 23 21
case8 105 113 83 108 93
case9 44 43 44 45 40
case10 14 14 14 14 14
case11 25 29 24 29 14
case12 44 49 45 50 32
case13 7 21 19 9 18
case14 3 17 15 18 15
case15 19456 19229 18670 19016 19583
case16 XX X X X
case17 XX X X X
case18 XX X X X
case19 44 36 29 46 37

note: do not support count query in case16,17 and 18.

Allie query

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 423 424 424 443 436
case2 4160 42004263 4201 4264
case3 3352 3230 2329 2329 2308
case4 568 592 92 92 97
case5 1830742 661710 39296 39296 39784

PDBJ query

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 751 213 213 212 213
case2 27 14 15 13 26
case3 188 56 45 53 66
case4 337 58 57 53 59

Uniprot query

DDBJ query

添付ファイル