* [#configure　Bigdata　Configuration]

* [#load Load performance] 
    * [#allieload Allie upload]
    * [#pdbjload PDBJ upload]
    * [#uniprotload Uniprot upload]
    * [#ddbjload DDBJ upload]
* [#Sparql  Sparql query performance] 
    * [#alliequery Allie query ]
    * [#pdbjquery PDBJ query ]
    * [#uniprotquery Uniprot query ]
    * [#ddbjquery DDBJ query ]


=== Bigdata Configuration === #configure

The journal in Bigdata (please refer to [http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=StandaloneGuide] for details.)

The WORM (Write-Once, Read-Many) is the traditional log-structured append only journal. It was designed for very fast write rates and is used to buffer writes for scale-out. This is a good choice for immortal databases where people want access to ALL history. Scaling is to several billions of triples. 

The RW store (Read-Write) supports recycling of allocation slots on the backing file. It may be used as a time-bounded version of an immortal database where history is aged off of the database over time. This is a good choice for standalone workloads where updates are continuously arriving and older database states may be released. The RW store is also less sensitive to data skew because it can reuse B+Tree node and leaf revisions within a commit group on large data set loads. Scaling should be better than the WORM for standalone and could reach to 10B+ triples. The default property file is attachment:RWStore.properties.


=== Load Performance === #load

Approach 1:

 Upload data from Bigdata sparql point(NanoSparqlServer). Post the data every 10000 lines. Please refer to attachment:upload.pl for details. 

Approach 2:

 Upload with com.bigdata.rdf.store.DataLoader tools and RW store default parameter.

And test the situation when adding GC in JVM. 
 
{{{
-Xmx55G -Xms30G -XX:+UseG1GC -XX:+TieredCompilation? -XX:+HeapDumpOnOutOfMemoryError
}}}

Without garbage collecting it took 35 minutes to upload Allie.

Approach 3:

We modified the following two important parameters(In the rest test we use this configure in default):
{{{
com.bigdata.btree.writeRetentionQueue.capacity=500000
com.bigdata.rdf.sail.BigdataSail.bufferCapacity=1000000
}}}

Approach 4: Split the file into 12 small files.

=== Allie upload === #allieload 
 
Approach 1: 26hours 

Approach 2: 5.89hours 
when Setting JVM GC : 6.75hours 

Approach 3: 2.61 hours with UseG1GC;
            35 minutes with automatic garbage collection (vm.swappiness =10);
            38 minutes with automatic garbage collection (vm.swappiness =60);

Approach 4: 1.03 hours with collecting garbage;
            35 minutes without collecting garbage;



=== PDBJ upload === #pdbjload 

'''Result:''' 8.95 hours with UseG1GC;
              7.15 hours with  automatic garbage collection (vm.swappiness =10);

=== Uniprot upload === #uniprotload 

time: over one week(7.48 days): 646336127ms

INFO : 646335942 main com.bigdata.rdf.store.DataLoader?.logCounters(DataLoader?.java:1185): extent=249818775552, stmts=3161144450, bytes/stat=79 Wrote: 241474404352 bytes. Total elapsed=646336127ms

Without collecting garbage： 74.3hours

 INFO : 267385254      main com.bigdata.rdf.store.DataLoader.logCounters(DataLoader.java:1185): extent=249818775552, stmts=3161144450, bytes/stat=79
Wrote: 241489608704 bytes.
Total elapsed=267385843ms

=== Load performance ===

In the result we configured the setting as vm.swappiness = 60, automatic collecting garbage.

||loadtime|| Cell Cycle Ontology || Allie || PDBj || UniProt || 
|| 1st time || 3mins ||35mins ||537mins ||  || 
|| 2nd time || 3mins || mins || mins  ||  || 
|| average || 3mins ||  mins ||mins ||  || 


=== Sparql query performance === #Sparql

=== Cell cycle query === #cellquery 


||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 ||
||case1 ||341	||353	||327	||328	||327||
||case2 ||46||	43	||41||	45||	39||
||case3 ||3361	||3039	||2855	||3284	||3416||
||case4 ||32	||21	||9	||22||	10||
||case5 ||416	||574	||404||	401||	433
||case6 ||1216||	1295||	1135	||1277||	1134||
||case7 ||21	||21||	19||	23||	21||
||case8 ||105	||113||	83||	108||	93||
||case9 ||44	||43||	44||	45||	40||
||case10 ||14||	14||	14	||14||	14||
||case11 ||25||	29||	24||	29||	14||
||case12 ||44||	49||	45||	50||	32||
||case13 ||7||	21	||19||	9||	18||
||case14 ||3||	17||	15||	18||	15||
||case15 ||19456||	19229||	18670	||19016||	19583||
||case16 ||X||X||	X||	X||	X||
||case17 ||X||X||	X||	X||	X||
||case18 ||X||X||	X||	X||	X||
||case19 ||44	||36	||29||	46||	37||

note: do not support '''count''' query in case16,17 and 18.



  === Allie query === #alliequery 

||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 ||
||case1 ||423 ||	424||	424||	443||	436||
||case2 ||4160	||4200||4263||	4201||	4264||
||case3 ||3352||	3230||	2329||	2329||	2308||
||case4 ||568	||592||	92||	92||	97||
||case5 ||1830742||	661710||	39296||	39296||	39784||

=== PDBJ query === #pdbjquery 


||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 ||
||case1 ||751 ||	213||	213||	212||	213||
||case2 ||27	 ||14	||15||	13||	26||
||case3 ||188 ||	56||	45||	53||	66||
||case4 ||337||	58||	57||	53||	59||





=== Uniprot query === #uniprotquery 

=== DDBJ query === #ddbjquery