Context Navigation

← 前の更新
Wiki 履歴
次の更新 →

初期バージョンからバージョン 1 における更新: bigdata1.1.0

更新日時:: 2012/11/09 18:33:39 (13 年前)
更新者:: wu
コメント:: --

凡例:

: 変更なし
: 追加
: 削除
: 変更

bigdata1.1.0

v1	v1
	1	* [#configure　Bigdata　Configuration]
	2
	3	* [#load Load performance]
	4	* [#allieload Allie upload]
	5	* [#pdbjload PDBJ upload]
	6	* [#uniprotload Uniprot upload]
	7	* [#ddbjload DDBJ upload]
	8	* [#Sparql Sparql query performance]
	9	* [#alliequery Allie query ]
	10	* [#pdbjquery PDBJ query ]
	11	* [#uniprotquery Uniprot query ]
	12	* [#ddbjquery DDBJ query ]
	13
	14
	15	=== Bigdata Configuration === #configure
	16
	17	The journal in Bigdata (please refer to [http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=StandaloneGuide] for details.)
	18
	19	The WORM (Write-Once, Read-Many) is the traditional log-structured append only journal. It was designed for very fast write rates and is used to buffer writes for scale-out. This is a good choice for immortal databases where people want access to ALL history. Scaling is to several billions of triples.
	20
	21	The RW store (Read-Write) supports recycling of allocation slots on the backing file. It may be used as a time-bounded version of an immortal database where history is aged off of the database over time. This is a good choice for standalone workloads where updates are continuously arriving and older database states may be released. The RW store is also less sensitive to data skew because it can reuse B+Tree node and leaf revisions within a commit group on large data set loads. Scaling should be better than the WORM for standalone and could reach to 10B+ triples. The default property file is attachment:RWStore.properties.
	22
	23
	24	=== Load Performance === #load
	25
	26	Approach 1:
	27
	28	Upload data from Bigdata sparql point(NanoSparqlServer). Post the data every 10000 lines. Please refer to attachment:upload.pl for details.
	29
	30	Approach 2:
	31
	32	Upload with com.bigdata.rdf.store.DataLoader tools and RW store default parameter.
	33
	34	And test the situation when adding GC in JVM.
	35
	36	{{{
	37	-Xmx55G -Xms30G -XX:+UseG1GC -XX:+TieredCompilation? -XX:+HeapDumpOnOutOfMemoryError
	38	}}}
	39
	40	Without garbage collecting it took 35 minutes to upload Allie.
	41
	42	Approach 3:
	43
	44	We modified the following two important parameters(In the rest test we use this configure in default):
	45	{{{
	46	com.bigdata.btree.writeRetentionQueue.capacity=500000
	47	com.bigdata.rdf.sail.BigdataSail.bufferCapacity=1000000
	48	}}}
	49
	50	Approach 4: Split the file into 12 small files.
	51
	52	=== Allie upload === #allieload
	53
	54	Approach 1: 26hours
	55
	56	Approach 2: 5.89hours
	57	when Setting JVM GC : 6.75hours
	58
	59	Approach 3: 2.61 hours with UseG1GC;
	60	35 minutes with automatic garbage collection (vm.swappiness =10);
	61	38 minutes with automatic garbage collection (vm.swappiness =60);
	62
	63	Approach 4: 1.03 hours with UseG1GC;
	64	35 minutes with automatic garbage collection;
	65
	66
	67
	68	=== PDBJ upload === #pdbjload
	69
	70	'''Result:''' 8.95 hours with UseG1GC;
	71	7.15 hours(429 minutes) with automatic garbage collection (vm.swappiness =10);
	72
	73	=== Uniprot upload === #uniprotload
	74
	75	'''uniprot.rdf.gz'''
	76
	77	We firstly uploaded the file uniprot.rdf.gz (3.16 billion triples):
	78
	79	time: over one week(7.48 days): 646336127ms with UseG1GC
	80
	81	INFO : 646335942 main com.bigdata.rdf.store.DataLoader?.logCounters(DataLoader?.java:1185): extent=249818775552, stmts=3161144450, bytes/stat=79 Wrote: 241474404352 bytes. Total elapsed=646336127ms
	82
	83	with automatic garbage collection： 74.3hours(4458 minutes)
	84
	85	INFO : 267385254 main com.bigdata.rdf.store.DataLoader.logCounters(DataLoader.java:1185): extent=249818775552, stmts=3161144450, bytes/stat=79
	86	Wrote: 241489608704 bytes.
	87	Total elapsed=267385843ms
	88
	89	'''uniref.rdf.gz'''
	90
	91	When adding uniref.rdf.gz, it took over 11 days to import 411800000 statements, and we stopped the procedure for the bad performance.
	92
	93	INFO : 1032251699 main com.bigdata.rdf.store.DataLoader$2.processingNotification(DataLoader.java:1018): 411800000 stmts buffered in 1032247.664 secs, rate= 398, baseURL=http://purl.uniprot.org, t
	94	otalStatementsSoFar=411800000
	95
	96
	97	'''uniprot.rdf.gz''' （2nd）
	98
	99	INFO : 349235249 main com.bigdata.rdf.store.DataLoader.logCounters(DataLoader.java:1185): extent=249818775552, stmts=3161144450, bytes/stat=79
	100	Wrote: 241707974656 bytes.
	101	Total elapsed=349235466ms
	102	INFO : 349235324 main com.bigdata.rdf.store.DataLoader.main(DataLoader.java:1545): Total elapsed=349235466ms
	103
	104	=== Load performance ===
	105
	106	In the result we configured the setting as vm.swappiness = 60, automatic garbage collection.
	107
	108	\|\|loadtime\|\| Cell Cycle Ontology \|\| Allie \|\| PDBj \|\| UniProt* \|\|
	109	\|\| 1st time \|\| 3mins \|\|35mins \|\|429mins \|\|4458mins \|\|
	110	\|\| 2nd time \|\| 3mins \|\|38mins \|\|412 mins \|\|5820 mins \|\|
	111	\|\| average \|\| 3mins \|\| 37 mins \|\|421mins \|\| 5139 mins \|\|
	112
	113	UniProt*: We only uploaded uniprot.rdf.gz, 3.16 billion triples.
	114
	115
	116	=== Sparql query performance === #Sparql
	117
	118	=== Cell cycle query === #cellquery
	119
	120
	121	\|\|Query\time(ms) \|\|time 1 \|\| time 2 \|\| time 3 \|\|time 4\|\|time 5 \|\|
	122	\|\|case1 \|\|341 \|\|353 \|\|327 \|\|328 \|\|327\|\|
	123	\|\|case2 \|\|46\|\| 43 \|\|41\|\| 45\|\| 39\|\|
	124	\|\|case3 \|\|3361 \|\|3039 \|\|2855 \|\|3284 \|\|3416\|\|
	125	\|\|case4 \|\|32 \|\|21 \|\|9 \|\|22\|\| 10\|\|
	126	\|\|case5 \|\|416 \|\|574 \|\|404\|\| 401\|\| 433
	127	\|\|case6 \|\|1216\|\| 1295\|\| 1135 \|\|1277\|\| 1134\|\|
	128	\|\|case7 \|\|21 \|\|21\|\| 19\|\| 23\|\| 21\|\|
	129	\|\|case8 \|\|105 \|\|113\|\| 83\|\| 108\|\| 93\|\|
	130	\|\|case9 \|\|44 \|\|43\|\| 44\|\| 45\|\| 40\|\|
	131	\|\|case10 \|\|14\|\| 14\|\| 14 \|\|14\|\| 14\|\|
	132	\|\|case11 \|\|25\|\| 29\|\| 24\|\| 29\|\| 14\|\|
	133	\|\|case12 \|\|44\|\| 49\|\| 45\|\| 50\|\| 32\|\|
	134	\|\|case13 \|\|7\|\| 21 \|\|19\|\| 9\|\| 18\|\|
	135	\|\|case14 \|\|3\|\| 17\|\| 15\|\| 18\|\| 15\|\|
	136	\|\|case15 \|\|19456\|\| 19229\|\| 18670 \|\|19016\|\| 19583\|\|
	137	\|\|case16 \|\|X\|\|X\|\| X\|\| X\|\| X\|\|
	138	\|\|case17 \|\|X\|\|X\|\| X\|\| X\|\| X\|\|
	139	\|\|case18 \|\|X\|\|X\|\| X\|\| X\|\| X\|\|
	140	\|\|case19 \|\|44 \|\|36 \|\|29\|\| 46\|\| 37\|\|
	141
	142	note: do not support '''count''' query in case16,17 and 18.
	143
	144
	145
	146	=== Allie query === #alliequery
	147
	148	\|\|Query\time(ms) \|\|time 1 \|\| time 2 \|\| time 3 \|\|time 4\|\|time 5 \|\|
	149	\|\|case1 \|\|423 \|\| 424\|\| 424\|\| 443\|\| 436\|\|
	150	\|\|case2 \|\|4160 \|\|4200\|\|4263\|\| 4201\|\| 4264\|\|
	151	\|\|case3 \|\|3352\|\| 3230\|\| 2329\|\| 2329\|\| 2308\|\|
	152	\|\|case4 \|\|568 \|\|592\|\| 92\|\| 92\|\| 97\|\|
	153	\|\|case5 \|\|1830742\|\| 661710\|\| 39296\|\| 39296\|\| 39784\|\|
	154
	155	=== PDBJ query === #pdbjquery
	156
	157
	158	\|\|Query\time(ms) \|\|time 1 \|\| time 2 \|\| time 3 \|\|time 4\|\|time 5 \|\|
	159	\|\|case1 \|\|751 \|\| 213\|\| 213\|\| 212\|\| 213\|\|
	160	\|\|case2 \|\|27 \|\|14 \|\|15\|\| 13\|\| 26\|\|
	161	\|\|case3 \|\|188 \|\| 56\|\| 45\|\| 53\|\| 66\|\|
	162	\|\|case4 \|\|337\|\| 58\|\| 57\|\| 53\|\| 59\|\|
	163
	164
	165
	166
	167
	168	=== Uniprot query === #uniprotquery
	169
	170	=== DDBJ query === #ddbjquery
	171
	172

Context Navigation

初期バージョン から バージョン 1 における更新: bigdata1.1.0

凡例:

bigdata1.1.0

初期バージョンからバージョン 1 における更新: bigdata1.1.0