Context Navigation

← 前の更新
Wiki 履歴
次の更新 →

初期バージョンからバージョン 1 における更新: summarizeOldVersion

更新日時:: 2012/11/14 10:41:19 (14 年前)
更新者:: wu
コメント:: --

凡例:

: 変更なし
: 追加
: 削除
: 変更

summarizeOldVersion

v1	v1
	1	'''Load:'''
	2
	3	\|\| Endpoint \|\| Cell Cycle Ontology\|\| Allie \|\| PDBj \|\| UniProt \|\| DDBJ \|\|
	4	\|\| Virtuoso(min) \|\|4 \|\| 47 \|\| 92 \|\| # 41hs28mins \|\| 79hs19mins \|\|
	5	\|\| OWLIM-SE(min) \|\| 3 \|\| 29 \|\| 208 \|\| #59hs15mins \|\|49hs49mins \|\|
	6	\|\| Bigdata(min) \|\| 3 \|\| 37 \|\| 421 \|\| # 85hs38mins \|\| X \|\|
	7	\|\| 4Store (min) \|\| 2 \|\| 12 \|\| 4834 \|\| # \|\| # \|\|
	8	\|\| Mulgara(min) \|\| 10 \|\|86 \|\| X \|\| X \|\| X \|\|
	9
	10
	11	Virtuoso： # It needs another 17 hours to decompress and split the data set.
	12
	13	OWLIM-SE: # This is the cost time we loaded it successfully first and we are testing another time.
	14
	15	Bigdata: # Here it is the cost to load only uniport.rdf.gz (3.16 billion triples) without including the other files such as uniref.rdf.gz, uniparc.rdf.gz and other files because of a bad scalability after finishing importing uniport.rdf.gz.
	16
	17	4store: # We do not test 4Store on larger data set because its scalability is not ideal. From about 100M Allie data set to 500M PDBJ data set the time cost increases 500 times.
	18
	19	X: Some Problem occurred when uploading the data. Please refer to its uploading procedure for details.
	20
	21	[[Image(triple.bmp)]] [[Image(loadtime.bmp)]]
	22
	23	Space used:
	24
	25	\|\| Endpoint \|\| Cell Cycle Ontology \|\| Allie \|\| PDBj \|\| UniProt \|\| DDBJ \|\|
	26	\|\| Virtuoso \|\| 840M \|\| 6.4G \|\| 30G \|\| 308G \|\| 538G \|\|
	27	\|\| OWLIM-SE \|\| 5.7G \|\| 18G \|\| 57G \|\| 481G \|\| 1079G \|\|
	28	\|\| Bigdata \|\| 780M \|\| 6.9G \|\| 34G \|\| testing \|\|X \|\|
	29	\|\| 4Store \|\| 2.2G \|\| 14.7G \|\| * \|\| * \|\| * \|\|
	30	\|\| Mulgara \|\| 2.4G \|\| 15.8G \|\| X \|\| X \|\| X \|\|
	31
	32	'''Query for Cell Cycle Ontology:'''
	33
	34	\|\| Endpoint \|\| case1 \|\|case2\|\| case3\|\| case4\|\| case5\|\| case6 \|\|case7\|\| case8\|\| case9\|\| case10\|\| case11 \|\|case12\|\| case13\|\| case14\|\| case15\|\| case16 \|\|case17\|\| case18\|\| case19\|\|
	35	\|\|Virtuoso (ms) \|\| '''24'''\|\| '''2'''\|\| 23280\|\|3\|\|42500\|\|13073\|\|5\|\|7562\|\|41\|\|2\|\|120\|\|19\|\|5\|\|1\|\|56058\|\|'''46'''\|\|'''15'''\|\|'''16'''\|\|16721\|\|
	36	\|\|OWLIM-SE(ms) \|\|110\|\| 6\|\|2472\|\|'''2'''\|\|149\|\|2071\|\|'''2'''\|\|'''33'''\|\|'''22'''\|\|'''0'''\|\|'''6'''\|\|'''6'''\|\|'''2'''\|\|'''0'''\|\|46129\|\|X\|\|X\|\|X\|\|'''14''' \|\|
	37	\|\|Bigdata(ms) \|\| 331 \|\|42\|\| 3135\|\| 16 \|\|414 \|\|1191\|\| 21\|\| 97 \|\|43\|\| 14\|\| 23\|\| 43\|\| 13\|\| 13\|\| '''19093'''\|\|X\|\|X\|\|X\|\| 37\|\|
	38	\|\|4Store (ms) \|\| 56\|\| 18\|\| '''1236'''\|\| 13\|\| '''33'''\|\| '''64'''\|\| 22\|\| 67\|\| 2035\|\| 7\|\| '''6'''\|\| 1563\|\| 8 \|\|7\|\| * \|\|X\|\|X\|\|X\|\| 15 \|\|
	39	\|\| Mulgara(ms) \|\| 1294 \|\| 20 \|\| 2207 \|\| 9 \|\| 343 \|\|2325\|\| 32\|\| 58\|\|33\|\|4\|\|14\|\|X\|\|9\|\|6\|\|X\|\|X\|\|X\|\|X\|\|38\|\|
	40
	41	X or * shows that the endpoint does not support "count()" function or some unsupported function causes a wrong result.
	42
	43	[[Image(cellcycle_bar.bmp)]] [[Image(cellcycle_pie.bmp)]] The pie chart shows that how many percent an end point accounts for the fastest performers.
	44
	45
	46
	47	The data shows that in the cell cycle queries on the 10 million or so triples:
	48
	49	(1)Virtuoso supports more query. In some cases Virtuoso response fast but some others cost far more than others, such as case5 and case19;
	50
	51	(2)Except that OwlimSE cannot support count() function, it totally perform better and has no worst case;
	52
	53	(3)Bigdata and Mulgara perform averagely well;
	54
	55	(4) 4Store do not support count() and give no response in case15. However it performs distinctively better in some cases such as case5 and case6.
	56
	57
	58	'''Query for Allie:'''
	59
	60	\|\| Endpoint \|\| case1 \|\|case2\|\| case3\|\| case4\|\| case5\|\|
	61	\|\|Virtuoso (ms) \|\| '''23''' \|\| 1413\|\| '''152''' \|\| 95 \|\|'''27299'''\|\|
	62	\|\|OWLIM-SE(ms) \|\| 145 \|\| 1980\|\| 1476 \|\| '''38''' \|\|68369\|\|
	63	\|\|Bigdata(ms) \|\| 427 \|\| 4206\|\| 2549 \|\| 212 \|\| 195021 \|\|
	64	\|\|4Store (ms) \|\| X \|\| '''217''' \|\| X \|\| X \|\|65128 \|\|
	65	\|\| Mulgara(ms) \|\| 373 \|\| 121 \|\| X \|\| X \|\| X \|\|
	66
	67	4Store: Donot support "lang()" function.
	68
	69	Mulgara: Unable to support arbitrarily complex ORDER BY clause.
	70
	71	[[Image(allie_bar.bmp)]] [[Image(allie_pie.bmp)]]
	72
	73	'''Query for PDBj:'''
	74
	75
	76	\|\| Endpoint \|\| case1 \|\|case2\|\| case3\|\| case4\|\|
	77	\|\|Virtuoso (ms) \|\| 147 \|\|2 \|\|''' 2''' \|\| 138 \|\|
	78	\|\|OWLIM-SE(ms) \|\|''' 53''' \|\| '''1''' \|\|191 \|\| '''4''' \|\|
	79	\|\|Bigdata(ms) \|\| 213 \|\|17 \|\|55 \|\|56 \|\|
	80	\|\|4Store (ms) \|\| 1025 \|\| 1274 \|\| 131 \|\| 1524 \|\|
	81	\|\| Mulgara(ms) \|\| X \|\| X \|\| X \|\| X \|\|
	82
	83	[[Image(pdbj_bar.bmp)]] [[Image(pdbj_pie.bmp)]]
	84
	85	In these two groups of queries on about 100 million triples:
	86
	87	(1) Virtuoso and OwlimSE works better than others. Although in Allie Virtuoso performs a little better and OwlimSE is better in PDBJ, there looks no overwhelming advantages over each other.
	88
	89	(2) In Allie 4Store is still limited but performs better when it executes the query such as in case2. However as increasing the number of triples in PDBJ, it performs worst.
	90
	91	(3) Bigdata still keeps it situation: neither the best one nor the worst one.
	92
	93
	94
	95	'''Query for UniProt:'''
	96
	97	\|\| Endpoint \|\| case1 \|\|case2\|\| case3\|\| case4\|\| case5\|\| case6 \|\|case7\|\| case8\|\| case9\|\| case10\|\| case11 \|\|case12\|\| case13\|\| case14\|\| case15\|\| case16 \|\|case17\|\| case18\|\|
	98	\|\|Virtuoso (ms) \|\|51\|\|95\|\|114\|\|2\|\|7\|\|2206\|\|34916\|\|413\|\|605\|\|652\|\|53\|\|4\|\|289\|\|269\|\|10631\|\|9052\|\|2\|\|76\|\|
	99	\|\|OwlimSE (ms) \|\|429\|\|383\|\|519\|\|139\|\|21\|\|628381\|\|433913\|\|6924\|\|1923\|\|58794\|\|3266\|\|117\|\|22\|\|42\|\|35112\|\|322058\|\|5875\|\|8900\|\|
	100
	101	[[Image(uniprot_bar.bmp)]] [[Image(uniprot_pie.bmp)]]
	102
	103	'''Query for DDBJ:'''
	104
	105	\|\| Endpoint \|\| case1 \|\|case2\|\| case3\|\| case4\|\| case5\|\| case6 \|\|case7\|\| case8\|\| case9\|\| case10\|\|
	106	\|\|OwlimSE (ms) \|\|18648\|\|3276\|\|3710\|\|82\|\|85\|\|117\|\|5534\|\|10468\|\|2010\|\|1\|\|
	107	\|\|Virtuoso (ms) \|\|226\|\|218\|\|418\|\|56\|\|7\|\|98\|\|5\|\|4\|\|7\|\|1\|\|
	108
	109	[[Image(ddbj_bar.bmp)]] [[Image(ddbj_pie.bmp)]]
	110
	111	In these two group of queries with about 4 billion and 8 billion triples, we found out that Virtuoso performs obviously better.
	112
	113
	114	'''Conclusion'''
	115
	116	Our evaluation shows that the importing cost of the data depends on the multiple factors: Server configuration(CPU,memory,harddisk and so on), the system property(vm.swappiness, JVM), the application configuration(cachememory,etc.), the data format, the size of data set and even data contents, e.g. DDBJ is nearly 2 times the triple size of Uniprot, but its importing cost is 2 times less than Uniprot(2 times longer expected if simply considering the proportional scaling).
	117
	118	When the number of triple size is less than 100M, 4Store can perform well both in loading data and query although providing only limited features. For data with moderate size such as varying from 100M to 500M or so, Virtuoso and OwlimSE have similar or comparable performance. When increasing data to several billions, Virtuoso works best in the five test triple stores.
	119
	120	In the future we will evaluate federated queries as well as the triple store's inference ability, and try to make each triple work their best. In addition the query use cases we used in this study are designed mainly based on their daily usage, which includes long join operations as long as 10, kinds of filter operations, and almost all the clauses frequently used in the Sparql queries. Some other use cases can be designed aiming to test the detailed performance of each triple store, such as test on PSO,POS indices and so on.
	121
	122
	123
	124