summarize – TogoRDF

Context Navigation

Load:

Endpoint	Cell Cycle Ontology	Allie	PDBj	UniProt?	DDBJ
Virtuoso(min)	4	47	92	3508mins(2488+1020)	4759mins(79hs19mins)
OWLIM-SE(min)	3	22	140	3770mins(62hs50mins)	7750mins(129hs10mins)
Bigdata(min)	3	272	1158mins(19hs18mins)	X	X
4Store (min)	2	12	4834	#	#
Mulgara(min)	10	86	X	X	X

Virtuoso： It includes a cost,1020 minutes to decompress and split the data set.

4store: # We do not test 4Store on larger data set because its scalability is not ideal. From about 100M Allie data set to 500M PDBJ data set the time cost increases 400 times.

X: Some Problem occurred when uploading the data. Please refer to its uploading procedure for details.

Space used:

Endpoint	Cell Cycle Ontology	Allie	PDBj	UniProt?	DDBJ
Virtuoso	0.84G	6.4G	30G	308G	538G
OWLIM-SE	3.7G	8.2G	27 G	213G	513G
Bigdata	0.78G	6.2G	34 G	X	X
4Store	2.2G	14.7G	66G	X	X
Mulgara	2.4G	15.8G	X	X	X

Query for Cell Cycle Ontology:

Endpoint	case1	case2	case3	case4	case5	case6	case7	case8	case9	case10	case11	case12	case13	case14	case15	case16	case17	case18	case19
Virtuoso (ms)	24	2	23280	3	42500	13073	5	7562	41	2	120	19	5	1	56058	46	15	16	16721
OWLIM-SE(ms)	121	9	2740	5	149	1722	3	39	25	1	6	47	2	1	52779	7	4	24	17
Bigdata(ms)	282	35	3247	13	52	3320	11	93	47	10	20	27	5	6	18126	X	X	X	30
4Store (ms)	56	18	1236	13	33	64	22	67	2035	7	6	1563	8	7	*	X	X	X	15
Mulgara(ms)	1294	20	2207	9	343	2325	32	58	33	4	14	X	9	6	X	X	X	X	38

X or * shows that the endpoint does not support "count()" function or some unsupported function causes a wrong result.

The pie chart shows that how many percent an end point accounts for the fastest performers.

The data shows that in the cell cycle queries on the 10 million or so triples:

(1)Virtuoso and OWLIM-SE supports more query. In some cases Virtuoso response fast but some others cost far more than others, such as case5 and case19;

(2)OwlimSE totally perform better and has no worst case;

(3)Bigdata and Mulgara perform averagely well;

(4) 4Store do not support count() and give no response in case15. However it performs distinctively better in some cases such as case5 and case6.

Query for Allie:

Endpoint	case1	case2	case3	case4	case5
Virtuoso (ms)	23	1413	152	95	27299
OWLIM-SE(ms)	136	1530	1091	31	78942
Bigdata(ms)	365	690	1779	98	38523
4Store (ms)	X	217	X	X	65128
Mulgara(ms)	373	121	X	X	X

4Store: Donot support "lang()" function.

Mulgara: Unable to support arbitrarily complex ORDER BY clause.

Query for PDBj:

Endpoint	case1	case2	case3	case4
Virtuoso (ms)	147	2	2	138
OWLIM-SE(ms)	72	2	162	7
Bigdata(ms)	190	14	35	54
4Store (ms)	1025	1274	131	1524
Mulgara(ms)	X	X	X	X

In these two groups of queries on about 100 million triples:

(1) Virtuoso and OwlimSE works better than others. Although in Allie Virtuoso performs a little better and OwlimSE is better in PDBJ, there looks no overwhelming advantages over each other.

(2) In Allie 4Store is still limited but performs better when it executes the query such as in case2. However as increasing the number of triples in PDBJ, it performs worst.

(3) Bigdata still keeps it situation: neither the best one nor the worst one.

Query for UniProt:

Endpoint	case1	case2	case3	case4	case5	case6	case7	case8	case9	case10	case11	case12	case13	case14	case15	case16	case17	case18
Virtuoso (ms)	51	95	114	2	7	2206	34916	413	605	652	53	4	289	269	10631	9052	2	76
OwlimSE (ms)	931	1920	2627	142	61	89586	86380	674	994	1053	50	10	9	7	15037	32055	2818	8548

Query for DDBJ:

Endpoint	case1	case2	case3	case4	case5	case6	case7	case8	case9	case10
OwlimSE (ms)	4783	4528	4867	12	25	4	470	1078	22	1
Virtuoso (ms)	226	218	418	56	7	98	5	4	7	1

In these two group of queries with about 4 billion and 8 billion triples, we found out that Virtuoso performs obviously better.

Simultaneous execution Simultaneous executions were done with multi-clients, 1, 4, 8, and 64 clients respectively. Mulgara reported the error “Interrupted while waiting to acquire lock” when doing queries with over 2 clients. We only evaluated 4store with Cell Cycle Ontology and Allie because it showed unsteady performance with multi-clients when data is larger. Virtuoso, OWLIM-SE and Bigdata finished the simultaneous executions with good scalability.

OWLIM-SE(ms):

Number of Clients Cell Cycle Ontology Allie PDBj UniProt? DDBJ
1 6,402 6,704 861 1,651,466 83,179
4 8,474 13,967 1,041 1,911,144 89,626
8 14,190 20,891 1,033 2,216,634 109,195
64 120,126 159,211 2,286 6,058,957 442,181

Virtuoso(ms):

Number of Clients Cell Cycle Ontology Allie PDBj UniProt? DDBJ
1 14,742 1,421 789 31,876 49,624
4 22,459 7,189 1,168 50,953 5,246
8 27,297 9,870 1,655 58,498 10,426
64 194,850 55,366 8,496 905,697 35,879

4store(ms):

Number of Clients Cell Cycle Ontology Allie PDBj UniProt? DDBJ
1 4,706 682 x x x
4 15,825 1,413 x x x
8 27,604 2,191 x x x
64 237,246 15,288 x x x

|Bigdata(ms):

Number of Clients Cell Cycle Ontology Allie PDBj UniProt? DDBJ
1 10,757 100,683 2,028 x x
4 15,617 129,136 2,138 x x
8 82,579 850,852 2,051 x x
64 108,755 4,467,378 2,930 x x

Conclusion

Our evaluation shows that the importing cost of the data depends on the multiple factors: Server configuration(CPU,memory,harddisk and so on), the system property(vm.swappiness, JVM), the application configuration(cachememory,etc.), the data format, the size of data set and even data contents, e.g. DDBJ is nearly 2 times the triple size of Uniprot, but its importing cost is 2 times less than Uniprot(2 times longer expected if simply considering the proportional scaling).

When the number of triple size is less than 100M, 4Store can perform well both in loading data and query although providing only limited features. For data with moderate size such as varying from 100M to 500M or so, Virtuoso and OwlimSE have similar or comparable performance. When increasing data to several billions, Virtuoso works best in the five test triple stores.

In the future we will evaluate federated queries as well as the triple store's inference ability, and try to make each triple work their best. In addition the query use cases we used in this study are designed mainly based on their daily usage, which includes long join operations as long as 10, kinds of filter operations, and almost all the clauses frequently used in the Sparql queries. Some other use cases can be designed aiming to test the detailed performance of each triple store, such as test on PSO,POS indices and so on.

添付ファイル

triple.bmp (340.9 KB) - 登録者 wu 14 年前.
allie_bar_new.bmp (417.5 KB) - 登録者 wu 14 年前.
allie_pie_new.bmp (232.6 KB) - 登録者 wu 14 年前.
pdbj_bar_new.bmp (415.5 KB) - 登録者 wu 14 年前.
pdbj_pie_new.bmp (244.2 KB) - 登録者 wu 14 年前.
cellcycle_bar_new.bmp (0.5 MB) - 登録者 wu 14 年前.
cellcycle_pie_new.bmp (322.3 KB) - 登録者 wu 14 年前.
ddbj_bar_new.bmp (407.6 KB) - 登録者 wu 14 年前.
ddbj_pie_new.bmp (260.6 KB) - 登録者 wu 14 年前.
uniprot_bar_new.bmp (429.1 KB) - 登録者 wu 14 年前.
uniprot_pie_new.bmp (248.0 KB) - 登録者 wu 14 年前.
loadtime_new.bmp (407.6 KB) - 登録者 wu 14 年前.

異なるフォーマットでダウンロード:

テキスト

Number of Clients	Cell Cycle Ontology	Allie	PDBj	UniProt?	DDBJ
1	6,402	6,704	861	1,651,466	83,179
4	8,474	13,967	1,041	1,911,144	89,626
8	14,190	20,891	1,033	2,216,634	109,195
64	120,126	159,211	2,286	6,058,957	442,181