Virtuoso 配置
About Virtuoso index:
The index scheme consists of the following indices:
- PSOG - primary key.
- POGS - bitmap index for lookups on object value.
- SP - partial index for cases where only S is specified.
- OP - partial index for cases where only O is specified.
- GS - partial index for cases where only G is specified.
* NumberOfBuffers?: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory.
* MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers? and MaxCheckpointRemap? are not set with the same values.
* AsyncQueueMaxThreads?: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better.
* ThreadsPerQuery?: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better.
* IndexTreeMaps?: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention.
* ThreadCleanupInterval? & ResourcesCleanupInterval?: Set both to 1 in order to reduce memory leaking.
NumberOfBuffers = 6500000 MaxDirtyBuffers = 5000000 MaxCheckpointRemap = 1000000 AsyncQueryMaxThreads = 18 ThreadsPerQuery = 18 IndexTreeMaps = 512 ThreadCleanupInterval = 1 ResourcesCleanupInterval = 1
Please refer to attachment:virtuoso.ini.2 for the detailed parameter in the test.
More information please refer to http://docs.openlinksw.com/virtuoso/databaseadmsrv.html http://www.openlinksw.com/weblog/oerling/?id=1665
it is generally recommended with the Virtuoso 6.x release 16GB of memory is required per billion triples.
Load Performance
Allie upload
Data: 94,420,989 tripples, n3 format.
* Approach 1:
Load the big file in one stream.
Result:
2hours.
Step:
$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &
load.list.isql script:
log_enable (2); DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp/', 0); checkpoint;
* Approach 2:
Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12).
Result: 46mins22secs.
Step 1. load file into ld_dir:
$nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &
load.list.isql script:
delete from load_list; ld_dir('data directory','*.rdf.nt','http://allie.dbcls.jp/'); select * from load_list;
Step 2. upload the file into virtuoso:
$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql &
load.data.isql script:
--record CPU time select getrusage ()[0] + getrusage ()[1]; rdf_loader_run () & ...(omit 10 times) rdf_loader_run () & checkpoint; -- Record CPU time select getrusage ()[0] + getrusage ()[1];
The following procedures use approach 2, 12 streams to upload the data.
Cell Cycle upload
4mins
PDBJ upload
Result:103min31s.
UniProt? upload
vm.swappiness = 60: 71hs58mins
vm.swappiness = 10: 42hs43mins
DDBJ upload
vm.swappiness = 10: 78hs8mins
Load performance
We uploaded all the data twice with the least cost configuration.
loadtime | Cell Cycle Ontology | Allie | PDBj | UniProt? | DDBJ |
1st time | 4mins | 46mins | 103mins | 42hs43mins | 78hs8mins |
2nd time | 4mins | 48mins | 81mins | 40hs12mins | 80hs30mins |
average | 4mins | 47 mins | 92mins | 41hs28mins | 79hs19mins |
Sparql query performance
Cell cycle query
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 23 | 23 | 24 | 24 | 24 |
case2 | 2 | 2 | 2 | 2 | 2 |
case3 | 22368 | 23440 | 22961 | 23655 | 23655 |
case4 | 2 | 3 | 10 | 4 | 4 |
case5 | 43265 | 42911 | 42172 | 42459 | 42459 |
case6 | 13062 | 13057 | 13069 | 13102 | 13102 |
case7 | 3 | 3 | 3 | 12 | 12 |
case8 | 7683 | 7479 | 7455 | 7656 | 7656 |
case9 | 40 | 38 | 36 | 51 | 51 |
case10 | 1 | 8 | 1 | 3 | 3 |
case11 | 120 | 119 | 118 | 123 | 123 |
case12 | 521 | 18 | 17 | 20 | 20 |
case13 | 24 | 4 | 2 | 7 | 7 |
case14 | 1 | 1 | 1 | 1 | 1 |
case15 | 55065 | 57530 | 56760 | 56203 | 56203 |
case16 | 36 | 34 | 46 | 65 | 65 |
case17 | 14 | 23 | 18 | 13 | 13 |
case18 | 23 | 17 | 16 | 16 | 16 |
case19 | 16980 | 17064 | 16643 | 16631 | 16631 |
Allie query
The time cost for the five use case (please refer to http://kiban.dbcls.jp/togordf/wiki/survey#data)
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 269 | 27 | 21 | 22 | 21 |
case2 | 1350 | 1273 | 1729 | 1300 | 1381 |
case3 | 395 | 145 | 138 | 155 | 172 |
case4 | 171 | 81 | 71 | 101 | 127 |
case5 | 26934 | 28107 | 27204 | 28276 | 26781 |
PDBJ query
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 184 | 156 | 150 | 152 | 131 |
case2 | 2 | 1 | 2 | 1 | 2 |
case3 | 6 | 3 | 2 | 1 | 1 |
case4 | 114 | 157 | 121 | 164 | 161 |
Uniprot query
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 49 | 42 | 56 | 157 | 58 |
case2 | 105 | 127 | 94 | 90 | 90 |
case3 | 114 | 108 | 116 | 123 | 116 |
case4 | 2 | 2 | 2 | 2 | 2 |
case5 | 8 | 1 | 66 | 16 | 1 |
case6 | 2252 | 2217 | 2177 | 2237 | 2192 |
case7 | 58027 | 13139 | 42780 | 42017 | 41729 |
case8 | 421 | 402 | 410 | 417 | 487 |
case9 | 589 | 597 | 614 | 644 | 619 |
case10 | 702 | 5862 | 622 | 642 | 643 |
case11 | 70 | 43 | 50 | 57 | 61 |
case12 | 6 | 2 | 21 | 3 | 3 |
case13 | 278 | 317 | 285 | 288 | 276 |
case14 | 268 | 270 | 274 | 271 | 264 |
case15 | 10635 | 10453 | 10785 | 10650 | 10684 |
case16 | 9075 | 9008 | 9049 | 9074 | 9260 |
case17 | 78 | 2 | 1 | 1 | 5 |
case18 | 180 | 70 | 45 | 98 | 89 |
DDBJ query
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 248 | 238 | 245 | 208 | 213 |
case2 | 270 | 225 | 212 | 214 | 222 |
case3 | 8672 | 401 | 431 | 430 | 411 |
case4 | 61 | 59 | 57 | 55 | 54 |
case5 | 24 | 11 | 6 | 6 | 5 |
case6 | 110 | 95 | 92 | 94 | 149 |
case7 | 14 | 12 | 3 | 2 | 4 |
case8 | 3 | 3 | 6 | 3 | 5 |
case9 | 13 | 4 | 23 | 4 | 6 |
case10 | 0 | 1 | 1 | 1 | 1 |
添付ファイル
- virtuoso.ini.2 (6.4 KB) - 登録者 wu 12 年 前.