バージョン 29 (更新者: wu, 13 年 前) |
---|
Virtuoso 配置
* NumberOfBuffers?: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory.
* MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers? and MaxCheckpointRemap? are not set with the same values.
* AsyncQueueMaxThreads?: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better.
* ThreadsPerQuery?: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better.
* IndexTreeMaps?: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention.
* ThreadCleanupInterval? & ResourcesCleanupInterval?: Set both to 1 in order to reduce memory leaking.
NumberOfBuffers = 6500000 MaxDirtyBuffers = 5000000 MaxCheckpointRemap = 1000000 AsyncQueryMaxThreads = 18 ThreadsPerQuery = 18 IndexTreeMaps = 512 ThreadCleanupInterval = 1 ResourcesCleanupInterval = 1
Please refer to attachment:virtuoso.ini.2 for the detailed parameter in the test.
More information please refer to http://docs.openlinksw.com/virtuoso/databaseadmsrv.html http://www.openlinksw.com/weblog/oerling/?id=1665
it is generally recommended with the Virtuoso 6.x release 16GB of memory is required per billion triples.
Load Performance
Allie upload
Data: 94,420,989 tripples, n3 format.
* Approach 1:
Load the big file in one stream.
Result:
2hours.
Step:
$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &
load.list.isql script:
log_enable (2); DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp', 0); checkpoint;
* Approach 2:
Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12).
Result: 46mins22secs.
Step 1. load file into ld_dir:
$nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &
load.list.isql script:
delete from load_list; ld_dir('data directory','*.rdf.nt','http://allie.dbcls.jp'); select * from load_list;
Step 2. upload the file into virtuoso:
$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql &
load.data.isql script:
--record CPU time select getrusage ()[0] + getrusage ()[1]; rdf_loader_run () & ...(omit 10 times) rdf_loader_run () & checkpoint; -- Record CPU time select getrusage ()[0] + getrusage ()[1];
PDBJ upload
Data: .rdf.gz, 77878 files, 589,987,335 triples from ftp://ftp.pdbj.org/XML/rdf/.
Upload with 12 streams.
Result:103min31s.
Uniprot upload
DDBJ upload
Sparql query performance
Cell cycle query
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 23 | 23 | 24 | 24 | 24 |
case2 | 2 | 2 | 2 | 2 | 2 |
case3 | 22368 | 23440 | 22961 | 23655 | 23655 |
case4 | 2 | 3 | 10 | 4 | 4 |
case5 | 43265 | 42911 | 42172 | 42459 | 42459 |
case6 | 13062 | 13057 | 13069 | 13102 | 13102 |
case7 | 3 | 3 | 3 | 12 | 12 |
case8 | 7683 | 7479 | 7455 | 7656 | 7656 |
case9 | 40 | 38 | 36 | 51 | 51 |
case10 | 1 | 8 | 1 | 3 | 3 |
case11 | 120 | 119 | 118 | 123 | 123 |
case12 | 521 | 18 | 17 | 20 | 20 |
case13 | 24 | 4 | 2 | 7 | 7 |
case14 | 1 | 1 | 1 | 1 | 1 |
case15 | 55065 | 57530 | 56760 | 56203 | 56203 |
case16 | 36 | 34 | 46 | 65 | 65 |
case17 | 14 | 23 | 18 | 13 | 13 |
case18 | 23 | 17 | 16 | 16 | 16 |
case19 | 16980 | 17064 | 16643 | 16631 | 16631 |
Allie query
The time cost for the five use case (please refer to http://kiban.dbcls.jp/togordf/wiki/survey#data)
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 269 | 27 | 21 | 22 | 21 |
case2 | 1350 | 1273 | 1729 | 1300 | 1381 |
case3 | 395 | 145 | 138 | 155 | 172 |
case4 | 171 | 81 | 71 | 101 | 127 |
case5 | 26934 | 28107 | 27204 | 28276 | 26781 |
PDBJ query
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 184 | 156 | 150 | 152 | 131 |
case2 | 2 | 1 | 2 | 1 | 2 |
case3 | 6 | 3 | 2 | 1 | 1 |
case4 | 114 | 157 | 121 | 164 | 161 |
Uniprot query
DDBJ query
添付ファイル
- virtuoso.ini.2 (6.4 KB) - 登録者 wu 12 年 前.