バージョン 26 (更新者: wu, 13 年 前) |
---|
Virtuoso 配置
* NumberOfBuffers?: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory.
* MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers? and MaxCheckpointRemap? are not set with the same values.
* AsyncQueueMaxThreads?: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better.
* ThreadsPerQuery?: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better.
* IndexTreeMaps?: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention.
* ThreadCleanupInterval? & ResourcesCleanupInterval?: Set both to 1 in order to reduce memory leaking.
NumberOfBuffers = 6500000 MaxDirtyBuffers = 5000000 MaxCheckpointRemap = 1000000 AsyncQueryMaxThreads = 18 ThreadsPerQuery = 18 IndexTreeMaps = 512 ThreadCleanupInterval = 1 ResourcesCleanupInterval = 1
Please refer to attachment:virtuoso.ini.2 for the detailed parameter in the test.
More information please refer to http://docs.openlinksw.com/virtuoso/databaseadmsrv.html http://www.openlinksw.com/weblog/oerling/?id=1665
it is generally recommended with the Virtuoso 6.x release 16GB of memory is required per billion triples.
Load Performance
Allie upload
Data: 94,420,989 tripples, n3 format.
* Approach 1:
Load the big file in one stream.
Result:
2hours.
Step:
$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &
load.list.isql script:
log_enable (2); DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp', 0); checkpoint;
* Approach 2:
Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12).
Result: 46mins22secs.
Step 1. load file into ld_dir:
$nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &
load.list.isql script:
delete from load_list; ld_dir('data directory','*.rdf.nt','http://allie.dbcls.jp'); select * from load_list;
Step 2. upload the file into virtuoso:
$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql &
load.data.isql script:
--record CPU time select getrusage ()[0] + getrusage ()[1]; rdf_loader_run () & ...(omit 10 times) rdf_loader_run () & checkpoint; -- Record CPU time select getrusage ()[0] + getrusage ()[1];
PDBJ upload
Data: .rdf.gz, 77878 files, 589,987,335 triples from ftp://ftp.pdbj.org/XML/rdf/.
Upload with 12 streams.
Result:103min31s.
Uniprot upload
DDBJ upload
Sparql query performance
Allie query performance
The time cost for the five use case (please refer to http://kiban.dbcls.jp/togordf/wiki/survey#data)
PDBJ query performance
Query\time(ms) | time 1 | time 2 | time 3 | time 4 | time 5 |
case1 | 184 | 156 | 150 | 152 | 131 |
case2 | 2 | 1 | 2 | 1 | 2 |
case3 | 6 | 3 | 2 | 1 | 1 |
case4 | 114 | 157 | 121 | 164 | 161 |
Uniprot query performance
DDBJ query performance
添付ファイル
- virtuoso.ini.2 (6.4 KB) - 登録者 wu 12 年 前.