バージョン 27 (更新者: wu, 12 年 前)

--

* Virtuoso 配置

* Load performance

* Sparql query performance

Virtuoso 配置

* NumberOfBuffers?: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory.

* MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers? and MaxCheckpointRemap? are not set with the same values.

* AsyncQueueMaxThreads?: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better.

* ThreadsPerQuery?: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better.

* IndexTreeMaps?: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention.

* ThreadCleanupInterval? & ResourcesCleanupInterval?: Set both to 1 in order to reduce memory leaking.

NumberOfBuffers          = 6500000
MaxDirtyBuffers          = 5000000
MaxCheckpointRemap       = 1000000
AsyncQueryMaxThreads     = 18
ThreadsPerQuery          = 18
IndexTreeMaps            = 512
ThreadCleanupInterval    = 1
ResourcesCleanupInterval = 1

Please refer to attachment:virtuoso.ini.2 ダウンロード for the detailed parameter in the test.

More information please refer to  http://docs.openlinksw.com/virtuoso/databaseadmsrv.html  http://www.openlinksw.com/weblog/oerling/?id=1665

it is generally recommended with the Virtuoso 6.x release 16GB of memory is required per billion triples.

Load Performance

Allie upload

Data: 94,420,989 tripples, n3 format.

* Approach 1:

Load the big file in one stream.

Result:

2hours.

Step:

$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &

load.list.isql script:

log_enable (2);
DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp', 0);
checkpoint;

* Approach 2:

Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12).

Result: 46mins22secs.

Step 1. load file into ld_dir:

$nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &

load.list.isql script:

 delete from load_list;
 ld_dir('data directory','*.rdf.nt','http://allie.dbcls.jp');
 select * from load_list;

Step 2. upload the file into virtuoso:

$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql &

load.data.isql script:

--record CPU time
select getrusage ()[0] + getrusage ()[1];

rdf_loader_run () &
...(omit 10 times)
rdf_loader_run () &
checkpoint;

-- Record CPU time
select getrusage ()[0] + getrusage ()[1];

PDBJ upload

Data: .rdf.gz, 77878 files, 589,987,335 triples from  ftp://ftp.pdbj.org/XML/rdf/.

Upload with 12 streams.

Result:103min31s.

Uniprot upload

DDBJ upload

Sparql query performance

Allie query performance

The time cost for the five use case (please refer to  http://kiban.dbcls.jp/togordf/wiki/survey#data)

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 269 27 21 22 21
case2 1350 1273 1729 1300 1381
case3 395 145 138 155 172
case4 171 81 71 101 127
case5 26934 28107 27204 28276 26781

PDBJ query performance

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 184 156 150 152 131
case2 2 1 2 1 2
case3 6 3 2 1 1
case4 114 157 121 164 161

Uniprot query performance

DDBJ query performance

添付ファイル