バージョン 25 (更新者: wu, 12 年 前)

--

* Virtuoso 配置

* Load performance

* Sparql query performance

Virtuoso 配置

* NumberOfBuffers?: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory.

* MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers? and MaxCheckpointRemap? are not set with the same values.

* AsyncQueueMaxThreads?: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better.

* ThreadsPerQuery?: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better.

* IndexTreeMaps?: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention.

* ThreadCleanupInterval? & ResourcesCleanupInterval?: Set both to 1 in order to reduce memory leaking.

NumberOfBuffers          = 6500000
MaxDirtyBuffers          = 5000000
MaxCheckpointRemap       = 1000000
AsyncQueryMaxThreads     = 18
ThreadsPerQuery          = 18
IndexTreeMaps            = 512
ThreadCleanupInterval    = 1
ResourcesCleanupInterval = 1

Please refer to attachment:virtuoso.ini.2 ダウンロード for the detailed parameter in the test.

More information please refer to  http://docs.openlinksw.com/virtuoso/databaseadmsrv.html  http://www.openlinksw.com/weblog/oerling/?id=1665

it is generally recommended with the Virtuoso 6.x release 16GB of memory is required per billion triples.

Load Performance

Allie upload

Data: 94,420,989 tripples, n3 format.

* Approach 1:

Load the big file in one stream.

Result:

2hours.

Step:

$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &

load.list.isql script:

log_enable (2);
DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp', 0);
checkpoint;

* Approach 2:

Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12).

Result: 46mins22secs.

Step 1. load file into ld_dir:

$nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &

load.list.isql script:

 delete from load_list;
 ld_dir('data directory','*.rdf.nt','http://allie.dbcls.jp');
 select * from load_list;

Step 2. upload the file into virtuoso:

$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql &

load.data.isql script:

--record CPU time
select getrusage ()[0] + getrusage ()[1];

rdf_loader_run () &
...(omit 10 times)
rdf_loader_run () &
checkpoint;

-- Record CPU time
select getrusage ()[0] + getrusage ()[1];

PDBJ upload

Data: .rdf.gz, 77878 files, 589,987,335 triples from  ftp://ftp.pdbj.org/XML/rdf/.

Upload with 12 streams.

Result:103min31s.

Uniprot upload

DDBJ upload

Sparql query performance

Allie query performance

The time cost for the five use case (please refer to  http://kiban.dbcls.jp/togordf/wiki/survey#data)

Query   case1 case2 case3 case4 case5

time(ms) 53   1268  160   80    28718

PDBJ query performance

Uniprot query performance

DDBJ query performance

添付ファイル