バージョン 29 (更新者: wu, 13 年 前)

--

* Virtuoso 配置

* Load performance

* Sparql query performance

Virtuoso 配置

* NumberOfBuffers?: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory.

* MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers? and MaxCheckpointRemap? are not set with the same values.

* AsyncQueueMaxThreads?: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better.

* ThreadsPerQuery?: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better.

* IndexTreeMaps?: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention.

* ThreadCleanupInterval? & ResourcesCleanupInterval?: Set both to 1 in order to reduce memory leaking.

NumberOfBuffers          = 6500000
MaxDirtyBuffers          = 5000000
MaxCheckpointRemap       = 1000000
AsyncQueryMaxThreads     = 18
ThreadsPerQuery          = 18
IndexTreeMaps            = 512
ThreadCleanupInterval    = 1
ResourcesCleanupInterval = 1

Please refer to attachment:virtuoso.ini.2 ダウンロード for the detailed parameter in the test.

More information please refer to  http://docs.openlinksw.com/virtuoso/databaseadmsrv.html  http://www.openlinksw.com/weblog/oerling/?id=1665

it is generally recommended with the Virtuoso 6.x release 16GB of memory is required per billion triples.

Load Performance

Allie upload

Data: 94,420,989 tripples, n3 format.

* Approach 1:

Load the big file in one stream.

Result:

2hours.

Step:

$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &

load.list.isql script:

log_enable (2);
DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp', 0);
checkpoint;

* Approach 2:

Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12).

Result: 46mins22secs.

Step 1. load file into ld_dir:

$nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql &

load.list.isql script:

 delete from load_list;
 ld_dir('data directory','*.rdf.nt','http://allie.dbcls.jp');
 select * from load_list;

Step 2. upload the file into virtuoso:

$ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql &

load.data.isql script:

--record CPU time
select getrusage ()[0] + getrusage ()[1];

rdf_loader_run () &
...(omit 10 times)
rdf_loader_run () &
checkpoint;

-- Record CPU time
select getrusage ()[0] + getrusage ()[1];

PDBJ upload

Data: .rdf.gz, 77878 files, 589,987,335 triples from  ftp://ftp.pdbj.org/XML/rdf/.

Upload with 12 streams.

Result:103min31s.

Uniprot upload

DDBJ upload

Sparql query performance

Cell cycle query

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 23 23 24 24 24
case2 2 2 2 2 2
case3 22368 23440 2296123655 23655
case4 2 3 10 4 4
case5 43265 42911 42172 42459 42459
case6 13062 13057 13069 13102 13102
case7 3 3 3 12 12
case8 7683 7479 7455 7656 7656
case9 40 38 36 51 51
case10 1 8 1 3 3
case11 120 119 118 123 123
case12 521 18 17 20 20
case13 24 4 2 7 7
case14 1 1 1 1 1
case15 55065 57530 56760 56203 56203
case16 36 34 46 65 65
case17 14 23 18 13 13
case18 23 17 16 16 16
case19 16980 17064 16643 16631 16631

Allie query

The time cost for the five use case (please refer to  http://kiban.dbcls.jp/togordf/wiki/survey#data)

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 269 27 21 22 21
case2 1350 1273 1729 1300 1381
case3 395 145 138 155 172
case4 171 81 71 101 127
case5 26934 28107 27204 28276 26781

PDBJ query

Query\time(ms) time 1 time 2 time 3 time 4time 5
case1 184 156 150 152 131
case2 2 1 2 1 2
case3 6 3 2 1 1
case4 114 157 121 164 161

Uniprot query

DDBJ query

添付ファイル