* [#Configure Virtuoso 配置] * [#load Load performance] * [#allieload Allie upload] * [#pdbjload PDBJ upload] * [#uniprotload Uniprot upload] * [#ddbjload DDBJ upload] * [#Sparql Sparql query performance] * [#alliequery Allie query performance] * [#pdbjquery PDBJ query performance] * [#uniprotquery Uniprot query performance] * [#ddbjquery DDBJ query performance] === Virtuoso 配置 === #Configure * NumberOfBuffers: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory. * MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers and MaxCheckpointRemap are not set with the same values. * AsyncQueueMaxThreads: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better. * ThreadsPerQuery: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better. * IndexTreeMaps: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention. * ThreadCleanupInterval & ResourcesCleanupInterval: Set both to 1 in order to reduce memory leaking. {{{ NumberOfBuffers = 6500000 MaxDirtyBuffers = 5000000 MaxCheckpointRemap = 1000000 AsyncQueryMaxThreads = 18 ThreadsPerQuery = 18 IndexTreeMaps = 512 ThreadCleanupInterval = 1 ResourcesCleanupInterval = 1 }}} Please refer to attachment:virtuoso.ini.2 for the detailed parameter in the test. More information please refer to [http://docs.openlinksw.com/virtuoso/databaseadmsrv.html] [http://www.openlinksw.com/weblog/oerling/?id=1665] it is generally recommended with the Virtuoso 6.x release 16GB of memory is required per billion triples. === Load Performance === #load === Allie upload === #allieload '''Data''': 94,420,989 tripples, n3 format. * Approach 1: Load the big file in one stream. '''Result:''' 2hours. Step: $ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql & load.list.isql script: {{{ log_enable (2); DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp', 0); checkpoint; }}} * Approach 2: Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12). '''Result''': 46mins22secs. Step 1. load file into ld_dir: $nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql & load.list.isql script: {{{ delete from load_list; ld_dir('data directory','*.rdf.nt','http://allie.dbcls.jp'); select * from load_list; }}} Step 2. upload the file into virtuoso: $ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql & load.data.isql script: {{{ --record CPU time select getrusage ()[0] + getrusage ()[1]; rdf_loader_run () & ...(omit 10 times) rdf_loader_run () & checkpoint; -- Record CPU time select getrusage ()[0] + getrusage ()[1]; }}} === PDBJ upload === #pdbjload '''Data:''' .rdf.gz, 77878 files, 589,987,335 triples from [ftp://ftp.pdbj.org/XML/rdf/]. Upload with 12 streams. '''Result:'''103min31s. === Uniprot upload === #uniprotload === DDBJ upload === #ddbjload === Sparql query performance === #Sparql === Cell cycle query performance === #cellquery ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||23|| 23|| 24|| 24|| 24 || ||case2 ||2|| 2|| 2|| 2|| 2|| ||case3 ||22368|| 23440|| 22961||23655 ||23655 || ||case4 ||2|| 3 ||10 ||4|| 4|| ||case5 ||43265|| 42911|| 42172|| 42459|| 42459|| ||case6 ||13062|| 13057|| 13069|| 13102|| 13102|| ||case7 ||3|| 3 ||3 ||12 ||12|| ||case8 ||7683|| 7479|| 7455 ||7656|| 7656|| ||case9 ||40|| 38|| 36|| 51|| 51|| ||case10 ||1|| 8|| 1|| 3|| 3|| ||case11 ||120 ||119|| 118|| 123|| 123|| ||case12 ||521|| 18|| 17|| 20|| 20|| ||case13 ||24|| 4|| 2|| 7|| 7|| ||case14 ||1|| 1|| 1|| 1|| 1|| ||case15 ||55065|| 57530|| 56760|| 56203|| 56203|| ||case16 ||36|| 34|| 46|| 65|| 65|| ||case17 ||14 ||23|| 18|| 13|| 13|| ||case18 ||23|| 17 ||16|| 16|| 16|| ||case19 ||16980|| 17064|| 16643|| 16631|| 16631|| === Allie query performance === #alliequery The time cost for the five use case (please refer to [http://kiban.dbcls.jp/togordf/wiki/survey#data]) ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||269|| 27|| 21|| 22|| 21|| ||case2 ||1350 ||1273|| 1729|| 1300|| 1381|| ||case3 ||395|| 145|| 138 ||155|| 172|| ||case4 ||171|| 81|| 71|| 101|| 127|| ||case5 ||26934 ||28107|| 27204|| 28276 ||26781|| === PDBJ query performance === #pdbjquery ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||184|| 156 ||150|| 152 ||131|| ||case2 ||2 ||1|| 2 ||1|| 2|| ||case3 ||6 ||3 ||2|| 1 ||1|| ||case4 ||114 ||157 ||121|| 164|| 161|| === Uniprot query performance === #uniprotquery === DDBJ query performance === #ddbj