* [#Configure Virtuoso 配置] * [#load Load performance] * [#allieload Allie upload] * [#pdbjload PDBJ upload] * [#uniprotload Uniprot upload] * [#ddbjload DDBJ upload] * [#Sparql Sparql query performance] * [#alliequery Allie query performance] * [#pdbjquery PDBJ query performance] * [#uniprotquery Uniprot query performance] * [#ddbjquery DDBJ query performance] === Virtuoso 配置 === #Configure * NumberOfBuffers: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory. * MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers and MaxCheckpointRemap are not set with the same values. * AsyncQueueMaxThreads: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better. * ThreadsPerQuery: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better. * IndexTreeMaps: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention. * ThreadCleanupInterval & ResourcesCleanupInterval: Set both to 1 in order to reduce memory leaking. {{{ NumberOfBuffers = 6500000 MaxDirtyBuffers = 5000000 MaxCheckpointRemap = 1000000 AsyncQueryMaxThreads = 18 ThreadsPerQuery = 18 IndexTreeMaps = 512 ThreadCleanupInterval = 1 ResourcesCleanupInterval = 1 }}} Please refer to attachment:virtuoso.ini.2 for the detailed parameter in the test. More information please refer to [http://docs.openlinksw.com/virtuoso/databaseadmsrv.html] [http://www.openlinksw.com/weblog/oerling/?id=1665] === Load Performance === #load === Allie upload === #allieload '''Data''': 94,420,989 tripples, n3 format. * Approach 1: Load the big file in one stream. '''Result:''' 2hours. Step: $ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql & load.list.isql script: {{{ log_enable (2); DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp', 0); checkpoint; }}} * Approach 2: Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12). '''Result''': 46mins22secs. Step 1. load file into ld_dir: $nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql & load.list.isql script: {{{ delete from load_list; ld_dir('data directory','x*.rdf.nt','http://allie.dbcls.jp'); select * from load_list; }}} Step 2. upload the file into virtuoso: $ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql & load.data.isql script: {{{ --record CPU time select getrusage ()[0] + getrusage ()[1]; rdf_loader_run () & ...(omit 10 times) rdf_loader_run () & checkpoint; -- Record CPU time select getrusage ()[0] + getrusage ()[1]; }}} === PDBJ upload === #pdbjload '''Data:''' .rdf.gz, 77878 files, 589,987,335 triples from [ftp://ftp.pdbj.org/XML/rdf/]. Upload with 12 streams. '''Result:'''103min31s. === Uniprot upload === #uniprotload === DDBJ upload === #ddbjload === Sparql query performance === #Sparql === Allie query performance === #alliequery === PDBJ query performance === #pdbjquery === Uniprot query performance === #uniprotquery === DDBJ query performance === #ddbj