* [#Configure Virtuoso 配置] * [#load Load performance] * [#cellload Cell Cycle upload] * [#allieload Allie upload] * [#pdbjload PDBJ upload] * [#uniprotload Uniprot upload] * [#ddbjload DDBJ upload] * [#Sparql Sparql query performance] * [#cellquery Cell cycle query ] * [#alliequery Allie query ] * [#pdbjquery PDBJ query ] * [#uniprotquery Uniprot query ] * [#ddbjquery DDBJ query ] === Virtuoso 配置 === #Configure About Virtuoso index: The index scheme consists of the following indices: * PSOG - primary key. * POGS - bitmap index for lookups on object value. * SP - partial index for cases where only S is specified. * OP - partial index for cases where only O is specified. * GS - partial index for cases where only G is specified. * NumberOfBuffers: the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Each buffer caches one 8K page of data and occupies approximately 8700 bytes of memory. * MaxCheckpointRemap:to avoid out of memory error, you should make sure the values for the paramaters NumberOfBuffers and MaxCheckpointRemap are not set with the same values. * AsyncQueueMaxThreads: the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better. * ThreadsPerQuery: the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better. * IndexTreeMaps: the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial. A low number will lead to frequent contention; upwards of 64 will have little contention. * ThreadCleanupInterval & ResourcesCleanupInterval: Set both to 1 in order to reduce memory leaking. {{{ NumberOfBuffers = 6500000 MaxDirtyBuffers = 5000000 MaxCheckpointRemap = 1000000 AsyncQueryMaxThreads = 18 ThreadsPerQuery = 18 IndexTreeMaps = 512 ThreadCleanupInterval = 1 ResourcesCleanupInterval = 1 }}} Please refer to attachment:virtuoso.ini.2 for the detailed parameter in the test. More information please refer to [http://docs.openlinksw.com/virtuoso/databaseadmsrv.html] [http://www.openlinksw.com/weblog/oerling/?id=1665] it is generally recommended with the Virtuoso 6.x release 16GB of memory is required per billion triples. === Load Performance === #load === Allie upload === #allieload '''Data''': 94,420,989 tripples, n3 format. * Approach 1: Load the big file in one stream. '''Result:''' 2hours. Step: $ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql & load.list.isql script: {{{ log_enable (2); DB.DBA.TTLP (file_to_string_output('file path'),' ','http://mydbcls.jp/', 0); checkpoint; }}} * Approach 2: Use one stream per core (not per core thread). Split the big file into 12 small files(precisely, 13(12+1)files, #linesPerFiles=#totleLines/12). '''Result''': 46mins22secs. Step 1. load file into ld_dir: $nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.list.isql & load.list.isql script: {{{ delete from load_list; ld_dir('data directory','*.rdf.nt','http://allie.dbcls.jp/'); select * from load_list; }}} Step 2. upload the file into virtuoso: $ nohup $VIRTUOSO_HOME/bin/isql 1111 dba dba <$VIRTUOSO_HOME/scripts/load.data.isql & load.data.isql script: {{{ --record CPU time select getrusage ()[0] + getrusage ()[1]; rdf_loader_run () & ...(omit 10 times) rdf_loader_run () & checkpoint; -- Record CPU time select getrusage ()[0] + getrusage ()[1]; }}} The following procedures use approach 2, 12 streams to upload the data. === Cell Cycle upload === #cellload 4mins === PDBJ upload === #pdbjload '''Result:'''103min31s. === UniProt upload === #uniprotload vm.swappiness = 60: 71hs58mins vm.swappiness = 10: 42hs43mins === DDBJ upload === #ddbjload vm.swappiness = 10: 78hs8mins === Load performance === We uploaded all the data twice with the least cost configuration. ||loadtime|| Cell Cycle Ontology || Allie || PDBj || UniProt || DDBJ || || 1st time || 4mins || 46mins || 103mins || 42hs43mins || 78hs8mins || || 2nd time || 4mins || 48mins || 81mins || 40hs12mins || 80hs30mins || || average || 4mins || 47 mins || 92mins || 41hs28mins || 79hs19mins|| === Sparql query performance === #Sparql === Cell cycle query === #cellquery ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||23|| 23|| 24|| 24|| 24 || ||case2 ||2|| 2|| 2|| 2|| 2|| ||case3 ||22368|| 23440|| 22961||23655 ||23655 || ||case4 ||2|| 3 ||10 ||4|| 4|| ||case5 ||43265|| 42911|| 42172|| 42459|| 42459|| ||case6 ||13062|| 13057|| 13069|| 13102|| 13102|| ||case7 ||3|| 3 ||3 ||12 ||12|| ||case8 ||7683|| 7479|| 7455 ||7656|| 7656|| ||case9 ||40|| 38|| 36|| 51|| 51|| ||case10 ||1|| 8|| 1|| 3|| 3|| ||case11 ||120 ||119|| 118|| 123|| 123|| ||case12 ||521|| 18|| 17|| 20|| 20|| ||case13 ||24|| 4|| 2|| 7|| 7|| ||case14 ||1|| 1|| 1|| 1|| 1|| ||case15 ||55065|| 57530|| 56760|| 56203|| 56203|| ||case16 ||36|| 34|| 46|| 65|| 65|| ||case17 ||14 ||23|| 18|| 13|| 13|| ||case18 ||23|| 17 ||16|| 16|| 16|| ||case19 ||16980|| 17064|| 16643|| 16631|| 16631|| === Allie query === #alliequery The time cost for the five use case (please refer to [http://kiban.dbcls.jp/togordf/wiki/survey#data]) ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||269|| 27|| 21|| 22|| 21|| ||case2 ||1350 ||1273|| 1729|| 1300|| 1381|| ||case3 ||395|| 145|| 138 ||155|| 172|| ||case4 ||171|| 81|| 71|| 101|| 127|| ||case5 ||26934 ||28107|| 27204|| 28276 ||26781|| === PDBJ query === #pdbjquery ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||184|| 156 ||150|| 152 ||131|| ||case2 ||2 ||1|| 2 ||1|| 2|| ||case3 ||6 ||3 ||2|| 1 ||1|| ||case4 ||114 ||157 ||121|| 164|| 161|| === Uniprot query === #uniprotquery ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||49 ||42 ||56 ||157|| 58 || ||case2 ||105 ||127 ||94 ||90 ||90|| ||case3 ||114 ||108|| 116 ||123 ||116 || ||case4 ||2 ||2 ||2 ||2 ||2|| ||case5 ||8 ||1 ||66 ||16 ||1|| ||case6 ||2252 ||2217 ||2177 ||2237 ||2192|| ||case7 ||58027 ||13139 ||42780 ||42017 ||41729|| ||case8 ||421 ||402 ||410 ||417 ||487|| ||case9 ||589 ||597 ||614 ||644 ||619|| ||case10 ||702 ||5862 ||622 ||642 ||643|| ||case11 ||70 ||43 ||50 ||57 ||61|| ||case12 ||6 ||2 ||21 ||3 ||3|| ||case13 ||278 ||317 ||285 ||288 ||276|| ||case14 ||268 ||270 ||274 ||271 ||264|| ||case15 ||10635 ||10453 ||10785 ||10650 ||10684|| ||case16 ||9075 ||9008 ||9049 ||9074 ||9260|| ||case17 ||78 ||2 ||1 ||1 ||5|| ||case18 ||180 ||70 ||45 ||98 ||89|| === DDBJ query === #ddbjquery ||Query\time(ms) ||time 1 || time 2 || time 3 ||time 4||time 5 || ||case1 ||248|| 238|| 245 ||208|| 213 || ||case2 ||270 ||225 ||212 ||214 ||222|| ||case3 ||8672|| 401 ||431 ||430 ||411|| ||case4 ||61 ||59 ||57|| 55|| 54|| ||case5 ||24 ||11 ||6|| 6|| 5|| ||case6 ||110|| 95|| 92 ||94|| 149|| ||case7 ||14|| 12|| 3|| 2|| 4|| ||case8 ||3|| 3|| 6|| 3|| 5|| ||case9 ||13|| 4|| 23|| 4|| 6|| ||case10 ||0|| 1|| 1|| 1