[2] | 1 | <tool id="rgGRR1" name="GRR:"> |
---|
| 2 | <code file="rgGRR_code.py"/> |
---|
| 3 | <description>Pairwise Allele Sharing</description> |
---|
| 4 | <command interpreter="python"> |
---|
| 5 | rgGRR.py $i.extra_files_path/$i.metadata.base_name "$i.metadata.base_name" |
---|
| 6 | '$out_file1' '$out_file1.files_path' "$title1" '$n' '$Z' |
---|
| 7 | </command> |
---|
| 8 | <inputs> |
---|
| 9 | <param name="i" type="data" label="Genotype data file from your current history" |
---|
| 10 | format="ldindep" /> |
---|
| 11 | <param name='title1' type='text' size="80" value='rgGRR' label="Title for this job"/> |
---|
| 12 | <param name="n" type="integer" label="N snps to use (0=all)" value="5000" /> |
---|
| 13 | <param name="Z" type="float" label="Z score cutoff for outliers (eg 2)" value="6" |
---|
| 14 | help="2 works but for very large numbers of pairs, you might want to see less than 5%" /> |
---|
| 15 | </inputs> |
---|
| 16 | <outputs> |
---|
| 17 | <data format="html" name="out_file1" /> |
---|
| 18 | </outputs> |
---|
| 19 | |
---|
| 20 | <tests> |
---|
| 21 | <test> |
---|
| 22 | <param name='i' value='tinywga' ftype='ldindep' > |
---|
| 23 | <metadata name='base_name' value='tinywga' /> |
---|
| 24 | <composite_data value='tinywga.bim' /> |
---|
| 25 | <composite_data value='tinywga.bed' /> |
---|
| 26 | <composite_data value='tinywga.fam' /> |
---|
| 27 | <edit_attributes type='name' value='tinywga' /> |
---|
| 28 | </param> |
---|
| 29 | <param name='title1' value='rgGRRtest1' /> |
---|
| 30 | <param name='n' value='100' /> |
---|
| 31 | <param name='Z' value='6' /> |
---|
| 32 | <param name='force' value='true' /> |
---|
| 33 | <output name='out_file1' file='rgtestouts/rgGRR/rgGRRtest1.html' ftype='html' compare="diff" lines_diff='350'> |
---|
| 34 | <extra_files type="file" name='Log_rgGRRtest1.txt' value="rgtestouts/rgGRR/Log_rgGRRtest1.txt" compare="diff" lines_diff="170"/> |
---|
| 35 | <extra_files type="file" name='rgGRRtest1.svg' value="rgtestouts/rgGRR/rgGRRtest1.svg" compare="diff" lines_diff="1000" /> |
---|
| 36 | <extra_files type="file" name='rgGRRtest1_table.xls' value="rgtestouts/rgGRR/rgGRRtest1_table.xls" compare="diff" lines_diff="100" /> |
---|
| 37 | </output> |
---|
| 38 | </test> |
---|
| 39 | </tests> |
---|
| 40 | |
---|
| 41 | |
---|
| 42 | <help> |
---|
| 43 | |
---|
| 44 | .. class:: infomark |
---|
| 45 | |
---|
| 46 | **Explanation** |
---|
| 47 | |
---|
| 48 | This tool will calculate allele sharing among all subjects, one pair at a time. It outputs measures of average alleles |
---|
| 49 | shared and measures of variability for each pair of subjects and creates an interactive image where each pair is |
---|
| 50 | plotted in this mean/variance space. It is based on the GRR windows application available at |
---|
| 51 | http://www.sph.umich.edu/csg/abecasis/GRR/ |
---|
| 52 | |
---|
| 53 | The plot is interactive - you can unselect one of the relationships in the legend to remove all those points |
---|
| 54 | from the plot for example. Details of outlier pairs will pop up when the pointer is over them. e found by moving your pointer |
---|
| 55 | over them. This relies on a working browser SVG plugin - try getting one installed for your browser if the interactivity is |
---|
| 56 | broken. |
---|
| 57 | |
---|
| 58 | ----- |
---|
| 59 | |
---|
| 60 | **Syntax** |
---|
| 61 | |
---|
| 62 | - **Genotype file** is the input pedigree data chosen from available library Plink binary files |
---|
| 63 | - **Title** will be used to name the outputs so make it mnemonic and useful |
---|
| 64 | - **N** is left 0 to use all snps - otherwise you get a random sample - much quicker with little loss of precision > 5000 SNPS |
---|
| 65 | |
---|
| 66 | **Summary** |
---|
| 67 | |
---|
| 68 | Warning - this tool works pairwise so slows down exponentially with sample size. An LD-reduced dataset is |
---|
| 69 | strongly recommended as it will give good resolution with relatively few SNPs. Do not use all million snps from a whole |
---|
| 70 | genome chip - it's overkill - 5k is good, 10k is almost indistinguishable from 100k. |
---|
| 71 | |
---|
| 72 | SNP are sampled randomly from the autosomes - otherwise parent/child pairs will be separated by gender. |
---|
| 73 | This tool will estimate mean pairwise allele shareing among all subjects. Based on the work of Abecasis, it has |
---|
| 74 | been rewritten so it can run with much larger data sets, produces cross platform svg and runs |
---|
| 75 | on a Galaxy server, instead of being MS windows only. Written in is Python, it uses numpy, and the innermost loop |
---|
| 76 | is inline C so it can calculate about 50M SNPpairs/sec on a typical opteron server. |
---|
| 77 | |
---|
| 78 | Setting N to some (fraction) of available markers will speed up calculation - the difference is most painful for |
---|
| 79 | large subject N. The real cost is that every subject must be compared to every other one over all genotypes - |
---|
| 80 | this is an exponential problem on subjects. |
---|
| 81 | |
---|
| 82 | If you don't see the genotype data set you want here, it can be imported using one of the methods available from |
---|
| 83 | the Rgenetics Get Data tool. |
---|
| 84 | |
---|
| 85 | ----- |
---|
| 86 | |
---|
| 87 | **Attribution** |
---|
| 88 | |
---|
| 89 | Based on an idea from G. Abecasis implemented as GRR (windows only) at http://www.sph.umich.edu/csg/abecasis/GRR/ |
---|
| 90 | |
---|
| 91 | Ross Lazarus wrote the original pdf writer Galaxy tool version. |
---|
| 92 | John Ziniti added the C and created the slick svg representation. |
---|
| 93 | Copyright Ross Lazarus 2007 |
---|
| 94 | Licensed under the terms of the LGPL as documented http://www.gnu.org/licenses/lgpl.html |
---|
| 95 | </help> |
---|
| 96 | </tool> |
---|