for genotype data rgGLM.py '$i.extra_files_path/$i.metadata.base_name' '$phef.extra_files_path/$phef.metadata.base_name' "$title1" '$predvar' '$covar' '$out_file1' '$logf' '$i.metadata.base_name' '$inter' '$cond' '$gender' '$mind' '$geno' '$maf' '$logistic' '$gffout' .. class:: infomark **Syntax** Note this is a two form tool - you will choose the dependent trait and covariates on the second page based on the phenotype file you choose on the first page - **Genotype file** is the input Plink format compressed genotype (pbed) file - **Phenotype file** is the input Plink phenotype (pphe) file with FAMID IID followed by phenotypes - **Dependant variable** is the term on the left of the model and is chosen from the pphe columns on the second page - **Logistic** if you are (eg) using disease status as the outcome variable (case/control) - otherwise the model is linear. - **Covariates** are covariate terms on the right of the model, also chosen on the second page - **Interactions** will add interactions - please be careful how you interpret these - see the Plink documentation. - **Gender** will add gender as a model term - described in the Plink documentation - **Condition** will condition the model on one or more specific SNP rs ids as a whitespace delimited sequence - **Format** determines how your data will be returned to your Galaxy workspace ----- .. class:: infomark **Summary** This tool will test GLM models for SNP predicting a dependent phenotype variable with adjustment for specified covariates. If you don't see the genotype or phenotype data set you want here, it can be imported using one of the methods available from the rg get data tool group. Output format can be UCSC .bed if you want to see one column of your results as a fully fledged UCSC genome browser track. A map file containing the chromosome and offset for each marker is required for writing this kind of output. Alternatively you can use .gg for the UCSC Genome Graphs tool which has all of the advantages of the the .bed track, plus a neat, visual front end that displays a lot of useful clues. Either of these are a very useful way of quickly getting a look at your data in full genomic context. Finally, if you can't live without spreadsheet data, choose the .xls tab delimited format. It's not a stupid binary excel file. Just a plain old tab delimited one with a header. Fortunately excel is dumb enough to open these without much protest. ----- .. class:: infomark **Attribution** This tool allows you to control settings for models using Plink linear models. So, we rely on the author (Shaun Purcell) for the documentation you need specific to those settings - they are very nicely documented at http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#glm Tool and Galaxy datatypes originally designed and written for the Rgenetics series of whole genome scale statistical genetics tools by ross lazarus (ross.lazarus@gmail.com) supported by NIH grant Shaun Purcell created and maintains Plink Please acknowledge your use of this tool, Galaxy and Plink in your publications and let us know so we can keep track. These tools all rely on highly competitive grant funding so your letting us know about publications is important to our ongoing support.