1 | <tool id="Annotation_Profiler_0" name="Profile Annotations" Version="1.0.0">
|
---|
2 | <description>for a set of genomic intervals</description>
|
---|
3 | <command interpreter="python">annotation_profiler_for_interval.py -i $input1 -c ${input1.metadata.chromCol} -s ${input1.metadata.startCol} -e ${input1.metadata.endCol} -o $out_file1 $keep_empty -p ${GALAXY_DATA_INDEX_DIR}/annotation_profiler/$dbkey $summary -b 3 -t $table_names</command>
|
---|
4 | <inputs>
|
---|
5 | <param format="interval" name="input1" type="data" label="Choose Intervals">
|
---|
6 | <validator type="dataset_metadata_in_file" filename="annotation_profiler_valid_builds.txt" metadata_name="dbkey" metadata_column="0" message="Profiling is not currently available for this species."/>
|
---|
7 | </param>
|
---|
8 | <param name="keep_empty" type="select" label="Keep Region/Table Pairs with 0 Coverage">
|
---|
9 | <option value="-k">Keep</option>
|
---|
10 | <option value="" selected="true">Discard</option>
|
---|
11 | </param>
|
---|
12 | <param name="summary" type="select" label="Output per Region/Summary">
|
---|
13 | <option value="-S">Summary</option>
|
---|
14 | <option value="" selected="true">Per Region</option>
|
---|
15 | </param>
|
---|
16 | <param name="table_names" type="drill_down" display="checkbox" hierarchy="recurse" multiple="true" label="Choose Tables to Use" help="Selecting no tables will result in using all tables." from_file="annotation_profiler_options.xml"/>
|
---|
17 | </inputs>
|
---|
18 | <outputs>
|
---|
19 | <data format="input" name="out_file1">
|
---|
20 | <change_format>
|
---|
21 | <when input="summary" value="-S" format="tabular" />
|
---|
22 | </change_format>
|
---|
23 | </data>
|
---|
24 | </outputs>
|
---|
25 | <tests>
|
---|
26 | <test>
|
---|
27 | <param name="input1" value="4.bed" dbkey="hg18"/>
|
---|
28 | <param name="keep_empty" value=""/>
|
---|
29 | <param name="summary" value=""/>
|
---|
30 | <param name="table_names" value="acembly,affyGnf1h,knownAlt,knownGene,mrna,multiz17way,multiz28way,refGene,snp126"/>
|
---|
31 | <output name="out_file1" file="annotation_profiler_1.out" />
|
---|
32 | </test>
|
---|
33 | <test>
|
---|
34 | <param name="input1" value="3.bed" dbkey="hg18"/>
|
---|
35 | <param name="keep_empty" value=""/>
|
---|
36 | <param name="summary" value="Summary"/>
|
---|
37 | <param name="table_names" value="acembly,affyGnf1h,knownAlt,knownGene,mrna,multiz17way,multiz28way,refGene,snp126"/>
|
---|
38 | <output name="out_file1" file="annotation_profiler_2.out" />
|
---|
39 | </test>
|
---|
40 | </tests>
|
---|
41 | <help>
|
---|
42 | **What it does**
|
---|
43 |
|
---|
44 | Takes an input set of intervals and for each interval determines the base coverage of the interval by a set of features (tables) available from UCSC. Genomic regions from the input feature data have been merged by overlap / direct adjacency (e.g. a table having ranges of: 1-10, 6-12, 12-20 and 25-28 results in two merged ranges of: 1-20 and 25-28).
|
---|
45 |
|
---|
46 | By default, this tool will check the coverage of your intervals against all available features; you may, however, choose to select only those tables that you want to include. Selecting a section heading will effectively cause all of its children to be selected.
|
---|
47 |
|
---|
48 | You may alternatively choose to receive a summary across all of the intervals that you provide.
|
---|
49 |
|
---|
50 | -----
|
---|
51 |
|
---|
52 | **Example**
|
---|
53 |
|
---|
54 | Using the interval below and selecting several tables::
|
---|
55 |
|
---|
56 | chr1 4558 14764 uc001aab.1 0 -
|
---|
57 |
|
---|
58 | results in::
|
---|
59 |
|
---|
60 | chr1 4558 14764 uc001aab.1 0 - snp126Exceptions 151 142
|
---|
61 | chr1 4558 14764 uc001aab.1 0 - genomicSuperDups 10206 1
|
---|
62 | chr1 4558 14764 uc001aab.1 0 - chainOryLat1 3718 1
|
---|
63 | chr1 4558 14764 uc001aab.1 0 - multiz28way 10206 1
|
---|
64 | chr1 4558 14764 uc001aab.1 0 - affyHuEx1 3553 32
|
---|
65 | chr1 4558 14764 uc001aab.1 0 - netXenTro2 3050 1
|
---|
66 | chr1 4558 14764 uc001aab.1 0 - intronEst 10206 1
|
---|
67 | chr1 4558 14764 uc001aab.1 0 - xenoMrna 10203 1
|
---|
68 | chr1 4558 14764 uc001aab.1 0 - ctgPos 10206 1
|
---|
69 | chr1 4558 14764 uc001aab.1 0 - clonePos 10206 1
|
---|
70 | chr1 4558 14764 uc001aab.1 0 - chainStrPur2Link 1323 29
|
---|
71 | chr1 4558 14764 uc001aab.1 0 - affyTxnPhase3HeLaNuclear 9011 8
|
---|
72 | chr1 4558 14764 uc001aab.1 0 - snp126orthoPanTro2RheMac2 61 58
|
---|
73 | chr1 4558 14764 uc001aab.1 0 - snp126 205 192
|
---|
74 | chr1 4558 14764 uc001aab.1 0 - chainEquCab1 10206 1
|
---|
75 | chr1 4558 14764 uc001aab.1 0 - netGalGal3 3686 1
|
---|
76 | chr1 4558 14764 uc001aab.1 0 - phastCons28wayPlacMammal 10172 3
|
---|
77 |
|
---|
78 | Where::
|
---|
79 |
|
---|
80 | The first added column is the table name.
|
---|
81 | The second added column is the number of bases covered by the table.
|
---|
82 | The third added column is the number of regions from the table that is covered by the interval.
|
---|
83 |
|
---|
84 | Alternatively, requesting a summary, using the intervals below and selecting several tables::
|
---|
85 |
|
---|
86 | chr1 4558 14764 uc001aab.1 0 -
|
---|
87 | chr1 4558 19346 uc001aac.1 0 -
|
---|
88 |
|
---|
89 | results in::
|
---|
90 |
|
---|
91 | #tableName tableSize tableRegionCount allIntervalCount allIntervalSize allCoverage allTableRegionsOverlaped allIntervalsOverlapingTable nrIntervalCount nrIntervalSize nrCoverage nrTableRegionsOverlaped nrIntervalsOverlapingTable
|
---|
92 | snp126Exceptions 133601 92469 2 24994 388 359 2 1 14788 237 217 1
|
---|
93 | genomicSuperDups 12268847 657 2 24994 24994 2 2 1 14788 14788 1 1
|
---|
94 | chainOryLat1 70337730 2542 2 24994 7436 2 2 1 14788 3718 1 1
|
---|
95 | affyHuEx1 15703901 112274 2 24994 7846 70 2 1 14788 4293 38 1
|
---|
96 | netXenTro2 111440392 1877 2 24994 6100 2 2 1 14788 3050 1 1
|
---|
97 | snp126orthoPanTro2RheMac2 700436 690674 2 24994 124 118 2 1 14788 63 60 1
|
---|
98 | intronEst 135796064 2332 2 24994 24994 2 2 1 14788 14788 1 1
|
---|
99 | xenoMrna 129031327 1586 2 24994 20406 2 2 1 14788 10203 1 1
|
---|
100 | snp126 956976 838091 2 24994 498 461 2 1 14788 293 269 1
|
---|
101 | clonePos 224999719 39 2 24994 24994 2 2 1 14788 14788 1 1
|
---|
102 | chainStrPur2Link 7948016 119841 2 24994 2646 58 2 1 14788 1323 29 1
|
---|
103 | affyTxnPhase3HeLaNuclear 136797870 140244 2 24994 22601 17 2 1 14788 13590 9 1
|
---|
104 | multiz28way 225928588 38 2 24994 24994 2 2 1 14788 14788 1 1
|
---|
105 | ctgPos 224999719 39 2 24994 24994 2 2 1 14788 14788 1 1
|
---|
106 | chainEquCab1 246306414 141 2 24994 24994 2 2 1 14788 14788 1 1
|
---|
107 | netGalGal3 203351973 461 2 24994 7372 2 2 1 14788 3686 1 1
|
---|
108 | phastCons28wayPlacMammal 221017670 22803 2 24994 24926 6 2 1 14788 14754 3 1
|
---|
109 |
|
---|
110 | Where::
|
---|
111 |
|
---|
112 | tableName is the name of the table
|
---|
113 | tableChromosomeCoverage is the number of positions existing in the table for only the chromosomes that were referenced by the interval file
|
---|
114 | tableChromosomeCount is the number of regions existing in the table for only the chromosomes that were referenced by the interval file
|
---|
115 | tableRegionCoverage is the number of positions existing in the table between the minimal and maximal bounding regions that were referenced by the interval file
|
---|
116 | tableRegionCount is the number of regions existing in the table between the minimal and maximal bounding regions that were referenced by the interval file
|
---|
117 |
|
---|
118 | allIntervalCount is the number of provided intervals
|
---|
119 | allIntervalSize is the sum of the lengths of the provided interval file
|
---|
120 | allCoverage is the sum of the coverage for each provided interval
|
---|
121 | allTableRegionsOverlapped is the sum of the number of regions of the table (non-unique) that were overlapped for each interval
|
---|
122 | allIntervalsOverlappingTable is the number of provided intervals which overlap the table
|
---|
123 |
|
---|
124 | nrIntervalCount is the number of non-redundant intervals
|
---|
125 | nrIntervalSize is the sum of the lengths of non-redundant intervals
|
---|
126 | nrCoverage is the sum of the coverage of non-redundant intervals
|
---|
127 | nrTableRegionsOverlapped is the number of regions of the table (unique) that were overlapped by the non-redundant intervals
|
---|
128 | nrIntervalsOverlappingTable is the number of non-redundant intervals which overlap the table
|
---|
129 |
|
---|
130 |
|
---|
131 | .. class:: infomark
|
---|
132 |
|
---|
133 | **TIP:** non-redundant (nr) refers to the set of intervals that remains after the intervals provided have been merged to resolve overlaps
|
---|
134 |
|
---|
135 | </help>
|
---|
136 | </tool>
|
---|