Context Navigation

cuffcompare_wrapper.xml

リビジョン 2, 11.6 KB (コミッタ: hatakeyama, 15 年前)
import galaxy-central

行番号
1	<tool id="cuffcompare" name="Cuffcompare" version="0.9.1">
2	<description>compare assembled transcripts to a reference annotation and track Cufflinks transcripts across multiple experiments</description>
3	<requirements>
4	<requirement type="package">cufflinks</requirement>
5	</requirements>
6	<command interpreter="python">
7	cuffcompare_wrapper.py
8	--transcripts-accuracy-output=$transcripts_accuracy
9	--input1-tmap-output=$input1_tmap
10	--input1-refmap-output=$input1_refmap
11	#if $second_gtf.use_second_gtf == "Yes":
12	--transcripts-combined-output=$transcripts_combined
13	--transcripts-tracking-output=$transcripts_tracking
14	--input2-tmap-output=$input2_tmap
15	--input2-refmap-output=$input2_refmap
16	#end if
17	#if $annotation.use_ref_annotation == "Yes":
18	-r $annotation.reference_annotation
19	#if $annotation.ignore_nonoverlapping_reference:
20	-R
21	#end if
22	#end if
23	-1 $input1
24	#if $second_gtf.use_second_gtf == "Yes":
25	-2 $second_gtf.input2
26	#end if
27	</command>
28	<inputs>
29	<param format="gtf" name="input1" type="data" label="GTF file produced by Cufflinks" help=""/>
30	<conditional name="second_gtf">
31	<param name="use_second_gtf" type="select" label="Use Another GTF file producted by Cufflinks?">
32	<option value="No">No</option>
33	<option value="Yes">Yes</option>
34	</param>
35	<when value="Yes">
36	<param format="gtf" name="input2" type="data" label="GTF file produced by Cufflinks" help=""/>
37	</when>
38	<when value="No">
39	</when>
40	</conditional>
41	<conditional name="annotation">
42	<param name="use_ref_annotation" type="select" label="Use Reference Annotation">
43	<option value="No">No</option>
44	<option value="Yes">Yes</option>
45	</param>
46	<when value="Yes">
47	<param format="gtf" name="reference_annotation" type="data" label="Reference Annotation" help="Make sure your annotation file is in GTF format and that Galaxy knows that your file is GTF--not GFF."/>
48	<param name="ignore_nonoverlapping_reference" type="boolean" label="Ignore reference transcripts that are not overlapped by any transcript in input files"/>
49	</when>
50	<when value="No">
51	</when>
52	</conditional>
53	</inputs>
54
55	<outputs>
56	<data format="txt" name="transcripts_accuracy" label="${tool.name} on ${on_string}: transcript accuracy"/>
57	<data format="tabular" name="input1_tmap" label="${tool.name} on ${on_string}: data ${input1.hid} tmap file"/>
58	<data format="tabular" name="input1_refmap" label="${tool.name} on ${on_string}: data ${input1.hid} refmap file"/>
59	<data format="tabular" name="input2_tmap" label="${tool.name} on ${on_string}: data ${second_gtf.input2.hid} tmap file">
60	<filter>second_gtf['use_second_gtf'] == "Yes"</filter>
61	</data>
62	<data format="tabular" name="input2_refmap" label="${tool.name} on ${on_string}: data ${second_gtf.input2.hid} refmap file">
63	<filter>second_gtf['use_second_gtf'] == "Yes"</filter>
64	</data>
65	<data format="gtf" name="transcripts_combined" label="${tool.name} on ${on_string}: combined transcripts">
66	<filter>second_gtf['use_second_gtf'] == "Yes"</filter>
67	</data>
68	<data format="tabular" name="transcripts_tracking" label="${tool.name} on ${on_string}: transcript tracking">
69	<filter>second_gtf['use_second_gtf'] == "Yes"</filter>
70	</data>
71	</outputs>
72
73	<tests>
74	<!--
75	cuffcompare -r cuffcompare_in3.gtf -R cuffcompare_in1.gtf cuffcompare_in2.gtf
76	-->
77	<test>
78	<param name="input1" value="cuffcompare_in1.gtf" ftype="gtf"/>
79	<param name="use_second_gtf" value="Yes"/>
80	<param name="input2" value="cuffcompare_in2.gtf" ftype="gtf"/>
81	<param name="use_ref_annotation" value="Yes"/>
82	<param name="reference_annotation" value="cuffcompare_in3.gtf" ftype="gtf"/>
83	<param name="ignore_nonoverlapping_reference" value="Yes"/>
84	## HACK: need to specify output name and it needs to work (right now it uses the final output file)
85	<output name="XXXX" file="cuffcompare_out6.tracking"/>
86	<!--
87	## TODO: transcripts combined file as well.
88	<output name="input1_tmap" file="cuffcompare_out1.tmap"/>
89	<output name="input1_refmap" file="cuffcompare_out2.refmap"/>
90	<output name="input2_tmap" file="cuffcompare_out3.tmap"/>
91	<output name="input2_refmap" file="cuffcompare_out4.refmap"/>
92	<output name="transcripts_accuracy" file="cuffcompare_out7.txt"/>
93	-->
94	</test>
95	</tests>
96
97	<help>
98	Cuffcompare Overview
99
100	Cuffcompare is part of Cufflinks_. Cuffcompare helps you: (a) compare your assembled transcripts to a reference annotation and (b) track Cufflinks transcripts across multiple experiments (e.g. across a time course). Please cite: Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nature Biotechnology doi:10.1038/nbt.1621
101
102	.. _Cufflinks: http://cufflinks.cbcb.umd.edu/
103
104	------
105
106	Know what you are doing
107
108	.. class:: warningmark
109
110	There is no such thing (yet) as an automated gearshift in expression analysis. It is all like stick-shift driving in San Francisco. In other words, running this tool with default parameters will probably not give you meaningful results. A way to deal with this is to understand the parameters by carefully reading the `documentation`__ and experimenting. Fortunately, Galaxy makes experimenting easy.
111
112	.. __: http://cufflinks.cbcb.umd.edu/manual.html#cuffcompare
113
114	------
115
116	Input format
117
118	Cuffcompare takes Cufflinks' GTF output as input, and optionally can take a "reference" annotation (such as from Ensembl___)
119
120	.. ___: http://www.todo.org
121
122	------
123
124	Outputs
125
126	Cuffcompare produces the following output files:
127
128	Transcripts Accuracy File:
129
130	Cuffcompare reports various statistics related to the "accuracy" of the transcripts in each sample when compared to the reference annotation data. The typical gene finding measures of "sensitivity" and "specificity" (as defined in Burset, M., Guigﾃｳ, R. : Evaluation of gene structure prediction programs (1996) Genomics, 34 (3), pp. 353-367. doi: 10.1006/geno.1996.0298) are calculated at various levels (nucleotide, exon, intron, transcript, gene) for each input file and reported in this file. The Sn and Sp columns show specificity and sensitivity values at each level, while the fSn and fSp columns are "fuzzy" variants of these same accuracy calculations, allowing for a very small variation in exon boundaries to still be counted as a "match".
131
132	Transcripts Combined File:
133
134	Cuffcompare reports a GTF file containing the "union" of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined gtf.
135
136	Transcripts Tracking File:
137
138	This file matches transcripts up between samples. Each row contains a transcript structure that is present in one or more input GTF files. Because the transcripts will generally have different IDs (unless you assembled your RNA-Seq reads against a reference transcriptome), cuffcompare examines the structure of each the transcripts, matching transcripts that agree on the coordinates and order of all of their introns, as well as strand. Matching transcripts are allowed to differ on the length of the first and last exons, since these lengths will naturally vary from sample to sample due to the random nature of sequencing.
139	If you ran cuffcompare with the -r option, the first and second columns contain the closest matching reference transcript to the one described by each row.
140
141	Here's an example of a line from the tracking file::
142
143	TCONS_00000045 XLOC_000023 Tcea\|uc007afj.1 j \
144	q1:exp.115\|exp.115.0\|100\|3.061355\|0.350242\|0.350207 \
145	q2:60hr.292\|60hr.292.0\|100\|4.094084\|0.000000\|0.000000
146
147	In this example, a transcript present in the two input files, called exp.115.0 in the first and 60hr.292.0 in the second, doesn't match any reference transcript exactly, but shares exons with uc007afj.1, an isoform of the gene Tcea, as indicated by the class code j. The first three columns are as follows::
148
149	Column number Column name Example Description
150	-----------------------------------------------------------------------
151	1 Cufflinks transfrag id TCONS_00000045 A unique internal id for the transfrag
152	2 Cufflinks locus id XLOC_000023 A unique internal id for the locus
153	3 Reference gene id Tcea The gene_name attribute of the reference GTF record for this transcript, or '-' if no reference transcript overlaps this Cufflinks transcript
154	4 Reference transcript id uc007afj.1 The transcript_id attribute of the reference GTF record for this transcript, or '-' if no reference transcript overlaps this Cufflinks transcript
155	5 Class code c The type of match between the Cufflinks transcripts in column 6 and the reference transcript. See class codes
156
157	Each of the columns after the fifth have the following format:
158	qJ:gene_id\|transcript_id\|FMI\|FPKM\|conf_lo\|conf_hi
159
160	A transcript need be present in all samples to be reported in the tracking file. A sample not containing a transcript will have a "-" in its entry in the row for that transcript.
161
162	Class Codes
163
164	If you ran cuffcompare with the -r option, tracking rows will contain the following values. If you did not use -r, the rows will all contain "-" in their class code column::
165
166	Priority Code Description
167	---------------------------------
168	1 = Match
169	2 c Contained
170	3 j New isoform
171	4 e A single exon transcript overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
172	5 i A single exon transcript falling entirely with a reference intron
173	6 r Repeat. Currently determined by looking at the reference sequence and applied to transcripts where at least 50% of the bases are lower case
174	7 p Possible polymerase run-on fragment
175	8 u Unknown, intergenic transcript
176	9 o Unknown, generic overlap with reference
177	10 . (.tracking file only, indicates multiple classifications)
178
179	-------
180
181	Settings
182
183	All of the options have a default value. You can change any of them. Most of the options in Cuffcompare have been implemented here.
184
185	------
186
187	Cuffcompare parameter list
188
189	This is a list of implemented Cuffcompare options::
190
191	-r An optional "reference" annotation GTF. Each sample is matched against this file, and sample isoforms are tagged as overlapping, matching, or novel where appropriate. See the refmap and tmap output file descriptions below.
192	-R If -r was specified, this option causes cuffcompare to ignore reference transcripts that are not overlapped by any transcript in one of cuff1.gtf,...,cuffN.gtf. Useful for ignoring annotated transcripts that are not present in your RNA-Seq samples and thus adjusting the "sensitivity" calculation in the accuracy report written in the transcripts_accuracy file
193	</help>
194	</tool>

Note: リポジトリブラウザについてのヘルプは TracBrowser を参照してください。

Context Navigation

root/galaxy-central/tools/ngs_rna/cuffcompare_wrapper.xml

異なるフォーマットでダウンロード: