1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" |
---|
2 | "http://www.w3.org/TR/html4/loose.dtd"> |
---|
3 | <html> |
---|
4 | <head> |
---|
5 | <title>Galaxy Data Formats</title> |
---|
6 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
---|
7 | <meta http-equiv="Content-Style-Type" content="text/css"> |
---|
8 | <style type="text/css"> |
---|
9 | hr { margin-top: 3ex; margin-bottom: 1ex; border: 1px inset } |
---|
10 | </style> |
---|
11 | </head> |
---|
12 | <body> |
---|
13 | <h2>Galaxy Data Formats</h2> |
---|
14 | <p> |
---|
15 | <br> |
---|
16 | |
---|
17 | <h3>Dataset missing?</h3> |
---|
18 | <p> |
---|
19 | If you have a dataset in your history that is not appearing in the |
---|
20 | drop-down selector for a tool, the most common reason is that it has |
---|
21 | the wrong format. Each Galaxy dataset has an associated file format |
---|
22 | recorded in its metadata, and tools will only list datasets from your |
---|
23 | history that have a format compatible with that particular tool. Of |
---|
24 | course some of these datasets might not actually contain relevant |
---|
25 | data, or even the correct columns needed by the tool, but filtering |
---|
26 | by format at least makes the list to select from a bit shorter. |
---|
27 | <p> |
---|
28 | Some of the formats are defined hierarchically, going from very |
---|
29 | general ones like <a href="#tab">Tabular</a> (which includes any text |
---|
30 | file with tab-separated columns), to more restrictive sub-formats |
---|
31 | like <a href="#interval">Interval</a> (where three of the columns |
---|
32 | must be the chromosome, start position, and end position), and on |
---|
33 | to even more specific ones such as <a href="#bed">BED</a> that have |
---|
34 | additional requirements. So for example if a tool's required input |
---|
35 | format is Tabular, then all of your history items whose format is |
---|
36 | recorded as Tabular will be listed, along with those in all |
---|
37 | sub-formats that also qualify as Tabular (Interval, BED, GFF, etc.). |
---|
38 | <p> |
---|
39 | There are two usual methods for changing a dataset's format in |
---|
40 | Galaxy: if the file contents are already in the required format but |
---|
41 | the metadata is wrong (perhaps because the Auto-detect feature of the |
---|
42 | Upload File tool guessed it incorrectly), you can fix the metadata |
---|
43 | manually by clicking on the pencil icon beside that dataset in your |
---|
44 | history. Or, if the file contents really are in a different format, |
---|
45 | Galaxy provides a number of format conversion tools (e.g. in the |
---|
46 | Text Manipulation and Convert Formats categories). For instance, |
---|
47 | if the tool you want to run requires Tabular but your columns are |
---|
48 | delimited by spaces or commas, you can use the "Convert delimiters |
---|
49 | to TAB" tool under Text Manipulation to reformat your data. However |
---|
50 | if your files are in a completely unsupported format, then you need |
---|
51 | to convert them yourself before uploading. |
---|
52 | <p> |
---|
53 | <hr> |
---|
54 | |
---|
55 | <h3>Format Descriptions</h3> |
---|
56 | <ul> |
---|
57 | <li><a href="#ab1">AB1</a> |
---|
58 | <li><a href="#axt">AXT</a> |
---|
59 | <li><a href="#bam">BAM</a> |
---|
60 | <li><a href="#bed">BED</a> |
---|
61 | <li><a href="#bedgraph">BedGraph</a> |
---|
62 | <li><a href="#binseq">Binseq.zip</a> |
---|
63 | <li><a href="#fasta">FASTA</a> |
---|
64 | <li><a href="#fastqsolexa">FastqSolexa</a> |
---|
65 | <li><a href="#fped">FPED</a> |
---|
66 | <li><a href="#gff">GFF</a> |
---|
67 | <li><a href="#gff3">GFF3</a> |
---|
68 | <li><a href="#gtf">GTF</a> |
---|
69 | <li><a href="#html">HTML</a> |
---|
70 | <li><a href="#interval">Interval</a> |
---|
71 | <li><a href="#lav">LAV</a> |
---|
72 | <li><a href="#lped">LPED</a> |
---|
73 | <li><a href="#maf">MAF</a> |
---|
74 | <li><a href="#pbed">PBED</a> |
---|
75 | <li><a href="#psl">PSL</a> |
---|
76 | <li><a href="#scf">SCF</a> |
---|
77 | <li><a href="#sff">SFF</a> |
---|
78 | <li><a href="#table">Table</a> |
---|
79 | <li><a href="#tab">Tabular</a> |
---|
80 | <li><a href="#txtseqzip">Txtseq.zip</a> |
---|
81 | <li><a href="#wig">Wiggle custom track</a> |
---|
82 | <li><a href="#text">Other text type</a> |
---|
83 | </ul> |
---|
84 | <p> |
---|
85 | |
---|
86 | <div><a name="ab1"></a></div> |
---|
87 | <hr> |
---|
88 | <strong>AB1</strong> |
---|
89 | <p> |
---|
90 | This is one of the ABIF family of binary sequence formats from |
---|
91 | Applied Biosystems Inc. |
---|
92 | <!-- Their PDF |
---|
93 | <a href="http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf" |
---|
94 | >format specification</a> is unfortunately password-protected. --> |
---|
95 | Files should have a '<code>.ab1</code>' file extension. You must |
---|
96 | manually select this file format when uploading the file. |
---|
97 | <p> |
---|
98 | |
---|
99 | <div><a name="axt"></a></div> |
---|
100 | <hr> |
---|
101 | <strong>AXT</strong> |
---|
102 | <p> |
---|
103 | Used for pairwise alignment output from BLASTZ, after post-processing. |
---|
104 | Each alignment block contains three lines: a summary line and two |
---|
105 | sequence lines. Blocks are separated from one another by blank lines. |
---|
106 | The summary line contains chromosomal position and size information |
---|
107 | about the alignment, and consists of nine required fields. |
---|
108 | <a href="http://main.genome-browser.bx.psu.edu/goldenPath/help/axt.html" |
---|
109 | >More information</a> |
---|
110 | <!-- (not available on Main) |
---|
111 | <dl><dt>Can be converted to: |
---|
112 | <dd><ul> |
---|
113 | <li>FASTA<br> |
---|
114 | Convert Formats → AXT to FASTA |
---|
115 | <li>LAV<br> |
---|
116 | Convert Formats → AXT to LAV |
---|
117 | </ul></dl> |
---|
118 | --> |
---|
119 | <p> |
---|
120 | |
---|
121 | <div><a name="bam"></a></div> |
---|
122 | <hr> |
---|
123 | <strong>BAM</strong> |
---|
124 | <p> |
---|
125 | A binary alignment file compressed in the BGZF format with a |
---|
126 | '<code>.bam</code>' file extension. |
---|
127 | <!-- You must manually select this file format when uploading the file. --> |
---|
128 | <a href="http://samtools.sourceforge.net/SAM1.pdf">SAM</a> |
---|
129 | is the human-readable text version of this format. |
---|
130 | <dl><dt>Can be converted to: |
---|
131 | <dd><ul> |
---|
132 | <li>SAM<br> |
---|
133 | NGS: SAM Tools → BAM-to-SAM |
---|
134 | <li>Pileup<br> |
---|
135 | NGS: SAM Tools → Generate pileup |
---|
136 | <li>Interval<br> |
---|
137 | First convert to Pileup as above, then use |
---|
138 | NGS: SAM Tools → Pileup-to-Interval |
---|
139 | </ul></dl> |
---|
140 | <p> |
---|
141 | |
---|
142 | <div><a name="bed"></a></div> |
---|
143 | <hr> |
---|
144 | <strong>BED</strong> |
---|
145 | <p> |
---|
146 | <ul> |
---|
147 | <li> also qualifies as Tabular |
---|
148 | <li> also qualifies as Interval |
---|
149 | </ul> |
---|
150 | This tab-separated format describes a genomic interval, but has |
---|
151 | strict field specifications for use in genome browsers. BED files |
---|
152 | can have from 3 to 12 columns, but the order of the columns matters, |
---|
153 | and only the end ones can be omitted. Some groups of columns must |
---|
154 | be all present or all absent. As in Interval format (but unlike |
---|
155 | GFF and its relatives), the interval endpoints use a 0-based, |
---|
156 | half-open numbering system. |
---|
157 | <a href="http://main.genome-browser.bx.psu.edu/goldenPath/help/hgTracksHelp.html#BED" |
---|
158 | >Field specifications</a> |
---|
159 | <p> |
---|
160 | Example: |
---|
161 | <pre> |
---|
162 | chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512 |
---|
163 | chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601 |
---|
164 | </pre> |
---|
165 | <dl><dt>Can be converted to: |
---|
166 | <dd><ul> |
---|
167 | <li>GFF<br> |
---|
168 | Convert Formats → BED-to-GFF |
---|
169 | </ul></dl> |
---|
170 | <p> |
---|
171 | |
---|
172 | <div><a name="bedgraph"></a></div> |
---|
173 | <hr> |
---|
174 | <strong>BedGraph</strong> |
---|
175 | <p> |
---|
176 | <ul> |
---|
177 | <li> also qualifies as Tabular |
---|
178 | <li> also qualifies as Interval |
---|
179 | <li> also qualifies as BED |
---|
180 | </ul> |
---|
181 | <a href="http://main.genome-browser.bx.psu.edu/goldenPath/help/bedgraph.html" |
---|
182 | >BedGraph</a> is a BED file with the name column being a float value |
---|
183 | that is displayed as a wiggle score in tracks. Unlike in Wiggle |
---|
184 | format, the exact value of this score can be retrieved after being |
---|
185 | loaded as a track. |
---|
186 | <p> |
---|
187 | |
---|
188 | <div><a name="binseq"></a></div> |
---|
189 | <hr> |
---|
190 | <strong>Binseq.zip</strong> |
---|
191 | <p> |
---|
192 | A zipped archive consisting of binary sequence files in either AB1 |
---|
193 | or SCF format. All files in this archive must have the same file |
---|
194 | extension which is one of '<code>.ab1</code>' or '<code>.scf</code>'. |
---|
195 | You must manually select this file format when uploading the file. |
---|
196 | <p> |
---|
197 | |
---|
198 | <div><a name="fasta"></a></div> |
---|
199 | <hr> |
---|
200 | <strong>FASTA</strong> |
---|
201 | <p> |
---|
202 | A sequence in |
---|
203 | <a href="http://www.ncbi.nlm.nih.gov/blast/fasta.shtml">FASTA</a> |
---|
204 | format consists of a single-line description, followed by lines of |
---|
205 | sequence data. The first character of the description line is a |
---|
206 | greater-than ('<code>></code>') symbol. All lines should be |
---|
207 | shorter than 80 characters. |
---|
208 | <pre> |
---|
209 | >sequence1 |
---|
210 | atgcgtttgcgtgc |
---|
211 | gtcggtttcgttgc |
---|
212 | >sequence2 |
---|
213 | tttcgtgcgtatag |
---|
214 | tggcgcggtga |
---|
215 | </pre> |
---|
216 | <dl><dt>Can be converted to: |
---|
217 | <dd><ul> |
---|
218 | <li>Tabular<br> |
---|
219 | Convert Formats → FASTA-to-Tabular |
---|
220 | </ul></dl> |
---|
221 | <p> |
---|
222 | |
---|
223 | <div><a name="fastqsolexa"></a></div> |
---|
224 | <hr> |
---|
225 | <strong>FastqSolexa</strong> |
---|
226 | <p> |
---|
227 | <a href="http://maq.sourceforge.net/fastq.shtml">FastqSolexa</a> |
---|
228 | is the Illumina (Solexa) variant of the FASTQ format, which stores |
---|
229 | sequences and quality scores in a single file. |
---|
230 | <pre> |
---|
231 | @seq1 |
---|
232 | GACAGCTTGGTTTTTAGTGAGTTGTTCCTTTCTTT |
---|
233 | +seq1 |
---|
234 | hhhhhhhhhhhhhhhhhhhhhhhhhhPW@hhhhhh |
---|
235 | @seq2 |
---|
236 | GCAATGACGGCAGCAATAAACTCAACAGGTGCTGG |
---|
237 | +seq2 |
---|
238 | hhhhhhhhhhhhhhYhhahhhhWhAhFhSIJGChO |
---|
239 | </pre> |
---|
240 | Or |
---|
241 | <pre> |
---|
242 | @seq1 |
---|
243 | GAATTGATCAGGACATAGGACAACTGTAGGCACCAT |
---|
244 | +seq1 |
---|
245 | 40 40 40 40 35 40 40 40 25 40 40 26 40 9 33 11 40 35 17 40 40 33 40 7 9 15 3 22 15 30 11 17 9 4 9 4 |
---|
246 | @seq2 |
---|
247 | GAGTTCTCGTCGCCTGTAGGCACCATCAATCGTATG |
---|
248 | +seq2 |
---|
249 | 40 15 40 17 6 36 40 40 40 25 40 9 35 33 40 14 14 18 15 17 19 28 31 4 24 18 27 14 15 18 2 8 12 8 11 9 |
---|
250 | </pre> |
---|
251 | <dl><dt>Can be converted to: |
---|
252 | <dd><ul> |
---|
253 | <li>FASTA<br> |
---|
254 | NGS: QC and manipulation → Generic FASTQ manipulation → FASTQ to FASTA |
---|
255 | <li>Tabular<br> |
---|
256 | NGS: QC and manipulation → Generic FASTQ manipulation → FASTQ to Tabular |
---|
257 | </ul></dl> |
---|
258 | <p> |
---|
259 | |
---|
260 | <div><a name="fped"></a></div> |
---|
261 | <hr> |
---|
262 | <strong>FPED</strong> |
---|
263 | <p> |
---|
264 | Also known as the FBAT format, for use with the |
---|
265 | <a href="http://biosun1.harvard.edu/~fbat/fbat.htm">FBAT</a> program. |
---|
266 | It consists of a pedigree file and a phenotype file. |
---|
267 | <p> |
---|
268 | |
---|
269 | <div><a name="gff"></a></div> |
---|
270 | <hr> |
---|
271 | <strong>GFF</strong> |
---|
272 | <p> |
---|
273 | <ul> |
---|
274 | <li> also qualifies as Tabular |
---|
275 | </ul> |
---|
276 | GFF is a tab-separated format somewhat similar to BED, but it has |
---|
277 | different columns and is more flexible. There are |
---|
278 | <a href="http://main.genome-browser.bx.psu.edu/FAQ/FAQformat#format3" |
---|
279 | >nine required fields</a>. |
---|
280 | Note that unlike Interval and BED, GFF and its relatives (GFF3, GTF) |
---|
281 | use 1-based inclusive coordinates to specify genomic intervals. |
---|
282 | <dl><dt>Can be converted to: |
---|
283 | <dd><ul> |
---|
284 | <li>BED<br> |
---|
285 | Convert Formats → GFF-to-BED |
---|
286 | </ul></dl> |
---|
287 | <p> |
---|
288 | |
---|
289 | <div><a name="gff3"></a></div> |
---|
290 | <hr> |
---|
291 | <strong>GFF3</strong> |
---|
292 | <p> |
---|
293 | <ul> |
---|
294 | <li> also qualifies as Tabular |
---|
295 | </ul> |
---|
296 | The <a href="http://www.sequenceontology.org/gff3.shtml">GFF3</a> |
---|
297 | format addresses the most common extensions to GFF, while attempting |
---|
298 | to preserve compatibility with previous formats. |
---|
299 | Note that unlike Interval and BED, GFF and its relatives (GFF3, GTF) |
---|
300 | use 1-based inclusive coordinates to specify genomic intervals. |
---|
301 | <p> |
---|
302 | |
---|
303 | <div><a name="gtf"></a></div> |
---|
304 | <hr> |
---|
305 | <strong>GTF</strong> |
---|
306 | <p> |
---|
307 | <ul> |
---|
308 | <li> also qualifies as Tabular |
---|
309 | </ul> |
---|
310 | <a href="http://main.genome-browser.bx.psu.edu/FAQ/FAQformat#format4" |
---|
311 | >GTF</a> is a format for describing genes and other features associated |
---|
312 | with DNA, RNA, and protein sequences. It is a refinement to GFF that |
---|
313 | tightens the specification. |
---|
314 | Note that unlike Interval and BED, GFF and its relatives (GFF3, GTF) |
---|
315 | use 1-based inclusive coordinates to specify genomic intervals. |
---|
316 | <!-- (not available on Main) |
---|
317 | <dl><dt>Can be converted to: |
---|
318 | <dd><ul> |
---|
319 | <li>BedGraph<br> |
---|
320 | Convert Formats → GTF-to-BEDGraph |
---|
321 | </ul></dl> |
---|
322 | --> |
---|
323 | <p> |
---|
324 | |
---|
325 | <div><a name="html"></a></div> |
---|
326 | <hr> |
---|
327 | <strong>HTML</strong> |
---|
328 | <p> |
---|
329 | This format is an HTML web page. Click the eye icon next to the |
---|
330 | dataset to view it in your browser. |
---|
331 | <p> |
---|
332 | |
---|
333 | <div><a name="interval"></a></div> |
---|
334 | <hr> |
---|
335 | <strong>Interval</strong> |
---|
336 | <p> |
---|
337 | <ul> |
---|
338 | <li> also qualifies as Tabular |
---|
339 | </ul> |
---|
340 | This Galaxy format represents genomic intervals. It is tab-separated, |
---|
341 | but has the added requirement that three of the columns must be the |
---|
342 | chromosome name, start position, and end position, where the positions |
---|
343 | use a 0-based, half-open numbering system (see below). An optional |
---|
344 | strand column can also be specified, and an initial header row can |
---|
345 | be used to label the columns, which do not have to be in any special |
---|
346 | order. Arbitrary additional columns can also be present. |
---|
347 | <p> |
---|
348 | Required fields: |
---|
349 | <ul> |
---|
350 | <li>CHROM - The name of the chromosome (e.g. chr3, chrY, chr2_random) |
---|
351 | or contig (e.g. ctgY1). |
---|
352 | <li>START - The starting position of the feature in the chromosome or |
---|
353 | contig. The first base in a chromosome is numbered 0. |
---|
354 | <li>END - The ending position of the feature in the chromosome or |
---|
355 | contig. This base is not included in the feature. For example, |
---|
356 | the first 100 bases of a chromosome are described as START=0, |
---|
357 | END=100, and span the bases numbered 0-99. |
---|
358 | </ul> |
---|
359 | Optional: |
---|
360 | <ul> |
---|
361 | <li>STRAND - Defines the strand, either '<code>+</code>' or |
---|
362 | '<code>-</code>'. |
---|
363 | <li>Header row |
---|
364 | </ul> |
---|
365 | Example: |
---|
366 | <pre> |
---|
367 | #CHROM START END STRAND NAME COMMENT |
---|
368 | chr1 10 100 + exon myExon |
---|
369 | chrX 1000 10050 - gene myGene |
---|
370 | </pre> |
---|
371 | <dl><dt>Can be converted to: |
---|
372 | <dd><ul> |
---|
373 | <li>BED<br> |
---|
374 | The exact changes needed and tools to run will vary with what fields |
---|
375 | are in the Interval file and what type of BED you are converting to. |
---|
376 | In general you will likely use Text Manipulation → Compute, Cut, |
---|
377 | or Merge Columns. |
---|
378 | </ul></dl> |
---|
379 | <p> |
---|
380 | |
---|
381 | <div><a name="lav"></a></div> |
---|
382 | <hr> |
---|
383 | <strong>LAV</strong> |
---|
384 | <p> |
---|
385 | <a href="http://www.bx.psu.edu/miller_lab/dist/lav_format.html">LAV</a> |
---|
386 | is the raw pairwise alignment format that is output by BLASTZ. The |
---|
387 | first line begins with <code>#:lav</code>. |
---|
388 | <!-- (not available on Main) |
---|
389 | <dl><dt>Can be converted to: |
---|
390 | <dd><ul> |
---|
391 | <li>BED<br> |
---|
392 | Convert Formats → LAV to BED |
---|
393 | </ul></dl> |
---|
394 | --> |
---|
395 | <p> |
---|
396 | |
---|
397 | <div><a name="lped"></a></div> |
---|
398 | <hr> |
---|
399 | <strong>LPED</strong> |
---|
400 | <p> |
---|
401 | This is the linkage pedigree format, which consists of separate MAP and PED |
---|
402 | files. Together these files describe SNPs; the map file contains the position |
---|
403 | and an identifier for the SNP, while the pedigree file has the alleles. To |
---|
404 | upload this format into Galaxy, do not use Auto-detect for the file format; |
---|
405 | instead select <code>lped</code>. You will then be given two sections for |
---|
406 | uploading files, one for the pedigree file and one for the map file. For more |
---|
407 | information, see |
---|
408 | <a href="http://www.broadinstitute.org/science/programs/medical-and-population-genetics/haploview/input-file-formats-0" |
---|
409 | >linkage pedigree</a>, |
---|
410 | <a href="http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#map">MAP</a>, |
---|
411 | and/or <a href="http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped">PED</a>. |
---|
412 | <dl><dt>Can be converted to: |
---|
413 | <dd><ul> |
---|
414 | <li>PBED<br>Automatic |
---|
415 | <li>FPED<br>Automatic |
---|
416 | </ul></dl> |
---|
417 | <p> |
---|
418 | |
---|
419 | <div><a name="maf"></a></div> |
---|
420 | <hr> |
---|
421 | <strong>MAF</strong> |
---|
422 | <p> |
---|
423 | <a href="http://main.genome-browser.bx.psu.edu/FAQ/FAQformat#format5" |
---|
424 | >MAF</a> is the multi-sequence alignment format that is output by TBA |
---|
425 | and Multiz. The first line begins with '<code>##maf</code>'. This |
---|
426 | word is followed by whitespace-separated "variable<code>=</code>value" |
---|
427 | pairs. There should be no whitespace surrounding the '<code>=</code>'. |
---|
428 | <dl><dt>Can be converted to: |
---|
429 | <dd><ul> |
---|
430 | <li>BED<br> |
---|
431 | Convert Formats → MAF to BED |
---|
432 | <li>Interval<br> |
---|
433 | Convert Formats → MAF to Interval |
---|
434 | <li>FASTA<br> |
---|
435 | Convert Formats → MAF to FASTA |
---|
436 | </ul></dl> |
---|
437 | <p> |
---|
438 | |
---|
439 | <div><a name="pbed"></a></div> |
---|
440 | <hr> |
---|
441 | <strong>PBED</strong> |
---|
442 | <p> |
---|
443 | This is the binary version of the LPED format. |
---|
444 | <dl><dt>Can be converted to: |
---|
445 | <dd><ul> |
---|
446 | <li>LPED<br>Automatic |
---|
447 | </ul></dl> |
---|
448 | <p> |
---|
449 | |
---|
450 | <div><a name="psl"></a></div> |
---|
451 | <hr> |
---|
452 | <strong>PSL</strong> |
---|
453 | <p> |
---|
454 | <a href="http://main.genome-browser.bx.psu.edu/FAQ/FAQformat#format2">PSL</a> |
---|
455 | format is used for alignments returned by |
---|
456 | <a href="http://genome.ucsc.edu/cgi-bin/hgBlat?command=start">BLAT</a>. |
---|
457 | It does not include any sequence. |
---|
458 | <p> |
---|
459 | |
---|
460 | <div><a name="scf"></a></div> |
---|
461 | <hr> |
---|
462 | <strong>SCF</strong> |
---|
463 | <p> |
---|
464 | This is a binary sequence format originally designed for the Staden |
---|
465 | sequence handling software package. Files should have a |
---|
466 | '<code>.scf</code>' file extension. You must manually select this |
---|
467 | file format when uploading the file. |
---|
468 | <a href="http://staden.sourceforge.net/manual/formats_unix_2.html" |
---|
469 | >More information</a> |
---|
470 | <p> |
---|
471 | |
---|
472 | <div><a name="sff"></a></div> |
---|
473 | <hr> |
---|
474 | <strong>SFF</strong> |
---|
475 | <p> |
---|
476 | This is a binary sequence format used by the Roche 454 GS FLX |
---|
477 | sequencing machine, and is documented on p. 528 of their |
---|
478 | <a href="http://sequence.otago.ac.nz/download/GS_FLX_Software_Manual.pdf" |
---|
479 | >software manual</a>. Files should have a '<code>.sff</code>' file |
---|
480 | extension. |
---|
481 | <!-- You must manually select this file format when uploading the file. --> |
---|
482 | <dl><dt>Can be converted to: |
---|
483 | <dd><ul> |
---|
484 | <li>FASTA<br> |
---|
485 | Convert Formats → SFF converter |
---|
486 | <li>FASTQ<br> |
---|
487 | Convert Formats → SFF converter |
---|
488 | </ul></dl> |
---|
489 | <p> |
---|
490 | |
---|
491 | <div><a name="table"></a></div> |
---|
492 | <hr> |
---|
493 | <strong>Table</strong> |
---|
494 | <p> |
---|
495 | Text data separated into columns by something other than tabs. |
---|
496 | <p> |
---|
497 | |
---|
498 | <div><a name="tab"></a></div> |
---|
499 | <hr> |
---|
500 | <strong>Tabular (tab-delimited)</strong> |
---|
501 | <p> |
---|
502 | One or more columns of text data separated by tabs. |
---|
503 | <dl><dt>Can be converted to: |
---|
504 | <dd><ul> |
---|
505 | <li>FASTA<br> |
---|
506 | Convert Formats → Tabular-to-FASTA<br> |
---|
507 | The Tabular file must have a title and sequence column. |
---|
508 | <li>FASTQ<br> |
---|
509 | NGS: QC and manipulation → Generic FASTQ manipulation → Tabular to FASTQ |
---|
510 | <li>Interval<br> |
---|
511 | If the Tabular file has a chromosome column (or is all on one |
---|
512 | chromosome) and has a position column, you can create an Interval |
---|
513 | file (e.g. for SNPs). If it is all on one chromosome, use |
---|
514 | Text Manipulation → Add column to add a CHROM column. |
---|
515 | If the given position is 1-based, use |
---|
516 | Text Manipulation → Compute with the position column minus 1 to |
---|
517 | get the START, and use the original given column for the END. |
---|
518 | If the given position is 0-based, use it as the START, and compute |
---|
519 | that plus 1 to get the END. |
---|
520 | </ul></dl> |
---|
521 | <p> |
---|
522 | |
---|
523 | <div><a name="txtseqzip"></a></div> |
---|
524 | <hr> |
---|
525 | <strong>Txtseq.zip</strong> |
---|
526 | <p> |
---|
527 | A zipped archive consisting of flat text sequence files. All files |
---|
528 | in this archive must have the same file extension of |
---|
529 | '<code>.txt</code>'. You must manually select this file format when |
---|
530 | uploading the file. |
---|
531 | <p> |
---|
532 | |
---|
533 | <div><a name="wig"></a></div> |
---|
534 | <hr> |
---|
535 | <strong>Wiggle custom track</strong> |
---|
536 | <p> |
---|
537 | Wiggle tracks are typically used to display per-nucleotide scores |
---|
538 | in a genome browser. The Wiggle format for custom tracks is |
---|
539 | line-oriented, and the wiggle data is preceded by a track definition |
---|
540 | line that specifies which of three different types is being used. |
---|
541 | <a href="http://main.genome-browser.bx.psu.edu/goldenPath/help/wiggle.html" |
---|
542 | >More information</a> |
---|
543 | <dl><dt>Can be converted to: |
---|
544 | <dd><ul> |
---|
545 | <li>Interval<br> |
---|
546 | Get Genomic Scores → Wiggle-to-Interval |
---|
547 | <li>As a second step this could be converted to 3- or 4-column BED, |
---|
548 | by removing extra columns using |
---|
549 | Text Manipulation → Cut columns from a table. |
---|
550 | </ul></dl> |
---|
551 | <p> |
---|
552 | |
---|
553 | <div><a name="text"></a></div> |
---|
554 | <hr> |
---|
555 | <strong>Other text type</strong> |
---|
556 | <p> |
---|
557 | Any text file. |
---|
558 | <dl><dt>Can be converted to: |
---|
559 | <dd><ul> |
---|
560 | <li>Tabular<br> |
---|
561 | If the text has fields separated by spaces, commas, or some other |
---|
562 | delimiter, it can be converted to Tabular by using |
---|
563 | Text Manipulation → Convert delimiters to TAB. |
---|
564 | </ul></dl> |
---|
565 | <p> |
---|
566 | |
---|
567 | <!-- blank lines so internal links will jump farther to end --> |
---|
568 | <br><br><br><br><br><br><br><br><br><br><br><br> |
---|
569 | <br><br><br><br><br><br><br><br><br><br><br><br> |
---|
570 | </body> |
---|
571 | </html> |
---|