[2] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" |
---|
| 2 | "http://www.w3.org/TR/html4/loose.dtd"> |
---|
| 3 | <html> |
---|
| 4 | <head> |
---|
| 5 | <title>Galaxy Data Formats</title> |
---|
| 6 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
---|
| 7 | <meta http-equiv="Content-Style-Type" content="text/css"> |
---|
| 8 | <style type="text/css"> |
---|
| 9 | hr { margin-top: 3ex; margin-bottom: 1ex; border: 1px inset } |
---|
| 10 | </style> |
---|
| 11 | </head> |
---|
| 12 | <body> |
---|
| 13 | <h2>Galaxy Data Formats</h2> |
---|
| 14 | <p> |
---|
| 15 | <br> |
---|
| 16 | |
---|
| 17 | <h3>Dataset missing?</h3> |
---|
| 18 | <p> |
---|
| 19 | If you have a dataset in your history that is not appearing in the |
---|
| 20 | drop-down selector for a tool, the most common reason is that it has |
---|
| 21 | the wrong format. Each Galaxy dataset has an associated file format |
---|
| 22 | recorded in its metadata, and tools will only list datasets from your |
---|
| 23 | history that have a format compatible with that particular tool. Of |
---|
| 24 | course some of these datasets might not actually contain relevant |
---|
| 25 | data, or even the correct columns needed by the tool, but filtering |
---|
| 26 | by format at least makes the list to select from a bit shorter. |
---|
| 27 | <p> |
---|
| 28 | Some of the formats are defined hierarchically, going from very |
---|
| 29 | general ones like <a href="#tab">Tabular</a> (which includes any text |
---|
| 30 | file with tab-separated columns), to more restrictive sub-formats |
---|
| 31 | like <a href="#interval">Interval</a> (where three of the columns |
---|
| 32 | must be the chromosome, start position, and end position), and on |
---|
| 33 | to even more specific ones such as <a href="#bed">BED</a> that have |
---|
| 34 | additional requirements. So for example if a tool's required input |
---|
| 35 | format is Tabular, then all of your history items whose format is |
---|
| 36 | recorded as Tabular will be listed, along with those in all |
---|
| 37 | sub-formats that also qualify as Tabular (Interval, BED, GFF, etc.). |
---|
| 38 | <p> |
---|
| 39 | There are two usual methods for changing a dataset's format in |
---|
| 40 | Galaxy: if the file contents are already in the required format but |
---|
| 41 | the metadata is wrong (perhaps because the Auto-detect feature of the |
---|
| 42 | Upload File tool guessed it incorrectly), you can fix the metadata |
---|
| 43 | manually by clicking on the pencil icon beside that dataset in your |
---|
| 44 | history. Or, if the file contents really are in a different format, |
---|
| 45 | Galaxy provides a number of format conversion tools (e.g. in the |
---|
| 46 | Text Manipulation and Convert Formats categories). For instance, |
---|
| 47 | if the tool you want to run requires Tabular but your columns are |
---|
| 48 | delimited by spaces or commas, you can use the "Convert delimiters |
---|
| 49 | to TAB" tool under Text Manipulation to reformat your data. However |
---|
| 50 | if your files are in a completely unsupported format, then you need |
---|
| 51 | to convert them yourself before uploading. |
---|
| 52 | <p> |
---|
| 53 | <hr> |
---|
| 54 | |
---|
| 55 | <h3>Format Descriptions</h3> |
---|
| 56 | <ul> |
---|
| 57 | <li><a href="#ab1">AB1</a> |
---|
| 58 | <li><a href="#axt">AXT</a> |
---|
| 59 | <li><a href="#bam">BAM</a> |
---|
| 60 | <li><a href="#bed">BED</a> |
---|
| 61 | <li><a href="#bedgraph">BedGraph</a> |
---|
| 62 | <li><a href="#binseq">Binseq.zip</a> |
---|
| 63 | <li><a href="#fasta">FASTA</a> |
---|
| 64 | <li><a href="#fastqsolexa">FastqSolexa</a> |
---|
| 65 | <li><a href="#fped">FPED</a> |
---|
| 66 | <li><a href="#gff">GFF</a> |
---|
| 67 | <li><a href="#gff3">GFF3</a> |
---|
| 68 | <li><a href="#gtf">GTF</a> |
---|
| 69 | <li><a href="#html">HTML</a> |
---|
| 70 | <li><a href="#interval">Interval</a> |
---|
| 71 | <li><a href="#lav">LAV</a> |
---|
| 72 | <li><a href="#lped">LPED</a> |
---|
| 73 | <li><a href="#maf">MAF</a> |
---|
| 74 | <li><a href="#pbed">PBED</a> |
---|
| 75 | <li><a href="#psl">PSL</a> |
---|
| 76 | <li><a href="#scf">SCF</a> |
---|
| 77 | <li><a href="#sff">SFF</a> |
---|
| 78 | <li><a href="#table">Table</a> |
---|
| 79 | <li><a href="#tab">Tabular</a> |
---|
| 80 | <li><a href="#txtseqzip">Txtseq.zip</a> |
---|
| 81 | <li><a href="#wig">Wiggle custom track</a> |
---|
| 82 | <li><a href="#text">Other text type</a> |
---|
| 83 | </ul> |
---|
| 84 | <p> |
---|
| 85 | |
---|
| 86 | <div><a name="ab1"></a></div> |
---|
| 87 | <hr> |
---|
| 88 | <strong>AB1</strong> |
---|
| 89 | <p> |
---|
| 90 | This is one of the ABIF family of binary sequence formats from |
---|
| 91 | Applied Biosystems Inc. |
---|
| 92 | <!-- Their PDF |
---|
| 93 | <a href="http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf" |
---|
| 94 | >format specification</a> is unfortunately password-protected. --> |
---|
| 95 | Files should have a '<code>.ab1</code>' file extension. You must |
---|
| 96 | manually select this file format when uploading the file. |
---|
| 97 | <p> |
---|
| 98 | |
---|
| 99 | <div><a name="axt"></a></div> |
---|
| 100 | <hr> |
---|
| 101 | <strong>AXT</strong> |
---|
| 102 | <p> |
---|
| 103 | Used for pairwise alignment output from BLASTZ, after post-processing. |
---|
| 104 | Each alignment block contains three lines: a summary line and two |
---|
| 105 | sequence lines. Blocks are separated from one another by blank lines. |
---|
| 106 | The summary line contains chromosomal position and size information |
---|
| 107 | about the alignment, and consists of nine required fields. |
---|
| 108 | <a href="http://main.genome-browser.bx.psu.edu/goldenPath/help/axt.html" |
---|
| 109 | >More information</a> |
---|
| 110 | <!-- (not available on Main) |
---|
| 111 | <dl><dt>Can be converted to: |
---|
| 112 | <dd><ul> |
---|
| 113 | <li>FASTA<br> |
---|
| 114 | Convert Formats → AXT to FASTA |
---|
| 115 | <li>LAV<br> |
---|
| 116 | Convert Formats → AXT to LAV |
---|
| 117 | </ul></dl> |
---|
| 118 | --> |
---|
| 119 | <p> |
---|
| 120 | |
---|
| 121 | <div><a name="bam"></a></div> |
---|
| 122 | <hr> |
---|
| 123 | <strong>BAM</strong> |
---|
| 124 | <p> |
---|
| 125 | A binary alignment file compressed in the BGZF format with a |
---|
| 126 | '<code>.bam</code>' file extension. |
---|
| 127 | <!-- You must manually select this file format when uploading the file. --> |
---|
| 128 | <a href="http://samtools.sourceforge.net/SAM1.pdf">SAM</a> |
---|
| 129 | is the human-readable text version of this format. |
---|
| 130 | <dl><dt>Can be converted to: |
---|
| 131 | <dd><ul> |
---|
| 132 | <li>SAM<br> |
---|
| 133 | NGS: SAM Tools → BAM-to-SAM |
---|
| 134 | <li>Pileup<br> |
---|
| 135 | NGS: SAM Tools → Generate pileup |
---|
| 136 | <li>Interval<br> |
---|
| 137 | First convert to Pileup as above, then use |
---|
| 138 | NGS: SAM Tools → Pileup-to-Interval |
---|
| 139 | </ul></dl> |
---|
| 140 | <p> |
---|
| 141 | |
---|
| 142 | <div><a name="bed"></a></div> |
---|
| 143 | <hr> |
---|
| 144 | <strong>BED</strong> |
---|
| 145 | <p> |
---|
| 146 | <ul> |
---|
| 147 | <li> also qualifies as Tabular |
---|
| 148 | <li> also qualifies as Interval |
---|
| 149 | </ul> |
---|
| 150 | This tab-separated format describes a genomic interval, but has |
---|
| 151 | strict field specifications for use in genome browsers. BED files |
---|
| 152 | can have from 3 to 12 columns, but the order of the columns matters, |
---|
| 153 | and only the end ones can be omitted. Some groups of columns must |
---|
| 154 | be all present or all absent. As in Interval format (but unlike |
---|
| 155 | GFF and its relatives), the interval endpoints use a 0-based, |
---|
| 156 | half-open numbering system. |
---|
| 157 | <a href="http://main.genome-browser.bx.psu.edu/goldenPath/help/hgTracksHelp.html#BED" |
---|
| 158 | >Field specifications</a> |
---|
| 159 | <p> |
---|
| 160 | Example: |
---|
| 161 | <pre> |
---|
| 162 | chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512 |
---|
| 163 | chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601 |
---|
| 164 | </pre> |
---|
| 165 | <dl><dt>Can be converted to: |
---|
| 166 | <dd><ul> |
---|
| 167 | <li>GFF<br> |
---|
| 168 | Convert Formats → BED-to-GFF |
---|
| 169 | </ul></dl> |
---|
| 170 | <p> |
---|
| 171 | |
---|
| 172 | <div><a name="bedgraph"></a></div> |
---|
| 173 | <hr> |
---|
| 174 | <strong>BedGraph</strong> |
---|
| 175 | <p> |
---|
| 176 | <ul> |
---|
| 177 | <li> also qualifies as Tabular |
---|
| 178 | <li> also qualifies as Interval |
---|
| 179 | <li> also qualifies as BED |
---|
| 180 | </ul> |
---|
| 181 | <a href="http://main.genome-browser.bx.psu.edu/goldenPath/help/bedgraph.html" |
---|
| 182 | >BedGraph</a> is a BED file with the name column being a float value |
---|
| 183 | that is displayed as a wiggle score in tracks. Unlike in Wiggle |
---|
| 184 | format, the exact value of this score can be retrieved after being |
---|
| 185 | loaded as a track. |
---|
| 186 | <p> |
---|
| 187 | |
---|
| 188 | <div><a name="binseq"></a></div> |
---|
| 189 | <hr> |
---|
| 190 | <strong>Binseq.zip</strong> |
---|
| 191 | <p> |
---|
| 192 | A zipped archive consisting of binary sequence files in either AB1 |
---|
| 193 | or SCF format. All files in this archive must have the same file |
---|
| 194 | extension which is one of '<code>.ab1</code>' or '<code>.scf</code>'. |
---|
| 195 | You must manually select this file format when uploading the file. |
---|
| 196 | <p> |
---|
| 197 | |
---|
| 198 | <div><a name="fasta"></a></div> |
---|
| 199 | <hr> |
---|
| 200 | <strong>FASTA</strong> |
---|
| 201 | <p> |
---|
| 202 | A sequence in |
---|
| 203 | <a href="http://www.ncbi.nlm.nih.gov/blast/fasta.shtml">FASTA</a> |
---|
| 204 | format consists of a single-line description, followed by lines of |
---|
| 205 | sequence data. The first character of the description line is a |
---|
| 206 | greater-than ('<code>></code>') symbol. All lines should be |
---|
| 207 | shorter than 80 characters. |
---|
| 208 | <pre> |
---|
| 209 | >sequence1 |
---|
| 210 | atgcgtttgcgtgc |
---|
| 211 | gtcggtttcgttgc |
---|
| 212 | >sequence2 |
---|
| 213 | tttcgtgcgtatag |
---|
| 214 | tggcgcggtga |
---|
| 215 | </pre> |
---|
| 216 | <dl><dt>Can be converted to: |
---|
| 217 | <dd><ul> |
---|
| 218 | <li>Tabular<br> |
---|
| 219 | Convert Formats → FASTA-to-Tabular |
---|
| 220 | </ul></dl> |
---|
| 221 | <p> |
---|
| 222 | |
---|
| 223 | <div><a name="fastqsolexa"></a></div> |
---|
| 224 | <hr> |
---|
| 225 | <strong>FastqSolexa</strong> |
---|
| 226 | <p> |
---|
| 227 | <a href="http://maq.sourceforge.net/fastq.shtml">FastqSolexa</a> |
---|
| 228 | is the Illumina (Solexa) variant of the FASTQ format, which stores |
---|
| 229 | sequences and quality scores in a single file. |
---|
| 230 | <pre> |
---|
| 231 | @seq1 |
---|
| 232 | GACAGCTTGGTTTTTAGTGAGTTGTTCCTTTCTTT |
---|
| 233 | +seq1 |
---|
| 234 | hhhhhhhhhhhhhhhhhhhhhhhhhhPW@hhhhhh |
---|
| 235 | @seq2 |
---|
| 236 | GCAATGACGGCAGCAATAAACTCAACAGGTGCTGG |
---|
| 237 | +seq2 |
---|
| 238 | hhhhhhhhhhhhhhYhhahhhhWhAhFhSIJGChO |
---|
| 239 | </pre> |
---|
| 240 | Or |
---|
| 241 | <pre> |
---|
| 242 | @seq1 |
---|
| 243 | GAATTGATCAGGACATAGGACAACTGTAGGCACCAT |
---|
| 244 | +seq1 |
---|
| 245 | 40 40 40 40 35 40 40 40 25 40 40 26 40 9 33 11 40 35 17 40 40 33 40 7 9 15 3 22 15 30 11 17 9 4 9 4 |
---|
| 246 | @seq2 |
---|
| 247 | GAGTTCTCGTCGCCTGTAGGCACCATCAATCGTATG |
---|
| 248 | +seq2 |
---|
| 249 | 40 15 40 17 6 36 40 40 40 25 40 9 35 33 40 14 14 18 15 17 19 28 31 4 24 18 27 14 15 18 2 8 12 8 11 9 |
---|
| 250 | </pre> |
---|
| 251 | <dl><dt>Can be converted to: |
---|
| 252 | <dd><ul> |
---|
| 253 | <li>FASTA<br> |
---|
| 254 | NGS: QC and manipulation → Generic FASTQ manipulation → FASTQ to FASTA |
---|
| 255 | <li>Tabular<br> |
---|
| 256 | NGS: QC and manipulation → Generic FASTQ manipulation → FASTQ to Tabular |
---|
| 257 | </ul></dl> |
---|
| 258 | <p> |
---|
| 259 | |
---|
| 260 | <div><a name="fped"></a></div> |
---|
| 261 | <hr> |
---|
| 262 | <strong>FPED</strong> |
---|
| 263 | <p> |
---|
| 264 | Also known as the FBAT format, for use with the |
---|
| 265 | <a href="http://biosun1.harvard.edu/~fbat/fbat.htm">FBAT</a> program. |
---|
| 266 | It consists of a pedigree file and a phenotype file. |
---|
| 267 | <p> |
---|
| 268 | |
---|
| 269 | <div><a name="gff"></a></div> |
---|
| 270 | <hr> |
---|
| 271 | <strong>GFF</strong> |
---|
| 272 | <p> |
---|
| 273 | <ul> |
---|
| 274 | <li> also qualifies as Tabular |
---|
| 275 | </ul> |
---|
| 276 | GFF is a tab-separated format somewhat similar to BED, but it has |
---|
| 277 | different columns and is more flexible. There are |
---|
| 278 | <a href="http://main.genome-browser.bx.psu.edu/FAQ/FAQformat#format3" |
---|
| 279 | >nine required fields</a>. |
---|
| 280 | Note that unlike Interval and BED, GFF and its relatives (GFF3, GTF) |
---|
| 281 | use 1-based inclusive coordinates to specify genomic intervals. |
---|
| 282 | <dl><dt>Can be converted to: |
---|
| 283 | <dd><ul> |
---|
| 284 | <li>BED<br> |
---|
| 285 | Convert Formats → GFF-to-BED |
---|
| 286 | </ul></dl> |
---|
| 287 | <p> |
---|
| 288 | |
---|
| 289 | <div><a name="gff3"></a></div> |
---|
| 290 | <hr> |
---|
| 291 | <strong>GFF3</strong> |
---|
| 292 | <p> |
---|
| 293 | <ul> |
---|
| 294 | <li> also qualifies as Tabular |
---|
| 295 | </ul> |
---|
| 296 | The <a href="http://www.sequenceontology.org/gff3.shtml">GFF3</a> |
---|
| 297 | format addresses the most common extensions to GFF, while attempting |
---|
| 298 | to preserve compatibility with previous formats. |
---|
| 299 | Note that unlike Interval and BED, GFF and its relatives (GFF3, GTF) |
---|
| 300 | use 1-based inclusive coordinates to specify genomic intervals. |
---|
| 301 | <p> |
---|
| 302 | |
---|
| 303 | <div><a name="gtf"></a></div> |
---|
| 304 | <hr> |
---|
| 305 | <strong>GTF</strong> |
---|
| 306 | <p> |
---|
| 307 | <ul> |
---|
| 308 | <li> also qualifies as Tabular |
---|
| 309 | </ul> |
---|
| 310 | <a href="http://main.genome-browser.bx.psu.edu/FAQ/FAQformat#format4" |
---|
| 311 | >GTF</a> is a format for describing genes and other features associated |
---|
| 312 | with DNA, RNA, and protein sequences. It is a refinement to GFF that |
---|
| 313 | tightens the specification. |
---|
| 314 | Note that unlike Interval and BED, GFF and its relatives (GFF3, GTF) |
---|
| 315 | use 1-based inclusive coordinates to specify genomic intervals. |
---|
| 316 | <!-- (not available on Main) |
---|
| 317 | <dl><dt>Can be converted to: |
---|
| 318 | <dd><ul> |
---|
| 319 | <li>BedGraph<br> |
---|
| 320 | Convert Formats → GTF-to-BEDGraph |
---|
| 321 | </ul></dl> |
---|
| 322 | --> |
---|
| 323 | <p> |
---|
| 324 | |
---|
| 325 | <div><a name="html"></a></div> |
---|
| 326 | <hr> |
---|
| 327 | <strong>HTML</strong> |
---|
| 328 | <p> |
---|
| 329 | This format is an HTML web page. Click the eye icon next to the |
---|
| 330 | dataset to view it in your browser. |
---|
| 331 | <p> |
---|
| 332 | |
---|
| 333 | <div><a name="interval"></a></div> |
---|
| 334 | <hr> |
---|
| 335 | <strong>Interval</strong> |
---|
| 336 | <p> |
---|
| 337 | <ul> |
---|
| 338 | <li> also qualifies as Tabular |
---|
| 339 | </ul> |
---|
| 340 | This Galaxy format represents genomic intervals. It is tab-separated, |
---|
| 341 | but has the added requirement that three of the columns must be the |
---|
| 342 | chromosome name, start position, and end position, where the positions |
---|
| 343 | use a 0-based, half-open numbering system (see below). An optional |
---|
| 344 | strand column can also be specified, and an initial header row can |
---|
| 345 | be used to label the columns, which do not have to be in any special |
---|
| 346 | order. Arbitrary additional columns can also be present. |
---|
| 347 | <p> |
---|
| 348 | Required fields: |
---|
| 349 | <ul> |
---|
| 350 | <li>CHROM - The name of the chromosome (e.g. chr3, chrY, chr2_random) |
---|
| 351 | or contig (e.g. ctgY1). |
---|
| 352 | <li>START - The starting position of the feature in the chromosome or |
---|
| 353 | contig. The first base in a chromosome is numbered 0. |
---|
| 354 | <li>END - The ending position of the feature in the chromosome or |
---|
| 355 | contig. This base is not included in the feature. For example, |
---|
| 356 | the first 100 bases of a chromosome are described as START=0, |
---|
| 357 | END=100, and span the bases numbered 0-99. |
---|
| 358 | </ul> |
---|
| 359 | Optional: |
---|
| 360 | <ul> |
---|
| 361 | <li>STRAND - Defines the strand, either '<code>+</code>' or |
---|
| 362 | '<code>-</code>'. |
---|
| 363 | <li>Header row |
---|
| 364 | </ul> |
---|
| 365 | Example: |
---|
| 366 | <pre> |
---|
| 367 | #CHROM START END STRAND NAME COMMENT |
---|
| 368 | chr1 10 100 + exon myExon |
---|
| 369 | chrX 1000 10050 - gene myGene |
---|
| 370 | </pre> |
---|
| 371 | <dl><dt>Can be converted to: |
---|
| 372 | <dd><ul> |
---|
| 373 | <li>BED<br> |
---|
| 374 | The exact changes needed and tools to run will vary with what fields |
---|
| 375 | are in the Interval file and what type of BED you are converting to. |
---|
| 376 | In general you will likely use Text Manipulation → Compute, Cut, |
---|
| 377 | or Merge Columns. |
---|
| 378 | </ul></dl> |
---|
| 379 | <p> |
---|
| 380 | |
---|
| 381 | <div><a name="lav"></a></div> |
---|
| 382 | <hr> |
---|
| 383 | <strong>LAV</strong> |
---|
| 384 | <p> |
---|
| 385 | <a href="http://www.bx.psu.edu/miller_lab/dist/lav_format.html">LAV</a> |
---|
| 386 | is the raw pairwise alignment format that is output by BLASTZ. The |
---|
| 387 | first line begins with <code>#:lav</code>. |
---|
| 388 | <!-- (not available on Main) |
---|
| 389 | <dl><dt>Can be converted to: |
---|
| 390 | <dd><ul> |
---|
| 391 | <li>BED<br> |
---|
| 392 | Convert Formats → LAV to BED |
---|
| 393 | </ul></dl> |
---|
| 394 | --> |
---|
| 395 | <p> |
---|
| 396 | |
---|
| 397 | <div><a name="lped"></a></div> |
---|
| 398 | <hr> |
---|
| 399 | <strong>LPED</strong> |
---|
| 400 | <p> |
---|
| 401 | This is the linkage pedigree format, which consists of separate MAP and PED |
---|
| 402 | files. Together these files describe SNPs; the map file contains the position |
---|
| 403 | and an identifier for the SNP, while the pedigree file has the alleles. To |
---|
| 404 | upload this format into Galaxy, do not use Auto-detect for the file format; |
---|
| 405 | instead select <code>lped</code>. You will then be given two sections for |
---|
| 406 | uploading files, one for the pedigree file and one for the map file. For more |
---|
| 407 | information, see |
---|
| 408 | <a href="http://www.broadinstitute.org/science/programs/medical-and-population-genetics/haploview/input-file-formats-0" |
---|
| 409 | >linkage pedigree</a>, |
---|
| 410 | <a href="http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#map">MAP</a>, |
---|
| 411 | and/or <a href="http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped">PED</a>. |
---|
| 412 | <dl><dt>Can be converted to: |
---|
| 413 | <dd><ul> |
---|
| 414 | <li>PBED<br>Automatic |
---|
| 415 | <li>FPED<br>Automatic |
---|
| 416 | </ul></dl> |
---|
| 417 | <p> |
---|
| 418 | |
---|
| 419 | <div><a name="maf"></a></div> |
---|
| 420 | <hr> |
---|
| 421 | <strong>MAF</strong> |
---|
| 422 | <p> |
---|
| 423 | <a href="http://main.genome-browser.bx.psu.edu/FAQ/FAQformat#format5" |
---|
| 424 | >MAF</a> is the multi-sequence alignment format that is output by TBA |
---|
| 425 | and Multiz. The first line begins with '<code>##maf</code>'. This |
---|
| 426 | word is followed by whitespace-separated "variable<code>=</code>value" |
---|
| 427 | pairs. There should be no whitespace surrounding the '<code>=</code>'. |
---|
| 428 | <dl><dt>Can be converted to: |
---|
| 429 | <dd><ul> |
---|
| 430 | <li>BED<br> |
---|
| 431 | Convert Formats → MAF to BED |
---|
| 432 | <li>Interval<br> |
---|
| 433 | Convert Formats → MAF to Interval |
---|
| 434 | <li>FASTA<br> |
---|
| 435 | Convert Formats → MAF to FASTA |
---|
| 436 | </ul></dl> |
---|
| 437 | <p> |
---|
| 438 | |
---|
| 439 | <div><a name="pbed"></a></div> |
---|
| 440 | <hr> |
---|
| 441 | <strong>PBED</strong> |
---|
| 442 | <p> |
---|
| 443 | This is the binary version of the LPED format. |
---|
| 444 | <dl><dt>Can be converted to: |
---|
| 445 | <dd><ul> |
---|
| 446 | <li>LPED<br>Automatic |
---|
| 447 | </ul></dl> |
---|
| 448 | <p> |
---|
| 449 | |
---|
| 450 | <div><a name="psl"></a></div> |
---|
| 451 | <hr> |
---|
| 452 | <strong>PSL</strong> |
---|
| 453 | <p> |
---|
| 454 | <a href="http://main.genome-browser.bx.psu.edu/FAQ/FAQformat#format2">PSL</a> |
---|
| 455 | format is used for alignments returned by |
---|
| 456 | <a href="http://genome.ucsc.edu/cgi-bin/hgBlat?command=start">BLAT</a>. |
---|
| 457 | It does not include any sequence. |
---|
| 458 | <p> |
---|
| 459 | |
---|
| 460 | <div><a name="scf"></a></div> |
---|
| 461 | <hr> |
---|
| 462 | <strong>SCF</strong> |
---|
| 463 | <p> |
---|
| 464 | This is a binary sequence format originally designed for the Staden |
---|
| 465 | sequence handling software package. Files should have a |
---|
| 466 | '<code>.scf</code>' file extension. You must manually select this |
---|
| 467 | file format when uploading the file. |
---|
| 468 | <a href="http://staden.sourceforge.net/manual/formats_unix_2.html" |
---|
| 469 | >More information</a> |
---|
| 470 | <p> |
---|
| 471 | |
---|
| 472 | <div><a name="sff"></a></div> |
---|
| 473 | <hr> |
---|
| 474 | <strong>SFF</strong> |
---|
| 475 | <p> |
---|
| 476 | This is a binary sequence format used by the Roche 454 GS FLX |
---|
| 477 | sequencing machine, and is documented on p. 528 of their |
---|
| 478 | <a href="http://sequence.otago.ac.nz/download/GS_FLX_Software_Manual.pdf" |
---|
| 479 | >software manual</a>. Files should have a '<code>.sff</code>' file |
---|
| 480 | extension. |
---|
| 481 | <!-- You must manually select this file format when uploading the file. --> |
---|
| 482 | <dl><dt>Can be converted to: |
---|
| 483 | <dd><ul> |
---|
| 484 | <li>FASTA<br> |
---|
| 485 | Convert Formats → SFF converter |
---|
| 486 | <li>FASTQ<br> |
---|
| 487 | Convert Formats → SFF converter |
---|
| 488 | </ul></dl> |
---|
| 489 | <p> |
---|
| 490 | |
---|
| 491 | <div><a name="table"></a></div> |
---|
| 492 | <hr> |
---|
| 493 | <strong>Table</strong> |
---|
| 494 | <p> |
---|
| 495 | Text data separated into columns by something other than tabs. |
---|
| 496 | <p> |
---|
| 497 | |
---|
| 498 | <div><a name="tab"></a></div> |
---|
| 499 | <hr> |
---|
| 500 | <strong>Tabular (tab-delimited)</strong> |
---|
| 501 | <p> |
---|
| 502 | One or more columns of text data separated by tabs. |
---|
| 503 | <dl><dt>Can be converted to: |
---|
| 504 | <dd><ul> |
---|
| 505 | <li>FASTA<br> |
---|
| 506 | Convert Formats → Tabular-to-FASTA<br> |
---|
| 507 | The Tabular file must have a title and sequence column. |
---|
| 508 | <li>FASTQ<br> |
---|
| 509 | NGS: QC and manipulation → Generic FASTQ manipulation → Tabular to FASTQ |
---|
| 510 | <li>Interval<br> |
---|
| 511 | If the Tabular file has a chromosome column (or is all on one |
---|
| 512 | chromosome) and has a position column, you can create an Interval |
---|
| 513 | file (e.g. for SNPs). If it is all on one chromosome, use |
---|
| 514 | Text Manipulation → Add column to add a CHROM column. |
---|
| 515 | If the given position is 1-based, use |
---|
| 516 | Text Manipulation → Compute with the position column minus 1 to |
---|
| 517 | get the START, and use the original given column for the END. |
---|
| 518 | If the given position is 0-based, use it as the START, and compute |
---|
| 519 | that plus 1 to get the END. |
---|
| 520 | </ul></dl> |
---|
| 521 | <p> |
---|
| 522 | |
---|
| 523 | <div><a name="txtseqzip"></a></div> |
---|
| 524 | <hr> |
---|
| 525 | <strong>Txtseq.zip</strong> |
---|
| 526 | <p> |
---|
| 527 | A zipped archive consisting of flat text sequence files. All files |
---|
| 528 | in this archive must have the same file extension of |
---|
| 529 | '<code>.txt</code>'. You must manually select this file format when |
---|
| 530 | uploading the file. |
---|
| 531 | <p> |
---|
| 532 | |
---|
| 533 | <div><a name="wig"></a></div> |
---|
| 534 | <hr> |
---|
| 535 | <strong>Wiggle custom track</strong> |
---|
| 536 | <p> |
---|
| 537 | Wiggle tracks are typically used to display per-nucleotide scores |
---|
| 538 | in a genome browser. The Wiggle format for custom tracks is |
---|
| 539 | line-oriented, and the wiggle data is preceded by a track definition |
---|
| 540 | line that specifies which of three different types is being used. |
---|
| 541 | <a href="http://main.genome-browser.bx.psu.edu/goldenPath/help/wiggle.html" |
---|
| 542 | >More information</a> |
---|
| 543 | <dl><dt>Can be converted to: |
---|
| 544 | <dd><ul> |
---|
| 545 | <li>Interval<br> |
---|
| 546 | Get Genomic Scores → Wiggle-to-Interval |
---|
| 547 | <li>As a second step this could be converted to 3- or 4-column BED, |
---|
| 548 | by removing extra columns using |
---|
| 549 | Text Manipulation → Cut columns from a table. |
---|
| 550 | </ul></dl> |
---|
| 551 | <p> |
---|
| 552 | |
---|
| 553 | <div><a name="text"></a></div> |
---|
| 554 | <hr> |
---|
| 555 | <strong>Other text type</strong> |
---|
| 556 | <p> |
---|
| 557 | Any text file. |
---|
| 558 | <dl><dt>Can be converted to: |
---|
| 559 | <dd><ul> |
---|
| 560 | <li>Tabular<br> |
---|
| 561 | If the text has fields separated by spaces, commas, or some other |
---|
| 562 | delimiter, it can be converted to Tabular by using |
---|
| 563 | Text Manipulation → Convert delimiters to TAB. |
---|
| 564 | </ul></dl> |
---|
| 565 | <p> |
---|
| 566 | |
---|
| 567 | <!-- blank lines so internal links will jump farther to end --> |
---|
| 568 | <br><br><br><br><br><br><br><br><br><br><br><br> |
---|
| 569 | <br><br><br><br><br><br><br><br><br><br><br><br> |
---|
| 570 | </body> |
---|
| 571 | </html> |
---|