[2] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" |
---|
| 2 | "http://www.w3.org/TR/html4/loose.dtd"> |
---|
| 3 | <html> |
---|
| 4 | <head> |
---|
| 5 | <title>Input Files for Gmaj</title> |
---|
| 6 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
| 7 | <meta http-equiv="Content-Style-Type" content="text/css"> |
---|
| 8 | <link rel="stylesheet" type="text/css" href="gmaj.css"> |
---|
| 9 | </head> |
---|
| 10 | <body> |
---|
| 11 | <p class=vvlarge> |
---|
| 12 | <h2>Input Files for Gmaj</h2> |
---|
| 13 | <p class=vvlarge> |
---|
| 14 | TABLE OF CONTENTS |
---|
| 15 | <p class=small> |
---|
| 16 | <ul class=notop> |
---|
| 17 | <li><a href="#intro">Introduction</a> |
---|
| 18 | <li><a href="#param">Parameters File</a> |
---|
| 19 | <li><a href="#zip">Compression and Bundling</a> |
---|
| 20 | <li><a href="#coord">Coordinate Systems</a> |
---|
| 21 | <li><a href="#align">Alignments</a> |
---|
| 22 | <li><a href="#exon">Exons</a> |
---|
| 23 | <li><a href="#repeat">Repeats</a> |
---|
| 24 | <li><a href="#link">Linkbars</a> |
---|
| 25 | <li><a href="#under">Underlays</a> |
---|
| 26 | <li><a href="#high">Highlights</a> |
---|
| 27 | <li><a href="#color">Color List</a> |
---|
| 28 | <li><a href="#generic">Generic Annotation Formats</a> |
---|
| 29 | </ul> |
---|
| 30 | <p class=vlarge> |
---|
| 31 | |
---|
| 32 | <p class=hdr> |
---|
| 33 | <h3><a name="intro">Introduction</a></h3> |
---|
| 34 | <p> |
---|
| 35 | This page describes the input files supported by Gmaj, and their |
---|
| 36 | formats. Only the <a href="#align">alignment file</a> is |
---|
| 37 | required; the others are optional. Except where noted, all |
---|
| 38 | information applies to both the stand-alone and applet modes of |
---|
| 39 | Gmaj. |
---|
| 40 | <p> |
---|
| 41 | For annotations, Gmaj supports two broad categories of file |
---|
| 42 | formats. The original set of formats is essentially the same as |
---|
| 43 | those used by <a href="http://pipmaker.bx.psu.edu/pipmaker/" |
---|
| 44 | >PipMaker</a> and <a href="http://globin.bx.psu.edu/dist/laj/" |
---|
| 45 | >Laj</a>, where each destination for the data (exons panel, color |
---|
| 46 | underlays, etc.) has its own file format tailored for the needs of |
---|
| 47 | that display. These files can be cumbersome to prepare manually, |
---|
| 48 | though PipMaker's associated utilities, such as |
---|
| 49 | <a href="http://pipmaker.bx.psu.edu/piphelper/">PipHelper</a> and |
---|
| 50 | the <a href="http://pipmaker.bx.psu.edu/pipmaker/tools.html" |
---|
| 51 | >PipTools</a>, can significantly reduce the burden. |
---|
| 52 | <p> |
---|
| 53 | However, since sequence annotations are increasingly becoming |
---|
| 54 | available in standardized formats from on-line resources such as |
---|
| 55 | the <a href="http://genome.ucsc.edu/cgi-bin/hgTables">UCSC Table |
---|
| 56 | Browser</a>, Gmaj can now accept some of these formats as well. |
---|
| 57 | These are referred to here as "generic" formats because they are |
---|
| 58 | not restricted to a particular biological data type or Gmaj |
---|
| 59 | display panel. |
---|
| 60 | <p> |
---|
| 61 | The PipMaker-style formats are described below in the sections for |
---|
| 62 | each panel, while the generic ones are discussed in a separate |
---|
| 63 | section, <a href="#generic">Generic Annotation Formats</a>. |
---|
| 64 | <p class=large> |
---|
| 65 | <center> |
---|
| 66 | <table width=55%> |
---|
| 67 | <tr> |
---|
| 68 | <td valign=top align=right><img class=lower src="hand14.gif"> |
---|
| 69 | <!-- Pointing hand icon is from Clip Art Warehouse, |
---|
| 70 | at http://www.clipart.co.uk/ --> |
---|
| 71 | </td> |
---|
| 72 | <td valign=top> |
---|
| 73 | <ul class="notop nobottom lessindent"> |
---|
| 74 | <li> <b>All files must consist solely of plain text ASCII |
---|
| 75 | characters.</b> (For example, no Word documents.) |
---|
| 76 | <p class=small> |
---|
| 77 | <li> <b>All <a href="#coord">coordinates</a> for PipMaker-style |
---|
| 78 | annotations are 1-based, closed interval.</b> Those |
---|
| 79 | for generic annotations may be either 1-based or 0-based |
---|
| 80 | and closed or half-open, depending on the format. |
---|
| 81 | </ul> |
---|
| 82 | </td> |
---|
| 83 | </tr> |
---|
| 84 | </table> |
---|
| 85 | </center> |
---|
| 86 | <p> |
---|
| 87 | |
---|
| 88 | <p class=hdr> |
---|
| 89 | <h3><a name="param">Parameters File</a></h3> |
---|
| 90 | <p> |
---|
| 91 | The annotation files are optional, but because in some alignments |
---|
| 92 | any of the sequences can be viewed as the reference sequence, |
---|
| 93 | there are potentially a large number of annotation files to |
---|
| 94 | provide, too many to type their names on the command line or |
---|
| 95 | paste them into a dialog box every time you want to view the data. |
---|
| 96 | For this reason, Gmaj uses a meta-level <b>parameters file</b> |
---|
| 97 | that lists the names of all the data files, plus a few other |
---|
| 98 | data-related options. Then when running Gmaj, you only have to |
---|
| 99 | specify that one file name. However, if you don't want to use |
---|
| 100 | any of these annotations or options, you can specify a single |
---|
| 101 | <a href="#align">alignment file</a> directly in place of a |
---|
| 102 | parameters file. |
---|
| 103 | <p> |
---|
| 104 | A sample parameters file that you can use as a template is |
---|
| 105 | provided at <code><a href="sample.gmaj">sample.gmaj</a></code>. |
---|
| 106 | It contains detailed comments at the bottom explaining the syntax |
---|
| 107 | and meaning of the parameters. |
---|
| 108 | <p> |
---|
| 109 | |
---|
| 110 | <p class=hdr> |
---|
| 111 | <h3><a name="zip">Compression and Bundling</a></h3> |
---|
| 112 | <p> |
---|
| 113 | Gmaj supports a "bundle" option, which allows you to collect and |
---|
| 114 | compress some or all of the data files into a single file in |
---|
| 115 | <code>.zip</code> or <code>.jar</code> format (not |
---|
| 116 | <code>.tar</code>, sorry). This is especially useful for |
---|
| 117 | streamlining the applet's data download, but is also supported in |
---|
| 118 | stand-alone mode. A few tips: |
---|
| 119 | <ul> |
---|
| 120 | <li> If the <a href="#param">parameters file</a> is included in |
---|
| 121 | the bundle it must be the first file in it, since Gmaj reads |
---|
| 122 | the bundle sequentially and needs the parameters file to |
---|
| 123 | process the others. In this case, there is no need to |
---|
| 124 | mention the parameters file on the command line or in the |
---|
| 125 | applet tags; just specify the bundle. But if the parameters |
---|
| 126 | file is not in the bundle, specify both. |
---|
| 127 | <p class=small> |
---|
| 128 | <li> Data files in the bundle should be referred to within the |
---|
| 129 | parameters file using their plain filenames, without paths, |
---|
| 130 | and these must be unique. Any data files outside the bundle |
---|
| 131 | should be referred to normally, using the rules described in |
---|
| 132 | <code><a href="sample.gmaj">sample.gmaj</a></code>. |
---|
| 133 | <p class=small> |
---|
| 134 | <li> Do not use filenames containing <code>/</code>, |
---|
| 135 | <code>\</code>, or <code>:</code> in the bundle. Gmaj |
---|
| 136 | needs to remove the path that may have been added to each |
---|
| 137 | name by the zip or jar program, and since it doesn't know |
---|
| 138 | what platform that program was run on, it treats all of |
---|
| 139 | these characters as path separators. |
---|
| 140 | <p class=small> |
---|
| 141 | <li> If you are not using a parameters file (i.e., you want to |
---|
| 142 | specify the <a href="#align">alignment file</a> directly, |
---|
| 143 | without any annotations or other data-related options), |
---|
| 144 | then the alignment file must be listed in place of the |
---|
| 145 | parameters file, not as a bundle (there's nothing else |
---|
| 146 | to bundle with it anyway). |
---|
| 147 | </ul> |
---|
| 148 | <p> |
---|
| 149 | As an alternative to bundling, data files can be compressed |
---|
| 150 | individually in <code>.zip</code>, <code>.jar</code>, or |
---|
| 151 | <code>.gz</code> format; this gains the compact size for storage |
---|
| 152 | and transfer, but still requires overhead for multiple HTTP |
---|
| 153 | connections in applet mode. The file name must end with the |
---|
| 154 | corresponding extension for the compression format to be |
---|
| 155 | recognized. (Such files can also be included in the bundle |
---|
| 156 | if desired; though little if any additional compression is |
---|
| 157 | typically achieved, this may be more convenient than unzipping |
---|
| 158 | a large file just to bundle it.) |
---|
| 159 | <p> |
---|
| 160 | |
---|
| 161 | <p class=hdr> |
---|
| 162 | <h3><a name="coord">Coordinate Systems</a></h3> |
---|
| 163 | <p> |
---|
| 164 | If you supply any annotations for Gmaj to display, these files |
---|
| 165 | must all use position coordinates that refer to the same original |
---|
| 166 | sequences identified in the MAF <a href="#align">alignment files</a> |
---|
| 167 | (ignoring any display offsets specified in the <a href="#param" |
---|
| 168 | >parameters file</a>). However, even though the MAF coordinates |
---|
| 169 | are 0-based, the PipMaker-style annotation files all use a |
---|
| 170 | 1-based, closed-interval coordinate system (i.e., the first |
---|
| 171 | nucleotide in the sequence is called "1", and specified ranges |
---|
| 172 | include both endpoints). This is for consistency with PipMaker, |
---|
| 173 | so the same files can be used with both programs, and the same |
---|
| 174 | tools can be used to prepare them. Coordinates for generic |
---|
| 175 | annotations may be either 1-based or 0-based and closed or |
---|
| 176 | half-open, depending on the format, but Gmaj always adjusts |
---|
| 177 | them as needed (including the ones in the MAF files) to convert |
---|
| 178 | everything to a 1-based, closed-interval system for display. |
---|
| 179 | <p> |
---|
| 180 | |
---|
| 181 | <p class=hdr> |
---|
| 182 | <h3><a name="align">Alignments</a></h3> |
---|
| 183 | <p> |
---|
| 184 | Gmaj is designed to display multiple-sequence alignments in |
---|
| 185 | <a href="http://genome.ucsc.edu/FAQ/FAQformat">MAF</a> format. |
---|
| 186 | It is especially suited for sequence-symmetric alignments from |
---|
| 187 | programs such as <a href="http://www.bx.psu.edu/miller_lab/" |
---|
| 188 | >TBA</a>, but can also display MAF files that have a fixed |
---|
| 189 | reference sequence. (In the latter case it is a good idea to |
---|
| 190 | set the <code>refseq</code> field in your <a href="#param" |
---|
| 191 | >parameters file</a>, to prevent displaying the alignments with |
---|
| 192 | an inappropriate reference sequence.) It is possible to display |
---|
| 193 | several alignment files simultaneously on the same plots, e.g. |
---|
| 194 | for comparing output from different alignment programs. |
---|
| 195 | <p> |
---|
| 196 | Gmaj normally requires that each sequence name appears at most |
---|
| 197 | once in each MAF block, i.e., that the values of the "src" field |
---|
| 198 | are unique across all of the <code>s</code> lines within the |
---|
| 199 | same block. However, there is a special exception for the case |
---|
| 200 | of pairwise self-alignments: if all of the blocks have just two |
---|
| 201 | rows, then all of the sequence names can be the same. In this |
---|
| 202 | case Gmaj distinguishes the rows in each block by internally |
---|
| 203 | adding a <code>~</code> suffix to the second row's sequence name; |
---|
| 204 | the <code>~</code> does not show in the main display, but you may |
---|
| 205 | occasionally see it in an error message. |
---|
| 206 | <p> |
---|
| 207 | The downside of this feature is that <b>sequence names in the MAF |
---|
| 208 | files must not end with <code>~</code></b>, even for non-self |
---|
| 209 | alignments. |
---|
| 210 | <p> |
---|
| 211 | |
---|
| 212 | <p class=hdr> |
---|
| 213 | <h3><a name="exon">Exons</a></h3> |
---|
| 214 | <p> |
---|
| 215 | Each of these files lists the locations of genes, exons, and |
---|
| 216 | coding regions in a particular reference sequence. The exons |
---|
| 217 | and UTRs are displayed as black and gray boxes in a separate |
---|
| 218 | panel above the alignment plots. |
---|
| 219 | <p> |
---|
| 220 | In the PipMaker-style exons format, the directionality of a gene |
---|
| 221 | (<code>></code>, <code><</code>, or <code>|</code>), its |
---|
| 222 | start and end positions, and name should be on one line, followed |
---|
| 223 | by an optional line beginning with a <code>+</code> character that |
---|
| 224 | indicates the first and last nucleotides of the translated region |
---|
| 225 | (including the initiation codon, <i>Met</i>, and the stop codon). |
---|
| 226 | These are followed by lines specifying the start and end positions |
---|
| 227 | of each exon, which must be listed in order of increasing address |
---|
| 228 | even if the gene is on the reverse strand (<code><</code>). By |
---|
| 229 | default Gmaj will supply exon numbers, but you can override this |
---|
| 230 | by specifying your own name or number for individual exons. Blank |
---|
| 231 | lines are ignored, and you can put an optional title line at the |
---|
| 232 | top. Thus, the file might begin as follows: |
---|
| 233 | <pre> |
---|
| 234 | My favorite genomic region |
---|
| 235 | |
---|
| 236 | < 100 800 XYZZY |
---|
| 237 | + 150 750 |
---|
| 238 | 100 200 |
---|
| 239 | 600 800 |
---|
| 240 | |
---|
| 241 | > 1000 2000 Frobozz gene |
---|
| 242 | 1000 1200 exon 1 |
---|
| 243 | 1400 1500 alt. spliced exon |
---|
| 244 | 1800 2000 exon 2 |
---|
| 245 | |
---|
| 246 | ... etc. |
---|
| 247 | </pre> |
---|
| 248 | <p> |
---|
| 249 | |
---|
| 250 | <p class=hdr> |
---|
| 251 | <h3><a name="repeat">Repeats</a></h3> |
---|
| 252 | <p> |
---|
| 253 | Each of these files lists interspersed repeats (and possibly other |
---|
| 254 | features such as CpG islands) in a particular reference sequence. |
---|
| 255 | These are displayed in a separate panel just below the exons, |
---|
| 256 | using the same shapes and shading as PipMaker if possible. |
---|
| 257 | <p> |
---|
| 258 | In the PipMaker-style repeats format, the first line identifies |
---|
| 259 | this as a simplified repeats file (as opposed to |
---|
| 260 | <a href="http://www.repeatmasker.org/">RepeatMasker</a> output, |
---|
| 261 | which Gmaj does not yet support). Each subsequent line specifies |
---|
| 262 | the start, end, direction, and type of an individual feature. |
---|
| 263 | <pre> |
---|
| 264 | %:repeats |
---|
| 265 | |
---|
| 266 | 1081 1364 Right Alu |
---|
| 267 | 1365 1405 Simple |
---|
| 268 | ... etc. |
---|
| 269 | </pre> |
---|
| 270 | The allowed PipMaker types are: |
---|
| 271 | <code>Alu</code>, <code>B1</code>, <code>B2</code>, |
---|
| 272 | <code>SINE</code>, <code>LINE1</code>, <code>LINE2</code>, |
---|
| 273 | <code>MIR</code>, <code>LTR</code>, <code>DNA</code>, |
---|
| 274 | <code>RNA</code>, <code>Simple</code>, <code>CpG60</code>, |
---|
| 275 | <code>CpG75</code>, and <code>Other</code>. Of these, all except |
---|
| 276 | <code>Simple</code>, <code>CpG60</code>, and <code>CpG75</code> |
---|
| 277 | require a direction (<code>Right</code> or <code>Left</code>). |
---|
| 278 | <p> |
---|
| 279 | |
---|
| 280 | <p class=hdr> |
---|
| 281 | <h3><a name="link">Linkbars</a></h3> |
---|
| 282 | <p> |
---|
| 283 | Each of these files contains reference annotations, i.e., |
---|
| 284 | noteworthy regions in a particular reference sequence, which are |
---|
| 285 | drawn in a separate panel as colored bars. Typically each bar |
---|
| 286 | has an associated URL pointing to a web site with more information |
---|
| 287 | about the region, but this is not required. In applet mode Gmaj |
---|
| 288 | opens a new browser window to visit the linked site when the user |
---|
| 289 | clicks on a bar; in stand-alone mode Gmaj is not running within |
---|
| 290 | a web browser, so it just displays the URL for the user to visit |
---|
| 291 | manually via copy-and-paste. |
---|
| 292 | <p> |
---|
| 293 | The PipMaker-style format first defines various types of links |
---|
| 294 | and associates a color with each of them, then specifies the type, |
---|
| 295 | position, description, and URL for each annotated region. |
---|
| 296 | <pre> |
---|
| 297 | # linkbars for part of the mouse MHC class II region |
---|
| 298 | |
---|
| 299 | %define type |
---|
| 300 | %name PubMed |
---|
| 301 | %color Blue |
---|
| 302 | |
---|
| 303 | %define type |
---|
| 304 | %name LocusLink |
---|
| 305 | %color Orange |
---|
| 306 | |
---|
| 307 | %define annotation |
---|
| 308 | %type PubMed |
---|
| 309 | %range 1 2000 |
---|
| 310 | %label Yang et al. 1997. Daxx, a novel Fas-binding protein... |
---|
| 311 | %summary Yang, X., Khosravi-Far, R. Chang, H., and Baltimore, D. (1997). |
---|
| 312 | Daxx, a novel Fas-binding protein that activates JNK and apoptosis. |
---|
| 313 | Cell 89(7):1067-76. |
---|
| 314 | %url http://www.ncbi.nlm.nih.gov:80/entrez/ |
---|
| 315 | query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9215629&dopt=Abstract |
---|
| 316 | |
---|
| 317 | ... etc. |
---|
| 318 | </pre> |
---|
| 319 | Here, for example, the first stanza requests that each feature |
---|
| 320 | subsequently identified as a PubMed entry be colored blue. |
---|
| 321 | The name must be a single word, perhaps containing underline |
---|
| 322 | characters (e.g., <code>Entry_in_GenBank</code>), and the color |
---|
| 323 | must come from Gmaj's <a href="#color">color list</a>. |
---|
| 324 | <p> |
---|
| 325 | The third stanza associates a PubMed link with positions |
---|
| 326 | 1-2000 in this sequence. The label should be kept fairly |
---|
| 327 | short, as it will be displayed on Gmaj's position indicator line |
---|
| 328 | when the user points at this linkbar. The summary is optional; |
---|
| 329 | it is used only by PipMaker and will be ignored by Gmaj. Also, |
---|
| 330 | while PipMaker allows several summary/URL pairs within a single |
---|
| 331 | annotation, Gmaj expects each field to occur at most once. If |
---|
| 332 | Gmaj encounters extra URLs, it will just use the first one and |
---|
| 333 | display a warning message. |
---|
| 334 | <p> |
---|
| 335 | Note that summaries and URLs (but not labels) can be broken into |
---|
| 336 | several lines for convenience; the line breaks are removed when |
---|
| 337 | the file is read, but they are not replaced with spaces. Thus |
---|
| 338 | a continuation line for a summary typically begins with a space |
---|
| 339 | to separate it from the last word of the previous line, while |
---|
| 340 | a URL continuation does not. |
---|
| 341 | <p> |
---|
| 342 | Also note that stanzas should be separated by blank lines, and |
---|
| 343 | lines beginning with a <code>#</code> character are comments |
---|
| 344 | that will be ignored. The linkbars can appear in the file in |
---|
| 345 | any order, and several can overlap at the same position with no |
---|
| 346 | problem, since Gmaj will display them in multiple rows if |
---|
| 347 | necessary. In PipMaker this format is called "annotations with |
---|
| 348 | hyperlinks". |
---|
| 349 | <p> |
---|
| 350 | |
---|
| 351 | <p class=hdr> |
---|
| 352 | <h3><a name="under">Underlays</a></h3> |
---|
| 353 | <p> |
---|
| 354 | Each of these files specifies underlays (colored bands) to be |
---|
| 355 | painted on a particular pairwise pip and its corresponding |
---|
| 356 | dotplot. The bands are specified as regions in the reference |
---|
| 357 | sequence and are normally drawn vertically; however for a dotplot, |
---|
| 358 | Gmaj will also look to see if you have specified an underlay file |
---|
| 359 | for the transposed situation where the reference and secondary |
---|
| 360 | sequences are swapped, and if so, will draw those underlays as |
---|
| 361 | horizontal bands in the secondary sequence. |
---|
| 362 | <p> |
---|
| 363 | The PipMaker-style underlay format supported by Gmaj looks like |
---|
| 364 | this: |
---|
| 365 | <pre> |
---|
| 366 | # partial underlays for the BTK region |
---|
| 367 | |
---|
| 368 | LightYellow Gene |
---|
| 369 | Green Exon |
---|
| 370 | Red Strongly_conserved |
---|
| 371 | |
---|
| 372 | 35324 72009 (BTK gene) Gene |
---|
| 373 | 49781 49849 (exon 4) Exon |
---|
| 374 | 51403 51484 Exon |
---|
| 375 | 50350 50513 (conserved 84%) Strongly_conserved 84 |
---|
| 376 | 52376 52603 (Kilroy was here) Strongly_conserved 92 + |
---|
| 377 | ... etc. |
---|
| 378 | </pre> |
---|
| 379 | The first group of lines describes the intended meaning of the |
---|
| 380 | colors, while the second group specifies the location of each band. |
---|
| 381 | Colors must come from Gmaj's <a href="#color">color list</a>, but |
---|
| 382 | the meaning of each color can be any single word chosen by you. |
---|
| 383 | The text in parentheses is an optional label which will be |
---|
| 384 | displayed on Gmaj's position indicator line when the user points |
---|
| 385 | the mouse at that band. The parentheses must be present if the |
---|
| 386 | label is, and the label itself cannot contain any additional |
---|
| 387 | parentheses. The number following the color category is an |
---|
| 388 | optional integer score that can be used to interactively adjust |
---|
| 389 | which underlays are displayed; see "Underlays Box" in the |
---|
| 390 | Menus and Widgets section of <a href="gmaj_help.html" |
---|
| 391 | >Starting and Running Gmaj</a> for more information. (The |
---|
| 392 | label and score are extra features not supported by PipMaker.) |
---|
| 393 | A <code>+</code> or <code>-</code> character at the end of a |
---|
| 394 | location line will paint just the upper or lower half of the band |
---|
| 395 | on the pip (but is ignored for dotplots). This allows you to |
---|
| 396 | differentiate between the two strands, or to plot potentially |
---|
| 397 | overlapping features like gene predictions and database matches. |
---|
| 398 | <p> |
---|
| 399 | Note that if two bands overlap, the one that was specified last |
---|
| 400 | in the file appears "on top" and obscures the earlier one (except |
---|
| 401 | for the special <code><a href="#hatch">Hatch</a></code> color). |
---|
| 402 | Thus in this example, the green exons and red strongly conserved |
---|
| 403 | regions cover up parts of the long yellow band representing the |
---|
| 404 | gene. As in the links file, lines beginning with a <code>#</code> |
---|
| 405 | character are comments that will be ignored. |
---|
| 406 | <p> |
---|
| 407 | |
---|
| 408 | <p class=hdr> |
---|
| 409 | <h3><a name="high">Highlights</a></h3> |
---|
| 410 | <p> |
---|
| 411 | Highlight files are analogous to the <a href="#under">underlay</a> |
---|
| 412 | files, but each of these specifies colored regions for a |
---|
| 413 | particular sequence in the text view, rather than for a plot. |
---|
| 414 | If you do not specify a highlight file for a particular sequence, |
---|
| 415 | Gmaj will automatically provide default highlights based on the |
---|
| 416 | <a href="#exon">exons</a> file (if you provided one). These will |
---|
| 417 | use one color for whole genes, overlaid with different colors to |
---|
| 418 | indicate exons on the forward vs. reverse strand. If the exons |
---|
| 419 | file specifies a gene's translated region, then the 5´ and |
---|
| 420 | 3´ UTRs will be shaded using lighter colors. These default |
---|
| 421 | highlights make it easy to examine the putative start/stop codons |
---|
| 422 | and splice junctions, as well as providing a visual connection |
---|
| 423 | between the graphical and text views. But if for some reason you |
---|
| 424 | do not want any text highlights, you can suppress them by |
---|
| 425 | specifying an empty highlight file. |
---|
| 426 | <p> |
---|
| 427 | The PipMaker-style format for highlights is the same as for |
---|
| 428 | underlays, except that any <code>+</code> or <code>-</code> |
---|
| 429 | indicators will be ignored, and the <code>Hatch</code> color is |
---|
| 430 | not supported for highlights. Just as with underlays, labels |
---|
| 431 | can be included which will be shown when the user points at |
---|
| 432 | the highlight, scores can be used to limit which entries are |
---|
| 433 | displayed, and highlights that are listed later in the file will |
---|
| 434 | cover up those that appear earlier. |
---|
| 435 | <p> |
---|
| 436 | |
---|
| 437 | <p class=hdr> |
---|
| 438 | <h3><a name="color">Color List</a></h3> |
---|
| 439 | <p> |
---|
| 440 | For Gmaj's PipMaker-style annotations, the available colors are: |
---|
| 441 | <pre> |
---|
| 442 | Black White Clear |
---|
| 443 | Gray LightGray DarkGray |
---|
| 444 | Red LightRed DarkRed |
---|
| 445 | Green LightGreen DarkGreen |
---|
| 446 | Blue LightBlue DarkBlue |
---|
| 447 | Yellow LightYellow DarkYellow |
---|
| 448 | Pink LightPink DarkPink |
---|
| 449 | Cyan LightCyan DarkCyan |
---|
| 450 | Purple LightPurple DarkPurple |
---|
| 451 | Orange LightOrange DarkOrange |
---|
| 452 | Brown LightBrown DarkBrown |
---|
| 453 | </pre> |
---|
| 454 | These names are case-sensitive (i.e., capitalization matters). |
---|
| 455 | Not all of these are supported by PipMaker. Also, be aware that |
---|
| 456 | the appearance of the colors may vary between PipMaker and Gmaj, |
---|
| 457 | and from one printer or monitor to the next. |
---|
| 458 | <p class=subhdr> |
---|
| 459 | <a name="hatch"><b><code>Hatch</code></b></a> |
---|
| 460 | <p> |
---|
| 461 | In addition to the regular colors listed above, Gmaj supports a |
---|
| 462 | special "color" for underlays called <code>Hatch</code>, which |
---|
| 463 | is drawn as a pattern of diagonal gray lines. Normally if two |
---|
| 464 | underlays overlap, the one that was specified last in the file |
---|
| 465 | appears "on top" and obscures the earlier one. However, |
---|
| 466 | <code>Hatch</code> underlays have the special property that they |
---|
| 467 | are always drawn after the other colors, and since the space |
---|
| 468 | between the diagonal lines is transparent, they allow the other |
---|
| 469 | colors to show through. Currently <code>Hatch</code> is only |
---|
| 470 | supported for underlays, not for highlights or linkbars. |
---|
| 471 | <p> |
---|
| 472 | |
---|
| 473 | <p class=hdr> |
---|
| 474 | <h3><a name="generic">Generic Annotation Formats</a></h3> |
---|
| 475 | <p> |
---|
| 476 | The standardized generic formats currently supported by Gmaj |
---|
| 477 | include |
---|
| 478 | <a href="http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml" |
---|
| 479 | >GFF</a> (v1 & v2), |
---|
| 480 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#GTF" |
---|
| 481 | >GTF</a>, and various flavors of |
---|
| 482 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED" |
---|
| 483 | >BED</a> (including the full BED12 format, a.k.a. "gene BED"). |
---|
| 484 | For details on these formats, please see the specifications at |
---|
| 485 | the above links; this document will mainly discuss their use |
---|
| 486 | by Gmaj. |
---|
| 487 | <p> |
---|
| 488 | These formats are all <b>tab-separated</b>, and despite their |
---|
| 489 | differences are similar enough that Gmaj can extract comparable |
---|
| 490 | fields and treat them more or less the same. Note that Gmaj is |
---|
| 491 | not intended as a format validator: parsing is more lenient in |
---|
| 492 | some respects than the official format specifications, and Gmaj |
---|
| 493 | will ignore fields it has no use for. Also, interpretation of |
---|
| 494 | these open-ended formats depends partly on what type of annotation |
---|
| 495 | is expected; e.g. if Gmaj is trying to read exons from a GFF v1 |
---|
| 496 | file, it will assume that the group field is the gene name. It |
---|
| 497 | will generally show warning messages to keep the user apprised |
---|
| 498 | of any such assumptions it is making (if these become too annoying |
---|
| 499 | they can be individually suppressed in the <a href="#param" |
---|
| 500 | >parameters file</a>; see <code><a href="sample.gmaj" |
---|
| 501 | >sample.gmaj</a></code> for details). Because one of the main |
---|
| 502 | reasons for supporting these formats is to enable the use of |
---|
| 503 | annotation files obtained from public sources, Gmaj tries not to |
---|
| 504 | balk at anomalies that are probably not the user's fault, and |
---|
| 505 | when practical will simply skip questionable items with a warning |
---|
| 506 | message. Each type of message will generally be displayed only |
---|
| 507 | once, and not repeated for every item with the same problem. |
---|
| 508 | <p> |
---|
| 509 | <p class=subhdr> |
---|
| 510 | <a name="fileext"><b>Filename Extensions</b></a> |
---|
| 511 | <p> |
---|
| 512 | In order to distinguish generic files from PipMaker-style ones |
---|
| 513 | and handle them appropriately, Gmaj requires that files in |
---|
| 514 | generic formats have names ending with any of certain extensions. |
---|
| 515 | The default list is <code>.gff</code>, <code>.gtf</code>, |
---|
| 516 | <code>.bed</code>, <code>.ct</code>, and <code>.trk</code>, but |
---|
| 517 | this can be customized (see <code><a href="sample.gmaj" |
---|
| 518 | >sample.gmaj</a></code>). |
---|
| 519 | <p> |
---|
| 520 | <p class=subhdr> |
---|
| 521 | <a name="quote"><b>Quoting</b></a> |
---|
| 522 | <p> |
---|
| 523 | Some of the generic formats require text values to be enclosed |
---|
| 524 | in double quotes (<code>" "</code>). Even when not strictly |
---|
| 525 | required it is usually a good idea to do so, especially if the |
---|
| 526 | value contains spaces. The official specifications generally |
---|
| 527 | don't say what to do if a value contains embedded quote |
---|
| 528 | characters, but Gmaj supports a rudimentary mechanism for |
---|
| 529 | escaping them with a backslash (<code>\</code>). However it |
---|
| 530 | does not provide for escaping the backslash: quoted values |
---|
| 531 | should not end with <code>\</code> (insert a space before the |
---|
| 532 | final quote if necessary). |
---|
| 533 | <p> |
---|
| 534 | <p class=subhdr> |
---|
| 535 | <a name="empty"><b>Empty Fields</b></a> |
---|
| 536 | <p> |
---|
| 537 | When reading the generic formats, Gmaj treats two adjacent tab |
---|
| 538 | characters as an empty field. However, your files will be easier |
---|
| 539 | for humans to read if you do not leave fields completely empty. |
---|
| 540 | Gmaj recognizes a value of <code>.</code> (the dot character) |
---|
| 541 | to mean "unspecified" for fields such as strand, score, feature, |
---|
| 542 | and color, in some cases even when the official formats don't. |
---|
| 543 | For instance, GFF v2 explicitly calls for using <code>.</code> |
---|
| 544 | when there is no score, but Gmaj allows you to do this with the |
---|
| 545 | other generic formats as well, in order to distinguish between |
---|
| 546 | "no score" and a score that is truly zero. For colors, in |
---|
| 547 | addition to <code>.</code> Gmaj also interprets <code>0</code> |
---|
| 548 | to mean "unspecified", in keeping with examples at UCSC. |
---|
| 549 | <p> |
---|
| 550 | <p class=subhdr> |
---|
| 551 | <a name="gencoord"><b>Coordinates</b></a> |
---|
| 552 | <p> |
---|
| 553 | The GFF and GTF formats use 1-based, closed-interval coordinates |
---|
| 554 | (i.e., sequence numbering starts with "1", and specified ranges |
---|
| 555 | include both endpoints), while BED uses a 0-based, half-open |
---|
| 556 | system (the first nucleotide of the sequence is numbered "0", |
---|
| 557 | and the ending position is not included in the region). For all |
---|
| 558 | of these formats, positions are given relative to the beginning |
---|
| 559 | of the named sequence regardless of which strand the feature is |
---|
| 560 | on (unlike MAF), and <code>start</code> must be less than or |
---|
| 561 | equal to <code>end</code>. |
---|
| 562 | <p> |
---|
| 563 | <p class=subhdr> |
---|
| 564 | <a name="gffconv"><b>GFF Conventions</b></a> |
---|
| 565 | <p> |
---|
| 566 | BED format is relatively fixed in how its fields are used, but |
---|
| 567 | GFF and GTF are more variable and require additional conventions |
---|
| 568 | for most effective use with Gmaj. In particular, the values of |
---|
| 569 | the "feature" field and the optional "attributes" affect how Gmaj |
---|
| 570 | will interpret and display an item. |
---|
| 571 | <p> |
---|
| 572 | Values of the feature field that are recognized for special |
---|
| 573 | treatment include: |
---|
| 574 | <p class=tiny> |
---|
| 575 | <ul class="notop nobottom"> |
---|
| 576 | <li> <code>gene</code> or values starting with <code>gene_</code> |
---|
| 577 | <li> <code>exon</code> or values starting with <code>exon_</code> |
---|
| 578 | <li> <code>start_codon</code>, <code>str_codon</code>, |
---|
| 579 | <code>stop_codon</code>, <code>stp_codon</code>, or |
---|
| 580 | <code>cds</code> |
---|
| 581 | <li> <code>repeatmasker</code> or any of the |
---|
| 582 | <a href="#repeat">PipMaker repeat or CpG types</a> |
---|
| 583 | </ul> |
---|
| 584 | <p class=tiny> |
---|
| 585 | Of these, only the PipMaker types are case-sensitive. |
---|
| 586 | <p> |
---|
| 587 | For GFF v2 and GTF, the currently recognized attribute tags are: |
---|
| 588 | <p class=tiny> |
---|
| 589 | <ul class="notop nobottom"> |
---|
| 590 | <li> <code>gene</code> or <code>gene_id</code>: the name of the |
---|
| 591 | gene, e.g. for grouping exons (<code>transcript_id</code> is |
---|
| 592 | ignored) |
---|
| 593 | <li> <code>name</code>: an optional name for this individual item, |
---|
| 594 | e.g. for an exon label |
---|
| 595 | <li> <code>sequence</code> (when feature is |
---|
| 596 | <code>repeatmasker</code>): the name/class/family of the |
---|
| 597 | repeat, e.g. <code>AluJb/SINE/Alu</code> |
---|
| 598 | <li> <code>color</code>: a <a href="#gencolor">color</a> |
---|
| 599 | specification in UCSC format, e.g. <code>0,0,255</code> |
---|
| 600 | <li> <code>url</code> or <code>ucsc_id</code>: the URL for |
---|
| 601 | linkbars; <code>$$</code> will be replaced with the value of |
---|
| 602 | <code>name</code> |
---|
| 603 | </ul> |
---|
| 604 | <p class=tiny> |
---|
| 605 | These keywords are not case-sensitive, but they cannot have |
---|
| 606 | multiple values. |
---|
| 607 | <p> |
---|
| 608 | <p class=subhdr> |
---|
| 609 | <a name="custom"><b>Custom Tracks</b></a> |
---|
| 610 | <p> |
---|
| 611 | Along with the basic formats listed above, Gmaj also supports UCSC |
---|
| 612 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks" |
---|
| 613 | >custom track</a> headers. |
---|
| 614 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#TRACK" |
---|
| 615 | >Track lines</a> can specify certain settings for an entire |
---|
| 616 | track; currently <code><a href="#gencolor">color</a></code>, |
---|
| 617 | <code><a href="#gencolor">itemRgb</a></code>, <code>offset</code>, |
---|
| 618 | and <code>url</code> are supported. They also allow several |
---|
| 619 | tracks (even in mixed formats) to be combined in a single file. |
---|
| 620 | Gmaj does not currently provide a way to use just one particular |
---|
| 621 | track from such a file (it will be treated as one big bag of |
---|
| 622 | annotations), but lines in unsupported formats such as |
---|
| 623 | <a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html" |
---|
| 624 | >WIG</a> are gracefully skipped. |
---|
| 625 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#lines" |
---|
| 626 | >Browser lines</a> are also skipped; Gmaj's initial zoom position |
---|
| 627 | is controlled by command-line or applet parameters rather than by |
---|
| 628 | individual annotation files. |
---|
| 629 | <p> |
---|
| 630 | <p class=subhdr> |
---|
| 631 | <a name="multiseq"><b>Multiple Sequences</b></a> |
---|
| 632 | <p> |
---|
| 633 | Generic files can also contain annotations for several sequences, |
---|
| 634 | because unlike the PipMaker-style formats, they all have a |
---|
| 635 | "seqname" or "chrom" field that Gmaj can use to select the |
---|
| 636 | appropriate lines. Ideally Gmaj expects this field to match |
---|
| 637 | the sequence name from the <a href="#align">alignment files</a>, |
---|
| 638 | but has two ways to deal with exceptions. If there is only one |
---|
| 639 | seqname in the annotation file, then Gmaj will go ahead and use |
---|
| 640 | it, but will display a warning (unless the mismatch can be fixed |
---|
| 641 | by prepending the organism name, or the organism name plus |
---|
| 642 | <code>chr</code>, to the annotation seqname). But if the file |
---|
| 643 | has annotations for several sequences and some don't match the |
---|
| 644 | alignment files, you need to tell Gmaj which is which by adding |
---|
| 645 | an alias in the <a href="#param">parameters file</a> (see |
---|
| 646 | <code><a href="sample.gmaj">sample.gmaj</a></code>). |
---|
| 647 | <p> |
---|
| 648 | <p class=subhdr> |
---|
| 649 | <a name="reuse"><b>Reusing Files</b></a> |
---|
| 650 | <p> |
---|
| 651 | One of the advantages of using generic formats is that files can |
---|
| 652 | be reused in multiple panels without reformatting, e.g. as both |
---|
| 653 | exons and underlays. Normally linkbars, underlays, and text |
---|
| 654 | highlights are simply handled as arbitrary regions of a specified |
---|
| 655 | color, since they could represent any type of biological feature. |
---|
| 656 | However, you can ask Gmaj to interpret them as exons or repeats |
---|
| 657 | by adding a type hint in the <a href="#param">parameters file</a> |
---|
| 658 | (see <code><a href="sample.gmaj">sample.gmaj</a></code>). Note |
---|
| 659 | that currently this will also cause any <a href="#gencolor" |
---|
| 660 | >specified colors</a> in that file to be overridden with Gmaj's |
---|
| 661 | defaults. |
---|
| 662 | <p> |
---|
| 663 | Combining several biological types of annotations (e.g. exons |
---|
| 664 | and repeats) in one file is possible, but not recommended. Gmaj |
---|
| 665 | will try to skip lines that are not appropriate for the type it |
---|
| 666 | is seeking, but it may draw more than you want. |
---|
| 667 | <p> |
---|
| 668 | <p class=subhdr> |
---|
| 669 | <a name="cds"><b>Coding Sequence</b></a> |
---|
| 670 | <p> |
---|
| 671 | Currently Gmaj has no special support for multiple transcripts. |
---|
| 672 | When inferring UTRs, all of the CDS-related items for a single |
---|
| 673 | gene name are combined, and the interval from the lowest |
---|
| 674 | coordinate to the highest is used as the CDS. Also, some of the |
---|
| 675 | formats' rules specify whether or not the initiation and stop |
---|
| 676 | codons should be included in the CDS, but Gmaj does not make |
---|
| 677 | adjustments to compensate for that; instead it simply includes |
---|
| 678 | all of the given endpoints in the CDS. |
---|
| 679 | <!-- and leaves it up to the user to interpret the display based |
---|
| 680 | on the convention used in the files he/she provided. [the user |
---|
| 681 | does not supply files for applets] --> |
---|
| 682 | <p> |
---|
| 683 | <p class=subhdr> |
---|
| 684 | <a name="gencolor"><b>Colors</b></a> |
---|
| 685 | <p> |
---|
| 686 | Colors can be specified for individual annotation lines via the |
---|
| 687 | <code>itemRgb</code> field (for BED) or a <code>color</code> |
---|
| 688 | attribute (for GFF v2 or GTF). However, for <a href="#custom" |
---|
| 689 | >custom tracks</a>, these are governed by the track line's |
---|
| 690 | <code>itemRgb</code> attribute, which defaults to off per the |
---|
| 691 | UCSC specification. Thus if you have track lines and want to |
---|
| 692 | use the per-item colors, you need to include |
---|
| 693 | <code>itemRgb=On</code> in the track attributes. |
---|
| 694 | <p> |
---|
| 695 | Track lines can also have a <code>color</code> attribute for |
---|
| 696 | the entire track, which will be used if <code>itemRgb</code> is |
---|
| 697 | off, or if an individual item does not have its own color. |
---|
| 698 | However in a rare break from the UCSC specification, Gmaj does |
---|
| 699 | not use black as the default if the track color is unspecified |
---|
| 700 | (black underlays and highlights just don't work with black plots |
---|
| 701 | and text). Instead it uses its own default colors, which for |
---|
| 702 | genes/exons are the same as the colors for <a href="#high" |
---|
| 703 | >default highlights</a>, or light gray for other annotations. |
---|
| 704 | Note that these defaults will also override your colors when |
---|
| 705 | <a href="#reuse">type hints</a> are used. |
---|
| 706 | <p> |
---|
| 707 | All of the above-mentioned color values are specified in UCSC |
---|
| 708 | format, which consists of three comma-separated RGB values from |
---|
| 709 | 0-255 (e.g. <code>0,0,255</code>). |
---|
| 710 | <p> |
---|
| 711 | <p class=subhdr> |
---|
| 712 | <a name="sort"><b>Sorting</b></a> |
---|
| 713 | <p> |
---|
| 714 | The order of the lines is not supposed to matter in these generic |
---|
| 715 | formats, but for most of the Gmaj panels it does matter: exons |
---|
| 716 | need to be grouped by gene and ordered by position so UTRs can be |
---|
| 717 | inferred and exon numbers assigned, early underlays are covered |
---|
| 718 | up by later ones, etc. Gmaj solves this problem by sorting the |
---|
| 719 | data before it is displayed. Exons are sorted first by gene name |
---|
| 720 | in ascending order, and then within each gene by start position |
---|
| 721 | (ascending) and lastly in case of a tie, by end position |
---|
| 722 | (descending). All other annotation types are sorted first by |
---|
| 723 | length in descending order, and then in case of a tie by start |
---|
| 724 | position (ascending). This usually produces a reasonable display, |
---|
| 725 | but if you need direct control of the order, you can use the |
---|
| 726 | PipMaker-style formats instead. |
---|
| 727 | <p> |
---|
| 728 | |
---|
| 729 | <p class=vvlarge> |
---|
| 730 | <hr> |
---|
| 731 | <i>Cathy Riemer, June 2008</i> |
---|
| 732 | |
---|
| 733 | <p class=scrollspace> |
---|
| 734 | </body> |
---|
| 735 | </html> |
---|