| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" | 
|---|
| 2 |         "http://www.w3.org/TR/html4/loose.dtd"> | 
|---|
| 3 | <html> | 
|---|
| 4 | <head> | 
|---|
| 5 | <title>Input Files for Gmaj</title> | 
|---|
| 6 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | 
|---|
| 7 | <meta http-equiv="Content-Style-Type" content="text/css"> | 
|---|
| 8 | <link rel="stylesheet" type="text/css" href="gmaj.css"> | 
|---|
| 9 | </head> | 
|---|
| 10 | <body> | 
|---|
| 11 | <p class=vvlarge> | 
|---|
| 12 | <h2>Input Files for Gmaj</h2> | 
|---|
| 13 | <p class=vvlarge> | 
|---|
| 14 | TABLE OF CONTENTS | 
|---|
| 15 | <p class=small> | 
|---|
| 16 | <ul class=notop> | 
|---|
| 17 | <li><a href="#intro">Introduction</a> | 
|---|
| 18 | <li><a href="#param">Parameters File</a> | 
|---|
| 19 | <li><a href="#zip">Compression and Bundling</a> | 
|---|
| 20 | <li><a href="#coord">Coordinate Systems</a> | 
|---|
| 21 | <li><a href="#align">Alignments</a> | 
|---|
| 22 | <li><a href="#exon">Exons</a> | 
|---|
| 23 | <li><a href="#repeat">Repeats</a> | 
|---|
| 24 | <li><a href="#link">Linkbars</a> | 
|---|
| 25 | <li><a href="#under">Underlays</a> | 
|---|
| 26 | <li><a href="#high">Highlights</a> | 
|---|
| 27 | <li><a href="#color">Color List</a> | 
|---|
| 28 | <li><a href="#generic">Generic Annotation Formats</a> | 
|---|
| 29 | </ul> | 
|---|
| 30 | <p class=vlarge> | 
|---|
| 31 |  | 
|---|
| 32 | <p class=hdr> | 
|---|
| 33 | <h3><a name="intro">Introduction</a></h3> | 
|---|
| 34 | <p> | 
|---|
| 35 | This page describes the input files supported by Gmaj, and their | 
|---|
| 36 | formats.  Only the <a href="#align">alignment file</a> is | 
|---|
| 37 | required; the others are optional.  Except where noted, all | 
|---|
| 38 | information applies to both the stand-alone and applet modes of | 
|---|
| 39 | Gmaj. | 
|---|
| 40 | <p> | 
|---|
| 41 | For annotations, Gmaj supports two broad categories of file | 
|---|
| 42 | formats.  The original set of formats is essentially the same as | 
|---|
| 43 | those used by <a href="http://pipmaker.bx.psu.edu/pipmaker/" | 
|---|
| 44 | >PipMaker</a> and <a href="http://globin.bx.psu.edu/dist/laj/" | 
|---|
| 45 | >Laj</a>, where each destination for the data (exons panel, color | 
|---|
| 46 | underlays, etc.) has its own file format tailored for the needs of | 
|---|
| 47 | that display.  These files can be cumbersome to prepare manually, | 
|---|
| 48 | though PipMaker's associated utilities, such as | 
|---|
| 49 | <a href="http://pipmaker.bx.psu.edu/piphelper/">PipHelper</a> and | 
|---|
| 50 | the <a href="http://pipmaker.bx.psu.edu/pipmaker/tools.html" | 
|---|
| 51 | >PipTools</a>, can significantly reduce the burden. | 
|---|
| 52 | <p> | 
|---|
| 53 | However, since sequence annotations are increasingly becoming | 
|---|
| 54 | available in standardized formats from on-line resources such as | 
|---|
| 55 | the <a href="http://genome.ucsc.edu/cgi-bin/hgTables">UCSC Table | 
|---|
| 56 | Browser</a>, Gmaj can now accept some of these formats as well. | 
|---|
| 57 | These are referred to here as "generic" formats because they are | 
|---|
| 58 | not restricted to a particular biological data type or Gmaj | 
|---|
| 59 | display panel. | 
|---|
| 60 | <p> | 
|---|
| 61 | The PipMaker-style formats are described below in the sections for | 
|---|
| 62 | each panel, while the generic ones are discussed in a separate | 
|---|
| 63 | section, <a href="#generic">Generic Annotation Formats</a>. | 
|---|
| 64 | <p class=large> | 
|---|
| 65 | <center> | 
|---|
| 66 | <table width=55%> | 
|---|
| 67 | <tr> | 
|---|
| 68 | <td valign=top align=right><img class=lower src="hand14.gif"> | 
|---|
| 69 | <!-- Pointing hand icon is from Clip Art Warehouse, | 
|---|
| 70 |         at http://www.clipart.co.uk/ --> | 
|---|
| 71 | </td> | 
|---|
| 72 | <td valign=top> | 
|---|
| 73 | <ul class="notop nobottom lessindent"> | 
|---|
| 74 | <li>    <b>All files must consist solely of plain text ASCII | 
|---|
| 75 |         characters.</b>  (For example, no Word documents.) | 
|---|
| 76 |         <p class=small> | 
|---|
| 77 | <li>    <b>All <a href="#coord">coordinates</a> for PipMaker-style | 
|---|
| 78 |         annotations are 1-based, closed interval.</b>  Those | 
|---|
| 79 |         for generic annotations may be either 1-based or 0-based | 
|---|
| 80 |         and closed or half-open, depending on the format. | 
|---|
| 81 | </ul> | 
|---|
| 82 | </td> | 
|---|
| 83 | </tr> | 
|---|
| 84 | </table> | 
|---|
| 85 | </center> | 
|---|
| 86 | <p> | 
|---|
| 87 |  | 
|---|
| 88 | <p class=hdr> | 
|---|
| 89 | <h3><a name="param">Parameters File</a></h3> | 
|---|
| 90 | <p> | 
|---|
| 91 | The annotation files are optional, but because in some alignments | 
|---|
| 92 | any of the sequences can be viewed as the reference sequence, | 
|---|
| 93 | there are potentially a large number of annotation files to | 
|---|
| 94 | provide, too many to type their names on the command line or | 
|---|
| 95 | paste them into a dialog box every time you want to view the data. | 
|---|
| 96 | For this reason, Gmaj uses a meta-level <b>parameters file</b> | 
|---|
| 97 | that lists the names of all the data files, plus a few other | 
|---|
| 98 | data-related options.  Then when running Gmaj, you only have to | 
|---|
| 99 | specify that one file name.  However, if you don't want to use | 
|---|
| 100 | any of these annotations or options, you can specify a single | 
|---|
| 101 | <a href="#align">alignment file</a> directly in place of a | 
|---|
| 102 | parameters file. | 
|---|
| 103 | <p> | 
|---|
| 104 | A sample parameters file that you can use as a template is | 
|---|
| 105 | provided at <code><a href="sample.gmaj">sample.gmaj</a></code>. | 
|---|
| 106 | It contains detailed comments at the bottom explaining the syntax | 
|---|
| 107 | and meaning of the parameters. | 
|---|
| 108 | <p> | 
|---|
| 109 |  | 
|---|
| 110 | <p class=hdr> | 
|---|
| 111 | <h3><a name="zip">Compression and Bundling</a></h3> | 
|---|
| 112 | <p> | 
|---|
| 113 | Gmaj supports a "bundle" option, which allows you to collect and | 
|---|
| 114 | compress some or all of the data files into a single file in | 
|---|
| 115 | <code>.zip</code> or <code>.jar</code> format (not | 
|---|
| 116 | <code>.tar</code>, sorry).  This is especially useful for | 
|---|
| 117 | streamlining the applet's data download, but is also supported in | 
|---|
| 118 | stand-alone mode.  A few tips: | 
|---|
| 119 | <ul> | 
|---|
| 120 | <li>    If the <a href="#param">parameters file</a> is included in | 
|---|
| 121 |         the bundle it must be the first file in it, since Gmaj reads | 
|---|
| 122 |         the bundle sequentially and needs the parameters file to | 
|---|
| 123 |         process the others.  In this case, there is no need to | 
|---|
| 124 |         mention the parameters file on the command line or in the | 
|---|
| 125 |         applet tags; just specify the bundle.  But if the parameters | 
|---|
| 126 |         file is not in the bundle, specify both. | 
|---|
| 127 |         <p class=small> | 
|---|
| 128 | <li>    Data files in the bundle should be referred to within the | 
|---|
| 129 |         parameters file using their plain filenames, without paths, | 
|---|
| 130 |         and these must be unique.  Any data files outside the bundle | 
|---|
| 131 |         should be referred to normally, using the rules described in | 
|---|
| 132 |         <code><a href="sample.gmaj">sample.gmaj</a></code>. | 
|---|
| 133 |         <p class=small> | 
|---|
| 134 | <li>    Do not use filenames containing <code>/</code>, | 
|---|
| 135 |         <code>\</code>, or <code>:</code> in the bundle.  Gmaj | 
|---|
| 136 |         needs to remove the path that may have been added to each | 
|---|
| 137 |         name by the zip or jar program, and since it doesn't know | 
|---|
| 138 |         what platform that program was run on, it treats all of | 
|---|
| 139 |         these characters as path separators. | 
|---|
| 140 |         <p class=small> | 
|---|
| 141 | <li>    If you are not using a parameters file (i.e., you want to | 
|---|
| 142 |         specify the <a href="#align">alignment file</a> directly, | 
|---|
| 143 |         without any annotations or other data-related options), | 
|---|
| 144 |         then the alignment file must be listed in place of the | 
|---|
| 145 |         parameters file, not as a bundle (there's nothing else | 
|---|
| 146 |         to bundle with it anyway). | 
|---|
| 147 | </ul> | 
|---|
| 148 | <p> | 
|---|
| 149 | As an alternative to bundling, data files can be compressed | 
|---|
| 150 | individually in <code>.zip</code>, <code>.jar</code>, or | 
|---|
| 151 | <code>.gz</code> format; this gains the compact size for storage | 
|---|
| 152 | and transfer, but still requires overhead for multiple HTTP | 
|---|
| 153 | connections in applet mode.  The file name must end with the | 
|---|
| 154 | corresponding extension for the compression format to be | 
|---|
| 155 | recognized.  (Such files can also be included in the bundle | 
|---|
| 156 | if desired; though little if any additional compression is | 
|---|
| 157 | typically achieved, this may be more convenient than unzipping | 
|---|
| 158 | a large file just to bundle it.) | 
|---|
| 159 | <p> | 
|---|
| 160 |  | 
|---|
| 161 | <p class=hdr> | 
|---|
| 162 | <h3><a name="coord">Coordinate Systems</a></h3> | 
|---|
| 163 | <p> | 
|---|
| 164 | If you supply any annotations for Gmaj to display, these files | 
|---|
| 165 | must all use position coordinates that refer to the same original | 
|---|
| 166 | sequences identified in the MAF <a href="#align">alignment files</a> | 
|---|
| 167 | (ignoring any display offsets specified in the <a href="#param" | 
|---|
| 168 | >parameters file</a>).  However, even though the MAF coordinates | 
|---|
| 169 | are 0-based, the PipMaker-style annotation files all use a | 
|---|
| 170 | 1-based, closed-interval coordinate system (i.e., the first | 
|---|
| 171 | nucleotide in the sequence is called "1", and specified ranges | 
|---|
| 172 | include both endpoints).  This is for consistency with PipMaker, | 
|---|
| 173 | so the same files can be used with both programs, and the same | 
|---|
| 174 | tools can be used to prepare them.  Coordinates for generic | 
|---|
| 175 | annotations may be either 1-based or 0-based and closed or | 
|---|
| 176 | half-open, depending on the format, but Gmaj always adjusts | 
|---|
| 177 | them as needed (including the ones in the MAF files) to convert | 
|---|
| 178 | everything to a 1-based, closed-interval system for display. | 
|---|
| 179 | <p> | 
|---|
| 180 |  | 
|---|
| 181 | <p class=hdr> | 
|---|
| 182 | <h3><a name="align">Alignments</a></h3> | 
|---|
| 183 | <p> | 
|---|
| 184 | Gmaj is designed to display multiple-sequence alignments in | 
|---|
| 185 | <a href="http://genome.ucsc.edu/FAQ/FAQformat">MAF</a> format. | 
|---|
| 186 | It is especially suited for sequence-symmetric alignments from | 
|---|
| 187 | programs such as <a href="http://www.bx.psu.edu/miller_lab/" | 
|---|
| 188 | >TBA</a>, but can also display MAF files that have a fixed | 
|---|
| 189 | reference sequence.  (In the latter case it is a good idea to | 
|---|
| 190 | set the <code>refseq</code> field in your <a href="#param" | 
|---|
| 191 | >parameters file</a>, to prevent displaying the alignments with | 
|---|
| 192 | an inappropriate reference sequence.)  It is possible to display | 
|---|
| 193 | several alignment files simultaneously on the same plots, e.g. | 
|---|
| 194 | for comparing output from different alignment programs. | 
|---|
| 195 | <p> | 
|---|
| 196 | Gmaj normally requires that each sequence name appears at most | 
|---|
| 197 | once in each MAF block, i.e., that the values of the "src" field | 
|---|
| 198 | are unique across all of the <code>s</code> lines within the | 
|---|
| 199 | same block.  However, there is a special exception for the case | 
|---|
| 200 | of pairwise self-alignments: if all of the blocks have just two | 
|---|
| 201 | rows, then all of the sequence names can be the same.  In this | 
|---|
| 202 | case Gmaj distinguishes the rows in each block by internally | 
|---|
| 203 | adding a <code>~</code> suffix to the second row's sequence name; | 
|---|
| 204 | the <code>~</code> does not show in the main display, but you may | 
|---|
| 205 | occasionally see it in an error message. | 
|---|
| 206 | <p> | 
|---|
| 207 | The downside of this feature is that <b>sequence names in the MAF | 
|---|
| 208 | files must not end with <code>~</code></b>, even for non-self | 
|---|
| 209 | alignments. | 
|---|
| 210 | <p> | 
|---|
| 211 |  | 
|---|
| 212 | <p class=hdr> | 
|---|
| 213 | <h3><a name="exon">Exons</a></h3> | 
|---|
| 214 | <p> | 
|---|
| 215 | Each of these files lists the locations of genes, exons, and | 
|---|
| 216 | coding regions in a particular reference sequence.  The exons | 
|---|
| 217 | and UTRs are displayed as black and gray boxes in a separate | 
|---|
| 218 | panel above the alignment plots. | 
|---|
| 219 | <p> | 
|---|
| 220 | In the PipMaker-style exons format, the directionality of a gene | 
|---|
| 221 | (<code>></code>, <code><</code>, or <code>|</code>), its | 
|---|
| 222 | start and end positions, and name should be on one line, followed | 
|---|
| 223 | by an optional line beginning with a <code>+</code> character that | 
|---|
| 224 | indicates the first and last nucleotides of the translated region | 
|---|
| 225 | (including the initiation codon, <i>Met</i>, and the stop codon). | 
|---|
| 226 | These are followed by lines specifying the start and end positions | 
|---|
| 227 | of each exon, which must be listed in order of increasing address | 
|---|
| 228 | even if the gene is on the reverse strand (<code><</code>).  By | 
|---|
| 229 | default Gmaj will supply exon numbers, but you can override this | 
|---|
| 230 | by specifying your own name or number for individual exons.  Blank | 
|---|
| 231 | lines are ignored, and you can put an optional title line at the | 
|---|
| 232 | top.  Thus, the file might begin as follows: | 
|---|
| 233 | <pre> | 
|---|
| 234 |      My favorite genomic region | 
|---|
| 235 |  | 
|---|
| 236 |      < 100 800 XYZZY | 
|---|
| 237 |      + 150 750 | 
|---|
| 238 |      100 200 | 
|---|
| 239 |      600 800 | 
|---|
| 240 |  | 
|---|
| 241 |      > 1000 2000 Frobozz gene | 
|---|
| 242 |      1000 1200 exon 1 | 
|---|
| 243 |      1400 1500 alt. spliced exon | 
|---|
| 244 |      1800 2000 exon 2 | 
|---|
| 245 |  | 
|---|
| 246 |      ... etc. | 
|---|
| 247 | </pre> | 
|---|
| 248 | <p> | 
|---|
| 249 |  | 
|---|
| 250 | <p class=hdr> | 
|---|
| 251 | <h3><a name="repeat">Repeats</a></h3> | 
|---|
| 252 | <p> | 
|---|
| 253 | Each of these files lists interspersed repeats (and possibly other | 
|---|
| 254 | features such as CpG islands) in a particular reference sequence. | 
|---|
| 255 | These are displayed in a separate panel just below the exons, | 
|---|
| 256 | using the same shapes and shading as PipMaker if possible. | 
|---|
| 257 | <p> | 
|---|
| 258 | In the PipMaker-style repeats format, the first line identifies | 
|---|
| 259 | this as a simplified repeats file (as opposed to | 
|---|
| 260 | <a href="http://www.repeatmasker.org/">RepeatMasker</a> output, | 
|---|
| 261 | which Gmaj does not yet support).  Each subsequent line specifies | 
|---|
| 262 | the start, end, direction, and type of an individual feature. | 
|---|
| 263 | <pre> | 
|---|
| 264 |      %:repeats | 
|---|
| 265 |  | 
|---|
| 266 |      1081 1364 Right Alu | 
|---|
| 267 |      1365 1405 Simple | 
|---|
| 268 |      ... etc. | 
|---|
| 269 | </pre> | 
|---|
| 270 | The allowed PipMaker types are: | 
|---|
| 271 | <code>Alu</code>, <code>B1</code>, <code>B2</code>, | 
|---|
| 272 | <code>SINE</code>, <code>LINE1</code>, <code>LINE2</code>, | 
|---|
| 273 | <code>MIR</code>, <code>LTR</code>, <code>DNA</code>, | 
|---|
| 274 | <code>RNA</code>, <code>Simple</code>, <code>CpG60</code>, | 
|---|
| 275 | <code>CpG75</code>, and <code>Other</code>.  Of these, all except | 
|---|
| 276 | <code>Simple</code>, <code>CpG60</code>, and <code>CpG75</code> | 
|---|
| 277 | require a direction (<code>Right</code> or <code>Left</code>). | 
|---|
| 278 | <p> | 
|---|
| 279 |  | 
|---|
| 280 | <p class=hdr> | 
|---|
| 281 | <h3><a name="link">Linkbars</a></h3> | 
|---|
| 282 | <p> | 
|---|
| 283 | Each of these files contains reference annotations, i.e., | 
|---|
| 284 | noteworthy regions in a particular reference sequence, which are | 
|---|
| 285 | drawn in a separate panel as colored bars.  Typically each bar | 
|---|
| 286 | has an associated URL pointing to a web site with more information | 
|---|
| 287 | about the region, but this is not required.  In applet mode Gmaj | 
|---|
| 288 | opens a new browser window to visit the linked site when the user | 
|---|
| 289 | clicks on a bar; in stand-alone mode Gmaj is not running within | 
|---|
| 290 | a web browser, so it just displays the URL for the user to visit | 
|---|
| 291 | manually via copy-and-paste. | 
|---|
| 292 | <p> | 
|---|
| 293 | The PipMaker-style format first defines various types of links | 
|---|
| 294 | and associates a color with each of them, then specifies the type, | 
|---|
| 295 | position, description, and URL for each annotated region. | 
|---|
| 296 | <pre> | 
|---|
| 297 |      # linkbars for part of the mouse MHC class II region | 
|---|
| 298 |  | 
|---|
| 299 |      %define type | 
|---|
| 300 |      %name PubMed | 
|---|
| 301 |      %color Blue | 
|---|
| 302 |  | 
|---|
| 303 |      %define type | 
|---|
| 304 |      %name LocusLink | 
|---|
| 305 |      %color Orange | 
|---|
| 306 |  | 
|---|
| 307 |      %define annotation | 
|---|
| 308 |      %type PubMed | 
|---|
| 309 |      %range 1 2000 | 
|---|
| 310 |      %label Yang et al. 1997.  Daxx, a novel Fas-binding protein... | 
|---|
| 311 |      %summary Yang, X., Khosravi-Far, R. Chang, H., and Baltimore, D. (1997). | 
|---|
| 312 |        Daxx, a novel Fas-binding protein that activates JNK and apoptosis. | 
|---|
| 313 |        Cell 89(7):1067-76. | 
|---|
| 314 |      %url http://www.ncbi.nlm.nih.gov:80/entrez/ | 
|---|
| 315 |      query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9215629&dopt=Abstract | 
|---|
| 316 |  | 
|---|
| 317 |      ... etc. | 
|---|
| 318 | </pre> | 
|---|
| 319 | Here, for example, the first stanza requests that each feature | 
|---|
| 320 | subsequently identified as a PubMed entry be colored blue. | 
|---|
| 321 | The name must be a single word, perhaps containing underline | 
|---|
| 322 | characters (e.g., <code>Entry_in_GenBank</code>), and the color | 
|---|
| 323 | must come from Gmaj's <a href="#color">color list</a>. | 
|---|
| 324 | <p> | 
|---|
| 325 | The third stanza associates a PubMed link with positions | 
|---|
| 326 | 1-2000 in this sequence.  The label should be kept fairly | 
|---|
| 327 | short, as it will be displayed on Gmaj's position indicator line | 
|---|
| 328 | when the user points at this linkbar.  The summary is optional; | 
|---|
| 329 | it is used only by PipMaker and will be ignored by Gmaj.  Also, | 
|---|
| 330 | while PipMaker allows several summary/URL pairs within a single | 
|---|
| 331 | annotation, Gmaj expects each field to occur at most once.  If | 
|---|
| 332 | Gmaj encounters extra URLs, it will just use the first one and | 
|---|
| 333 | display a warning message. | 
|---|
| 334 | <p> | 
|---|
| 335 | Note that summaries and URLs (but not labels) can be broken into | 
|---|
| 336 | several lines for convenience; the line breaks are removed when | 
|---|
| 337 | the file is read, but they are not replaced with spaces.  Thus | 
|---|
| 338 | a continuation line for a summary typically begins with a space | 
|---|
| 339 | to separate it from the last word of the previous line, while | 
|---|
| 340 | a URL continuation does not. | 
|---|
| 341 | <p> | 
|---|
| 342 | Also note that stanzas should be separated by blank lines, and | 
|---|
| 343 | lines beginning with a <code>#</code> character are comments | 
|---|
| 344 | that will be ignored.  The linkbars can appear in the file in | 
|---|
| 345 | any order, and several can overlap at the same position with no | 
|---|
| 346 | problem, since Gmaj will display them in multiple rows if | 
|---|
| 347 | necessary.  In PipMaker this format is called "annotations with | 
|---|
| 348 | hyperlinks". | 
|---|
| 349 | <p> | 
|---|
| 350 |  | 
|---|
| 351 | <p class=hdr> | 
|---|
| 352 | <h3><a name="under">Underlays</a></h3> | 
|---|
| 353 | <p> | 
|---|
| 354 | Each of these files specifies underlays (colored bands) to be | 
|---|
| 355 | painted on a particular pairwise pip and its corresponding | 
|---|
| 356 | dotplot.  The bands are specified as regions in the reference | 
|---|
| 357 | sequence and are normally drawn vertically; however for a dotplot, | 
|---|
| 358 | Gmaj will also look to see if you have specified an underlay file | 
|---|
| 359 | for the transposed situation where the reference and secondary | 
|---|
| 360 | sequences are swapped, and if so, will draw those underlays as | 
|---|
| 361 | horizontal bands in the secondary sequence. | 
|---|
| 362 | <p> | 
|---|
| 363 | The PipMaker-style underlay format supported by Gmaj looks like | 
|---|
| 364 | this: | 
|---|
| 365 | <pre> | 
|---|
| 366 |      # partial underlays for the BTK region | 
|---|
| 367 |  | 
|---|
| 368 |      LightYellow Gene | 
|---|
| 369 |      Green Exon | 
|---|
| 370 |      Red Strongly_conserved | 
|---|
| 371 |  | 
|---|
| 372 |      35324 72009 (BTK gene) Gene | 
|---|
| 373 |      49781 49849 (exon 4) Exon | 
|---|
| 374 |      51403 51484 Exon | 
|---|
| 375 |      50350 50513 (conserved 84%) Strongly_conserved 84 | 
|---|
| 376 |      52376 52603 (Kilroy was here) Strongly_conserved 92 + | 
|---|
| 377 |      ... etc. | 
|---|
| 378 | </pre> | 
|---|
| 379 | The first group of lines describes the intended meaning of the | 
|---|
| 380 | colors, while the second group specifies the location of each band. | 
|---|
| 381 | Colors must come from Gmaj's <a href="#color">color list</a>, but | 
|---|
| 382 | the meaning of each color can be any single word chosen by you. | 
|---|
| 383 | The text in parentheses is an optional label which will be | 
|---|
| 384 | displayed on Gmaj's position indicator line when the user points | 
|---|
| 385 | the mouse at that band.  The parentheses must be present if the | 
|---|
| 386 | label is, and the label itself cannot contain any additional | 
|---|
| 387 | parentheses.  The number following the color category is an | 
|---|
| 388 | optional integer score that can be used to interactively adjust | 
|---|
| 389 | which underlays are displayed; see "Underlays Box" in the | 
|---|
| 390 | Menus and Widgets section of <a href="gmaj_help.html" | 
|---|
| 391 | >Starting and Running Gmaj</a> for more information.  (The | 
|---|
| 392 | label and score are extra features not supported by PipMaker.) | 
|---|
| 393 | A <code>+</code> or <code>-</code> character at the end of a | 
|---|
| 394 | location line will paint just the upper or lower half of the band | 
|---|
| 395 | on the pip (but is ignored for dotplots).  This allows you to | 
|---|
| 396 | differentiate between the two strands, or to plot potentially | 
|---|
| 397 | overlapping features like gene predictions and database matches. | 
|---|
| 398 | <p> | 
|---|
| 399 | Note that if two bands overlap, the one that was specified last | 
|---|
| 400 | in the file appears "on top" and obscures the earlier one (except | 
|---|
| 401 | for the special <code><a href="#hatch">Hatch</a></code> color). | 
|---|
| 402 | Thus in this example, the green exons and red strongly conserved | 
|---|
| 403 | regions cover up parts of the long yellow band representing the | 
|---|
| 404 | gene.  As in the links file, lines beginning with a <code>#</code> | 
|---|
| 405 | character are comments that will be ignored. | 
|---|
| 406 | <p> | 
|---|
| 407 |  | 
|---|
| 408 | <p class=hdr> | 
|---|
| 409 | <h3><a name="high">Highlights</a></h3> | 
|---|
| 410 | <p> | 
|---|
| 411 | Highlight files are analogous to the <a href="#under">underlay</a> | 
|---|
| 412 | files, but each of these specifies colored regions for a | 
|---|
| 413 | particular sequence in the text view, rather than for a plot. | 
|---|
| 414 | If you do not specify a highlight file for a particular sequence, | 
|---|
| 415 | Gmaj will automatically provide default highlights based on the | 
|---|
| 416 | <a href="#exon">exons</a> file (if you provided one).  These will | 
|---|
| 417 | use one color for whole genes, overlaid with different colors to | 
|---|
| 418 | indicate exons on the forward vs. reverse strand.  If the exons | 
|---|
| 419 | file specifies a gene's translated region, then the 5´ and | 
|---|
| 420 | 3´ UTRs will be shaded using lighter colors.  These default | 
|---|
| 421 | highlights make it easy to examine the putative start/stop codons | 
|---|
| 422 | and splice junctions, as well as providing a visual connection | 
|---|
| 423 | between the graphical and text views.  But if for some reason you | 
|---|
| 424 | do not want any text highlights, you can suppress them by | 
|---|
| 425 | specifying an empty highlight file. | 
|---|
| 426 | <p> | 
|---|
| 427 | The PipMaker-style format for highlights is the same as for | 
|---|
| 428 | underlays, except that any <code>+</code> or <code>-</code> | 
|---|
| 429 | indicators will be ignored, and the <code>Hatch</code> color is | 
|---|
| 430 | not supported for highlights.  Just as with underlays, labels | 
|---|
| 431 | can be included which will be shown when the user points at | 
|---|
| 432 | the highlight, scores can be used to limit which entries are | 
|---|
| 433 | displayed, and highlights that are listed later in the file will | 
|---|
| 434 | cover up those that appear earlier. | 
|---|
| 435 | <p> | 
|---|
| 436 |  | 
|---|
| 437 | <p class=hdr> | 
|---|
| 438 | <h3><a name="color">Color List</a></h3> | 
|---|
| 439 | <p> | 
|---|
| 440 | For Gmaj's PipMaker-style annotations, the available colors are: | 
|---|
| 441 | <pre> | 
|---|
| 442 |     Black   White        Clear | 
|---|
| 443 |     Gray    LightGray    DarkGray | 
|---|
| 444 |     Red     LightRed     DarkRed | 
|---|
| 445 |     Green   LightGreen   DarkGreen | 
|---|
| 446 |     Blue    LightBlue    DarkBlue | 
|---|
| 447 |     Yellow  LightYellow  DarkYellow | 
|---|
| 448 |     Pink    LightPink    DarkPink | 
|---|
| 449 |     Cyan    LightCyan    DarkCyan | 
|---|
| 450 |     Purple  LightPurple  DarkPurple | 
|---|
| 451 |     Orange  LightOrange  DarkOrange | 
|---|
| 452 |     Brown   LightBrown   DarkBrown | 
|---|
| 453 | </pre> | 
|---|
| 454 | These names are case-sensitive (i.e., capitalization matters). | 
|---|
| 455 | Not all of these are supported by PipMaker.  Also, be aware that | 
|---|
| 456 | the appearance of the colors may vary between PipMaker and Gmaj, | 
|---|
| 457 | and from one printer or monitor to the next. | 
|---|
| 458 | <p class=subhdr> | 
|---|
| 459 | <a name="hatch"><b><code>Hatch</code></b></a> | 
|---|
| 460 | <p> | 
|---|
| 461 | In addition to the regular colors listed above, Gmaj supports a | 
|---|
| 462 | special "color" for underlays called <code>Hatch</code>, which | 
|---|
| 463 | is drawn as a pattern of diagonal gray lines.  Normally if two | 
|---|
| 464 | underlays overlap, the one that was specified last in the file | 
|---|
| 465 | appears "on top" and obscures the earlier one.  However, | 
|---|
| 466 | <code>Hatch</code> underlays have the special property that they | 
|---|
| 467 | are always drawn after the other colors, and since the space | 
|---|
| 468 | between the diagonal lines is transparent, they allow the other | 
|---|
| 469 | colors to show through.  Currently <code>Hatch</code> is only | 
|---|
| 470 | supported for underlays, not for highlights or linkbars. | 
|---|
| 471 | <p> | 
|---|
| 472 |  | 
|---|
| 473 | <p class=hdr> | 
|---|
| 474 | <h3><a name="generic">Generic Annotation Formats</a></h3> | 
|---|
| 475 | <p> | 
|---|
| 476 | The standardized generic formats currently supported by Gmaj | 
|---|
| 477 | include | 
|---|
| 478 | <a href="http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml" | 
|---|
| 479 | >GFF</a> (v1 & v2), | 
|---|
| 480 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#GTF" | 
|---|
| 481 | >GTF</a>, and various flavors of | 
|---|
| 482 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED" | 
|---|
| 483 | >BED</a> (including the full BED12 format, a.k.a. "gene BED"). | 
|---|
| 484 | For details on these formats, please see the specifications at | 
|---|
| 485 | the above links; this document will mainly discuss their use | 
|---|
| 486 | by Gmaj. | 
|---|
| 487 | <p> | 
|---|
| 488 | These formats are all <b>tab-separated</b>, and despite their | 
|---|
| 489 | differences are similar enough that Gmaj can extract comparable | 
|---|
| 490 | fields and treat them more or less the same.  Note that Gmaj is | 
|---|
| 491 | not intended as a format validator: parsing is more lenient in | 
|---|
| 492 | some respects than the official format specifications, and Gmaj | 
|---|
| 493 | will ignore fields it has no use for.  Also, interpretation of | 
|---|
| 494 | these open-ended formats depends partly on what type of annotation | 
|---|
| 495 | is expected; e.g. if Gmaj is trying to read exons from a GFF v1 | 
|---|
| 496 | file, it will assume that the group field is the gene name.  It | 
|---|
| 497 | will generally show warning messages to keep the user apprised | 
|---|
| 498 | of any such assumptions it is making (if these become too annoying | 
|---|
| 499 | they can be individually suppressed in the <a href="#param" | 
|---|
| 500 | >parameters file</a>; see <code><a href="sample.gmaj" | 
|---|
| 501 | >sample.gmaj</a></code> for details).  Because one of the main | 
|---|
| 502 | reasons for supporting these formats is to enable the use of | 
|---|
| 503 | annotation files obtained from public sources, Gmaj tries not to | 
|---|
| 504 | balk at anomalies that are probably not the user's fault, and | 
|---|
| 505 | when practical will simply skip questionable items with a warning | 
|---|
| 506 | message.  Each type of message will generally be displayed only | 
|---|
| 507 | once, and not repeated for every item with the same problem. | 
|---|
| 508 | <p> | 
|---|
| 509 | <p class=subhdr> | 
|---|
| 510 | <a name="fileext"><b>Filename Extensions</b></a> | 
|---|
| 511 | <p> | 
|---|
| 512 | In order to distinguish generic files from PipMaker-style ones | 
|---|
| 513 | and handle them appropriately, Gmaj requires that files in | 
|---|
| 514 | generic formats have names ending with any of certain extensions. | 
|---|
| 515 | The default list is <code>.gff</code>, <code>.gtf</code>, | 
|---|
| 516 | <code>.bed</code>, <code>.ct</code>, and <code>.trk</code>, but | 
|---|
| 517 | this can be customized (see <code><a href="sample.gmaj" | 
|---|
| 518 | >sample.gmaj</a></code>). | 
|---|
| 519 | <p> | 
|---|
| 520 | <p class=subhdr> | 
|---|
| 521 | <a name="quote"><b>Quoting</b></a> | 
|---|
| 522 | <p> | 
|---|
| 523 | Some of the generic formats require text values to be enclosed | 
|---|
| 524 | in double quotes (<code>" "</code>).  Even when not strictly | 
|---|
| 525 | required it is usually a good idea to do so, especially if the | 
|---|
| 526 | value contains spaces.  The official specifications generally | 
|---|
| 527 | don't say what to do if a value contains embedded quote | 
|---|
| 528 | characters, but Gmaj supports a rudimentary mechanism for | 
|---|
| 529 | escaping them with a backslash (<code>\</code>).  However it | 
|---|
| 530 | does not provide for escaping the backslash: quoted values | 
|---|
| 531 | should not end with <code>\</code> (insert a space before the | 
|---|
| 532 | final quote if necessary). | 
|---|
| 533 | <p> | 
|---|
| 534 | <p class=subhdr> | 
|---|
| 535 | <a name="empty"><b>Empty Fields</b></a> | 
|---|
| 536 | <p> | 
|---|
| 537 | When reading the generic formats, Gmaj treats two adjacent tab | 
|---|
| 538 | characters as an empty field.  However, your files will be easier | 
|---|
| 539 | for humans to read if you do not leave fields completely empty. | 
|---|
| 540 | Gmaj recognizes a value of <code>.</code> (the dot character) | 
|---|
| 541 | to mean "unspecified" for fields such as strand, score, feature, | 
|---|
| 542 | and color, in some cases even when the official formats don't. | 
|---|
| 543 | For instance, GFF v2 explicitly calls for using <code>.</code> | 
|---|
| 544 | when there is no score, but Gmaj allows you to do this with the | 
|---|
| 545 | other generic formats as well, in order to distinguish between | 
|---|
| 546 | "no score" and a score that is truly zero.  For colors, in | 
|---|
| 547 | addition to <code>.</code> Gmaj also interprets <code>0</code> | 
|---|
| 548 | to mean "unspecified", in keeping with examples at UCSC. | 
|---|
| 549 | <p> | 
|---|
| 550 | <p class=subhdr> | 
|---|
| 551 | <a name="gencoord"><b>Coordinates</b></a> | 
|---|
| 552 | <p> | 
|---|
| 553 | The GFF and GTF formats use 1-based, closed-interval coordinates | 
|---|
| 554 | (i.e., sequence numbering starts with "1", and specified ranges | 
|---|
| 555 | include both endpoints), while BED uses a 0-based, half-open | 
|---|
| 556 | system (the first nucleotide of the sequence is numbered "0", | 
|---|
| 557 | and the ending position is not included in the region).  For all | 
|---|
| 558 | of these formats, positions are given relative to the beginning | 
|---|
| 559 | of the named sequence regardless of which strand the feature is | 
|---|
| 560 | on (unlike MAF), and <code>start</code> must be less than or | 
|---|
| 561 | equal to <code>end</code>. | 
|---|
| 562 | <p> | 
|---|
| 563 | <p class=subhdr> | 
|---|
| 564 | <a name="gffconv"><b>GFF Conventions</b></a> | 
|---|
| 565 | <p> | 
|---|
| 566 | BED format is relatively fixed in how its fields are used, but | 
|---|
| 567 | GFF and GTF are more variable and require additional conventions | 
|---|
| 568 | for most effective use with Gmaj.  In particular, the values of | 
|---|
| 569 | the "feature" field and the optional "attributes" affect how Gmaj | 
|---|
| 570 | will interpret and display an item. | 
|---|
| 571 | <p> | 
|---|
| 572 | Values of the feature field that are recognized for special | 
|---|
| 573 | treatment include: | 
|---|
| 574 | <p class=tiny> | 
|---|
| 575 | <ul class="notop nobottom"> | 
|---|
| 576 | <li>    <code>gene</code> or values starting with <code>gene_</code> | 
|---|
| 577 | <li>    <code>exon</code> or values starting with <code>exon_</code> | 
|---|
| 578 | <li>    <code>start_codon</code>, <code>str_codon</code>, | 
|---|
| 579 |         <code>stop_codon</code>, <code>stp_codon</code>, or | 
|---|
| 580 |         <code>cds</code> | 
|---|
| 581 | <li>    <code>repeatmasker</code> or any of the  | 
|---|
| 582 |         <a href="#repeat">PipMaker repeat or CpG types</a> | 
|---|
| 583 | </ul> | 
|---|
| 584 | <p class=tiny> | 
|---|
| 585 | Of these, only the PipMaker types are case-sensitive. | 
|---|
| 586 | <p> | 
|---|
| 587 | For GFF v2 and GTF, the currently recognized attribute tags are: | 
|---|
| 588 | <p class=tiny> | 
|---|
| 589 | <ul class="notop nobottom"> | 
|---|
| 590 | <li>    <code>gene</code> or <code>gene_id</code>: the name of the | 
|---|
| 591 |         gene, e.g. for grouping exons (<code>transcript_id</code> is | 
|---|
| 592 |         ignored) | 
|---|
| 593 | <li>    <code>name</code>: an optional name for this individual item, | 
|---|
| 594 |         e.g. for an exon label | 
|---|
| 595 | <li>    <code>sequence</code> (when feature is | 
|---|
| 596 |         <code>repeatmasker</code>): the name/class/family of the | 
|---|
| 597 |         repeat, e.g. <code>AluJb/SINE/Alu</code> | 
|---|
| 598 | <li>    <code>color</code>: a <a href="#gencolor">color</a> | 
|---|
| 599 |         specification in UCSC format, e.g. <code>0,0,255</code> | 
|---|
| 600 | <li>    <code>url</code> or <code>ucsc_id</code>: the URL for | 
|---|
| 601 |         linkbars; <code>$$</code> will be replaced with the value of | 
|---|
| 602 |         <code>name</code> | 
|---|
| 603 | </ul> | 
|---|
| 604 | <p class=tiny> | 
|---|
| 605 | These keywords are not case-sensitive, but they cannot have | 
|---|
| 606 | multiple values. | 
|---|
| 607 | <p> | 
|---|
| 608 | <p class=subhdr> | 
|---|
| 609 | <a name="custom"><b>Custom Tracks</b></a> | 
|---|
| 610 | <p> | 
|---|
| 611 | Along with the basic formats listed above, Gmaj also supports UCSC | 
|---|
| 612 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks" | 
|---|
| 613 | >custom track</a> headers. | 
|---|
| 614 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#TRACK" | 
|---|
| 615 | >Track lines</a> can specify certain settings for an entire | 
|---|
| 616 | track; currently <code><a href="#gencolor">color</a></code>, | 
|---|
| 617 | <code><a href="#gencolor">itemRgb</a></code>, <code>offset</code>, | 
|---|
| 618 | and <code>url</code> are supported.  They also allow several | 
|---|
| 619 | tracks (even in mixed formats) to be combined in a single file. | 
|---|
| 620 | Gmaj does not currently provide a way to use just one particular | 
|---|
| 621 | track from such a file (it will be treated as one big bag of | 
|---|
| 622 | annotations), but lines in unsupported formats such as | 
|---|
| 623 | <a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html" | 
|---|
| 624 | >WIG</a> are gracefully skipped. | 
|---|
| 625 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#lines" | 
|---|
| 626 | >Browser lines</a> are also skipped; Gmaj's initial zoom position | 
|---|
| 627 | is controlled by command-line or applet parameters rather than by | 
|---|
| 628 | individual annotation files. | 
|---|
| 629 | <p> | 
|---|
| 630 | <p class=subhdr> | 
|---|
| 631 | <a name="multiseq"><b>Multiple Sequences</b></a> | 
|---|
| 632 | <p> | 
|---|
| 633 | Generic files can also contain annotations for several sequences, | 
|---|
| 634 | because unlike the PipMaker-style formats, they all have a | 
|---|
| 635 | "seqname" or "chrom" field that Gmaj can use to select the | 
|---|
| 636 | appropriate lines.  Ideally Gmaj expects this field to match | 
|---|
| 637 | the sequence name from the <a href="#align">alignment files</a>, | 
|---|
| 638 | but has two ways to deal with exceptions.  If there is only one | 
|---|
| 639 | seqname in the annotation file, then Gmaj will go ahead and use | 
|---|
| 640 | it, but will display a warning (unless the mismatch can be fixed | 
|---|
| 641 | by prepending the organism name, or the organism name plus | 
|---|
| 642 | <code>chr</code>, to the annotation seqname).  But if the file | 
|---|
| 643 | has annotations for several sequences and some don't match the | 
|---|
| 644 | alignment files, you need to tell Gmaj which is which by adding | 
|---|
| 645 | an alias in the <a href="#param">parameters file</a> (see | 
|---|
| 646 | <code><a href="sample.gmaj">sample.gmaj</a></code>). | 
|---|
| 647 | <p> | 
|---|
| 648 | <p class=subhdr> | 
|---|
| 649 | <a name="reuse"><b>Reusing Files</b></a> | 
|---|
| 650 | <p> | 
|---|
| 651 | One of the advantages of using generic formats is that files can | 
|---|
| 652 | be reused in multiple panels without reformatting, e.g. as both | 
|---|
| 653 | exons and underlays.  Normally linkbars, underlays, and text | 
|---|
| 654 | highlights are simply handled as arbitrary regions of a specified | 
|---|
| 655 | color, since they could represent any type of biological feature. | 
|---|
| 656 | However, you can ask Gmaj to interpret them as exons or repeats | 
|---|
| 657 | by adding a type hint in the <a href="#param">parameters file</a> | 
|---|
| 658 | (see <code><a href="sample.gmaj">sample.gmaj</a></code>).  Note | 
|---|
| 659 | that currently this will also cause any <a href="#gencolor" | 
|---|
| 660 | >specified colors</a> in that file to be overridden with Gmaj's | 
|---|
| 661 | defaults. | 
|---|
| 662 | <p> | 
|---|
| 663 | Combining several biological types of annotations (e.g. exons | 
|---|
| 664 | and repeats) in one file is possible, but not recommended.  Gmaj | 
|---|
| 665 | will try to skip lines that are not appropriate for the type it | 
|---|
| 666 | is seeking, but it may draw more than you want. | 
|---|
| 667 | <p> | 
|---|
| 668 | <p class=subhdr> | 
|---|
| 669 | <a name="cds"><b>Coding Sequence</b></a> | 
|---|
| 670 | <p> | 
|---|
| 671 | Currently Gmaj has no special support for multiple transcripts. | 
|---|
| 672 | When inferring UTRs, all of the CDS-related items for a single | 
|---|
| 673 | gene name are combined, and the interval from the lowest | 
|---|
| 674 | coordinate to the highest is used as the CDS.  Also, some of the | 
|---|
| 675 | formats' rules specify whether or not the initiation and stop | 
|---|
| 676 | codons should be included in the CDS, but Gmaj does not make | 
|---|
| 677 | adjustments to compensate for that; instead it simply includes | 
|---|
| 678 | all of the given endpoints in the CDS. | 
|---|
| 679 | <!-- and leaves it up to the user to interpret the display based | 
|---|
| 680 | on the convention used in the files he/she provided.  [the user | 
|---|
| 681 | does not supply files for applets] --> | 
|---|
| 682 | <p> | 
|---|
| 683 | <p class=subhdr> | 
|---|
| 684 | <a name="gencolor"><b>Colors</b></a> | 
|---|
| 685 | <p> | 
|---|
| 686 | Colors can be specified for individual annotation lines via the | 
|---|
| 687 | <code>itemRgb</code> field (for BED) or a <code>color</code> | 
|---|
| 688 | attribute (for GFF v2 or GTF).  However, for <a href="#custom" | 
|---|
| 689 | >custom tracks</a>, these are governed by the track line's | 
|---|
| 690 | <code>itemRgb</code> attribute, which defaults to off per the | 
|---|
| 691 | UCSC specification.  Thus if you have track lines and want to | 
|---|
| 692 | use the per-item colors, you need to include | 
|---|
| 693 | <code>itemRgb=On</code> in the track attributes. | 
|---|
| 694 | <p> | 
|---|
| 695 | Track lines can also have a <code>color</code> attribute for | 
|---|
| 696 | the entire track, which will be used if <code>itemRgb</code> is | 
|---|
| 697 | off, or if an individual item does not have its own color. | 
|---|
| 698 | However in a rare break from the UCSC specification, Gmaj does | 
|---|
| 699 | not use black as the default if the track color is unspecified | 
|---|
| 700 | (black underlays and highlights just don't work with black plots | 
|---|
| 701 | and text).  Instead it uses its own default colors, which for | 
|---|
| 702 | genes/exons are the same as the colors for <a href="#high" | 
|---|
| 703 | >default highlights</a>, or light gray for other annotations. | 
|---|
| 704 | Note that these defaults will also override your colors when | 
|---|
| 705 | <a href="#reuse">type hints</a> are used. | 
|---|
| 706 | <p> | 
|---|
| 707 | All of the above-mentioned color values are specified in UCSC | 
|---|
| 708 | format, which consists of three comma-separated RGB values from | 
|---|
| 709 | 0-255 (e.g. <code>0,0,255</code>). | 
|---|
| 710 | <p> | 
|---|
| 711 | <p class=subhdr> | 
|---|
| 712 | <a name="sort"><b>Sorting</b></a> | 
|---|
| 713 | <p> | 
|---|
| 714 | The order of the lines is not supposed to matter in these generic | 
|---|
| 715 | formats, but for most of the Gmaj panels it does matter:  exons | 
|---|
| 716 | need to be grouped by gene and ordered by position so UTRs can be | 
|---|
| 717 | inferred and exon numbers assigned, early underlays are covered | 
|---|
| 718 | up by later ones, etc.  Gmaj solves this problem by sorting the | 
|---|
| 719 | data before it is displayed.  Exons are sorted first by gene name | 
|---|
| 720 | in ascending order, and then within each gene by start position | 
|---|
| 721 | (ascending) and lastly in case of a tie, by end position | 
|---|
| 722 | (descending).  All other annotation types are sorted first by | 
|---|
| 723 | length in descending order, and then in case of a tie by start | 
|---|
| 724 | position (ascending).  This usually produces a reasonable display, | 
|---|
| 725 | but if you need direct control of the order, you can use the | 
|---|
| 726 | PipMaker-style formats instead. | 
|---|
| 727 | <p> | 
|---|
| 728 |  | 
|---|
| 729 | <p class=vvlarge> | 
|---|
| 730 | <hr> | 
|---|
| 731 | <i>Cathy Riemer, June 2008</i> | 
|---|
| 732 |  | 
|---|
| 733 | <p class=scrollspace> | 
|---|
| 734 | </body> | 
|---|
| 735 | </html> | 
|---|