1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" |
---|
2 | "http://www.w3.org/TR/html4/loose.dtd"> |
---|
3 | <html> |
---|
4 | <head> |
---|
5 | <title>Input Files for Gmaj</title> |
---|
6 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
7 | <meta http-equiv="Content-Style-Type" content="text/css"> |
---|
8 | <link rel="stylesheet" type="text/css" href="gmaj.css"> |
---|
9 | </head> |
---|
10 | <body> |
---|
11 | <p class=vvlarge> |
---|
12 | <h2>Input Files for Gmaj</h2> |
---|
13 | <p class=vvlarge> |
---|
14 | TABLE OF CONTENTS |
---|
15 | <p class=small> |
---|
16 | <ul class=notop> |
---|
17 | <li><a href="#intro">Introduction</a> |
---|
18 | <li><a href="#param">Parameters File</a> |
---|
19 | <li><a href="#zip">Compression and Bundling</a> |
---|
20 | <li><a href="#coord">Coordinate Systems</a> |
---|
21 | <li><a href="#align">Alignments</a> |
---|
22 | <li><a href="#exon">Exons</a> |
---|
23 | <li><a href="#repeat">Repeats</a> |
---|
24 | <li><a href="#link">Linkbars</a> |
---|
25 | <li><a href="#under">Underlays</a> |
---|
26 | <li><a href="#high">Highlights</a> |
---|
27 | <li><a href="#color">Color List</a> |
---|
28 | <li><a href="#generic">Generic Annotation Formats</a> |
---|
29 | </ul> |
---|
30 | <p class=vlarge> |
---|
31 | |
---|
32 | <p class=hdr> |
---|
33 | <h3><a name="intro">Introduction</a></h3> |
---|
34 | <p> |
---|
35 | This page describes the input files supported by Gmaj, and their |
---|
36 | formats. Only the <a href="#align">alignment file</a> is |
---|
37 | required; the others are optional. Except where noted, all |
---|
38 | information applies to both the stand-alone and applet modes of |
---|
39 | Gmaj. |
---|
40 | <p> |
---|
41 | For annotations, Gmaj supports two broad categories of file |
---|
42 | formats. The original set of formats is essentially the same as |
---|
43 | those used by <a href="http://pipmaker.bx.psu.edu/pipmaker/" |
---|
44 | >PipMaker</a> and <a href="http://globin.bx.psu.edu/dist/laj/" |
---|
45 | >Laj</a>, where each destination for the data (exons panel, color |
---|
46 | underlays, etc.) has its own file format tailored for the needs of |
---|
47 | that display. These files can be cumbersome to prepare manually, |
---|
48 | though PipMaker's associated utilities, such as |
---|
49 | <a href="http://pipmaker.bx.psu.edu/piphelper/">PipHelper</a> and |
---|
50 | the <a href="http://pipmaker.bx.psu.edu/pipmaker/tools.html" |
---|
51 | >PipTools</a>, can significantly reduce the burden. |
---|
52 | <p> |
---|
53 | However, since sequence annotations are increasingly becoming |
---|
54 | available in standardized formats from on-line resources such as |
---|
55 | the <a href="http://genome.ucsc.edu/cgi-bin/hgTables">UCSC Table |
---|
56 | Browser</a>, Gmaj can now accept some of these formats as well. |
---|
57 | These are referred to here as "generic" formats because they are |
---|
58 | not restricted to a particular biological data type or Gmaj |
---|
59 | display panel. |
---|
60 | <p> |
---|
61 | The PipMaker-style formats are described below in the sections for |
---|
62 | each panel, while the generic ones are discussed in a separate |
---|
63 | section, <a href="#generic">Generic Annotation Formats</a>. |
---|
64 | <p class=large> |
---|
65 | <center> |
---|
66 | <table width=55%> |
---|
67 | <tr> |
---|
68 | <td valign=top align=right><img class=lower src="hand14.gif"> |
---|
69 | <!-- Pointing hand icon is from Clip Art Warehouse, |
---|
70 | at http://www.clipart.co.uk/ --> |
---|
71 | </td> |
---|
72 | <td valign=top> |
---|
73 | <ul class="notop nobottom lessindent"> |
---|
74 | <li> <b>All files must consist solely of plain text ASCII |
---|
75 | characters.</b> (For example, no Word documents.) |
---|
76 | <p class=small> |
---|
77 | <li> <b>All <a href="#coord">coordinates</a> for PipMaker-style |
---|
78 | annotations are 1-based, closed interval.</b> Those |
---|
79 | for generic annotations may be either 1-based or 0-based |
---|
80 | and closed or half-open, depending on the format. |
---|
81 | </ul> |
---|
82 | </td> |
---|
83 | </tr> |
---|
84 | </table> |
---|
85 | </center> |
---|
86 | <p> |
---|
87 | |
---|
88 | <p class=hdr> |
---|
89 | <h3><a name="param">Parameters File</a></h3> |
---|
90 | <p> |
---|
91 | The annotation files are optional, but because in some alignments |
---|
92 | any of the sequences can be viewed as the reference sequence, |
---|
93 | there are potentially a large number of annotation files to |
---|
94 | provide, too many to type their names on the command line or |
---|
95 | paste them into a dialog box every time you want to view the data. |
---|
96 | For this reason, Gmaj uses a meta-level <b>parameters file</b> |
---|
97 | that lists the names of all the data files, plus a few other |
---|
98 | data-related options. Then when running Gmaj, you only have to |
---|
99 | specify that one file name. However, if you don't want to use |
---|
100 | any of these annotations or options, you can specify a single |
---|
101 | <a href="#align">alignment file</a> directly in place of a |
---|
102 | parameters file. |
---|
103 | <p> |
---|
104 | A sample parameters file that you can use as a template is |
---|
105 | provided at <code><a href="sample.gmaj">sample.gmaj</a></code>. |
---|
106 | It contains detailed comments at the bottom explaining the syntax |
---|
107 | and meaning of the parameters. |
---|
108 | <p> |
---|
109 | |
---|
110 | <p class=hdr> |
---|
111 | <h3><a name="zip">Compression and Bundling</a></h3> |
---|
112 | <p> |
---|
113 | Gmaj supports a "bundle" option, which allows you to collect and |
---|
114 | compress some or all of the data files into a single file in |
---|
115 | <code>.zip</code> or <code>.jar</code> format (not |
---|
116 | <code>.tar</code>, sorry). This is especially useful for |
---|
117 | streamlining the applet's data download, but is also supported in |
---|
118 | stand-alone mode. A few tips: |
---|
119 | <ul> |
---|
120 | <li> If the <a href="#param">parameters file</a> is included in |
---|
121 | the bundle it must be the first file in it, since Gmaj reads |
---|
122 | the bundle sequentially and needs the parameters file to |
---|
123 | process the others. In this case, there is no need to |
---|
124 | mention the parameters file on the command line or in the |
---|
125 | applet tags; just specify the bundle. But if the parameters |
---|
126 | file is not in the bundle, specify both. |
---|
127 | <p class=small> |
---|
128 | <li> Data files in the bundle should be referred to within the |
---|
129 | parameters file using their plain filenames, without paths, |
---|
130 | and these must be unique. Any data files outside the bundle |
---|
131 | should be referred to normally, using the rules described in |
---|
132 | <code><a href="sample.gmaj">sample.gmaj</a></code>. |
---|
133 | <p class=small> |
---|
134 | <li> Do not use filenames containing <code>/</code>, |
---|
135 | <code>\</code>, or <code>:</code> in the bundle. Gmaj |
---|
136 | needs to remove the path that may have been added to each |
---|
137 | name by the zip or jar program, and since it doesn't know |
---|
138 | what platform that program was run on, it treats all of |
---|
139 | these characters as path separators. |
---|
140 | <p class=small> |
---|
141 | <li> If you are not using a parameters file (i.e., you want to |
---|
142 | specify the <a href="#align">alignment file</a> directly, |
---|
143 | without any annotations or other data-related options), |
---|
144 | then the alignment file must be listed in place of the |
---|
145 | parameters file, not as a bundle (there's nothing else |
---|
146 | to bundle with it anyway). |
---|
147 | </ul> |
---|
148 | <p> |
---|
149 | As an alternative to bundling, data files can be compressed |
---|
150 | individually in <code>.zip</code>, <code>.jar</code>, or |
---|
151 | <code>.gz</code> format; this gains the compact size for storage |
---|
152 | and transfer, but still requires overhead for multiple HTTP |
---|
153 | connections in applet mode. The file name must end with the |
---|
154 | corresponding extension for the compression format to be |
---|
155 | recognized. (Such files can also be included in the bundle |
---|
156 | if desired; though little if any additional compression is |
---|
157 | typically achieved, this may be more convenient than unzipping |
---|
158 | a large file just to bundle it.) |
---|
159 | <p> |
---|
160 | |
---|
161 | <p class=hdr> |
---|
162 | <h3><a name="coord">Coordinate Systems</a></h3> |
---|
163 | <p> |
---|
164 | If you supply any annotations for Gmaj to display, these files |
---|
165 | must all use position coordinates that refer to the same original |
---|
166 | sequences identified in the MAF <a href="#align">alignment files</a> |
---|
167 | (ignoring any display offsets specified in the <a href="#param" |
---|
168 | >parameters file</a>). However, even though the MAF coordinates |
---|
169 | are 0-based, the PipMaker-style annotation files all use a |
---|
170 | 1-based, closed-interval coordinate system (i.e., the first |
---|
171 | nucleotide in the sequence is called "1", and specified ranges |
---|
172 | include both endpoints). This is for consistency with PipMaker, |
---|
173 | so the same files can be used with both programs, and the same |
---|
174 | tools can be used to prepare them. Coordinates for generic |
---|
175 | annotations may be either 1-based or 0-based and closed or |
---|
176 | half-open, depending on the format, but Gmaj always adjusts |
---|
177 | them as needed (including the ones in the MAF files) to convert |
---|
178 | everything to a 1-based, closed-interval system for display. |
---|
179 | <p> |
---|
180 | |
---|
181 | <p class=hdr> |
---|
182 | <h3><a name="align">Alignments</a></h3> |
---|
183 | <p> |
---|
184 | Gmaj is designed to display multiple-sequence alignments in |
---|
185 | <a href="http://genome.ucsc.edu/FAQ/FAQformat">MAF</a> format. |
---|
186 | It is especially suited for sequence-symmetric alignments from |
---|
187 | programs such as <a href="http://www.bx.psu.edu/miller_lab/" |
---|
188 | >TBA</a>, but can also display MAF files that have a fixed |
---|
189 | reference sequence. (In the latter case it is a good idea to |
---|
190 | set the <code>refseq</code> field in your <a href="#param" |
---|
191 | >parameters file</a>, to prevent displaying the alignments with |
---|
192 | an inappropriate reference sequence.) It is possible to display |
---|
193 | several alignment files simultaneously on the same plots, e.g. |
---|
194 | for comparing output from different alignment programs. |
---|
195 | <p> |
---|
196 | Gmaj normally requires that each sequence name appears at most |
---|
197 | once in each MAF block, i.e., that the values of the "src" field |
---|
198 | are unique across all of the <code>s</code> lines within the |
---|
199 | same block. However, there is a special exception for the case |
---|
200 | of pairwise self-alignments: if all of the blocks have just two |
---|
201 | rows, then all of the sequence names can be the same. In this |
---|
202 | case Gmaj distinguishes the rows in each block by internally |
---|
203 | adding a <code>~</code> suffix to the second row's sequence name; |
---|
204 | the <code>~</code> does not show in the main display, but you may |
---|
205 | occasionally see it in an error message. |
---|
206 | <p> |
---|
207 | The downside of this feature is that <b>sequence names in the MAF |
---|
208 | files must not end with <code>~</code></b>, even for non-self |
---|
209 | alignments. |
---|
210 | <p> |
---|
211 | |
---|
212 | <p class=hdr> |
---|
213 | <h3><a name="exon">Exons</a></h3> |
---|
214 | <p> |
---|
215 | Each of these files lists the locations of genes, exons, and |
---|
216 | coding regions in a particular reference sequence. The exons |
---|
217 | and UTRs are displayed as black and gray boxes in a separate |
---|
218 | panel above the alignment plots. |
---|
219 | <p> |
---|
220 | In the PipMaker-style exons format, the directionality of a gene |
---|
221 | (<code>></code>, <code><</code>, or <code>|</code>), its |
---|
222 | start and end positions, and name should be on one line, followed |
---|
223 | by an optional line beginning with a <code>+</code> character that |
---|
224 | indicates the first and last nucleotides of the translated region |
---|
225 | (including the initiation codon, <i>Met</i>, and the stop codon). |
---|
226 | These are followed by lines specifying the start and end positions |
---|
227 | of each exon, which must be listed in order of increasing address |
---|
228 | even if the gene is on the reverse strand (<code><</code>). By |
---|
229 | default Gmaj will supply exon numbers, but you can override this |
---|
230 | by specifying your own name or number for individual exons. Blank |
---|
231 | lines are ignored, and you can put an optional title line at the |
---|
232 | top. Thus, the file might begin as follows: |
---|
233 | <pre> |
---|
234 | My favorite genomic region |
---|
235 | |
---|
236 | < 100 800 XYZZY |
---|
237 | + 150 750 |
---|
238 | 100 200 |
---|
239 | 600 800 |
---|
240 | |
---|
241 | > 1000 2000 Frobozz gene |
---|
242 | 1000 1200 exon 1 |
---|
243 | 1400 1500 alt. spliced exon |
---|
244 | 1800 2000 exon 2 |
---|
245 | |
---|
246 | ... etc. |
---|
247 | </pre> |
---|
248 | <p> |
---|
249 | |
---|
250 | <p class=hdr> |
---|
251 | <h3><a name="repeat">Repeats</a></h3> |
---|
252 | <p> |
---|
253 | Each of these files lists interspersed repeats (and possibly other |
---|
254 | features such as CpG islands) in a particular reference sequence. |
---|
255 | These are displayed in a separate panel just below the exons, |
---|
256 | using the same shapes and shading as PipMaker if possible. |
---|
257 | <p> |
---|
258 | In the PipMaker-style repeats format, the first line identifies |
---|
259 | this as a simplified repeats file (as opposed to |
---|
260 | <a href="http://www.repeatmasker.org/">RepeatMasker</a> output, |
---|
261 | which Gmaj does not yet support). Each subsequent line specifies |
---|
262 | the start, end, direction, and type of an individual feature. |
---|
263 | <pre> |
---|
264 | %:repeats |
---|
265 | |
---|
266 | 1081 1364 Right Alu |
---|
267 | 1365 1405 Simple |
---|
268 | ... etc. |
---|
269 | </pre> |
---|
270 | The allowed PipMaker types are: |
---|
271 | <code>Alu</code>, <code>B1</code>, <code>B2</code>, |
---|
272 | <code>SINE</code>, <code>LINE1</code>, <code>LINE2</code>, |
---|
273 | <code>MIR</code>, <code>LTR</code>, <code>DNA</code>, |
---|
274 | <code>RNA</code>, <code>Simple</code>, <code>CpG60</code>, |
---|
275 | <code>CpG75</code>, and <code>Other</code>. Of these, all except |
---|
276 | <code>Simple</code>, <code>CpG60</code>, and <code>CpG75</code> |
---|
277 | require a direction (<code>Right</code> or <code>Left</code>). |
---|
278 | <p> |
---|
279 | |
---|
280 | <p class=hdr> |
---|
281 | <h3><a name="link">Linkbars</a></h3> |
---|
282 | <p> |
---|
283 | Each of these files contains reference annotations, i.e., |
---|
284 | noteworthy regions in a particular reference sequence, which are |
---|
285 | drawn in a separate panel as colored bars. Typically each bar |
---|
286 | has an associated URL pointing to a web site with more information |
---|
287 | about the region, but this is not required. In applet mode Gmaj |
---|
288 | opens a new browser window to visit the linked site when the user |
---|
289 | clicks on a bar; in stand-alone mode Gmaj is not running within |
---|
290 | a web browser, so it just displays the URL for the user to visit |
---|
291 | manually via copy-and-paste. |
---|
292 | <p> |
---|
293 | The PipMaker-style format first defines various types of links |
---|
294 | and associates a color with each of them, then specifies the type, |
---|
295 | position, description, and URL for each annotated region. |
---|
296 | <pre> |
---|
297 | # linkbars for part of the mouse MHC class II region |
---|
298 | |
---|
299 | %define type |
---|
300 | %name PubMed |
---|
301 | %color Blue |
---|
302 | |
---|
303 | %define type |
---|
304 | %name LocusLink |
---|
305 | %color Orange |
---|
306 | |
---|
307 | %define annotation |
---|
308 | %type PubMed |
---|
309 | %range 1 2000 |
---|
310 | %label Yang et al. 1997. Daxx, a novel Fas-binding protein... |
---|
311 | %summary Yang, X., Khosravi-Far, R. Chang, H., and Baltimore, D. (1997). |
---|
312 | Daxx, a novel Fas-binding protein that activates JNK and apoptosis. |
---|
313 | Cell 89(7):1067-76. |
---|
314 | %url http://www.ncbi.nlm.nih.gov:80/entrez/ |
---|
315 | query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9215629&dopt=Abstract |
---|
316 | |
---|
317 | ... etc. |
---|
318 | </pre> |
---|
319 | Here, for example, the first stanza requests that each feature |
---|
320 | subsequently identified as a PubMed entry be colored blue. |
---|
321 | The name must be a single word, perhaps containing underline |
---|
322 | characters (e.g., <code>Entry_in_GenBank</code>), and the color |
---|
323 | must come from Gmaj's <a href="#color">color list</a>. |
---|
324 | <p> |
---|
325 | The third stanza associates a PubMed link with positions |
---|
326 | 1-2000 in this sequence. The label should be kept fairly |
---|
327 | short, as it will be displayed on Gmaj's position indicator line |
---|
328 | when the user points at this linkbar. The summary is optional; |
---|
329 | it is used only by PipMaker and will be ignored by Gmaj. Also, |
---|
330 | while PipMaker allows several summary/URL pairs within a single |
---|
331 | annotation, Gmaj expects each field to occur at most once. If |
---|
332 | Gmaj encounters extra URLs, it will just use the first one and |
---|
333 | display a warning message. |
---|
334 | <p> |
---|
335 | Note that summaries and URLs (but not labels) can be broken into |
---|
336 | several lines for convenience; the line breaks are removed when |
---|
337 | the file is read, but they are not replaced with spaces. Thus |
---|
338 | a continuation line for a summary typically begins with a space |
---|
339 | to separate it from the last word of the previous line, while |
---|
340 | a URL continuation does not. |
---|
341 | <p> |
---|
342 | Also note that stanzas should be separated by blank lines, and |
---|
343 | lines beginning with a <code>#</code> character are comments |
---|
344 | that will be ignored. The linkbars can appear in the file in |
---|
345 | any order, and several can overlap at the same position with no |
---|
346 | problem, since Gmaj will display them in multiple rows if |
---|
347 | necessary. In PipMaker this format is called "annotations with |
---|
348 | hyperlinks". |
---|
349 | <p> |
---|
350 | |
---|
351 | <p class=hdr> |
---|
352 | <h3><a name="under">Underlays</a></h3> |
---|
353 | <p> |
---|
354 | Each of these files specifies underlays (colored bands) to be |
---|
355 | painted on a particular pairwise pip and its corresponding |
---|
356 | dotplot. The bands are specified as regions in the reference |
---|
357 | sequence and are normally drawn vertically; however for a dotplot, |
---|
358 | Gmaj will also look to see if you have specified an underlay file |
---|
359 | for the transposed situation where the reference and secondary |
---|
360 | sequences are swapped, and if so, will draw those underlays as |
---|
361 | horizontal bands in the secondary sequence. |
---|
362 | <p> |
---|
363 | The PipMaker-style underlay format supported by Gmaj looks like |
---|
364 | this: |
---|
365 | <pre> |
---|
366 | # partial underlays for the BTK region |
---|
367 | |
---|
368 | LightYellow Gene |
---|
369 | Green Exon |
---|
370 | Red Strongly_conserved |
---|
371 | |
---|
372 | 35324 72009 (BTK gene) Gene |
---|
373 | 49781 49849 (exon 4) Exon |
---|
374 | 51403 51484 Exon |
---|
375 | 50350 50513 (conserved 84%) Strongly_conserved 84 |
---|
376 | 52376 52603 (Kilroy was here) Strongly_conserved 92 + |
---|
377 | ... etc. |
---|
378 | </pre> |
---|
379 | The first group of lines describes the intended meaning of the |
---|
380 | colors, while the second group specifies the location of each band. |
---|
381 | Colors must come from Gmaj's <a href="#color">color list</a>, but |
---|
382 | the meaning of each color can be any single word chosen by you. |
---|
383 | The text in parentheses is an optional label which will be |
---|
384 | displayed on Gmaj's position indicator line when the user points |
---|
385 | the mouse at that band. The parentheses must be present if the |
---|
386 | label is, and the label itself cannot contain any additional |
---|
387 | parentheses. The number following the color category is an |
---|
388 | optional integer score that can be used to interactively adjust |
---|
389 | which underlays are displayed; see "Underlays Box" in the |
---|
390 | Menus and Widgets section of <a href="gmaj_help.html" |
---|
391 | >Starting and Running Gmaj</a> for more information. (The |
---|
392 | label and score are extra features not supported by PipMaker.) |
---|
393 | A <code>+</code> or <code>-</code> character at the end of a |
---|
394 | location line will paint just the upper or lower half of the band |
---|
395 | on the pip (but is ignored for dotplots). This allows you to |
---|
396 | differentiate between the two strands, or to plot potentially |
---|
397 | overlapping features like gene predictions and database matches. |
---|
398 | <p> |
---|
399 | Note that if two bands overlap, the one that was specified last |
---|
400 | in the file appears "on top" and obscures the earlier one (except |
---|
401 | for the special <code><a href="#hatch">Hatch</a></code> color). |
---|
402 | Thus in this example, the green exons and red strongly conserved |
---|
403 | regions cover up parts of the long yellow band representing the |
---|
404 | gene. As in the links file, lines beginning with a <code>#</code> |
---|
405 | character are comments that will be ignored. |
---|
406 | <p> |
---|
407 | |
---|
408 | <p class=hdr> |
---|
409 | <h3><a name="high">Highlights</a></h3> |
---|
410 | <p> |
---|
411 | Highlight files are analogous to the <a href="#under">underlay</a> |
---|
412 | files, but each of these specifies colored regions for a |
---|
413 | particular sequence in the text view, rather than for a plot. |
---|
414 | If you do not specify a highlight file for a particular sequence, |
---|
415 | Gmaj will automatically provide default highlights based on the |
---|
416 | <a href="#exon">exons</a> file (if you provided one). These will |
---|
417 | use one color for whole genes, overlaid with different colors to |
---|
418 | indicate exons on the forward vs. reverse strand. If the exons |
---|
419 | file specifies a gene's translated region, then the 5´ and |
---|
420 | 3´ UTRs will be shaded using lighter colors. These default |
---|
421 | highlights make it easy to examine the putative start/stop codons |
---|
422 | and splice junctions, as well as providing a visual connection |
---|
423 | between the graphical and text views. But if for some reason you |
---|
424 | do not want any text highlights, you can suppress them by |
---|
425 | specifying an empty highlight file. |
---|
426 | <p> |
---|
427 | The PipMaker-style format for highlights is the same as for |
---|
428 | underlays, except that any <code>+</code> or <code>-</code> |
---|
429 | indicators will be ignored, and the <code>Hatch</code> color is |
---|
430 | not supported for highlights. Just as with underlays, labels |
---|
431 | can be included which will be shown when the user points at |
---|
432 | the highlight, scores can be used to limit which entries are |
---|
433 | displayed, and highlights that are listed later in the file will |
---|
434 | cover up those that appear earlier. |
---|
435 | <p> |
---|
436 | |
---|
437 | <p class=hdr> |
---|
438 | <h3><a name="color">Color List</a></h3> |
---|
439 | <p> |
---|
440 | For Gmaj's PipMaker-style annotations, the available colors are: |
---|
441 | <pre> |
---|
442 | Black White Clear |
---|
443 | Gray LightGray DarkGray |
---|
444 | Red LightRed DarkRed |
---|
445 | Green LightGreen DarkGreen |
---|
446 | Blue LightBlue DarkBlue |
---|
447 | Yellow LightYellow DarkYellow |
---|
448 | Pink LightPink DarkPink |
---|
449 | Cyan LightCyan DarkCyan |
---|
450 | Purple LightPurple DarkPurple |
---|
451 | Orange LightOrange DarkOrange |
---|
452 | Brown LightBrown DarkBrown |
---|
453 | </pre> |
---|
454 | These names are case-sensitive (i.e., capitalization matters). |
---|
455 | Not all of these are supported by PipMaker. Also, be aware that |
---|
456 | the appearance of the colors may vary between PipMaker and Gmaj, |
---|
457 | and from one printer or monitor to the next. |
---|
458 | <p class=subhdr> |
---|
459 | <a name="hatch"><b><code>Hatch</code></b></a> |
---|
460 | <p> |
---|
461 | In addition to the regular colors listed above, Gmaj supports a |
---|
462 | special "color" for underlays called <code>Hatch</code>, which |
---|
463 | is drawn as a pattern of diagonal gray lines. Normally if two |
---|
464 | underlays overlap, the one that was specified last in the file |
---|
465 | appears "on top" and obscures the earlier one. However, |
---|
466 | <code>Hatch</code> underlays have the special property that they |
---|
467 | are always drawn after the other colors, and since the space |
---|
468 | between the diagonal lines is transparent, they allow the other |
---|
469 | colors to show through. Currently <code>Hatch</code> is only |
---|
470 | supported for underlays, not for highlights or linkbars. |
---|
471 | <p> |
---|
472 | |
---|
473 | <p class=hdr> |
---|
474 | <h3><a name="generic">Generic Annotation Formats</a></h3> |
---|
475 | <p> |
---|
476 | The standardized generic formats currently supported by Gmaj |
---|
477 | include |
---|
478 | <a href="http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml" |
---|
479 | >GFF</a> (v1 & v2), |
---|
480 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#GTF" |
---|
481 | >GTF</a>, and various flavors of |
---|
482 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED" |
---|
483 | >BED</a> (including the full BED12 format, a.k.a. "gene BED"). |
---|
484 | For details on these formats, please see the specifications at |
---|
485 | the above links; this document will mainly discuss their use |
---|
486 | by Gmaj. |
---|
487 | <p> |
---|
488 | These formats are all <b>tab-separated</b>, and despite their |
---|
489 | differences are similar enough that Gmaj can extract comparable |
---|
490 | fields and treat them more or less the same. Note that Gmaj is |
---|
491 | not intended as a format validator: parsing is more lenient in |
---|
492 | some respects than the official format specifications, and Gmaj |
---|
493 | will ignore fields it has no use for. Also, interpretation of |
---|
494 | these open-ended formats depends partly on what type of annotation |
---|
495 | is expected; e.g. if Gmaj is trying to read exons from a GFF v1 |
---|
496 | file, it will assume that the group field is the gene name. It |
---|
497 | will generally show warning messages to keep the user apprised |
---|
498 | of any such assumptions it is making (if these become too annoying |
---|
499 | they can be individually suppressed in the <a href="#param" |
---|
500 | >parameters file</a>; see <code><a href="sample.gmaj" |
---|
501 | >sample.gmaj</a></code> for details). Because one of the main |
---|
502 | reasons for supporting these formats is to enable the use of |
---|
503 | annotation files obtained from public sources, Gmaj tries not to |
---|
504 | balk at anomalies that are probably not the user's fault, and |
---|
505 | when practical will simply skip questionable items with a warning |
---|
506 | message. Each type of message will generally be displayed only |
---|
507 | once, and not repeated for every item with the same problem. |
---|
508 | <p> |
---|
509 | <p class=subhdr> |
---|
510 | <a name="fileext"><b>Filename Extensions</b></a> |
---|
511 | <p> |
---|
512 | In order to distinguish generic files from PipMaker-style ones |
---|
513 | and handle them appropriately, Gmaj requires that files in |
---|
514 | generic formats have names ending with any of certain extensions. |
---|
515 | The default list is <code>.gff</code>, <code>.gtf</code>, |
---|
516 | <code>.bed</code>, <code>.ct</code>, and <code>.trk</code>, but |
---|
517 | this can be customized (see <code><a href="sample.gmaj" |
---|
518 | >sample.gmaj</a></code>). |
---|
519 | <p> |
---|
520 | <p class=subhdr> |
---|
521 | <a name="quote"><b>Quoting</b></a> |
---|
522 | <p> |
---|
523 | Some of the generic formats require text values to be enclosed |
---|
524 | in double quotes (<code>" "</code>). Even when not strictly |
---|
525 | required it is usually a good idea to do so, especially if the |
---|
526 | value contains spaces. The official specifications generally |
---|
527 | don't say what to do if a value contains embedded quote |
---|
528 | characters, but Gmaj supports a rudimentary mechanism for |
---|
529 | escaping them with a backslash (<code>\</code>). However it |
---|
530 | does not provide for escaping the backslash: quoted values |
---|
531 | should not end with <code>\</code> (insert a space before the |
---|
532 | final quote if necessary). |
---|
533 | <p> |
---|
534 | <p class=subhdr> |
---|
535 | <a name="empty"><b>Empty Fields</b></a> |
---|
536 | <p> |
---|
537 | When reading the generic formats, Gmaj treats two adjacent tab |
---|
538 | characters as an empty field. However, your files will be easier |
---|
539 | for humans to read if you do not leave fields completely empty. |
---|
540 | Gmaj recognizes a value of <code>.</code> (the dot character) |
---|
541 | to mean "unspecified" for fields such as strand, score, feature, |
---|
542 | and color, in some cases even when the official formats don't. |
---|
543 | For instance, GFF v2 explicitly calls for using <code>.</code> |
---|
544 | when there is no score, but Gmaj allows you to do this with the |
---|
545 | other generic formats as well, in order to distinguish between |
---|
546 | "no score" and a score that is truly zero. For colors, in |
---|
547 | addition to <code>.</code> Gmaj also interprets <code>0</code> |
---|
548 | to mean "unspecified", in keeping with examples at UCSC. |
---|
549 | <p> |
---|
550 | <p class=subhdr> |
---|
551 | <a name="gencoord"><b>Coordinates</b></a> |
---|
552 | <p> |
---|
553 | The GFF and GTF formats use 1-based, closed-interval coordinates |
---|
554 | (i.e., sequence numbering starts with "1", and specified ranges |
---|
555 | include both endpoints), while BED uses a 0-based, half-open |
---|
556 | system (the first nucleotide of the sequence is numbered "0", |
---|
557 | and the ending position is not included in the region). For all |
---|
558 | of these formats, positions are given relative to the beginning |
---|
559 | of the named sequence regardless of which strand the feature is |
---|
560 | on (unlike MAF), and <code>start</code> must be less than or |
---|
561 | equal to <code>end</code>. |
---|
562 | <p> |
---|
563 | <p class=subhdr> |
---|
564 | <a name="gffconv"><b>GFF Conventions</b></a> |
---|
565 | <p> |
---|
566 | BED format is relatively fixed in how its fields are used, but |
---|
567 | GFF and GTF are more variable and require additional conventions |
---|
568 | for most effective use with Gmaj. In particular, the values of |
---|
569 | the "feature" field and the optional "attributes" affect how Gmaj |
---|
570 | will interpret and display an item. |
---|
571 | <p> |
---|
572 | Values of the feature field that are recognized for special |
---|
573 | treatment include: |
---|
574 | <p class=tiny> |
---|
575 | <ul class="notop nobottom"> |
---|
576 | <li> <code>gene</code> or values starting with <code>gene_</code> |
---|
577 | <li> <code>exon</code> or values starting with <code>exon_</code> |
---|
578 | <li> <code>start_codon</code>, <code>str_codon</code>, |
---|
579 | <code>stop_codon</code>, <code>stp_codon</code>, or |
---|
580 | <code>cds</code> |
---|
581 | <li> <code>repeatmasker</code> or any of the |
---|
582 | <a href="#repeat">PipMaker repeat or CpG types</a> |
---|
583 | </ul> |
---|
584 | <p class=tiny> |
---|
585 | Of these, only the PipMaker types are case-sensitive. |
---|
586 | <p> |
---|
587 | For GFF v2 and GTF, the currently recognized attribute tags are: |
---|
588 | <p class=tiny> |
---|
589 | <ul class="notop nobottom"> |
---|
590 | <li> <code>gene</code> or <code>gene_id</code>: the name of the |
---|
591 | gene, e.g. for grouping exons (<code>transcript_id</code> is |
---|
592 | ignored) |
---|
593 | <li> <code>name</code>: an optional name for this individual item, |
---|
594 | e.g. for an exon label |
---|
595 | <li> <code>sequence</code> (when feature is |
---|
596 | <code>repeatmasker</code>): the name/class/family of the |
---|
597 | repeat, e.g. <code>AluJb/SINE/Alu</code> |
---|
598 | <li> <code>color</code>: a <a href="#gencolor">color</a> |
---|
599 | specification in UCSC format, e.g. <code>0,0,255</code> |
---|
600 | <li> <code>url</code> or <code>ucsc_id</code>: the URL for |
---|
601 | linkbars; <code>$$</code> will be replaced with the value of |
---|
602 | <code>name</code> |
---|
603 | </ul> |
---|
604 | <p class=tiny> |
---|
605 | These keywords are not case-sensitive, but they cannot have |
---|
606 | multiple values. |
---|
607 | <p> |
---|
608 | <p class=subhdr> |
---|
609 | <a name="custom"><b>Custom Tracks</b></a> |
---|
610 | <p> |
---|
611 | Along with the basic formats listed above, Gmaj also supports UCSC |
---|
612 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks" |
---|
613 | >custom track</a> headers. |
---|
614 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#TRACK" |
---|
615 | >Track lines</a> can specify certain settings for an entire |
---|
616 | track; currently <code><a href="#gencolor">color</a></code>, |
---|
617 | <code><a href="#gencolor">itemRgb</a></code>, <code>offset</code>, |
---|
618 | and <code>url</code> are supported. They also allow several |
---|
619 | tracks (even in mixed formats) to be combined in a single file. |
---|
620 | Gmaj does not currently provide a way to use just one particular |
---|
621 | track from such a file (it will be treated as one big bag of |
---|
622 | annotations), but lines in unsupported formats such as |
---|
623 | <a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html" |
---|
624 | >WIG</a> are gracefully skipped. |
---|
625 | <a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#lines" |
---|
626 | >Browser lines</a> are also skipped; Gmaj's initial zoom position |
---|
627 | is controlled by command-line or applet parameters rather than by |
---|
628 | individual annotation files. |
---|
629 | <p> |
---|
630 | <p class=subhdr> |
---|
631 | <a name="multiseq"><b>Multiple Sequences</b></a> |
---|
632 | <p> |
---|
633 | Generic files can also contain annotations for several sequences, |
---|
634 | because unlike the PipMaker-style formats, they all have a |
---|
635 | "seqname" or "chrom" field that Gmaj can use to select the |
---|
636 | appropriate lines. Ideally Gmaj expects this field to match |
---|
637 | the sequence name from the <a href="#align">alignment files</a>, |
---|
638 | but has two ways to deal with exceptions. If there is only one |
---|
639 | seqname in the annotation file, then Gmaj will go ahead and use |
---|
640 | it, but will display a warning (unless the mismatch can be fixed |
---|
641 | by prepending the organism name, or the organism name plus |
---|
642 | <code>chr</code>, to the annotation seqname). But if the file |
---|
643 | has annotations for several sequences and some don't match the |
---|
644 | alignment files, you need to tell Gmaj which is which by adding |
---|
645 | an alias in the <a href="#param">parameters file</a> (see |
---|
646 | <code><a href="sample.gmaj">sample.gmaj</a></code>). |
---|
647 | <p> |
---|
648 | <p class=subhdr> |
---|
649 | <a name="reuse"><b>Reusing Files</b></a> |
---|
650 | <p> |
---|
651 | One of the advantages of using generic formats is that files can |
---|
652 | be reused in multiple panels without reformatting, e.g. as both |
---|
653 | exons and underlays. Normally linkbars, underlays, and text |
---|
654 | highlights are simply handled as arbitrary regions of a specified |
---|
655 | color, since they could represent any type of biological feature. |
---|
656 | However, you can ask Gmaj to interpret them as exons or repeats |
---|
657 | by adding a type hint in the <a href="#param">parameters file</a> |
---|
658 | (see <code><a href="sample.gmaj">sample.gmaj</a></code>). Note |
---|
659 | that currently this will also cause any <a href="#gencolor" |
---|
660 | >specified colors</a> in that file to be overridden with Gmaj's |
---|
661 | defaults. |
---|
662 | <p> |
---|
663 | Combining several biological types of annotations (e.g. exons |
---|
664 | and repeats) in one file is possible, but not recommended. Gmaj |
---|
665 | will try to skip lines that are not appropriate for the type it |
---|
666 | is seeking, but it may draw more than you want. |
---|
667 | <p> |
---|
668 | <p class=subhdr> |
---|
669 | <a name="cds"><b>Coding Sequence</b></a> |
---|
670 | <p> |
---|
671 | Currently Gmaj has no special support for multiple transcripts. |
---|
672 | When inferring UTRs, all of the CDS-related items for a single |
---|
673 | gene name are combined, and the interval from the lowest |
---|
674 | coordinate to the highest is used as the CDS. Also, some of the |
---|
675 | formats' rules specify whether or not the initiation and stop |
---|
676 | codons should be included in the CDS, but Gmaj does not make |
---|
677 | adjustments to compensate for that; instead it simply includes |
---|
678 | all of the given endpoints in the CDS. |
---|
679 | <!-- and leaves it up to the user to interpret the display based |
---|
680 | on the convention used in the files he/she provided. [the user |
---|
681 | does not supply files for applets] --> |
---|
682 | <p> |
---|
683 | <p class=subhdr> |
---|
684 | <a name="gencolor"><b>Colors</b></a> |
---|
685 | <p> |
---|
686 | Colors can be specified for individual annotation lines via the |
---|
687 | <code>itemRgb</code> field (for BED) or a <code>color</code> |
---|
688 | attribute (for GFF v2 or GTF). However, for <a href="#custom" |
---|
689 | >custom tracks</a>, these are governed by the track line's |
---|
690 | <code>itemRgb</code> attribute, which defaults to off per the |
---|
691 | UCSC specification. Thus if you have track lines and want to |
---|
692 | use the per-item colors, you need to include |
---|
693 | <code>itemRgb=On</code> in the track attributes. |
---|
694 | <p> |
---|
695 | Track lines can also have a <code>color</code> attribute for |
---|
696 | the entire track, which will be used if <code>itemRgb</code> is |
---|
697 | off, or if an individual item does not have its own color. |
---|
698 | However in a rare break from the UCSC specification, Gmaj does |
---|
699 | not use black as the default if the track color is unspecified |
---|
700 | (black underlays and highlights just don't work with black plots |
---|
701 | and text). Instead it uses its own default colors, which for |
---|
702 | genes/exons are the same as the colors for <a href="#high" |
---|
703 | >default highlights</a>, or light gray for other annotations. |
---|
704 | Note that these defaults will also override your colors when |
---|
705 | <a href="#reuse">type hints</a> are used. |
---|
706 | <p> |
---|
707 | All of the above-mentioned color values are specified in UCSC |
---|
708 | format, which consists of three comma-separated RGB values from |
---|
709 | 0-255 (e.g. <code>0,0,255</code>). |
---|
710 | <p> |
---|
711 | <p class=subhdr> |
---|
712 | <a name="sort"><b>Sorting</b></a> |
---|
713 | <p> |
---|
714 | The order of the lines is not supposed to matter in these generic |
---|
715 | formats, but for most of the Gmaj panels it does matter: exons |
---|
716 | need to be grouped by gene and ordered by position so UTRs can be |
---|
717 | inferred and exon numbers assigned, early underlays are covered |
---|
718 | up by later ones, etc. Gmaj solves this problem by sorting the |
---|
719 | data before it is displayed. Exons are sorted first by gene name |
---|
720 | in ascending order, and then within each gene by start position |
---|
721 | (ascending) and lastly in case of a tie, by end position |
---|
722 | (descending). All other annotation types are sorted first by |
---|
723 | length in descending order, and then in case of a tie by start |
---|
724 | position (ascending). This usually produces a reasonable display, |
---|
725 | but if you need direct control of the order, you can use the |
---|
726 | PipMaker-style formats instead. |
---|
727 | <p> |
---|
728 | |
---|
729 | <p class=vvlarge> |
---|
730 | <hr> |
---|
731 | <i>Cathy Riemer, June 2008</i> |
---|
732 | |
---|
733 | <p class=scrollspace> |
---|
734 | </body> |
---|
735 | </html> |
---|