root/galaxy-central/static/gmaj/docs/gmaj_input.html

リビジョン 2, 31.5 KB (コミッタ: hatakeyama, 14 年 前)

import galaxy-central

行番号 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2        "http://www.w3.org/TR/html4/loose.dtd">
3<html>
4<head>
5<title>Input Files for Gmaj</title>
6<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
7<meta http-equiv="Content-Style-Type" content="text/css">
8<link rel="stylesheet" type="text/css" href="gmaj.css">
9</head>
10<body>
11<p class=vvlarge>
12<h2>Input Files for Gmaj</h2>
13<p class=vvlarge>
14TABLE OF CONTENTS
15<p class=small>
16<ul class=notop>
17<li><a href="#intro">Introduction</a>
18<li><a href="#param">Parameters File</a>
19<li><a href="#zip">Compression and Bundling</a>
20<li><a href="#coord">Coordinate Systems</a>
21<li><a href="#align">Alignments</a>
22<li><a href="#exon">Exons</a>
23<li><a href="#repeat">Repeats</a>
24<li><a href="#link">Linkbars</a>
25<li><a href="#under">Underlays</a>
26<li><a href="#high">Highlights</a>
27<li><a href="#color">Color List</a>
28<li><a href="#generic">Generic Annotation Formats</a>
29</ul>
30<p class=vlarge>
31
32<p class=hdr>
33<h3><a name="intro">Introduction</a></h3>
34<p>
35This page describes the input files supported by Gmaj, and their
36formats.  Only the <a href="#align">alignment file</a> is
37required; the others are optional.  Except where noted, all
38information applies to both the stand-alone and applet modes of
39Gmaj.
40<p>
41For annotations, Gmaj supports two broad categories of file
42formats.  The original set of formats is essentially the same as
43those used by <a href="http://pipmaker.bx.psu.edu/pipmaker/"
44>PipMaker</a> and <a href="http://globin.bx.psu.edu/dist/laj/"
45>Laj</a>, where each destination for the data (exons panel, color
46underlays, etc.) has its own file format tailored for the needs of
47that display.  These files can be cumbersome to prepare manually,
48though PipMaker's associated utilities, such as
49<a href="http://pipmaker.bx.psu.edu/piphelper/">PipHelper</a> and
50the <a href="http://pipmaker.bx.psu.edu/pipmaker/tools.html"
51>PipTools</a>, can significantly reduce the burden.
52<p>
53However, since sequence annotations are increasingly becoming
54available in standardized formats from on-line resources such as
55the <a href="http://genome.ucsc.edu/cgi-bin/hgTables">UCSC Table
56Browser</a>, Gmaj can now accept some of these formats as well.
57These are referred to here as "generic" formats because they are
58not restricted to a particular biological data type or Gmaj
59display panel.
60<p>
61The PipMaker-style formats are described below in the sections for
62each panel, while the generic ones are discussed in a separate
63section, <a href="#generic">Generic Annotation Formats</a>.
64<p class=large>
65<center>
66<table width=55%>
67<tr>
68<td valign=top align=right><img class=lower src="hand14.gif">
69<!-- Pointing hand icon is from Clip Art Warehouse,
70        at http://www.clipart.co.uk/ -->
71</td>
72<td valign=top>
73<ul class="notop nobottom lessindent">
74<li>    <b>All files must consist solely of plain text ASCII
75        characters.</b>&nbsp; (For example, no Word documents.)
76        <p class=small>
77<li>    <b>All <a href="#coord">coordinates</a> for PipMaker-style
78        annotations are 1-based, closed interval.</b>&nbsp; Those
79        for generic annotations may be either 1-based or 0-based
80        and closed or half-open, depending on the format.
81</ul>
82</td>
83</tr>
84</table>
85</center>
86<p>
87
88<p class=hdr>
89<h3><a name="param">Parameters File</a></h3>
90<p>
91The annotation files are optional, but because in some alignments
92any of the sequences can be viewed as the reference sequence,
93there are potentially a large number of annotation files to
94provide, too many to type their names on the command line or
95paste them into a dialog box every time you want to view the data.
96For this reason, Gmaj uses a meta-level <b>parameters file</b>
97that lists the names of all the data files, plus a few other
98data-related options.  Then when running Gmaj, you only have to
99specify that one file name.  However, if you don't want to use
100any of these annotations or options, you can specify a single
101<a href="#align">alignment file</a> directly in place of a
102parameters file.
103<p>
104A sample parameters file that you can use as a template is
105provided at <code><a href="sample.gmaj">sample.gmaj</a></code>.
106It contains detailed comments at the bottom explaining the syntax
107and meaning of the parameters.
108<p>
109
110<p class=hdr>
111<h3><a name="zip">Compression and Bundling</a></h3>
112<p>
113Gmaj supports a "bundle" option, which allows you to collect and
114compress some or all of the data files into a single file in
115<code>.zip</code> or <code>.jar</code> format (not
116<code>.tar</code>, sorry).  This is especially useful for
117streamlining the applet's data download, but is also supported in
118stand-alone mode.  A few tips:
119<ul>
120<li>    If the <a href="#param">parameters file</a> is included in
121        the bundle it must be the first file in it, since Gmaj reads
122        the bundle sequentially and needs the parameters file to
123        process the others.  In this case, there is no need to
124        mention the parameters file on the command line or in the
125        applet tags; just specify the bundle.  But if the parameters
126        file is not in the bundle, specify both.
127        <p class=small>
128<li>    Data files in the bundle should be referred to within the
129        parameters file using their plain filenames, without paths,
130        and these must be unique.  Any data files outside the bundle
131        should be referred to normally, using the rules described in
132        <code><a href="sample.gmaj">sample.gmaj</a></code>.
133        <p class=small>
134<li>    Do not use filenames containing <code>/</code>,
135        <code>\</code>, or <code>:</code> in the bundle.  Gmaj
136        needs to remove the path that may have been added to each
137        name by the zip or jar program, and since it doesn't know
138        what platform that program was run on, it treats all of
139        these characters as path separators.
140        <p class=small>
141<li>    If you are not using a parameters file (i.e., you want to
142        specify the <a href="#align">alignment file</a> directly,
143        without any annotations or other data-related options),
144        then the alignment file must be listed in place of the
145        parameters file, not as a bundle (there's nothing else
146        to bundle with it anyway).
147</ul>
148<p>
149As an alternative to bundling, data files can be compressed
150individually in <code>.zip</code>, <code>.jar</code>, or
151<code>.gz</code> format; this gains the compact size for storage
152and transfer, but still requires overhead for multiple HTTP
153connections in applet mode.  The file name must end with the
154corresponding extension for the compression format to be
155recognized.  (Such files can also be included in the bundle
156if desired; though little if any additional compression is
157typically achieved, this may be more convenient than unzipping
158a large file just to bundle it.)
159<p>
160
161<p class=hdr>
162<h3><a name="coord">Coordinate Systems</a></h3>
163<p>
164If you supply any annotations for Gmaj to display, these files
165must all use position coordinates that refer to the same original
166sequences identified in the MAF <a href="#align">alignment files</a>
167(ignoring any display offsets specified in the <a href="#param"
168>parameters file</a>).  However, even though the MAF coordinates
169are 0-based, the PipMaker-style annotation files all use a
1701-based, closed-interval coordinate system (i.e., the first
171nucleotide in the sequence is called "1", and specified ranges
172include both endpoints).  This is for consistency with PipMaker,
173so the same files can be used with both programs, and the same
174tools can be used to prepare them.  Coordinates for generic
175annotations may be either 1-based or 0-based and closed or
176half-open, depending on the format, but Gmaj always adjusts
177them as needed (including the ones in the MAF files) to convert
178everything to a 1-based, closed-interval system for display.
179<p>
180
181<p class=hdr>
182<h3><a name="align">Alignments</a></h3>
183<p>
184Gmaj is designed to display multiple-sequence alignments in
185<a href="http://genome.ucsc.edu/FAQ/FAQformat">MAF</a> format.
186It is especially suited for sequence-symmetric alignments from
187programs such as <a href="http://www.bx.psu.edu/miller_lab/"
188>TBA</a>, but can also display MAF files that have a fixed
189reference sequence.  (In the latter case it is a good idea to
190set the <code>refseq</code> field in your <a href="#param"
191>parameters file</a>, to prevent displaying the alignments with
192an inappropriate reference sequence.)  It is possible to display
193several alignment files simultaneously on the same plots, e.g.
194for comparing output from different alignment programs.
195<p>
196Gmaj normally requires that each sequence name appears at most
197once in each MAF block, i.e., that the values of the "src" field
198are unique across all of the <code>s</code> lines within the
199same block.  However, there is a special exception for the case
200of pairwise self-alignments: if all of the blocks have just two
201rows, then all of the sequence names can be the same.  In this
202case Gmaj distinguishes the rows in each block by internally
203adding a <code>~</code> suffix to the second row's sequence name;
204the <code>~</code> does not show in the main display, but you may
205occasionally see it in an error message.
206<p>
207The downside of this feature is that <b>sequence names in the MAF
208files must not end with <code>~</code></b>, even for non-self
209alignments.
210<p>
211
212<p class=hdr>
213<h3><a name="exon">Exons</a></h3>
214<p>
215Each of these files lists the locations of genes, exons, and
216coding regions in a particular reference sequence.  The exons
217and UTRs are displayed as black and gray boxes in a separate
218panel above the alignment plots.
219<p>
220In the PipMaker-style exons format, the directionality of a gene
221(<code>&gt;</code>, <code>&lt;</code>, or <code>|</code>), its
222start and end positions, and name should be on one line, followed
223by an optional line beginning with a <code>+</code> character that
224indicates the first and last nucleotides of the translated region
225(including the initiation codon, <i>Met</i>, and the stop codon).
226These are followed by lines specifying the start and end positions
227of each exon, which must be listed in order of increasing address
228even if the gene is on the reverse strand (<code>&lt;</code>).  By
229default Gmaj will supply exon numbers, but you can override this
230by specifying your own name or number for individual exons.  Blank
231lines are ignored, and you can put an optional title line at the
232top.  Thus, the file might begin as follows:
233<pre>
234     My favorite genomic region
235
236     < 100 800 XYZZY
237     + 150 750
238     100 200
239     600 800
240
241     > 1000 2000 Frobozz gene
242     1000 1200 exon 1
243     1400 1500 alt. spliced exon
244     1800 2000 exon 2
245
246     ... etc.
247</pre>
248<p>
249
250<p class=hdr>
251<h3><a name="repeat">Repeats</a></h3>
252<p>
253Each of these files lists interspersed repeats (and possibly other
254features such as CpG islands) in a particular reference sequence.
255These are displayed in a separate panel just below the exons,
256using the same shapes and shading as PipMaker if possible.
257<p>
258In the PipMaker-style repeats format, the first line identifies
259this as a simplified repeats file (as opposed to
260<a href="http://www.repeatmasker.org/">RepeatMasker</a> output,
261which Gmaj does not yet support).  Each subsequent line specifies
262the start, end, direction, and type of an individual feature.
263<pre>
264     %:repeats
265
266     1081 1364 Right Alu
267     1365 1405 Simple
268     ... etc.
269</pre>
270The allowed PipMaker types are:
271<code>Alu</code>, <code>B1</code>, <code>B2</code>,
272<code>SINE</code>, <code>LINE1</code>, <code>LINE2</code>,
273<code>MIR</code>, <code>LTR</code>, <code>DNA</code>,
274<code>RNA</code>, <code>Simple</code>, <code>CpG60</code>,
275<code>CpG75</code>, and <code>Other</code>.  Of these, all except
276<code>Simple</code>, <code>CpG60</code>, and <code>CpG75</code>
277require a direction (<code>Right</code> or <code>Left</code>).
278<p>
279
280<p class=hdr>
281<h3><a name="link">Linkbars</a></h3>
282<p>
283Each of these files contains reference annotations, i.e.,
284noteworthy regions in a particular reference sequence, which are
285drawn in a separate panel as colored bars.  Typically each bar
286has an associated URL pointing to a web site with more information
287about the region, but this is not required.  In applet mode Gmaj
288opens a new browser window to visit the linked site when the user
289clicks on a bar; in stand-alone mode Gmaj is not running within
290a web browser, so it just displays the URL for the user to visit
291manually via copy-and-paste.
292<p>
293The PipMaker-style format first defines various types of links
294and associates a color with each of them, then specifies the type,
295position, description, and URL for each annotated region.
296<pre>
297     # linkbars for part of the mouse MHC class II region
298
299     %define type
300     %name PubMed
301     %color Blue
302
303     %define type
304     %name LocusLink
305     %color Orange
306
307     %define annotation
308     %type PubMed
309     %range 1 2000
310     %label Yang et al. 1997.  Daxx, a novel Fas-binding protein...
311     %summary Yang, X., Khosravi-Far, R. Chang, H., and Baltimore, D. (1997).
312       Daxx, a novel Fas-binding protein that activates JNK and apoptosis.
313       Cell 89(7):1067-76.
314     %url http://www.ncbi.nlm.nih.gov:80/entrez/
315     query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9215629&dopt=Abstract
316
317     ... etc.
318</pre>
319Here, for example, the first stanza requests that each feature
320subsequently identified as a PubMed entry be colored blue.
321The name must be a single word, perhaps containing underline
322characters (e.g., <code>Entry_in_GenBank</code>), and the color
323must come from Gmaj's <a href="#color">color list</a>.
324<p>
325The third stanza associates a PubMed link with positions
3261-2000 in this sequence.  The label should be kept fairly
327short, as it will be displayed on Gmaj's position indicator line
328when the user points at this linkbar.  The summary is optional;
329it is used only by PipMaker and will be ignored by Gmaj.  Also,
330while PipMaker allows several summary/URL pairs within a single
331annotation, Gmaj expects each field to occur at most once.  If
332Gmaj encounters extra URLs, it will just use the first one and
333display a warning message.
334<p>
335Note that summaries and URLs (but not labels) can be broken into
336several lines for convenience; the line breaks are removed when
337the file is read, but they are not replaced with spaces.  Thus
338a continuation line for a summary typically begins with a space
339to separate it from the last word of the previous line, while
340a URL continuation does not.
341<p>
342Also note that stanzas should be separated by blank lines, and
343lines beginning with a <code>#</code> character are comments
344that will be ignored.  The linkbars can appear in the file in
345any order, and several can overlap at the same position with no
346problem, since Gmaj will display them in multiple rows if
347necessary.  In PipMaker this format is called "annotations with
348hyperlinks".
349<p>
350
351<p class=hdr>
352<h3><a name="under">Underlays</a></h3>
353<p>
354Each of these files specifies underlays (colored bands) to be
355painted on a particular pairwise pip and its corresponding
356dotplot.  The bands are specified as regions in the reference
357sequence and are normally drawn vertically; however for a dotplot,
358Gmaj will also look to see if you have specified an underlay file
359for the transposed situation where the reference and secondary
360sequences are swapped, and if so, will draw those underlays as
361horizontal bands in the secondary sequence.
362<p>
363The PipMaker-style underlay format supported by Gmaj looks like
364this:
365<pre>
366     # partial underlays for the BTK region
367
368     LightYellow Gene
369     Green Exon
370     Red Strongly_conserved
371
372     35324 72009 (BTK gene) Gene
373     49781 49849 (exon 4) Exon
374     51403 51484 Exon
375     50350 50513 (conserved 84%) Strongly_conserved 84
376     52376 52603 (Kilroy was here) Strongly_conserved 92 +
377     ... etc.
378</pre>
379The first group of lines describes the intended meaning of the
380colors, while the second group specifies the location of each band.
381Colors must come from Gmaj's <a href="#color">color list</a>, but
382the meaning of each color can be any single word chosen by you.
383The text in parentheses is an optional label which will be
384displayed on Gmaj's position indicator line when the user points
385the mouse at that band.  The parentheses must be present if the
386label is, and the label itself cannot contain any additional
387parentheses.  The number following the color category is an
388optional integer score that can be used to interactively adjust
389which underlays are displayed; see "Underlays Box" in the
390Menus and Widgets section of <a href="gmaj_help.html"
391>Starting and Running Gmaj</a> for more information.  (The
392label and score are extra features not supported by PipMaker.)
393A <code>+</code> or <code>-</code> character at the end of a
394location line will paint just the upper or lower half of the band
395on the pip (but is ignored for dotplots).  This allows you to
396differentiate between the two strands, or to plot potentially
397overlapping features like gene predictions and database matches.
398<p>
399Note that if two bands overlap, the one that was specified last
400in the file appears "on top" and obscures the earlier one (except
401for the special <code><a href="#hatch">Hatch</a></code> color).
402Thus in this example, the green exons and red strongly conserved
403regions cover up parts of the long yellow band representing the
404gene.  As in the links file, lines beginning with a <code>#</code>
405character are comments that will be ignored.
406<p>
407
408<p class=hdr>
409<h3><a name="high">Highlights</a></h3>
410<p>
411Highlight files are analogous to the <a href="#under">underlay</a>
412files, but each of these specifies colored regions for a
413particular sequence in the text view, rather than for a plot.
414If you do not specify a highlight file for a particular sequence,
415Gmaj will automatically provide default highlights based on the
416<a href="#exon">exons</a> file (if you provided one).  These will
417use one color for whole genes, overlaid with different colors to
418indicate exons on the forward vs. reverse strand.  If the exons
419file specifies a gene's translated region, then the 5&acute; and
4203&acute; UTRs will be shaded using lighter colors.  These default
421highlights make it easy to examine the putative start/stop codons
422and splice junctions, as well as providing a visual connection
423between the graphical and text views.  But if for some reason you
424do not want any text highlights, you can suppress them by
425specifying an empty highlight file.
426<p>
427The PipMaker-style format for highlights is the same as for
428underlays, except that any <code>+</code> or <code>-</code>
429indicators will be ignored, and the <code>Hatch</code> color is
430not supported for highlights.  Just as with underlays, labels
431can be included which will be shown when the user points at
432the highlight, scores can be used to limit which entries are
433displayed, and highlights that are listed later in the file will
434cover up those that appear earlier.
435<p>
436
437<p class=hdr>
438<h3><a name="color">Color List</a></h3>
439<p>
440For Gmaj's PipMaker-style annotations, the available colors are:
441<pre>
442    Black   White        Clear
443    Gray    LightGray    DarkGray
444    Red     LightRed     DarkRed
445    Green   LightGreen   DarkGreen
446    Blue    LightBlue    DarkBlue
447    Yellow  LightYellow  DarkYellow
448    Pink    LightPink    DarkPink
449    Cyan    LightCyan    DarkCyan
450    Purple  LightPurple  DarkPurple
451    Orange  LightOrange  DarkOrange
452    Brown   LightBrown   DarkBrown
453</pre>
454These names are case-sensitive (i.e., capitalization matters).
455Not all of these are supported by PipMaker.  Also, be aware that
456the appearance of the colors may vary between PipMaker and Gmaj,
457and from one printer or monitor to the next.
458<p class=subhdr>
459<a name="hatch"><b><code>Hatch</code></b></a>
460<p>
461In addition to the regular colors listed above, Gmaj supports a
462special "color" for underlays called <code>Hatch</code>, which
463is drawn as a pattern of diagonal gray lines.  Normally if two
464underlays overlap, the one that was specified last in the file
465appears "on top" and obscures the earlier one.  However,
466<code>Hatch</code> underlays have the special property that they
467are always drawn after the other colors, and since the space
468between the diagonal lines is transparent, they allow the other
469colors to show through.  Currently <code>Hatch</code> is only
470supported for underlays, not for highlights or linkbars.
471<p>
472
473<p class=hdr>
474<h3><a name="generic">Generic Annotation Formats</a></h3>
475<p>
476The standardized generic formats currently supported by Gmaj
477include
478<a href="http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml"
479>GFF</a> (v1 & v2),
480<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#GTF"
481>GTF</a>, and various flavors of
482<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED"
483>BED</a> (including the full BED12 format, a.k.a. "gene BED").
484For details on these formats, please see the specifications at
485the above links; this document will mainly discuss their use
486by Gmaj.
487<p>
488These formats are all <b>tab-separated</b>, and despite their
489differences are similar enough that Gmaj can extract comparable
490fields and treat them more or less the same.  Note that Gmaj is
491not intended as a format validator: parsing is more lenient in
492some respects than the official format specifications, and Gmaj
493will ignore fields it has no use for.  Also, interpretation of
494these open-ended formats depends partly on what type of annotation
495is expected; e.g. if Gmaj is trying to read exons from a GFF v1
496file, it will assume that the group field is the gene name.  It
497will generally show warning messages to keep the user apprised
498of any such assumptions it is making (if these become too annoying
499they can be individually suppressed in the <a href="#param"
500>parameters file</a>; see <code><a href="sample.gmaj"
501>sample.gmaj</a></code> for details).  Because one of the main
502reasons for supporting these formats is to enable the use of
503annotation files obtained from public sources, Gmaj tries not to
504balk at anomalies that are probably not the user's fault, and
505when practical will simply skip questionable items with a warning
506message.  Each type of message will generally be displayed only
507once, and not repeated for every item with the same problem.
508<p>
509<p class=subhdr>
510<a name="fileext"><b>Filename Extensions</b></a>
511<p>
512In order to distinguish generic files from PipMaker-style ones
513and handle them appropriately, Gmaj requires that files in
514generic formats have names ending with any of certain extensions.
515The default list is <code>.gff</code>, <code>.gtf</code>,
516<code>.bed</code>, <code>.ct</code>, and <code>.trk</code>, but
517this can be customized (see <code><a href="sample.gmaj"
518>sample.gmaj</a></code>).
519<p>
520<p class=subhdr>
521<a name="quote"><b>Quoting</b></a>
522<p>
523Some of the generic formats require text values to be enclosed
524in double quotes (<code>" "</code>).  Even when not strictly
525required it is usually a good idea to do so, especially if the
526value contains spaces.  The official specifications generally
527don't say what to do if a value contains embedded quote
528characters, but Gmaj supports a rudimentary mechanism for
529escaping them with a backslash (<code>\</code>).  However it
530does not provide for escaping the backslash: quoted values
531should not end with <code>\</code> (insert a space before the
532final quote if necessary).
533<p>
534<p class=subhdr>
535<a name="empty"><b>Empty Fields</b></a>
536<p>
537When reading the generic formats, Gmaj treats two adjacent tab
538characters as an empty field.  However, your files will be easier
539for humans to read if you do not leave fields completely empty.
540Gmaj recognizes a value of <code>.</code> (the dot character)
541to mean "unspecified" for fields such as strand, score, feature,
542and color, in some cases even when the official formats don't.
543For instance, GFF v2 explicitly calls for using <code>.</code>
544when there is no score, but Gmaj allows you to do this with the
545other generic formats as well, in order to distinguish between
546"no score" and a score that is truly zero.  For colors, in
547addition to <code>.</code> Gmaj also interprets <code>0</code>
548to mean "unspecified", in keeping with examples at UCSC.
549<p>
550<p class=subhdr>
551<a name="gencoord"><b>Coordinates</b></a>
552<p>
553The GFF and GTF formats use 1-based, closed-interval coordinates
554(i.e., sequence numbering starts with "1", and specified ranges
555include both endpoints), while BED uses a 0-based, half-open
556system (the first nucleotide of the sequence is numbered "0",
557and the ending position is not included in the region).  For all
558of these formats, positions are given relative to the beginning
559of the named sequence regardless of which strand the feature is
560on (unlike MAF), and <code>start</code> must be less than or
561equal to <code>end</code>.
562<p>
563<p class=subhdr>
564<a name="gffconv"><b>GFF Conventions</b></a>
565<p>
566BED format is relatively fixed in how its fields are used, but
567GFF and GTF are more variable and require additional conventions
568for most effective use with Gmaj.  In particular, the values of
569the "feature" field and the optional "attributes" affect how Gmaj
570will interpret and display an item.
571<p>
572Values of the feature field that are recognized for special
573treatment include:
574<p class=tiny>
575<ul class="notop nobottom">
576<li>    <code>gene</code> or values starting with <code>gene_</code>
577<li>    <code>exon</code> or values starting with <code>exon_</code>
578<li>    <code>start_codon</code>, <code>str_codon</code>,
579        <code>stop_codon</code>, <code>stp_codon</code>, or
580        <code>cds</code>
581<li>    <code>repeatmasker</code> or any of the
582        <a href="#repeat">PipMaker repeat or CpG types</a>
583</ul>
584<p class=tiny>
585Of these, only the PipMaker types are case-sensitive.
586<p>
587For GFF v2 and GTF, the currently recognized attribute tags are:
588<p class=tiny>
589<ul class="notop nobottom">
590<li>    <code>gene</code> or <code>gene_id</code>: the name of the
591        gene, e.g. for grouping exons (<code>transcript_id</code> is
592        ignored)
593<li>    <code>name</code>: an optional name for this individual item,
594        e.g. for an exon label
595<li>    <code>sequence</code> (when feature is
596        <code>repeatmasker</code>): the name/class/family of the
597        repeat, e.g. <code>AluJb/SINE/Alu</code>
598<li>    <code>color</code>: a <a href="#gencolor">color</a>
599        specification in UCSC format, e.g. <code>0,0,255</code>
600<li>    <code>url</code> or <code>ucsc_id</code>: the URL for
601        linkbars; <code>$$</code> will be replaced with the value of
602        <code>name</code>
603</ul>
604<p class=tiny>
605These keywords are not case-sensitive, but they cannot have
606multiple values.
607<p>
608<p class=subhdr>
609<a name="custom"><b>Custom Tracks</b></a>
610<p>
611Along with the basic formats listed above, Gmaj also supports UCSC
612<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks"
613>custom track</a> headers.
614<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#TRACK"
615>Track lines</a> can specify certain settings for an entire
616track; currently <code><a href="#gencolor">color</a></code>,
617<code><a href="#gencolor">itemRgb</a></code>, <code>offset</code>,
618and <code>url</code> are supported.  They also allow several
619tracks (even in mixed formats) to be combined in a single file.
620Gmaj does not currently provide a way to use just one particular
621track from such a file (it will be treated as one big bag of
622annotations), but lines in unsupported formats such as
623<a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html"
624>WIG</a> are gracefully skipped.
625<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#lines"
626>Browser lines</a> are also skipped; Gmaj's initial zoom position
627is controlled by command-line or applet parameters rather than by
628individual annotation files.
629<p>
630<p class=subhdr>
631<a name="multiseq"><b>Multiple Sequences</b></a>
632<p>
633Generic files can also contain annotations for several sequences,
634because unlike the PipMaker-style formats, they all have a
635"seqname" or "chrom" field that Gmaj can use to select the
636appropriate lines.  Ideally Gmaj expects this field to match
637the sequence name from the <a href="#align">alignment files</a>,
638but has two ways to deal with exceptions.  If there is only one
639seqname in the annotation file, then Gmaj will go ahead and use
640it, but will display a warning (unless the mismatch can be fixed
641by prepending the organism name, or the organism name plus
642<code>chr</code>, to the annotation seqname).  But if the file
643has annotations for several sequences and some don't match the
644alignment files, you need to tell Gmaj which is which by adding
645an alias in the <a href="#param">parameters file</a> (see
646<code><a href="sample.gmaj">sample.gmaj</a></code>).
647<p>
648<p class=subhdr>
649<a name="reuse"><b>Reusing Files</b></a>
650<p>
651One of the advantages of using generic formats is that files can
652be reused in multiple panels without reformatting, e.g. as both
653exons and underlays.  Normally linkbars, underlays, and text
654highlights are simply handled as arbitrary regions of a specified
655color, since they could represent any type of biological feature.
656However, you can ask Gmaj to interpret them as exons or repeats
657by adding a type hint in the <a href="#param">parameters file</a>
658(see <code><a href="sample.gmaj">sample.gmaj</a></code>).  Note
659that currently this will also cause any <a href="#gencolor"
660>specified colors</a> in that file to be overridden with Gmaj's
661defaults.
662<p>
663Combining several biological types of annotations (e.g. exons
664and repeats) in one file is possible, but not recommended.  Gmaj
665will try to skip lines that are not appropriate for the type it
666is seeking, but it may draw more than you want.
667<p>
668<p class=subhdr>
669<a name="cds"><b>Coding Sequence</b></a>
670<p>
671Currently Gmaj has no special support for multiple transcripts.
672When inferring UTRs, all of the CDS-related items for a single
673gene name are combined, and the interval from the lowest
674coordinate to the highest is used as the CDS.  Also, some of the
675formats' rules specify whether or not the initiation and stop
676codons should be included in the CDS, but Gmaj does not make
677adjustments to compensate for that; instead it simply includes
678all of the given endpoints in the CDS.
679<!-- and leaves it up to the user to interpret the display based
680on the convention used in the files he/she provided.  [the user
681does not supply files for applets] -->
682<p>
683<p class=subhdr>
684<a name="gencolor"><b>Colors</b></a>
685<p>
686Colors can be specified for individual annotation lines via the
687<code>itemRgb</code> field (for BED) or a <code>color</code>
688attribute (for GFF v2 or GTF).  However, for <a href="#custom"
689>custom tracks</a>, these are governed by the track line's
690<code>itemRgb</code> attribute, which defaults to off per the
691UCSC specification.  Thus if you have track lines and want to
692use the per-item colors, you need to include
693<code>itemRgb=On</code> in the track attributes.
694<p>
695Track lines can also have a <code>color</code> attribute for
696the entire track, which will be used if <code>itemRgb</code> is
697off, or if an individual item does not have its own color.
698However in a rare break from the UCSC specification, Gmaj does
699not use black as the default if the track color is unspecified
700(black underlays and highlights just don't work with black plots
701and text).  Instead it uses its own default colors, which for
702genes/exons are the same as the colors for <a href="#high"
703>default highlights</a>, or light gray for other annotations.
704Note that these defaults will also override your colors when
705<a href="#reuse">type hints</a> are used.
706<p>
707All of the above-mentioned color values are specified in UCSC
708format, which consists of three comma-separated RGB values from
7090-255 (e.g. <code>0,0,255</code>).
710<p>
711<p class=subhdr>
712<a name="sort"><b>Sorting</b></a>
713<p>
714The order of the lines is not supposed to matter in these generic
715formats, but for most of the Gmaj panels it does matter:  exons
716need to be grouped by gene and ordered by position so UTRs can be
717inferred and exon numbers assigned, early underlays are covered
718up by later ones, etc.  Gmaj solves this problem by sorting the
719data before it is displayed.  Exons are sorted first by gene name
720in ascending order, and then within each gene by start position
721(ascending) and lastly in case of a tie, by end position
722(descending).  All other annotation types are sorted first by
723length in descending order, and then in case of a tie by start
724position (ascending).  This usually produces a reasonable display,
725but if you need direct control of the order, you can use the
726PipMaker-style formats instead.
727<p>
728
729<p class=vvlarge>
730<hr>
731<i>Cathy Riemer, June 2008</i>
732
733<p class=scrollspace>
734</body>
735</html>
Note: リポジトリブラウザについてのヘルプは TracBrowser を参照してください。