root/galaxy-central/eggs/docutils-0.4-py2.6.egg/docutils/parsers/rst/states.py

リビジョン 3, 121.1 KB (コミッタ: kohda, 14 年 前)

Install Unix tools  http://hannonlab.cshl.edu/galaxy_unix_tools/galaxy.html

行番号 
1# Author: David Goodger
2# Contact: goodger@users.sourceforge.net
3# Revision: $Revision: 4258 $
4# Date: $Date: 2006-01-09 04:29:23 +0100 (Mon, 09 Jan 2006) $
5# Copyright: This module has been placed in the public domain.
6
7"""
8This is the ``docutils.parsers.restructuredtext.states`` module, the core of
9the reStructuredText parser.  It defines the following:
10
11:Classes:
12    - `RSTStateMachine`: reStructuredText parser's entry point.
13    - `NestedStateMachine`: recursive StateMachine.
14    - `RSTState`: reStructuredText State superclass.
15    - `Inliner`: For parsing inline markup.
16    - `Body`: Generic classifier of the first line of a block.
17    - `SpecializedBody`: Superclass for compound element members.
18    - `BulletList`: Second and subsequent bullet_list list_items
19    - `DefinitionList`: Second+ definition_list_items.
20    - `EnumeratedList`: Second+ enumerated_list list_items.
21    - `FieldList`: Second+ fields.
22    - `OptionList`: Second+ option_list_items.
23    - `RFC2822List`: Second+ RFC2822-style fields.
24    - `ExtensionOptions`: Parses directive option fields.
25    - `Explicit`: Second+ explicit markup constructs.
26    - `SubstitutionDef`: For embedded directives in substitution definitions.
27    - `Text`: Classifier of second line of a text block.
28    - `SpecializedText`: Superclass for continuation lines of Text-variants.
29    - `Definition`: Second line of potential definition_list_item.
30    - `Line`: Second line of overlined section title or transition marker.
31    - `Struct`: An auxiliary collection class.
32
33:Exception classes:
34    - `MarkupError`
35    - `ParserError`
36    - `MarkupMismatch`
37
38:Functions:
39    - `escape2null()`: Return a string, escape-backslashes converted to nulls.
40    - `unescape()`: Return a string, nulls removed or restored to backslashes.
41
42:Attributes:
43    - `state_classes`: set of State classes used with `RSTStateMachine`.
44
45Parser Overview
46===============
47
48The reStructuredText parser is implemented as a recursive state machine,
49examining its input one line at a time.  To understand how the parser works,
50please first become familiar with the `docutils.statemachine` module.  In the
51description below, references are made to classes defined in this module;
52please see the individual classes for details.
53
54Parsing proceeds as follows:
55
561. The state machine examines each line of input, checking each of the
57   transition patterns of the state `Body`, in order, looking for a match.
58   The implicit transitions (blank lines and indentation) are checked before
59   any others.  The 'text' transition is a catch-all (matches anything).
60
612. The method associated with the matched transition pattern is called.
62
63   A. Some transition methods are self-contained, appending elements to the
64      document tree (`Body.doctest` parses a doctest block).  The parser's
65      current line index is advanced to the end of the element, and parsing
66      continues with step 1.
67
68   B. Other transition methods trigger the creation of a nested state machine,
69      whose job is to parse a compound construct ('indent' does a block quote,
70      'bullet' does a bullet list, 'overline' does a section [first checking
71      for a valid section header], etc.).
72
73      - In the case of lists and explicit markup, a one-off state machine is
74        created and run to parse contents of the first item.
75
76      - A new state machine is created and its initial state is set to the
77        appropriate specialized state (`BulletList` in the case of the
78        'bullet' transition; see `SpecializedBody` for more detail).  This
79        state machine is run to parse the compound element (or series of
80        explicit markup elements), and returns as soon as a non-member element
81        is encountered.  For example, the `BulletList` state machine ends as
82        soon as it encounters an element which is not a list item of that
83        bullet list.  The optional omission of inter-element blank lines is
84        enabled by this nested state machine.
85
86      - The current line index is advanced to the end of the elements parsed,
87        and parsing continues with step 1.
88
89   C. The result of the 'text' transition depends on the next line of text.
90      The current state is changed to `Text`, under which the second line is
91      examined.  If the second line is:
92
93      - Indented: The element is a definition list item, and parsing proceeds
94        similarly to step 2.B, using the `DefinitionList` state.
95
96      - A line of uniform punctuation characters: The element is a section
97        header; again, parsing proceeds as in step 2.B, and `Body` is still
98        used.
99
100      - Anything else: The element is a paragraph, which is examined for
101        inline markup and appended to the parent element.  Processing
102        continues with step 1.
103"""
104
105__docformat__ = 'reStructuredText'
106
107
108import sys
109import re
110import roman
111from types import TupleType
112from docutils import nodes, statemachine, utils, urischemes
113from docutils import ApplicationError, DataError
114from docutils.statemachine import StateMachineWS, StateWS
115from docutils.nodes import fully_normalize_name as normalize_name
116from docutils.nodes import whitespace_normalize_name
117from docutils.utils import escape2null, unescape, column_width
118from docutils.parsers.rst import directives, languages, tableparser, roles
119from docutils.parsers.rst.languages import en as _fallback_language_module
120
121
122class MarkupError(DataError): pass
123class UnknownInterpretedRoleError(DataError): pass
124class InterpretedRoleNotImplementedError(DataError): pass
125class ParserError(ApplicationError): pass
126class MarkupMismatch(Exception): pass
127
128
129class Struct:
130
131    """Stores data attributes for dotted-attribute access."""
132
133    def __init__(self, **keywordargs):
134        self.__dict__.update(keywordargs)
135
136
137class RSTStateMachine(StateMachineWS):
138
139    """
140    reStructuredText's master StateMachine.
141
142    The entry point to reStructuredText parsing is the `run()` method.
143    """
144
145    def run(self, input_lines, document, input_offset=0, match_titles=1,
146            inliner=None):
147        """
148        Parse `input_lines` and modify the `document` node in place.
149
150        Extend `StateMachineWS.run()`: set up parse-global data and
151        run the StateMachine.
152        """
153        self.language = languages.get_language(
154            document.settings.language_code)
155        self.match_titles = match_titles
156        if inliner is None:
157            inliner = Inliner()
158        inliner.init_customizations(document.settings)
159        self.memo = Struct(document=document,
160                           reporter=document.reporter,
161                           language=self.language,
162                           title_styles=[],
163                           section_level=0,
164                           section_bubble_up_kludge=0,
165                           inliner=inliner)
166        self.document = document
167        self.attach_observer(document.note_source)
168        self.reporter = self.memo.reporter
169        self.node = document
170        results = StateMachineWS.run(self, input_lines, input_offset,
171                                     input_source=document['source'])
172        assert results == [], 'RSTStateMachine.run() results should be empty!'
173        self.node = self.memo = None    # remove unneeded references
174
175
176class NestedStateMachine(StateMachineWS):
177
178    """
179    StateMachine run from within other StateMachine runs, to parse nested
180    document structures.
181    """
182
183    def run(self, input_lines, input_offset, memo, node, match_titles=1):
184        """
185        Parse `input_lines` and populate a `docutils.nodes.document` instance.
186
187        Extend `StateMachineWS.run()`: set up document-wide data.
188        """
189        self.match_titles = match_titles
190        self.memo = memo
191        self.document = memo.document
192        self.attach_observer(self.document.note_source)
193        self.reporter = memo.reporter
194        self.language = memo.language
195        self.node = node
196        results = StateMachineWS.run(self, input_lines, input_offset)
197        assert results == [], ('NestedStateMachine.run() results should be '
198                               'empty!')
199        return results
200
201
202class RSTState(StateWS):
203
204    """
205    reStructuredText State superclass.
206
207    Contains methods used by all State subclasses.
208    """
209
210    nested_sm = NestedStateMachine
211
212    def __init__(self, state_machine, debug=0):
213        self.nested_sm_kwargs = {'state_classes': state_classes,
214                                 'initial_state': 'Body'}
215        StateWS.__init__(self, state_machine, debug)
216
217    def runtime_init(self):
218        StateWS.runtime_init(self)
219        memo = self.state_machine.memo
220        self.memo = memo
221        self.reporter = memo.reporter
222        self.inliner = memo.inliner
223        self.document = memo.document
224        self.parent = self.state_machine.node
225
226    def goto_line(self, abs_line_offset):
227        """
228        Jump to input line `abs_line_offset`, ignoring jumps past the end.
229        """
230        try:
231            self.state_machine.goto_line(abs_line_offset)
232        except EOFError:
233            pass
234
235    def no_match(self, context, transitions):
236        """
237        Override `StateWS.no_match` to generate a system message.
238
239        This code should never be run.
240        """
241        self.reporter.severe(
242            'Internal error: no transition pattern match.  State: "%s"; '
243            'transitions: %s; context: %s; current line: %r.'
244            % (self.__class__.__name__, transitions, context,
245               self.state_machine.line),
246            line=self.state_machine.abs_line_number())
247        return context, None, []
248
249    def bof(self, context):
250        """Called at beginning of file."""
251        return [], []
252
253    def nested_parse(self, block, input_offset, node, match_titles=0,
254                     state_machine_class=None, state_machine_kwargs=None):
255        """
256        Create a new StateMachine rooted at `node` and run it over the input
257        `block`.
258        """
259        if state_machine_class is None:
260            state_machine_class = self.nested_sm
261        if state_machine_kwargs is None:
262            state_machine_kwargs = self.nested_sm_kwargs
263        block_length = len(block)
264        state_machine = state_machine_class(debug=self.debug,
265                                            **state_machine_kwargs)
266        state_machine.run(block, input_offset, memo=self.memo,
267                          node=node, match_titles=match_titles)
268        state_machine.unlink()
269        new_offset = state_machine.abs_line_offset()
270        # No `block.parent` implies disconnected -- lines aren't in sync:
271        if block.parent and (len(block) - block_length) != 0:
272            # Adjustment for block if modified in nested parse:
273            self.state_machine.next_line(len(block) - block_length)
274        return new_offset
275
276    def nested_list_parse(self, block, input_offset, node, initial_state,
277                          blank_finish,
278                          blank_finish_state=None,
279                          extra_settings={},
280                          match_titles=0,
281                          state_machine_class=None,
282                          state_machine_kwargs=None):
283        """
284        Create a new StateMachine rooted at `node` and run it over the input
285        `block`. Also keep track of optional intermediate blank lines and the
286        required final one.
287        """
288        if state_machine_class is None:
289            state_machine_class = self.nested_sm
290        if state_machine_kwargs is None:
291            state_machine_kwargs = self.nested_sm_kwargs.copy()
292        state_machine_kwargs['initial_state'] = initial_state
293        state_machine = state_machine_class(debug=self.debug,
294                                            **state_machine_kwargs)
295        if blank_finish_state is None:
296            blank_finish_state = initial_state
297        state_machine.states[blank_finish_state].blank_finish = blank_finish
298        for key, value in extra_settings.items():
299            setattr(state_machine.states[initial_state], key, value)
300        state_machine.run(block, input_offset, memo=self.memo,
301                          node=node, match_titles=match_titles)
302        blank_finish = state_machine.states[blank_finish_state].blank_finish
303        state_machine.unlink()
304        return state_machine.abs_line_offset(), blank_finish
305
306    def section(self, title, source, style, lineno, messages):
307        """Check for a valid subsection and create one if it checks out."""
308        if self.check_subsection(source, style, lineno):
309            self.new_subsection(title, lineno, messages)
310
311    def check_subsection(self, source, style, lineno):
312        """
313        Check for a valid subsection header.  Return 1 (true) or None (false).
314
315        When a new section is reached that isn't a subsection of the current
316        section, back up the line count (use ``previous_line(-x)``), then
317        ``raise EOFError``.  The current StateMachine will finish, then the
318        calling StateMachine can re-examine the title.  This will work its way
319        back up the calling chain until the correct section level isreached.
320
321        @@@ Alternative: Evaluate the title, store the title info & level, and
322        back up the chain until that level is reached.  Store in memo? Or
323        return in results?
324
325        :Exception: `EOFError` when a sibling or supersection encountered.
326        """
327        memo = self.memo
328        title_styles = memo.title_styles
329        mylevel = memo.section_level
330        try:                            # check for existing title style
331            level = title_styles.index(style) + 1
332        except ValueError:              # new title style
333            if len(title_styles) == memo.section_level: # new subsection
334                title_styles.append(style)
335                return 1
336            else:                       # not at lowest level
337                self.parent += self.title_inconsistent(source, lineno)
338                return None
339        if level <= mylevel:            # sibling or supersection
340            memo.section_level = level   # bubble up to parent section
341            if len(style) == 2:
342                memo.section_bubble_up_kludge = 1
343            # back up 2 lines for underline title, 3 for overline title
344            self.state_machine.previous_line(len(style) + 1)
345            raise EOFError              # let parent section re-evaluate
346        if level == mylevel + 1:        # immediate subsection
347            return 1
348        else:                           # invalid subsection
349            self.parent += self.title_inconsistent(source, lineno)
350            return None
351
352    def title_inconsistent(self, sourcetext, lineno):
353        error = self.reporter.severe(
354            'Title level inconsistent:', nodes.literal_block('', sourcetext),
355            line=lineno)
356        return error
357
358    def new_subsection(self, title, lineno, messages):
359        """Append new subsection to document tree. On return, check level."""
360        memo = self.memo
361        mylevel = memo.section_level
362        memo.section_level += 1
363        section_node = nodes.section()
364        self.parent += section_node
365        textnodes, title_messages = self.inline_text(title, lineno)
366        titlenode = nodes.title(title, '', *textnodes)
367        name = normalize_name(titlenode.astext())
368        section_node['names'].append(name)
369        section_node += titlenode
370        section_node += messages
371        section_node += title_messages
372        self.document.note_implicit_target(section_node, section_node)
373        offset = self.state_machine.line_offset + 1
374        absoffset = self.state_machine.abs_line_offset() + 1
375        newabsoffset = self.nested_parse(
376              self.state_machine.input_lines[offset:], input_offset=absoffset,
377              node=section_node, match_titles=1)
378        self.goto_line(newabsoffset)
379        if memo.section_level <= mylevel: # can't handle next section?
380            raise EOFError              # bubble up to supersection
381        # reset section_level; next pass will detect it properly
382        memo.section_level = mylevel
383
384    def paragraph(self, lines, lineno):
385        """
386        Return a list (paragraph & messages) & a boolean: literal_block next?
387        """
388        data = '\n'.join(lines).rstrip()
389        if re.search(r'(?<!\\)(\\\\)*::$', data):
390            if len(data) == 2:
391                return [], 1
392            elif data[-3] in ' \n':
393                text = data[:-3].rstrip()
394            else:
395                text = data[:-1]
396            literalnext = 1
397        else:
398            text = data
399            literalnext = 0
400        textnodes, messages = self.inline_text(text, lineno)
401        p = nodes.paragraph(data, '', *textnodes)
402        p.line = lineno
403        return [p] + messages, literalnext
404
405    def inline_text(self, text, lineno):
406        """
407        Return 2 lists: nodes (text and inline elements), and system_messages.
408        """
409        return self.inliner.parse(text, lineno, self.memo, self.parent)
410
411    def unindent_warning(self, node_name):
412        return self.reporter.warning(
413            '%s ends without a blank line; unexpected unindent.' % node_name,
414            line=(self.state_machine.abs_line_number() + 1))
415
416
417def build_regexp(definition, compile=1):
418    """
419    Build, compile and return a regular expression based on `definition`.
420
421    :Parameter: `definition`: a 4-tuple (group name, prefix, suffix, parts),
422        where "parts" is a list of regular expressions and/or regular
423        expression definitions to be joined into an or-group.
424    """
425    name, prefix, suffix, parts = definition
426    part_strings = []
427    for part in parts:
428        if type(part) is TupleType:
429            part_strings.append(build_regexp(part, None))
430        else:
431            part_strings.append(part)
432    or_group = '|'.join(part_strings)
433    regexp = '%(prefix)s(?P<%(name)s>%(or_group)s)%(suffix)s' % locals()
434    if compile:
435        return re.compile(regexp, re.UNICODE)
436    else:
437        return regexp
438
439
440class Inliner:
441
442    """
443    Parse inline markup; call the `parse()` method.
444    """
445
446    def __init__(self):
447        self.implicit_dispatch = [(self.patterns.uri, self.standalone_uri),]
448        """List of (pattern, bound method) tuples, used by
449        `self.implicit_inline`."""
450
451    def init_customizations(self, settings):
452        """Setting-based customizations; run when parsing begins."""
453        if settings.pep_references:
454            self.implicit_dispatch.append((self.patterns.pep,
455                                           self.pep_reference))
456        if settings.rfc_references:
457            self.implicit_dispatch.append((self.patterns.rfc,
458                                           self.rfc_reference))
459
460    def parse(self, text, lineno, memo, parent):
461        # Needs to be refactored for nested inline markup.
462        # Add nested_parse() method?
463        """
464        Return 2 lists: nodes (text and inline elements), and system_messages.
465
466        Using `self.patterns.initial`, a pattern which matches start-strings
467        (emphasis, strong, interpreted, phrase reference, literal,
468        substitution reference, and inline target) and complete constructs
469        (simple reference, footnote reference), search for a candidate.  When
470        one is found, check for validity (e.g., not a quoted '*' character).
471        If valid, search for the corresponding end string if applicable, and
472        check it for validity.  If not found or invalid, generate a warning
473        and ignore the start-string.  Implicit inline markup (e.g. standalone
474        URIs) is found last.
475        """
476        self.reporter = memo.reporter
477        self.document = memo.document
478        self.language = memo.language
479        self.parent = parent
480        pattern_search = self.patterns.initial.search
481        dispatch = self.dispatch
482        remaining = escape2null(text)
483        processed = []
484        unprocessed = []
485        messages = []
486        while remaining:
487            match = pattern_search(remaining)
488            if match:
489                groups = match.groupdict()
490                method = dispatch[groups['start'] or groups['backquote']
491                                  or groups['refend'] or groups['fnend']]
492                before, inlines, remaining, sysmessages = method(self, match,
493                                                                 lineno)
494                unprocessed.append(before)
495                messages += sysmessages
496                if inlines:
497                    processed += self.implicit_inline(''.join(unprocessed),
498                                                      lineno)
499                    processed += inlines
500                    unprocessed = []
501            else:
502                break
503        remaining = ''.join(unprocessed) + remaining
504        if remaining:
505            processed += self.implicit_inline(remaining, lineno)
506        return processed, messages
507
508    openers = '\'"([{<'
509    closers = '\'")]}>'
510    start_string_prefix = (r'((?<=^)|(?<=[-/: \n%s]))' % re.escape(openers))
511    end_string_suffix = (r'((?=$)|(?=[-/:.,;!? \n\x00%s]))'
512                         % re.escape(closers))
513    non_whitespace_before = r'(?<![ \n])'
514    non_whitespace_escape_before = r'(?<![ \n\x00])'
515    non_whitespace_after = r'(?![ \n])'
516    # Alphanumerics with isolated internal [-._] chars (i.e. not 2 together):
517    simplename = r'(?:(?!_)\w)+(?:[-._](?:(?!_)\w)+)*'
518    # Valid URI characters (see RFC 2396 & RFC 2732);
519    # final \x00 allows backslash escapes in URIs:
520    uric = r"""[-_.!~*'()[\];/:@&=+$,%a-zA-Z0-9\x00]"""
521    # Delimiter indicating the end of a URI (not part of the URI):
522    uri_end_delim = r"""[>]"""
523    # Last URI character; same as uric but no punctuation:
524    urilast = r"""[_~*/=+a-zA-Z0-9]"""
525    # End of a URI (either 'urilast' or 'uric followed by a
526    # uri_end_delim'):
527    uri_end = r"""(?:%(urilast)s|%(uric)s(?=%(uri_end_delim)s))""" % locals()
528    emailc = r"""[-_!~*'{|}/#?^`&=+$%a-zA-Z0-9\x00]"""
529    email_pattern = r"""
530          %(emailc)s+(?:\.%(emailc)s+)*   # name
531          (?<!\x00)@                      # at
532          %(emailc)s+(?:\.%(emailc)s*)*   # host
533          %(uri_end)s                     # final URI char
534          """
535    parts = ('initial_inline', start_string_prefix, '',
536             [('start', '', non_whitespace_after,  # simple start-strings
537               [r'\*\*',                # strong
538                r'\*(?!\*)',            # emphasis but not strong
539                r'``',                  # literal
540                r'_`',                  # inline internal target
541                r'\|(?!\|)']            # substitution reference
542               ),
543              ('whole', '', end_string_suffix, # whole constructs
544               [# reference name & end-string
545                r'(?P<refname>%s)(?P<refend>__?)' % simplename,
546                ('footnotelabel', r'\[', r'(?P<fnend>\]_)',
547                 [r'[0-9]+',               # manually numbered
548                  r'\#(%s)?' % simplename, # auto-numbered (w/ label?)
549                  r'\*',                   # auto-symbol
550                  r'(?P<citationlabel>%s)' % simplename] # citation reference
551                 )
552                ]
553               ),
554              ('backquote',             # interpreted text or phrase reference
555               '(?P<role>(:%s:)?)' % simplename, # optional role
556               non_whitespace_after,
557               ['`(?!`)']               # but not literal
558               )
559              ]
560             )
561    patterns = Struct(
562          initial=build_regexp(parts),
563          emphasis=re.compile(non_whitespace_escape_before
564                              + r'(\*)' + end_string_suffix),
565          strong=re.compile(non_whitespace_escape_before
566                            + r'(\*\*)' + end_string_suffix),
567          interpreted_or_phrase_ref=re.compile(
568              r"""
569              %(non_whitespace_escape_before)s
570              (
571                `
572                (?P<suffix>
573                  (?P<role>:%(simplename)s:)?
574                  (?P<refend>__?)?
575                )
576              )
577              %(end_string_suffix)s
578              """ % locals(), re.VERBOSE | re.UNICODE),
579          embedded_uri=re.compile(
580              r"""
581              (
582                (?:[ \n]+|^)            # spaces or beginning of line/string
583                <                       # open bracket
584                %(non_whitespace_after)s
585                ([^<>\x00]+)            # anything but angle brackets & nulls
586                %(non_whitespace_before)s
587                >                       # close bracket w/o whitespace before
588              )
589              $                         # end of string
590              """ % locals(), re.VERBOSE),
591          literal=re.compile(non_whitespace_before + '(``)'
592                             + end_string_suffix),
593          target=re.compile(non_whitespace_escape_before
594                            + r'(`)' + end_string_suffix),
595          substitution_ref=re.compile(non_whitespace_escape_before
596                                      + r'(\|_{0,2})'
597                                      + end_string_suffix),
598          email=re.compile(email_pattern % locals() + '$', re.VERBOSE),
599          uri=re.compile(
600                (r"""
601                %(start_string_prefix)s
602                (?P<whole>
603                  (?P<absolute>           # absolute URI
604                    (?P<scheme>             # scheme (http, ftp, mailto)
605                      [a-zA-Z][a-zA-Z0-9.+-]*
606                    )
607                    :
608                    (
609                      (                       # either:
610                        (//?)?                  # hierarchical URI
611                        %(uric)s*               # URI characters
612                        %(uri_end)s             # final URI char
613                      )
614                      (                       # optional query
615                        \?%(uric)s*
616                        %(uri_end)s
617                      )?
618                      (                       # optional fragment
619                        \#%(uric)s*
620                        %(uri_end)s
621                      )?
622                    )
623                  )
624                |                       # *OR*
625                  (?P<email>              # email address
626                    """ + email_pattern + r"""
627                  )
628                )
629                %(end_string_suffix)s
630                """) % locals(), re.VERBOSE),
631          pep=re.compile(
632                r"""
633                %(start_string_prefix)s
634                (
635                  (pep-(?P<pepnum1>\d+)(.txt)?) # reference to source file
636                |
637                  (PEP\s+(?P<pepnum2>\d+))      # reference by name
638                )
639                %(end_string_suffix)s""" % locals(), re.VERBOSE),
640          rfc=re.compile(
641                r"""
642                %(start_string_prefix)s
643                (RFC(-|\s+)?(?P<rfcnum>\d+))
644                %(end_string_suffix)s""" % locals(), re.VERBOSE))
645
646    def quoted_start(self, match):
647        """Return 1 if inline markup start-string is 'quoted', 0 if not."""
648        string = match.string
649        start = match.start()
650        end = match.end()
651        if start == 0:                  # start-string at beginning of text
652            return 0
653        prestart = string[start - 1]
654        try:
655            poststart = string[end]
656            if self.openers.index(prestart) \
657                  == self.closers.index(poststart):   # quoted
658                return 1
659        except IndexError:              # start-string at end of text
660            return 1
661        except ValueError:              # not quoted
662            pass
663        return 0
664
665    def inline_obj(self, match, lineno, end_pattern, nodeclass,
666                   restore_backslashes=0):
667        string = match.string
668        matchstart = match.start('start')
669        matchend = match.end('start')
670        if self.quoted_start(match):
671            return (string[:matchend], [], string[matchend:], [], '')
672        endmatch = end_pattern.search(string[matchend:])
673        if endmatch and endmatch.start(1):  # 1 or more chars
674            text = unescape(endmatch.string[:endmatch.start(1)],
675                            restore_backslashes)
676            textend = matchend + endmatch.end(1)
677            rawsource = unescape(string[matchstart:textend], 1)
678            return (string[:matchstart], [nodeclass(rawsource, text)],
679                    string[textend:], [], endmatch.group(1))
680        msg = self.reporter.warning(
681              'Inline %s start-string without end-string.'
682              % nodeclass.__name__, line=lineno)
683        text = unescape(string[matchstart:matchend], 1)
684        rawsource = unescape(string[matchstart:matchend], 1)
685        prb = self.problematic(text, rawsource, msg)
686        return string[:matchstart], [prb], string[matchend:], [msg], ''
687
688    def problematic(self, text, rawsource, message):
689        msgid = self.document.set_id(message, self.parent)
690        problematic = nodes.problematic(rawsource, text, refid=msgid)
691        prbid = self.document.set_id(problematic)
692        message.add_backref(prbid)
693        return problematic
694
695    def emphasis(self, match, lineno):
696        before, inlines, remaining, sysmessages, endstring = self.inline_obj(
697              match, lineno, self.patterns.emphasis, nodes.emphasis)
698        return before, inlines, remaining, sysmessages
699
700    def strong(self, match, lineno):
701        before, inlines, remaining, sysmessages, endstring = self.inline_obj(
702              match, lineno, self.patterns.strong, nodes.strong)
703        return before, inlines, remaining, sysmessages
704
705    def interpreted_or_phrase_ref(self, match, lineno):
706        end_pattern = self.patterns.interpreted_or_phrase_ref
707        string = match.string
708        matchstart = match.start('backquote')
709        matchend = match.end('backquote')
710        rolestart = match.start('role')
711        role = match.group('role')
712        position = ''
713        if role:
714            role = role[1:-1]
715            position = 'prefix'
716        elif self.quoted_start(match):
717            return (string[:matchend], [], string[matchend:], [])
718        endmatch = end_pattern.search(string[matchend:])
719        if endmatch and endmatch.start(1):  # 1 or more chars
720            textend = matchend + endmatch.end()
721            if endmatch.group('role'):
722                if role:
723                    msg = self.reporter.warning(
724                        'Multiple roles in interpreted text (both '
725                        'prefix and suffix present; only one allowed).',
726                        line=lineno)
727                    text = unescape(string[rolestart:textend], 1)
728                    prb = self.problematic(text, text, msg)
729                    return string[:rolestart], [prb], string[textend:], [msg]
730                role = endmatch.group('suffix')[1:-1]
731                position = 'suffix'
732            escaped = endmatch.string[:endmatch.start(1)]
733            rawsource = unescape(string[matchstart:textend], 1)
734            if rawsource[-1:] == '_':
735                if role:
736                    msg = self.reporter.warning(
737                          'Mismatch: both interpreted text role %s and '
738                          'reference suffix.' % position, line=lineno)
739                    text = unescape(string[rolestart:textend], 1)
740                    prb = self.problematic(text, text, msg)
741                    return string[:rolestart], [prb], string[textend:], [msg]
742                return self.phrase_ref(string[:matchstart], string[textend:],
743                                       rawsource, escaped, unescape(escaped))
744            else:
745                rawsource = unescape(string[rolestart:textend], 1)
746                nodelist, messages = self.interpreted(rawsource, escaped, role,
747                                                      lineno)
748                return (string[:rolestart], nodelist,
749                        string[textend:], messages)
750        msg = self.reporter.warning(
751              'Inline interpreted text or phrase reference start-string '
752              'without end-string.', line=lineno)
753        text = unescape(string[matchstart:matchend], 1)
754        prb = self.problematic(text, text, msg)
755        return string[:matchstart], [prb], string[matchend:], [msg]
756
757    def phrase_ref(self, before, after, rawsource, escaped, text):
758        match = self.patterns.embedded_uri.search(escaped)
759        if match:
760            text = unescape(escaped[:match.start(0)])
761            uri_text = match.group(2)
762            uri = ''.join(uri_text.split())
763            uri = self.adjust_uri(uri)
764            if uri:
765                target = nodes.target(match.group(1), refuri=uri)
766            else:
767                raise ApplicationError('problem with URI: %r' % uri_text)
768            if not text:
769                text = uri
770        else:
771            target = None
772        refname = normalize_name(text)
773        reference = nodes.reference(rawsource, text,
774                                    name=whitespace_normalize_name(text))
775        node_list = [reference]
776        if rawsource[-2:] == '__':
777            if target:
778                reference['refuri'] = uri
779            else:
780                reference['anonymous'] = 1
781        else:
782            if target:
783                reference['refuri'] = uri
784                target['names'].append(refname)
785                self.document.note_explicit_target(target, self.parent)
786                node_list.append(target)
787            else:
788                reference['refname'] = refname
789                self.document.note_refname(reference)
790        return before, node_list, after, []
791
792    def adjust_uri(self, uri):
793        match = self.patterns.email.match(uri)
794        if match:
795            return 'mailto:' + uri
796        else:
797            return uri
798
799    def interpreted(self, rawsource, text, role, lineno):
800        role_fn, messages = roles.role(role, self.language, lineno,
801                                       self.reporter)
802        if role_fn:
803            nodes, messages2 = role_fn(role, rawsource, text, lineno, self)
804            return nodes, messages + messages2
805        else:
806            msg = self.reporter.error(
807                'Unknown interpreted text role "%s".' % role,
808                line=lineno)
809            return ([self.problematic(rawsource, rawsource, msg)],
810                    messages + [msg])
811
812    def literal(self, match, lineno):
813        before, inlines, remaining, sysmessages, endstring = self.inline_obj(
814              match, lineno, self.patterns.literal, nodes.literal,
815              restore_backslashes=1)
816        return before, inlines, remaining, sysmessages
817
818    def inline_internal_target(self, match, lineno):
819        before, inlines, remaining, sysmessages, endstring = self.inline_obj(
820              match, lineno, self.patterns.target, nodes.target)
821        if inlines and isinstance(inlines[0], nodes.target):
822            assert len(inlines) == 1
823            target = inlines[0]
824            name = normalize_name(target.astext())
825            target['names'].append(name)
826            self.document.note_explicit_target(target, self.parent)
827        return before, inlines, remaining, sysmessages
828
829    def substitution_reference(self, match, lineno):
830        before, inlines, remaining, sysmessages, endstring = self.inline_obj(
831              match, lineno, self.patterns.substitution_ref,
832              nodes.substitution_reference)
833        if len(inlines) == 1:
834            subref_node = inlines[0]
835            if isinstance(subref_node, nodes.substitution_reference):
836                subref_text = subref_node.astext()
837                self.document.note_substitution_ref(subref_node, subref_text)
838                if endstring[-1:] == '_':
839                    reference_node = nodes.reference(
840                        '|%s%s' % (subref_text, endstring), '')
841                    if endstring[-2:] == '__':
842                        reference_node['anonymous'] = 1
843                    else:
844                        reference_node['refname'] = normalize_name(subref_text)
845                        self.document.note_refname(reference_node)
846                    reference_node += subref_node
847                    inlines = [reference_node]
848        return before, inlines, remaining, sysmessages
849
850    def footnote_reference(self, match, lineno):
851        """
852        Handles `nodes.footnote_reference` and `nodes.citation_reference`
853        elements.
854        """
855        label = match.group('footnotelabel')
856        refname = normalize_name(label)
857        string = match.string
858        before = string[:match.start('whole')]
859        remaining = string[match.end('whole'):]
860        if match.group('citationlabel'):
861            refnode = nodes.citation_reference('[%s]_' % label,
862                                               refname=refname)
863            refnode += nodes.Text(label)
864            self.document.note_citation_ref(refnode)
865        else:
866            refnode = nodes.footnote_reference('[%s]_' % label)
867            if refname[0] == '#':
868                refname = refname[1:]
869                refnode['auto'] = 1
870                self.document.note_autofootnote_ref(refnode)
871            elif refname == '*':
872                refname = ''
873                refnode['auto'] = '*'
874                self.document.note_symbol_footnote_ref(
875                      refnode)
876            else:
877                refnode += nodes.Text(label)
878            if refname:
879                refnode['refname'] = refname
880                self.document.note_footnote_ref(refnode)
881            if utils.get_trim_footnote_ref_space(self.document.settings):
882                before = before.rstrip()
883        return (before, [refnode], remaining, [])
884
885    def reference(self, match, lineno, anonymous=None):
886        referencename = match.group('refname')
887        refname = normalize_name(referencename)
888        referencenode = nodes.reference(
889            referencename + match.group('refend'), referencename,
890            name=whitespace_normalize_name(referencename))
891        if anonymous:
892            referencenode['anonymous'] = 1
893        else:
894            referencenode['refname'] = refname
895            self.document.note_refname(referencenode)
896        string = match.string
897        matchstart = match.start('whole')
898        matchend = match.end('whole')
899        return (string[:matchstart], [referencenode], string[matchend:], [])
900
901    def anonymous_reference(self, match, lineno):
902        return self.reference(match, lineno, anonymous=1)
903
904    def standalone_uri(self, match, lineno):
905        if not match.group('scheme') or urischemes.schemes.has_key(
906              match.group('scheme').lower()):
907            if match.group('email'):
908                addscheme = 'mailto:'
909            else:
910                addscheme = ''
911            text = match.group('whole')
912            unescaped = unescape(text, 0)
913            return [nodes.reference(unescape(text, 1), unescaped,
914                                    refuri=addscheme + unescaped)]
915        else:                   # not a valid scheme
916            raise MarkupMismatch
917
918    pep_url = 'pep-%04d.html'
919
920    def pep_reference(self, match, lineno):
921        text = match.group(0)
922        if text.startswith('pep-'):
923            pepnum = int(match.group('pepnum1'))
924        elif text.startswith('PEP'):
925            pepnum = int(match.group('pepnum2'))
926        else:
927            raise MarkupMismatch
928        ref = self.document.settings.pep_base_url + self.pep_url % pepnum
929        unescaped = unescape(text, 0)
930        return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
931
932    rfc_url = 'rfc%d.html'
933
934    def rfc_reference(self, match, lineno):
935        text = match.group(0)
936        if text.startswith('RFC'):
937            rfcnum = int(match.group('rfcnum'))
938            ref = self.document.settings.rfc_base_url + self.rfc_url % rfcnum
939        else:
940            raise MarkupMismatch
941        unescaped = unescape(text, 0)
942        return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
943
944    def implicit_inline(self, text, lineno):
945        """
946        Check each of the patterns in `self.implicit_dispatch` for a match,
947        and dispatch to the stored method for the pattern.  Recursively check
948        the text before and after the match.  Return a list of `nodes.Text`
949        and inline element nodes.
950        """
951        if not text:
952            return []
953        for pattern, method in self.implicit_dispatch:
954            match = pattern.search(text)
955            if match:
956                try:
957                    # Must recurse on strings before *and* after the match;
958                    # there may be multiple patterns.
959                    return (self.implicit_inline(text[:match.start()], lineno)
960                            + method(match, lineno) +
961                            self.implicit_inline(text[match.end():], lineno))
962                except MarkupMismatch:
963                    pass
964        return [nodes.Text(unescape(text), rawsource=unescape(text, 1))]
965
966    dispatch = {'*': emphasis,
967                '**': strong,
968                '`': interpreted_or_phrase_ref,
969                '``': literal,
970                '_`': inline_internal_target,
971                ']_': footnote_reference,
972                '|': substitution_reference,
973                '_': reference,
974                '__': anonymous_reference}
975
976
977def _loweralpha_to_int(s, _zero=(ord('a')-1)):
978    return ord(s) - _zero
979
980def _upperalpha_to_int(s, _zero=(ord('A')-1)):
981    return ord(s) - _zero
982
983def _lowerroman_to_int(s):
984    return roman.fromRoman(s.upper())
985
986
987class Body(RSTState):
988
989    """
990    Generic classifier of the first line of a block.
991    """
992
993    double_width_pad_char = tableparser.TableParser.double_width_pad_char
994    """Padding character for East Asian double-width text."""
995
996    enum = Struct()
997    """Enumerated list parsing information."""
998
999    enum.formatinfo = {
1000          'parens': Struct(prefix='(', suffix=')', start=1, end=-1),
1001          'rparen': Struct(prefix='', suffix=')', start=0, end=-1),
1002          'period': Struct(prefix='', suffix='.', start=0, end=-1)}
1003    enum.formats = enum.formatinfo.keys()
1004    enum.sequences = ['arabic', 'loweralpha', 'upperalpha',
1005                      'lowerroman', 'upperroman'] # ORDERED!
1006    enum.sequencepats = {'arabic': '[0-9]+',
1007                         'loweralpha': '[a-z]',
1008                         'upperalpha': '[A-Z]',
1009                         'lowerroman': '[ivxlcdm]+',
1010                         'upperroman': '[IVXLCDM]+',}
1011    enum.converters = {'arabic': int,
1012                       'loweralpha': _loweralpha_to_int,
1013                       'upperalpha': _upperalpha_to_int,
1014                       'lowerroman': _lowerroman_to_int,
1015                       'upperroman': roman.fromRoman}
1016
1017    enum.sequenceregexps = {}
1018    for sequence in enum.sequences:
1019        enum.sequenceregexps[sequence] = re.compile(
1020              enum.sequencepats[sequence] + '$')
1021
1022    grid_table_top_pat = re.compile(r'\+-[-+]+-\+ *$')
1023    """Matches the top (& bottom) of a full table)."""
1024
1025    simple_table_top_pat = re.compile('=+( +=+)+ *$')
1026    """Matches the top of a simple table."""
1027
1028    simple_table_border_pat = re.compile('=+[ =]*$')
1029    """Matches the bottom & header bottom of a simple table."""
1030
1031    pats = {}
1032    """Fragments of patterns used by transitions."""
1033
1034    pats['nonalphanum7bit'] = '[!-/:-@[-`{-~]'
1035    pats['alpha'] = '[a-zA-Z]'
1036    pats['alphanum'] = '[a-zA-Z0-9]'
1037    pats['alphanumplus'] = '[a-zA-Z0-9_-]'
1038    pats['enum'] = ('(%(arabic)s|%(loweralpha)s|%(upperalpha)s|%(lowerroman)s'
1039                    '|%(upperroman)s|#)' % enum.sequencepats)
1040    pats['optname'] = '%(alphanum)s%(alphanumplus)s*' % pats
1041    # @@@ Loosen up the pattern?  Allow Unicode?
1042    pats['optarg'] = '(%(alpha)s%(alphanumplus)s*|<[^<>]+>)' % pats
1043    pats['shortopt'] = r'(-|\+)%(alphanum)s( ?%(optarg)s)?' % pats
1044    pats['longopt'] = r'(--|/)%(optname)s([ =]%(optarg)s)?' % pats
1045    pats['option'] = r'(%(shortopt)s|%(longopt)s)' % pats
1046
1047    for format in enum.formats:
1048        pats[format] = '(?P<%s>%s%s%s)' % (
1049              format, re.escape(enum.formatinfo[format].prefix),
1050              pats['enum'], re.escape(enum.formatinfo[format].suffix))
1051
1052    patterns = {
1053          'bullet': r'[-+*]( +|$)',
1054          'enumerator': r'(%(parens)s|%(rparen)s|%(period)s)( +|$)' % pats,
1055          'field_marker': r':(?![: ])([^:\\]|\\.)*(?<! ):( +|$)',
1056          'option_marker': r'%(option)s(, %(option)s)*(  +| ?$)' % pats,
1057          'doctest': r'>>>( +|$)',
1058          'line_block': r'\|( +|$)',
1059          'grid_table_top': grid_table_top_pat,
1060          'simple_table_top': simple_table_top_pat,
1061          'explicit_markup': r'\.\.( +|$)',
1062          'anonymous': r'__( +|$)',
1063          'line': r'(%(nonalphanum7bit)s)\1* *$' % pats,
1064          'text': r''}
1065    initial_transitions = (
1066          'bullet',
1067          'enumerator',
1068          'field_marker',
1069          'option_marker',
1070          'doctest',
1071          'line_block',
1072          'grid_table_top',
1073          'simple_table_top',
1074          'explicit_markup',
1075          'anonymous',
1076          'line',
1077          'text')
1078
1079    def indent(self, match, context, next_state):
1080        """Block quote."""
1081        indented, indent, line_offset, blank_finish = \
1082              self.state_machine.get_indented()
1083        blockquote, messages = self.block_quote(indented, line_offset)
1084        self.parent += blockquote
1085        self.parent += messages
1086        if not blank_finish:
1087            self.parent += self.unindent_warning('Block quote')
1088        return context, next_state, []
1089
1090    def block_quote(self, indented, line_offset):
1091        blockquote_lines, attribution_lines, attribution_offset = \
1092              self.check_attribution(indented, line_offset)
1093        blockquote = nodes.block_quote()
1094        self.nested_parse(blockquote_lines, line_offset, blockquote)
1095        messages = []
1096        if attribution_lines:
1097            attribution, messages = self.parse_attribution(attribution_lines,
1098                                                           attribution_offset)
1099            blockquote += attribution
1100        return blockquote, messages
1101
1102    # u'\u2014' is an em-dash:
1103    attribution_pattern = re.compile(ur'(---?(?!-)|\u2014) *(?=[^ \n])')
1104
1105    def check_attribution(self, indented, line_offset):
1106        """
1107        Check for an attribution in the last contiguous block of `indented`.
1108
1109        * First line after last blank line must begin with "--" (etc.).
1110        * Every line after that must have consistent indentation.
1111
1112        Return a 3-tuple: (block quote lines, attribution lines,
1113        attribution offset).
1114        """
1115        #import pdb ; pdb.set_trace()
1116        blank = None
1117        nonblank_seen = None
1118        indent = 0
1119        for i in range(len(indented) - 1, 0, -1): # don't check first line
1120            this_line_blank = not indented[i].strip()
1121            if nonblank_seen and this_line_blank:
1122                match = self.attribution_pattern.match(indented[i + 1])
1123                if match:
1124                    blank = i
1125                break
1126            elif not this_line_blank:
1127                nonblank_seen = 1
1128        if blank and len(indented) - blank > 2: # multi-line attribution
1129            indent = (len(indented[blank + 2])
1130                      - len(indented[blank + 2].lstrip()))
1131            for j in range(blank + 3, len(indented)):
1132                if ( indented[j]        # may be blank last line
1133                     and indent != (len(indented[j])
1134                                    - len(indented[j].lstrip()))):
1135                    # bad shape
1136                    blank = None
1137                    break
1138        if blank:
1139            a_lines = indented[blank + 1:]
1140            a_lines.trim_left(match.end(), end=1)
1141            a_lines.trim_left(indent, start=1)
1142            return (indented[:blank], a_lines, line_offset + blank + 1)
1143        else:
1144            return (indented, None, None)
1145
1146    def parse_attribution(self, indented, line_offset):
1147        text = '\n'.join(indented).rstrip()
1148        lineno = self.state_machine.abs_line_number() + line_offset
1149        textnodes, messages = self.inline_text(text, lineno)
1150        node = nodes.attribution(text, '', *textnodes)
1151        node.line = lineno
1152        return node, messages
1153
1154    def bullet(self, match, context, next_state):
1155        """Bullet list item."""
1156        bulletlist = nodes.bullet_list()
1157        self.parent += bulletlist
1158        bulletlist['bullet'] = match.string[0]
1159        i, blank_finish = self.list_item(match.end())
1160        bulletlist += i
1161        offset = self.state_machine.line_offset + 1   # next line
1162        new_line_offset, blank_finish = self.nested_list_parse(
1163              self.state_machine.input_lines[offset:],
1164              input_offset=self.state_machine.abs_line_offset() + 1,
1165              node=bulletlist, initial_state='BulletList',
1166              blank_finish=blank_finish)
1167        self.goto_line(new_line_offset)
1168        if not blank_finish:
1169            self.parent += self.unindent_warning('Bullet list')
1170        return [], next_state, []
1171
1172    def list_item(self, indent):
1173        if self.state_machine.line[indent:]:
1174            indented, line_offset, blank_finish = (
1175                self.state_machine.get_known_indented(indent))
1176        else:
1177            indented, indent, line_offset, blank_finish = (
1178                self.state_machine.get_first_known_indented(indent))
1179        listitem = nodes.list_item('\n'.join(indented))
1180        if indented:
1181            self.nested_parse(indented, input_offset=line_offset,
1182                              node=listitem)
1183        return listitem, blank_finish
1184
1185    def enumerator(self, match, context, next_state):
1186        """Enumerated List Item"""
1187        format, sequence, text, ordinal = self.parse_enumerator(match)
1188        if not self.is_enumerated_list_item(ordinal, sequence, format):
1189            raise statemachine.TransitionCorrection('text')
1190        enumlist = nodes.enumerated_list()
1191        self.parent += enumlist
1192        if sequence == '#':
1193            enumlist['enumtype'] = 'arabic'
1194        else:
1195            enumlist['enumtype'] = sequence
1196        enumlist['prefix'] = self.enum.formatinfo[format].prefix
1197        enumlist['suffix'] = self.enum.formatinfo[format].suffix
1198        if ordinal != 1:
1199            enumlist['start'] = ordinal
1200            msg = self.reporter.info(
1201                'Enumerated list start value not ordinal-1: "%s" (ordinal %s)'
1202                % (text, ordinal), line=self.state_machine.abs_line_number())
1203            self.parent += msg
1204        listitem, blank_finish = self.list_item(match.end())
1205        enumlist += listitem
1206        offset = self.state_machine.line_offset + 1   # next line
1207        newline_offset, blank_finish = self.nested_list_parse(
1208              self.state_machine.input_lines[offset:],
1209              input_offset=self.state_machine.abs_line_offset() + 1,
1210              node=enumlist, initial_state='EnumeratedList',
1211              blank_finish=blank_finish,
1212              extra_settings={'lastordinal': ordinal,
1213                              'format': format,
1214                              'auto': sequence == '#'})
1215        self.goto_line(newline_offset)
1216        if not blank_finish:
1217            self.parent += self.unindent_warning('Enumerated list')
1218        return [], next_state, []
1219
1220    def parse_enumerator(self, match, expected_sequence=None):
1221        """
1222        Analyze an enumerator and return the results.
1223
1224        :Return:
1225            - the enumerator format ('period', 'parens', or 'rparen'),
1226            - the sequence used ('arabic', 'loweralpha', 'upperroman', etc.),
1227            - the text of the enumerator, stripped of formatting, and
1228            - the ordinal value of the enumerator ('a' -> 1, 'ii' -> 2, etc.;
1229              ``None`` is returned for invalid enumerator text).
1230
1231        The enumerator format has already been determined by the regular
1232        expression match. If `expected_sequence` is given, that sequence is
1233        tried first. If not, we check for Roman numeral 1. This way,
1234        single-character Roman numerals (which are also alphabetical) can be
1235        matched. If no sequence has been matched, all sequences are checked in
1236        order.
1237        """
1238        groupdict = match.groupdict()
1239        sequence = ''
1240        for format in self.enum.formats:
1241            if groupdict[format]:       # was this the format matched?
1242                break                   # yes; keep `format`
1243        else:                           # shouldn't happen
1244            raise ParserError('enumerator format not matched')
1245        text = groupdict[format][self.enum.formatinfo[format].start
1246                                 :self.enum.formatinfo[format].end]
1247        if text == '#':
1248            sequence = '#'
1249        elif expected_sequence:
1250            try:
1251                if self.enum.sequenceregexps[expected_sequence].match(text):
1252                    sequence = expected_sequence
1253            except KeyError:            # shouldn't happen
1254                raise ParserError('unknown enumerator sequence: %s'
1255                                  % sequence)
1256        elif text == 'i':
1257            sequence = 'lowerroman'
1258        elif text == 'I':
1259            sequence = 'upperroman'
1260        if not sequence:
1261            for sequence in self.enum.sequences:
1262                if self.enum.sequenceregexps[sequence].match(text):
1263                    break
1264            else:                       # shouldn't happen
1265                raise ParserError('enumerator sequence not matched')
1266        if sequence == '#':
1267            ordinal = 1
1268        else:
1269            try:
1270                ordinal = self.enum.converters[sequence](text)
1271            except roman.InvalidRomanNumeralError:
1272                ordinal = None
1273        return format, sequence, text, ordinal
1274
1275    def is_enumerated_list_item(self, ordinal, sequence, format):
1276        """
1277        Check validity based on the ordinal value and the second line.
1278
1279        Return true iff the ordinal is valid and the second line is blank,
1280        indented, or starts with the next enumerator or an auto-enumerator.
1281        """
1282        if ordinal is None:
1283            return None
1284        try:
1285            next_line = self.state_machine.next_line()
1286        except EOFError:              # end of input lines
1287            self.state_machine.previous_line()
1288            return 1
1289        else:
1290            self.state_machine.previous_line()
1291        if not next_line[:1].strip():   # blank or indented
1292            return 1
1293        result = self.make_enumerator(ordinal + 1, sequence, format)
1294        if result:
1295            next_enumerator, auto_enumerator = result
1296            try:
1297                if ( next_line.startswith(next_enumerator) or
1298                     next_line.startswith(auto_enumerator) ):
1299                    return 1
1300            except TypeError:
1301                pass
1302        return None
1303
1304    def make_enumerator(self, ordinal, sequence, format):
1305        """
1306        Construct and return the next enumerated list item marker, and an
1307        auto-enumerator ("#" instead of the regular enumerator).
1308
1309        Return ``None`` for invalid (out of range) ordinals.
1310        """ #"
1311        if sequence == '#':
1312            enumerator = '#'
1313        elif sequence == 'arabic':
1314            enumerator = str(ordinal)
1315        else:
1316            if sequence.endswith('alpha'):
1317                if ordinal > 26:
1318                    return None
1319                enumerator = chr(ordinal + ord('a') - 1)
1320            elif sequence.endswith('roman'):
1321                try:
1322                    enumerator = roman.toRoman(ordinal)
1323                except roman.RomanError:
1324                    return None
1325            else:                       # shouldn't happen
1326                raise ParserError('unknown enumerator sequence: "%s"'
1327                                  % sequence)
1328            if sequence.startswith('lower'):
1329                enumerator = enumerator.lower()
1330            elif sequence.startswith('upper'):
1331                enumerator = enumerator.upper()
1332            else:                       # shouldn't happen
1333                raise ParserError('unknown enumerator sequence: "%s"'
1334                                  % sequence)
1335        formatinfo = self.enum.formatinfo[format]
1336        next_enumerator = (formatinfo.prefix + enumerator + formatinfo.suffix
1337                           + ' ')
1338        auto_enumerator = formatinfo.prefix + '#' + formatinfo.suffix + ' '
1339        return next_enumerator, auto_enumerator
1340
1341    def field_marker(self, match, context, next_state):
1342        """Field list item."""
1343        field_list = nodes.field_list()
1344        self.parent += field_list
1345        field, blank_finish = self.field(match)
1346        field_list += field
1347        offset = self.state_machine.line_offset + 1   # next line
1348        newline_offset, blank_finish = self.nested_list_parse(
1349              self.state_machine.input_lines[offset:],
1350              input_offset=self.state_machine.abs_line_offset() + 1,
1351              node=field_list, initial_state='FieldList',
1352              blank_finish=blank_finish)
1353        self.goto_line(newline_offset)
1354        if not blank_finish:
1355            self.parent += self.unindent_warning('Field list')
1356        return [], next_state, []
1357
1358    def field(self, match):
1359        name = self.parse_field_marker(match)
1360        lineno = self.state_machine.abs_line_number()
1361        indented, indent, line_offset, blank_finish = \
1362              self.state_machine.get_first_known_indented(match.end())
1363        field_node = nodes.field()
1364        field_node.line = lineno
1365        name_nodes, name_messages = self.inline_text(name, lineno)
1366        field_node += nodes.field_name(name, '', *name_nodes)
1367        field_body = nodes.field_body('\n'.join(indented), *name_messages)
1368        field_node += field_body
1369        if indented:
1370            self.parse_field_body(indented, line_offset, field_body)
1371        return field_node, blank_finish
1372
1373    def parse_field_marker(self, match):
1374        """Extract & return field name from a field marker match."""
1375        field = match.group()[1:]        # strip off leading ':'
1376        field = field[:field.rfind(':')] # strip off trailing ':' etc.
1377        return field
1378
1379    def parse_field_body(self, indented, offset, node):
1380        self.nested_parse(indented, input_offset=offset, node=node)
1381
1382    def option_marker(self, match, context, next_state):
1383        """Option list item."""
1384        optionlist = nodes.option_list()
1385        try:
1386            listitem, blank_finish = self.option_list_item(match)
1387        except MarkupError, (message, lineno):
1388            # This shouldn't happen; pattern won't match.
1389            msg = self.reporter.error(
1390                'Invalid option list marker: %s' % message, line=lineno)
1391            self.parent += msg
1392            indented, indent, line_offset, blank_finish = \
1393                  self.state_machine.get_first_known_indented(match.end())
1394            blockquote, messages = self.block_quote(indented, line_offset)
1395            self.parent += blockquote
1396            self.parent += messages
1397            if not blank_finish:
1398                self.parent += self.unindent_warning('Option list')
1399            return [], next_state, []
1400        self.parent += optionlist
1401        optionlist += listitem
1402        offset = self.state_machine.line_offset + 1   # next line
1403        newline_offset, blank_finish = self.nested_list_parse(
1404              self.state_machine.input_lines[offset:],
1405              input_offset=self.state_machine.abs_line_offset() + 1,
1406              node=optionlist, initial_state='OptionList',
1407              blank_finish=blank_finish)
1408        self.goto_line(newline_offset)
1409        if not blank_finish:
1410            self.parent += self.unindent_warning('Option list')
1411        return [], next_state, []
1412
1413    def option_list_item(self, match):
1414        offset = self.state_machine.abs_line_offset()
1415        options = self.parse_option_marker(match)
1416        indented, indent, line_offset, blank_finish = \
1417              self.state_machine.get_first_known_indented(match.end())
1418        if not indented:                # not an option list item
1419            self.goto_line(offset)
1420            raise statemachine.TransitionCorrection('text')
1421        option_group = nodes.option_group('', *options)
1422        description = nodes.description('\n'.join(indented))
1423        option_list_item = nodes.option_list_item('', option_group,
1424                                                  description)
1425        if indented:
1426            self.nested_parse(indented, input_offset=line_offset,
1427                              node=description)
1428        return option_list_item, blank_finish
1429
1430    def parse_option_marker(self, match):
1431        """
1432        Return a list of `node.option` and `node.option_argument` objects,
1433        parsed from an option marker match.
1434
1435        :Exception: `MarkupError` for invalid option markers.
1436        """
1437        optlist = []
1438        optionstrings = match.group().rstrip().split(', ')
1439        for optionstring in optionstrings:
1440            tokens = optionstring.split()
1441            delimiter = ' '
1442            firstopt = tokens[0].split('=')
1443            if len(firstopt) > 1:
1444                # "--opt=value" form
1445                tokens[:1] = firstopt
1446                delimiter = '='
1447            elif (len(tokens[0]) > 2
1448                  and ((tokens[0].startswith('-')
1449                        and not tokens[0].startswith('--'))
1450                       or tokens[0].startswith('+'))):
1451                # "-ovalue" form
1452                tokens[:1] = [tokens[0][:2], tokens[0][2:]]
1453                delimiter = ''
1454            if len(tokens) > 1 and (tokens[1].startswith('<')
1455                                    and tokens[-1].endswith('>')):
1456                # "-o <value1 value2>" form; join all values into one token
1457                tokens[1:] = [' '.join(tokens[1:])]
1458            if 0 < len(tokens) <= 2:
1459                option = nodes.option(optionstring)
1460                option += nodes.option_string(tokens[0], tokens[0])
1461                if len(tokens) > 1:
1462                    option += nodes.option_argument(tokens[1], tokens[1],
1463                                                    delimiter=delimiter)
1464                optlist.append(option)
1465            else:
1466                raise MarkupError(
1467                    'wrong number of option tokens (=%s), should be 1 or 2: '
1468                    '"%s"' % (len(tokens), optionstring),
1469                    self.state_machine.abs_line_number() + 1)
1470        return optlist
1471
1472    def doctest(self, match, context, next_state):
1473        data = '\n'.join(self.state_machine.get_text_block())
1474        self.parent += nodes.doctest_block(data, data)
1475        return [], next_state, []
1476
1477    def line_block(self, match, context, next_state):
1478        """First line of a line block."""
1479        block = nodes.line_block()
1480        self.parent += block
1481        lineno = self.state_machine.abs_line_number()
1482        line, messages, blank_finish = self.line_block_line(match, lineno)
1483        block += line
1484        self.parent += messages
1485        if not blank_finish:
1486            offset = self.state_machine.line_offset + 1   # next line
1487            new_line_offset, blank_finish = self.nested_list_parse(
1488                  self.state_machine.input_lines[offset:],
1489                  input_offset=self.state_machine.abs_line_offset() + 1,
1490                  node=block, initial_state='LineBlock',
1491                  blank_finish=0)
1492            self.goto_line(new_line_offset)
1493        if not blank_finish:
1494            self.parent += self.reporter.warning(
1495                'Line block ends without a blank line.',
1496                line=(self.state_machine.abs_line_number() + 1))
1497        if len(block):
1498            if block[0].indent is None:
1499                block[0].indent = 0
1500            self.nest_line_block_lines(block)
1501        return [], next_state, []
1502
1503    def line_block_line(self, match, lineno):
1504        """Return one line element of a line_block."""
1505        indented, indent, line_offset, blank_finish = \
1506              self.state_machine.get_first_known_indented(match.end(),
1507                                                          until_blank=1)
1508        text = u'\n'.join(indented)
1509        text_nodes, messages = self.inline_text(text, lineno)
1510        line = nodes.line(text, '', *text_nodes)
1511        if match.string.rstrip() != '|': # not empty
1512            line.indent = len(match.group(1)) - 1
1513        return line, messages, blank_finish
1514
1515    def nest_line_block_lines(self, block):
1516        for index in range(1, len(block)):
1517            if block[index].indent is None:
1518                block[index].indent = block[index - 1].indent
1519        self.nest_line_block_segment(block)
1520
1521    def nest_line_block_segment(self, block):
1522        indents = [item.indent for item in block]
1523        least = min(indents)
1524        new_items = []
1525        new_block = nodes.line_block()
1526        for item in block:
1527            if item.indent > least:
1528                new_block.append(item)
1529            else:
1530                if len(new_block):
1531                    self.nest_line_block_segment(new_block)
1532                    new_items.append(new_block)
1533                    new_block = nodes.line_block()
1534                new_items.append(item)
1535        if len(new_block):
1536            self.nest_line_block_segment(new_block)
1537            new_items.append(new_block)
1538        block[:] = new_items
1539
1540    def grid_table_top(self, match, context, next_state):
1541        """Top border of a full table."""
1542        return self.table_top(match, context, next_state,
1543                              self.isolate_grid_table,
1544                              tableparser.GridTableParser)
1545
1546    def simple_table_top(self, match, context, next_state):
1547        """Top border of a simple table."""
1548        return self.table_top(match, context, next_state,
1549                              self.isolate_simple_table,
1550                              tableparser.SimpleTableParser)
1551
1552    def table_top(self, match, context, next_state,
1553                  isolate_function, parser_class):
1554        """Top border of a generic table."""
1555        nodelist, blank_finish = self.table(isolate_function, parser_class)
1556        self.parent += nodelist
1557        if not blank_finish:
1558            msg = self.reporter.warning(
1559                'Blank line required after table.',
1560                line=self.state_machine.abs_line_number() + 1)
1561            self.parent += msg
1562        return [], next_state, []
1563
1564    def table(self, isolate_function, parser_class):
1565        """Parse a table."""
1566        block, messages, blank_finish = isolate_function()
1567        if block:
1568            try:
1569                parser = parser_class()
1570                tabledata = parser.parse(block)
1571                tableline = (self.state_machine.abs_line_number() - len(block)
1572                             + 1)
1573                table = self.build_table(tabledata, tableline)
1574                nodelist = [table] + messages
1575            except tableparser.TableMarkupError, detail:
1576                nodelist = self.malformed_table(
1577                    block, ' '.join(detail.args)) + messages
1578        else:
1579            nodelist = messages
1580        return nodelist, blank_finish
1581
1582    def isolate_grid_table(self):
1583        messages = []
1584        blank_finish = 1
1585        try:
1586            block = self.state_machine.get_text_block(flush_left=1)
1587        except statemachine.UnexpectedIndentationError, instance:
1588            block, source, lineno = instance.args
1589            messages.append(self.reporter.error('Unexpected indentation.',
1590                                                source=source, line=lineno))
1591            blank_finish = 0
1592        block.disconnect()
1593        # for East Asian chars:
1594        block.pad_double_width(self.double_width_pad_char)
1595        width = len(block[0].strip())
1596        for i in range(len(block)):
1597            block[i] = block[i].strip()
1598            if block[i][0] not in '+|': # check left edge
1599                blank_finish = 0
1600                self.state_machine.previous_line(len(block) - i)
1601                del block[i:]
1602                break
1603        if not self.grid_table_top_pat.match(block[-1]): # find bottom
1604            blank_finish = 0
1605            # from second-last to third line of table:
1606            for i in range(len(block) - 2, 1, -1):
1607                if self.grid_table_top_pat.match(block[i]):
1608                    self.state_machine.previous_line(len(block) - i + 1)
1609                    del block[i+1:]
1610                    break
1611            else:
1612                messages.extend(self.malformed_table(block))
1613                return [], messages, blank_finish
1614        for i in range(len(block)):     # check right edge
1615            if len(block[i]) != width or block[i][-1] not in '+|':
1616                messages.extend(self.malformed_table(block))
1617                return [], messages, blank_finish
1618        return block, messages, blank_finish
1619
1620    def isolate_simple_table(self):
1621        start = self.state_machine.line_offset
1622        lines = self.state_machine.input_lines
1623        limit = len(lines) - 1
1624        toplen = len(lines[start].strip())
1625        pattern_match = self.simple_table_border_pat.match
1626        found = 0
1627        found_at = None
1628        i = start + 1
1629        while i <= limit:
1630            line = lines[i]
1631            match = pattern_match(line)
1632            if match:
1633                if len(line.strip()) != toplen:
1634                    self.state_machine.next_line(i - start)
1635                    messages = self.malformed_table(
1636                        lines[start:i+1], 'Bottom/header table border does '
1637                        'not match top border.')
1638                    return [], messages, i == limit or not lines[i+1].strip()
1639                found += 1
1640                found_at = i
1641                if found == 2 or i == limit or not lines[i+1].strip():
1642                    end = i
1643                    break
1644            i += 1
1645        else:                           # reached end of input_lines
1646            if found:
1647                extra = ' or no blank line after table bottom'
1648                self.state_machine.next_line(found_at - start)
1649                block = lines[start:found_at+1]
1650            else:
1651                extra = ''
1652                self.state_machine.next_line(i - start - 1)
1653                block = lines[start:]
1654            messages = self.malformed_table(
1655                block, 'No bottom table border found%s.' % extra)
1656            return [], messages, not extra
1657        self.state_machine.next_line(end - start)
1658        block = lines[start:end+1]
1659        # for East Asian chars:
1660        block.pad_double_width(self.double_width_pad_char)
1661        return block, [], end == limit or not lines[end+1].strip()
1662
1663    def malformed_table(self, block, detail=''):
1664        block.replace(self.double_width_pad_char, '')
1665        data = '\n'.join(block)
1666        message = 'Malformed table.'
1667        lineno = self.state_machine.abs_line_number() - len(block) + 1
1668        if detail:
1669            message += '\n' + detail
1670        error = self.reporter.error(message, nodes.literal_block(data, data),
1671                                    line=lineno)
1672        return [error]
1673
1674    def build_table(self, tabledata, tableline, stub_columns=0):
1675        colwidths, headrows, bodyrows = tabledata
1676        table = nodes.table()
1677        tgroup = nodes.tgroup(cols=len(colwidths))
1678        table += tgroup
1679        for colwidth in colwidths:
1680            colspec = nodes.colspec(colwidth=colwidth)
1681            if stub_columns:
1682                colspec.attributes['stub'] = 1
1683                stub_columns -= 1
1684            tgroup += colspec
1685        if headrows:
1686            thead = nodes.thead()
1687            tgroup += thead
1688            for row in headrows:
1689                thead += self.build_table_row(row, tableline)
1690        tbody = nodes.tbody()
1691        tgroup += tbody
1692        for row in bodyrows:
1693            tbody += self.build_table_row(row, tableline)
1694        return table
1695
1696    def build_table_row(self, rowdata, tableline):
1697        row = nodes.row()
1698        for cell in rowdata:
1699            if cell is None:
1700                continue
1701            morerows, morecols, offset, cellblock = cell
1702            attributes = {}
1703            if morerows:
1704                attributes['morerows'] = morerows
1705            if morecols:
1706                attributes['morecols'] = morecols
1707            entry = nodes.entry(**attributes)
1708            row += entry
1709            if ''.join(cellblock):
1710                self.nested_parse(cellblock, input_offset=tableline+offset,
1711                                  node=entry)
1712        return row
1713
1714
1715    explicit = Struct()
1716    """Patterns and constants used for explicit markup recognition."""
1717
1718    explicit.patterns = Struct(
1719          target=re.compile(r"""
1720                            (
1721                              _               # anonymous target
1722                            |               # *OR*
1723                              (?P<quote>`?)   # optional open quote
1724                              (?![ `])        # first char. not space or
1725                                              # backquote
1726                              (?P<name>       # reference name
1727                                .+?
1728                              )
1729                              %(non_whitespace_escape_before)s
1730                              (?P=quote)      # close quote if open quote used
1731                            )
1732                            (?<!(?<!\x00):) # no unescaped colon at end
1733                            %(non_whitespace_escape_before)s
1734                            [ ]?            # optional space
1735                            :               # end of reference name
1736                            ([ ]+|$)        # followed by whitespace
1737                            """ % vars(Inliner), re.VERBOSE),
1738          reference=re.compile(r"""
1739                               (
1740                                 (?P<simple>%(simplename)s)_
1741                               |                  # *OR*
1742                                 `                  # open backquote
1743                                 (?![ ])            # not space
1744                                 (?P<phrase>.+?)    # hyperlink phrase
1745                                 %(non_whitespace_escape_before)s
1746                                 `_                 # close backquote,
1747                                                    # reference mark
1748                               )
1749                               $                  # end of string
1750                               """ % vars(Inliner), re.VERBOSE | re.UNICODE),
1751          substitution=re.compile(r"""
1752                                  (
1753                                    (?![ ])          # first char. not space
1754                                    (?P<name>.+?)    # substitution text
1755                                    %(non_whitespace_escape_before)s
1756                                    \|               # close delimiter
1757                                  )
1758                                  ([ ]+|$)           # followed by whitespace
1759                                  """ % vars(Inliner), re.VERBOSE),)
1760
1761    def footnote(self, match):
1762        lineno = self.state_machine.abs_line_number()
1763        indented, indent, offset, blank_finish = \
1764              self.state_machine.get_first_known_indented(match.end())
1765        label = match.group(1)
1766        name = normalize_name(label)
1767        footnote = nodes.footnote('\n'.join(indented))
1768        footnote.line = lineno
1769        if name[0] == '#':              # auto-numbered
1770            name = name[1:]             # autonumber label
1771            footnote['auto'] = 1
1772            if name:
1773                footnote['names'].append(name)
1774            self.document.note_autofootnote(footnote)
1775        elif name == '*':               # auto-symbol
1776            name = ''
1777            footnote['auto'] = '*'
1778            self.document.note_symbol_footnote(footnote)
1779        else:                           # manually numbered
1780            footnote += nodes.label('', label)
1781            footnote['names'].append(name)
1782            self.document.note_footnote(footnote)
1783        if name:
1784            self.document.note_explicit_target(footnote, footnote)
1785        else:
1786            self.document.set_id(footnote, footnote)
1787        if indented:
1788            self.nested_parse(indented, input_offset=offset, node=footnote)
1789        return [footnote], blank_finish
1790
1791    def citation(self, match):
1792        lineno = self.state_machine.abs_line_number()
1793        indented, indent, offset, blank_finish = \
1794              self.state_machine.get_first_known_indented(match.end())
1795        label = match.group(1)
1796        name = normalize_name(label)
1797        citation = nodes.citation('\n'.join(indented))
1798        citation.line = lineno
1799        citation += nodes.label('', label)
1800        citation['names'].append(name)
1801        self.document.note_citation(citation)
1802        self.document.note_explicit_target(citation, citation)
1803        if indented:
1804            self.nested_parse(indented, input_offset=offset, node=citation)
1805        return [citation], blank_finish
1806
1807    def hyperlink_target(self, match):
1808        pattern = self.explicit.patterns.target
1809        lineno = self.state_machine.abs_line_number()
1810        block, indent, offset, blank_finish = \
1811              self.state_machine.get_first_known_indented(
1812              match.end(), until_blank=1, strip_indent=0)
1813        blocktext = match.string[:match.end()] + '\n'.join(block)
1814        block = [escape2null(line) for line in block]
1815        escaped = block[0]
1816        blockindex = 0
1817        while 1:
1818            targetmatch = pattern.match(escaped)
1819            if targetmatch:
1820                break
1821            blockindex += 1
1822            try:
1823                escaped += block[blockindex]
1824            except IndexError:
1825                raise MarkupError('malformed hyperlink target.', lineno)
1826        del block[:blockindex]
1827        block[0] = (block[0] + ' ')[targetmatch.end()-len(escaped)-1:].strip()
1828        target = self.make_target(block, blocktext, lineno,
1829                                  targetmatch.group('name'))
1830        return [target], blank_finish
1831
1832    def make_target(self, block, block_text, lineno, target_name):
1833        target_type, data = self.parse_target(block, block_text, lineno)
1834        if target_type == 'refname':
1835            target = nodes.target(block_text, '', refname=normalize_name(data))
1836            target.indirect_reference_name = data
1837            self.add_target(target_name, '', target, lineno)
1838            self.document.note_indirect_target(target)
1839            return target
1840        elif target_type == 'refuri':
1841            target = nodes.target(block_text, '')
1842            self.add_target(target_name, data, target, lineno)
1843            return target
1844        else:
1845            return data
1846
1847    def parse_target(self, block, block_text, lineno):
1848        """
1849        Determine the type of reference of a target.
1850
1851        :Return: A 2-tuple, one of:
1852
1853            - 'refname' and the indirect reference name
1854            - 'refuri' and the URI
1855            - 'malformed' and a system_message node
1856        """
1857        if block and block[-1].strip()[-1:] == '_': # possible indirect target
1858            reference = ' '.join([line.strip() for line in block])
1859            refname = self.is_reference(reference)
1860            if refname:
1861                return 'refname', refname
1862        reference = ''.join([''.join(line.split()) for line in block])
1863        return 'refuri', unescape(reference)
1864
1865    def is_reference(self, reference):
1866        match = self.explicit.patterns.reference.match(
1867            whitespace_normalize_name(reference))
1868        if not match:
1869            return None
1870        return unescape(match.group('simple') or match.group('phrase'))
1871
1872    def add_target(self, targetname, refuri, target, lineno):
1873        target.line = lineno
1874        if targetname:
1875            name = normalize_name(unescape(targetname))
1876            target['names'].append(name)
1877            if refuri:
1878                uri = self.inliner.adjust_uri(refuri)
1879                if uri:
1880                    target['refuri'] = uri
1881                else:
1882                    raise ApplicationError('problem with URI: %r' % refuri)
1883            self.document.note_explicit_target(target, self.parent)
1884        else:                       # anonymous target
1885            if refuri:
1886                target['refuri'] = refuri
1887            target['anonymous'] = 1
1888            self.document.note_anonymous_target(target)
1889
1890    def substitution_def(self, match):
1891        pattern = self.explicit.patterns.substitution
1892        lineno = self.state_machine.abs_line_number()
1893        block, indent, offset, blank_finish = \
1894              self.state_machine.get_first_known_indented(match.end(),
1895                                                          strip_indent=0)
1896        blocktext = (match.string[:match.end()] + '\n'.join(block))
1897        block.disconnect()
1898        escaped = escape2null(block[0].rstrip())
1899        blockindex = 0
1900        while 1:
1901            subdefmatch = pattern.match(escaped)
1902            if subdefmatch:
1903                break
1904            blockindex += 1
1905            try:
1906                escaped = escaped + ' ' + escape2null(block[blockindex].strip())
1907            except IndexError:
1908                raise MarkupError('malformed substitution definition.',
1909                                  lineno)
1910        del block[:blockindex]          # strip out the substitution marker
1911        block[0] = (block[0].strip() + ' ')[subdefmatch.end()-len(escaped)-1:-1]
1912        if not block[0]:
1913            del block[0]
1914            offset += 1
1915        while block and not block[-1].strip():
1916            block.pop()
1917        subname = subdefmatch.group('name')
1918        substitution_node = nodes.substitution_definition(blocktext)
1919        substitution_node.line = lineno
1920        if not block:
1921            msg = self.reporter.warning(
1922                'Substitution definition "%s" missing contents.' % subname,
1923                nodes.literal_block(blocktext, blocktext), line=lineno)
1924            return [msg], blank_finish
1925        block[0] = block[0].strip()
1926        substitution_node['names'].append(
1927            nodes.whitespace_normalize_name(subname))
1928        new_abs_offset, blank_finish = self.nested_list_parse(
1929              block, input_offset=offset, node=substitution_node,
1930              initial_state='SubstitutionDef', blank_finish=blank_finish)
1931        i = 0
1932        for node in substitution_node[:]:
1933            if not (isinstance(node, nodes.Inline) or
1934                    isinstance(node, nodes.Text)):
1935                self.parent += substitution_node[i]
1936                del substitution_node[i]
1937            else:
1938                i += 1
1939        for node in substitution_node.traverse(nodes.Element):
1940            if self.disallowed_inside_substitution_definitions(node):
1941                pformat = nodes.literal_block('', node.pformat().rstrip())
1942                msg = self.reporter.error(
1943                    'Substitution definition contains illegal element:',
1944                    pformat, nodes.literal_block(blocktext, blocktext),
1945                    line=lineno)
1946                return [msg], blank_finish
1947        if len(substitution_node) == 0:
1948            msg = self.reporter.warning(
1949                  'Substitution definition "%s" empty or invalid.'
1950                  % subname,
1951                  nodes.literal_block(blocktext, blocktext), line=lineno)
1952            return [msg], blank_finish
1953        self.document.note_substitution_def(
1954            substitution_node, subname, self.parent)
1955        return [substitution_node], blank_finish
1956
1957    def disallowed_inside_substitution_definitions(self, node):
1958        if (node['ids'] or
1959            isinstance(node, nodes.reference) and node.get('anonymous') or
1960            isinstance(node, nodes.footnote_reference) and node.get('auto')):
1961            return 1
1962        else:
1963            return 0
1964
1965    def directive(self, match, **option_presets):
1966        """Returns a 2-tuple: list of nodes, and a "blank finish" boolean."""
1967        type_name = match.group(1)
1968        directive_function, messages = directives.directive(
1969            type_name, self.memo.language, self.document)
1970        self.parent += messages
1971        if directive_function:
1972            return self.run_directive(
1973                directive_function, match, type_name, option_presets)
1974        else:
1975            return self.unknown_directive(type_name)
1976
1977    def run_directive(self, directive_fn, match, type_name, option_presets):
1978        """
1979        Parse a directive then run its directive function.
1980
1981        Parameters:
1982
1983        - `directive_fn`: The function implementing the directive.  Uses
1984          function attributes ``arguments``, ``options``, and/or ``content``
1985          if present.
1986
1987        - `match`: A regular expression match object which matched the first
1988          line of the directive.
1989
1990        - `type_name`: The directive name, as used in the source text.
1991
1992        - `option_presets`: A dictionary of preset options, defaults for the
1993          directive options.  Currently, only an "alt" option is passed by
1994          substitution definitions (value: the substitution name), which may
1995          be used by an embedded image directive.
1996
1997        Returns a 2-tuple: list of nodes, and a "blank finish" boolean.
1998        """
1999        lineno = self.state_machine.abs_line_number()
2000        initial_line_offset = self.state_machine.line_offset
2001        indented, indent, line_offset, blank_finish \
2002                  = self.state_machine.get_first_known_indented(match.end(),
2003                                                                strip_top=0)
2004        block_text = '\n'.join(self.state_machine.input_lines[
2005            initial_line_offset : self.state_machine.line_offset + 1])
2006        try:
2007            arguments, options, content, content_offset = (
2008                self.parse_directive_block(indented, line_offset,
2009                                           directive_fn, option_presets))
2010        except MarkupError, detail:
2011            error = self.reporter.error(
2012                'Error in "%s" directive:\n%s.' % (type_name,
2013                                                   ' '.join(detail.args)),
2014                nodes.literal_block(block_text, block_text), line=lineno)
2015            return [error], blank_finish
2016        result = directive_fn(type_name, arguments, options, content, lineno,
2017                              content_offset, block_text, self,
2018                              self.state_machine)
2019        return (result,
2020                blank_finish or self.state_machine.is_next_line_blank())
2021
2022    def parse_directive_block(self, indented, line_offset, directive_fn,
2023                              option_presets):
2024        arguments = []
2025        options = {}
2026        argument_spec = getattr(directive_fn, 'arguments', None)
2027        if argument_spec and argument_spec[:2] == (0, 0):
2028            argument_spec = None
2029        option_spec = getattr(directive_fn, 'options', None)
2030        content_spec = getattr(directive_fn, 'content', None)
2031        if indented and not indented[0].strip():
2032            indented.trim_start()
2033            line_offset += 1
2034        while indented and not indented[-1].strip():
2035            indented.trim_end()
2036        if indented and (argument_spec or option_spec):
2037            for i in range(len(indented)):
2038                if not indented[i].strip():
2039                    break
2040            else:
2041                i += 1
2042            arg_block = indented[:i]
2043            content = indented[i+1:]
2044            content_offset = line_offset + i + 1
2045        else:
2046            content = indented
2047            content_offset = line_offset
2048            arg_block = []
2049        while content and not content[0].strip():
2050            content.trim_start()
2051            content_offset += 1
2052        if option_spec:
2053            options, arg_block = self.parse_directive_options(
2054                option_presets, option_spec, arg_block)
2055            if arg_block and not argument_spec:
2056                raise MarkupError('no arguments permitted; blank line '
2057                                  'required before content block')
2058        if argument_spec:
2059            arguments = self.parse_directive_arguments(
2060                argument_spec, arg_block)
2061        if content and not content_spec:
2062            raise MarkupError('no content permitted')
2063        return (arguments, options, content, content_offset)
2064
2065    def parse_directive_options(self, option_presets, option_spec, arg_block):
2066        options = option_presets.copy()
2067        for i in range(len(arg_block)):
2068            if arg_block[i][:1] == ':':
2069                opt_block = arg_block[i:]
2070                arg_block = arg_block[:i]
2071                break
2072        else:
2073            opt_block = []
2074        if opt_block:
2075            success, data = self.parse_extension_options(option_spec,
2076                                                         opt_block)
2077            if success:                 # data is a dict of options
2078                options.update(data)
2079            else:                       # data is an error string
2080                raise MarkupError(data)
2081        return options, arg_block
2082
2083    def parse_directive_arguments(self, argument_spec, arg_block):
2084        required, optional, last_whitespace = argument_spec
2085        arg_text = '\n'.join(arg_block)
2086        arguments = arg_text.split()
2087        if len(arguments) < required:
2088            raise MarkupError('%s argument(s) required, %s supplied'
2089                              % (required, len(arguments)))
2090        elif len(arguments) > required + optional:
2091            if last_whitespace:
2092                arguments = arg_text.split(None, required + optional - 1)
2093            else:
2094                raise MarkupError(
2095                    'maximum %s argument(s) allowed, %s supplied'
2096                    % (required + optional, len(arguments)))
2097        return arguments
2098
2099    def parse_extension_options(self, option_spec, datalines):
2100        """
2101        Parse `datalines` for a field list containing extension options
2102        matching `option_spec`.
2103
2104        :Parameters:
2105            - `option_spec`: a mapping of option name to conversion
2106              function, which should raise an exception on bad input.
2107            - `datalines`: a list of input strings.
2108
2109        :Return:
2110            - Success value, 1 or 0.
2111            - An option dictionary on success, an error string on failure.
2112        """
2113        node = nodes.field_list()
2114        newline_offset, blank_finish = self.nested_list_parse(
2115              datalines, 0, node, initial_state='ExtensionOptions',
2116              blank_finish=1)
2117        if newline_offset != len(datalines): # incomplete parse of block
2118            return 0, 'invalid option block'
2119        try:
2120            options = utils.extract_extension_options(node, option_spec)
2121        except KeyError, detail:
2122            return 0, ('unknown option: "%s"' % detail.args[0])
2123        except (ValueError, TypeError), detail:
2124            return 0, ('invalid option value: %s' % ' '.join(detail.args))
2125        except utils.ExtensionOptionError, detail:
2126            return 0, ('invalid option data: %s' % ' '.join(detail.args))
2127        if blank_finish:
2128            return 1, options
2129        else:
2130            return 0, 'option data incompletely parsed'
2131
2132    def unknown_directive(self, type_name):
2133        lineno = self.state_machine.abs_line_number()
2134        indented, indent, offset, blank_finish = \
2135              self.state_machine.get_first_known_indented(0, strip_indent=0)
2136        text = '\n'.join(indented)
2137        error = self.reporter.error(
2138              'Unknown directive type "%s".' % type_name,
2139              nodes.literal_block(text, text), line=lineno)
2140        return [error], blank_finish
2141
2142    def comment(self, match):
2143        if not match.string[match.end():].strip() \
2144              and self.state_machine.is_next_line_blank(): # an empty comment?
2145            return [nodes.comment()], 1 # "A tiny but practical wart."
2146        indented, indent, offset, blank_finish = \
2147              self.state_machine.get_first_known_indented(match.end())
2148        while indented and not indented[-1].strip():
2149            indented.trim_end()
2150        text = '\n'.join(indented)
2151        return [nodes.comment(text, text)], blank_finish
2152
2153    explicit.constructs = [
2154          (footnote,
2155           re.compile(r"""
2156                      \.\.[ ]+          # explicit markup start
2157                      \[
2158                      (                 # footnote label:
2159                          [0-9]+          # manually numbered footnote
2160                        |               # *OR*
2161                          \#              # anonymous auto-numbered footnote
2162                        |               # *OR*
2163                          \#%s            # auto-number ed?) footnote label
2164                        |               # *OR*
2165                          \*              # auto-symbol footnote
2166                      )
2167                      \]
2168                      ([ ]+|$)          # whitespace or end of line
2169                      """ % Inliner.simplename, re.VERBOSE | re.UNICODE)),
2170          (citation,
2171           re.compile(r"""
2172                      \.\.[ ]+          # explicit markup start
2173                      \[(%s)\]          # citation label
2174                      ([ ]+|$)          # whitespace or end of line
2175                      """ % Inliner.simplename, re.VERBOSE | re.UNICODE)),
2176          (hyperlink_target,
2177           re.compile(r"""
2178                      \.\.[ ]+          # explicit markup start
2179                      _                 # target indicator
2180                      (?![ ]|$)         # first char. not space or EOL
2181                      """, re.VERBOSE)),
2182          (substitution_def,
2183           re.compile(r"""
2184                      \.\.[ ]+          # explicit markup start
2185                      \|                # substitution indicator
2186                      (?![ ]|$)         # first char. not space or EOL
2187                      """, re.VERBOSE)),
2188          (directive,
2189           re.compile(r"""
2190                      \.\.[ ]+          # explicit markup start
2191                      (%s)              # directive name
2192                      [ ]?              # optional space
2193                      ::                # directive delimiter
2194                      ([ ]+|$)          # whitespace or end of line
2195                      """ % Inliner.simplename, re.VERBOSE | re.UNICODE))]
2196
2197    def explicit_markup(self, match, context, next_state):
2198        """Footnotes, hyperlink targets, directives, comments."""
2199        nodelist, blank_finish = self.explicit_construct(match)
2200        self.parent += nodelist
2201        self.explicit_list(blank_finish)
2202        return [], next_state, []
2203
2204    def explicit_construct(self, match):
2205        """Determine which explicit construct this is, parse & return it."""
2206        errors = []
2207        for method, pattern in self.explicit.constructs:
2208            expmatch = pattern.match(match.string)
2209            if expmatch:
2210                try:
2211                    return method(self, expmatch)
2212                except MarkupError, (message, lineno): # never reached?
2213                    errors.append(self.reporter.warning(message, line=lineno))
2214                    break
2215        nodelist, blank_finish = self.comment(match)
2216        return nodelist + errors, blank_finish
2217
2218    def explicit_list(self, blank_finish):
2219        """
2220        Create a nested state machine for a series of explicit markup
2221        constructs (including anonymous hyperlink targets).
2222        """
2223        offset = self.state_machine.line_offset + 1   # next line
2224        newline_offset, blank_finish = self.nested_list_parse(
2225              self.state_machine.input_lines[offset:],
2226              input_offset=self.state_machine.abs_line_offset() + 1,
2227              node=self.parent, initial_state='Explicit',
2228              blank_finish=blank_finish,
2229              match_titles=self.state_machine.match_titles)
2230        self.goto_line(newline_offset)
2231        if not blank_finish:
2232            self.parent += self.unindent_warning('Explicit markup')
2233
2234    def anonymous(self, match, context, next_state):
2235        """Anonymous hyperlink targets."""
2236        nodelist, blank_finish = self.anonymous_target(match)
2237        self.parent += nodelist
2238        self.explicit_list(blank_finish)
2239        return [], next_state, []
2240
2241    def anonymous_target(self, match):
2242        lineno = self.state_machine.abs_line_number()
2243        block, indent, offset, blank_finish \
2244              = self.state_machine.get_first_known_indented(match.end(),
2245                                                            until_blank=1)
2246        blocktext = match.string[:match.end()] + '\n'.join(block)
2247        block = [escape2null(line) for line in block]
2248        target = self.make_target(block, blocktext, lineno, '')
2249        return [target], blank_finish
2250
2251    def line(self, match, context, next_state):
2252        """Section title overline or transition marker."""
2253        if self.state_machine.match_titles:
2254            return [match.string], 'Line', []
2255        elif match.string.strip() == '::':
2256            raise statemachine.TransitionCorrection('text')
2257        elif len(match.string.strip()) < 4:
2258            msg = self.reporter.info(
2259                'Unexpected possible title overline or transition.\n'
2260                "Treating it as ordinary text because it's so short.",
2261                line=self.state_machine.abs_line_number())
2262            self.parent += msg
2263            raise statemachine.TransitionCorrection('text')
2264        else:
2265            blocktext = self.state_machine.line
2266            msg = self.reporter.severe(
2267                  'Unexpected section title or transition.',
2268                  nodes.literal_block(blocktext, blocktext),
2269                  line=self.state_machine.abs_line_number())
2270            self.parent += msg
2271            return [], next_state, []
2272
2273    def text(self, match, context, next_state):
2274        """Titles, definition lists, paragraphs."""
2275        return [match.string], 'Text', []
2276
2277
2278class RFC2822Body(Body):
2279
2280    """
2281    RFC2822 headers are only valid as the first constructs in documents.  As
2282    soon as anything else appears, the `Body` state should take over.
2283    """
2284
2285    patterns = Body.patterns.copy()     # can't modify the original
2286    patterns['rfc2822'] = r'[!-9;-~]+:( +|$)'
2287    initial_transitions = [(name, 'Body')
2288                           for name in Body.initial_transitions]
2289    initial_transitions.insert(-1, ('rfc2822', 'Body')) # just before 'text'
2290
2291    def rfc2822(self, match, context, next_state):
2292        """RFC2822-style field list item."""
2293        fieldlist = nodes.field_list(classes=['rfc2822'])
2294        self.parent += fieldlist
2295        field, blank_finish = self.rfc2822_field(match)
2296        fieldlist += field
2297        offset = self.state_machine.line_offset + 1   # next line
2298        newline_offset, blank_finish = self.nested_list_parse(
2299              self.state_machine.input_lines[offset:],
2300              input_offset=self.state_machine.abs_line_offset() + 1,
2301              node=fieldlist, initial_state='RFC2822List',
2302              blank_finish=blank_finish)
2303        self.goto_line(newline_offset)
2304        if not blank_finish:
2305            self.parent += self.unindent_warning(
2306                  'RFC2822-style field list')
2307        return [], next_state, []
2308
2309    def rfc2822_field(self, match):
2310        name = match.string[:match.string.find(':')]
2311        indented, indent, line_offset, blank_finish = \
2312              self.state_machine.get_first_known_indented(match.end(),
2313                                                          until_blank=1)
2314        fieldnode = nodes.field()
2315        fieldnode += nodes.field_name(name, name)
2316        fieldbody = nodes.field_body('\n'.join(indented))
2317        fieldnode += fieldbody
2318        if indented:
2319            self.nested_parse(indented, input_offset=line_offset,
2320                              node=fieldbody)
2321        return fieldnode, blank_finish
2322
2323
2324class SpecializedBody(Body):
2325
2326    """
2327    Superclass for second and subsequent compound element members.  Compound
2328    elements are lists and list-like constructs.
2329
2330    All transition methods are disabled (redefined as `invalid_input`).
2331    Override individual methods in subclasses to re-enable.
2332
2333    For example, once an initial bullet list item, say, is recognized, the
2334    `BulletList` subclass takes over, with a "bullet_list" node as its
2335    container.  Upon encountering the initial bullet list item, `Body.bullet`
2336    calls its ``self.nested_list_parse`` (`RSTState.nested_list_parse`), which
2337    starts up a nested parsing session with `BulletList` as the initial state.
2338    Only the ``bullet`` transition method is enabled in `BulletList`; as long
2339    as only bullet list items are encountered, they are parsed and inserted
2340    into the container.  The first construct which is *not* a bullet list item
2341    triggers the `invalid_input` method, which ends the nested parse and
2342    closes the container.  `BulletList` needs to recognize input that is
2343    invalid in the context of a bullet list, which means everything *other
2344    than* bullet list items, so it inherits the transition list created in
2345    `Body`.
2346    """
2347
2348    def invalid_input(self, match=None, context=None, next_state=None):
2349        """Not a compound element member. Abort this state machine."""
2350        self.state_machine.previous_line() # back up so parent SM can reassess
2351        raise EOFError
2352
2353    indent = invalid_input
2354    bullet = invalid_input
2355    enumerator = invalid_input
2356    field_marker = invalid_input
2357    option_marker = invalid_input
2358    doctest = invalid_input
2359    line_block = invalid_input
2360    grid_table_top = invalid_input
2361    simple_table_top = invalid_input
2362    explicit_markup = invalid_input
2363    anonymous = invalid_input
2364    line = invalid_input
2365    text = invalid_input
2366
2367
2368class BulletList(SpecializedBody):
2369
2370    """Second and subsequent bullet_list list_items."""
2371
2372    def bullet(self, match, context, next_state):
2373        """Bullet list item."""
2374        if match.string[0] != self.parent['bullet']:
2375            # different bullet: new list
2376            self.invalid_input()
2377        listitem, blank_finish = self.list_item(match.end())
2378        self.parent += listitem
2379        self.blank_finish = blank_finish
2380        return [], next_state, []
2381
2382
2383class DefinitionList(SpecializedBody):
2384
2385    """Second and subsequent definition_list_items."""
2386
2387    def text(self, match, context, next_state):
2388        """Definition lists."""
2389        return [match.string], 'Definition', []
2390
2391
2392class EnumeratedList(SpecializedBody):
2393
2394    """Second and subsequent enumerated_list list_items."""
2395
2396    def enumerator(self, match, context, next_state):
2397        """Enumerated list item."""
2398        format, sequence, text, ordinal = self.parse_enumerator(
2399              match, self.parent['enumtype'])
2400        if ( format != self.format
2401             or (sequence != '#' and (sequence != self.parent['enumtype']
2402                                      or self.auto
2403                                      or ordinal != (self.lastordinal + 1)))
2404             or not self.is_enumerated_list_item(ordinal, sequence, format)):
2405            # different enumeration: new list
2406            self.invalid_input()
2407        if sequence == '#':
2408            self.auto = 1
2409        listitem, blank_finish = self.list_item(match.end())
2410        self.parent += listitem
2411        self.blank_finish = blank_finish
2412        self.lastordinal = ordinal
2413        return [], next_state, []
2414
2415
2416class FieldList(SpecializedBody):
2417
2418    """Second and subsequent field_list fields."""
2419
2420    def field_marker(self, match, context, next_state):
2421        """Field list field."""
2422        field, blank_finish = self.field(match)
2423        self.parent += field
2424        self.blank_finish = blank_finish
2425        return [], next_state, []
2426
2427
2428class OptionList(SpecializedBody):
2429
2430    """Second and subsequent option_list option_list_items."""
2431
2432    def option_marker(self, match, context, next_state):
2433        """Option list item."""
2434        try:
2435            option_list_item, blank_finish = self.option_list_item(match)
2436        except MarkupError, (message, lineno):
2437            self.invalid_input()
2438        self.parent += option_list_item
2439        self.blank_finish = blank_finish
2440        return [], next_state, []
2441
2442
2443class RFC2822List(SpecializedBody, RFC2822Body):
2444
2445    """Second and subsequent RFC2822-style field_list fields."""
2446
2447    patterns = RFC2822Body.patterns
2448    initial_transitions = RFC2822Body.initial_transitions
2449
2450    def rfc2822(self, match, context, next_state):
2451        """RFC2822-style field list item."""
2452        field, blank_finish = self.rfc2822_field(match)
2453        self.parent += field
2454        self.blank_finish = blank_finish
2455        return [], 'RFC2822List', []
2456
2457    blank = SpecializedBody.invalid_input
2458
2459
2460class ExtensionOptions(FieldList):
2461
2462    """
2463    Parse field_list fields for extension options.
2464
2465    No nested parsing is done (including inline markup parsing).
2466    """
2467
2468    def parse_field_body(self, indented, offset, node):
2469        """Override `Body.parse_field_body` for simpler parsing."""
2470        lines = []
2471        for line in list(indented) + ['']:
2472            if line.strip():
2473                lines.append(line)
2474            elif lines:
2475                text = '\n'.join(lines)
2476                node += nodes.paragraph(text, text)
2477                lines = []
2478
2479
2480class LineBlock(SpecializedBody):
2481
2482    """Second and subsequent lines of a line_block."""
2483
2484    blank = SpecializedBody.invalid_input
2485
2486    def line_block(self, match, context, next_state):
2487        """New line of line block."""
2488        lineno = self.state_machine.abs_line_number()
2489        line, messages, blank_finish = self.line_block_line(match, lineno)
2490        self.parent += line
2491        self.parent.parent += messages
2492        self.blank_finish = blank_finish
2493        return [], next_state, []
2494
2495
2496class Explicit(SpecializedBody):
2497
2498    """Second and subsequent explicit markup construct."""
2499
2500    def explicit_markup(self, match, context, next_state):
2501        """Footnotes, hyperlink targets, directives, comments."""
2502        nodelist, blank_finish = self.explicit_construct(match)
2503        self.parent += nodelist
2504        self.blank_finish = blank_finish
2505        return [], next_state, []
2506
2507    def anonymous(self, match, context, next_state):
2508        """Anonymous hyperlink targets."""
2509        nodelist, blank_finish = self.anonymous_target(match)
2510        self.parent += nodelist
2511        self.blank_finish = blank_finish
2512        return [], next_state, []
2513
2514    blank = SpecializedBody.invalid_input
2515
2516
2517class SubstitutionDef(Body):
2518
2519    """
2520    Parser for the contents of a substitution_definition element.
2521    """
2522
2523    patterns = {
2524          'embedded_directive': re.compile(r'(%s)::( +|$)'
2525                                           % Inliner.simplename, re.UNICODE),
2526          'text': r''}
2527    initial_transitions = ['embedded_directive', 'text']
2528
2529    def embedded_directive(self, match, context, next_state):
2530        nodelist, blank_finish = self.directive(match,
2531                                                alt=self.parent['names'][0])
2532        self.parent += nodelist
2533        if not self.state_machine.at_eof():
2534            self.blank_finish = blank_finish
2535        raise EOFError
2536
2537    def text(self, match, context, next_state):
2538        if not self.state_machine.at_eof():
2539            self.blank_finish = self.state_machine.is_next_line_blank()
2540        raise EOFError
2541
2542
2543class Text(RSTState):
2544
2545    """
2546    Classifier of second line of a text block.
2547
2548    Could be a paragraph, a definition list item, or a title.
2549    """
2550
2551    patterns = {'underline': Body.patterns['line'],
2552                'text': r''}
2553    initial_transitions = [('underline', 'Body'), ('text', 'Body')]
2554
2555    def blank(self, match, context, next_state):
2556        """End of paragraph."""
2557        paragraph, literalnext = self.paragraph(
2558              context, self.state_machine.abs_line_number() - 1)
2559        self.parent += paragraph
2560        if literalnext:
2561            self.parent += self.literal_block()
2562        return [], 'Body', []
2563
2564    def eof(self, context):
2565        if context:
2566            self.blank(None, context, None)
2567        return []
2568
2569    def indent(self, match, context, next_state):
2570        """Definition list item."""
2571        definitionlist = nodes.definition_list()
2572        definitionlistitem, blank_finish = self.definition_list_item(context)
2573        definitionlist += definitionlistitem
2574        self.parent += definitionlist
2575        offset = self.state_machine.line_offset + 1   # next line
2576        newline_offset, blank_finish = self.nested_list_parse(
2577              self.state_machine.input_lines[offset:],
2578              input_offset=self.state_machine.abs_line_offset() + 1,
2579              node=definitionlist, initial_state='DefinitionList',
2580              blank_finish=blank_finish, blank_finish_state='Definition')
2581        self.goto_line(newline_offset)
2582        if not blank_finish:
2583            self.parent += self.unindent_warning('Definition list')
2584        return [], 'Body', []
2585
2586    def underline(self, match, context, next_state):
2587        """Section title."""
2588        lineno = self.state_machine.abs_line_number()
2589        title = context[0].rstrip()
2590        underline = match.string.rstrip()
2591        source = title + '\n' + underline
2592        messages = []
2593        if column_width(title) > len(underline):
2594            if len(underline) < 4:
2595                if self.state_machine.match_titles:
2596                    msg = self.reporter.info(
2597                        'Possible title underline, too short for the title.\n'
2598                        "Treating it as ordinary text because it's so short.",
2599                        line=lineno)
2600                    self.parent += msg
2601                raise statemachine.TransitionCorrection('text')
2602            else:
2603                blocktext = context[0] + '\n' + self.state_machine.line
2604                msg = self.reporter.warning(
2605                    'Title underline too short.',
2606                    nodes.literal_block(blocktext, blocktext), line=lineno)
2607                messages.append(msg)
2608        if not self.state_machine.match_titles:
2609            blocktext = context[0] + '\n' + self.state_machine.line
2610            msg = self.reporter.severe(
2611                'Unexpected section title.',
2612                nodes.literal_block(blocktext, blocktext), line=lineno)
2613            self.parent += messages
2614            self.parent += msg
2615            return [], next_state, []
2616        style = underline[0]
2617        context[:] = []
2618        self.section(title, source, style, lineno - 1, messages)
2619        return [], next_state, []
2620
2621    def text(self, match, context, next_state):
2622        """Paragraph."""
2623        startline = self.state_machine.abs_line_number() - 1
2624        msg = None
2625        try:
2626            block = self.state_machine.get_text_block(flush_left=1)
2627        except statemachine.UnexpectedIndentationError, instance:
2628            block, source, lineno = instance.args
2629            msg = self.reporter.error('Unexpected indentation.',
2630                                      source=source, line=lineno)
2631        lines = context + list(block)
2632        paragraph, literalnext = self.paragraph(lines, startline)
2633        self.parent += paragraph
2634        self.parent += msg
2635        if literalnext:
2636            try:
2637                self.state_machine.next_line()
2638            except EOFError:
2639                pass
2640            self.parent += self.literal_block()
2641        return [], next_state, []
2642
2643    def literal_block(self):
2644        """Return a list of nodes."""
2645        indented, indent, offset, blank_finish = \
2646              self.state_machine.get_indented()
2647        while indented and not indented[-1].strip():
2648            indented.trim_end()
2649        if not indented:
2650            return self.quoted_literal_block()
2651        data = '\n'.join(indented)
2652        literal_block = nodes.literal_block(data, data)
2653        literal_block.line = offset + 1
2654        nodelist = [literal_block]
2655        if not blank_finish:
2656            nodelist.append(self.unindent_warning('Literal block'))
2657        return nodelist
2658
2659    def quoted_literal_block(self):
2660        abs_line_offset = self.state_machine.abs_line_offset()
2661        offset = self.state_machine.line_offset
2662        parent_node = nodes.Element()
2663        new_abs_offset = self.nested_parse(
2664            self.state_machine.input_lines[offset:],
2665            input_offset=abs_line_offset, node=parent_node, match_titles=0,
2666            state_machine_kwargs={'state_classes': (QuotedLiteralBlock,),
2667                                  'initial_state': 'QuotedLiteralBlock'})
2668        self.goto_line(new_abs_offset)
2669        return parent_node.children
2670
2671    def definition_list_item(self, termline):
2672        indented, indent, line_offset, blank_finish = \
2673              self.state_machine.get_indented()
2674        definitionlistitem = nodes.definition_list_item(
2675            '\n'.join(termline + list(indented)))
2676        lineno = self.state_machine.abs_line_number() - 1
2677        definitionlistitem.line = lineno
2678        termlist, messages = self.term(termline, lineno)
2679        definitionlistitem += termlist
2680        definition = nodes.definition('', *messages)
2681        definitionlistitem += definition
2682        if termline[0][-2:] == '::':
2683            definition += self.reporter.info(
2684                  'Blank line missing before literal block (after the "::")? '
2685                  'Interpreted as a definition list item.', line=line_offset+1)
2686        self.nested_parse(indented, input_offset=line_offset, node=definition)
2687        return definitionlistitem, blank_finish
2688
2689    classifier_delimiter = re.compile(' +: +')
2690
2691    def term(self, lines, lineno):
2692        """Return a definition_list's term and optional classifiers."""
2693        assert len(lines) == 1
2694        text_nodes, messages = self.inline_text(lines[0], lineno)
2695        term_node = nodes.term()
2696        node_list = [term_node]
2697        for i in range(len(text_nodes)):
2698            node = text_nodes[i]
2699            if isinstance(node, nodes.Text):
2700                parts = self.classifier_delimiter.split(node.rawsource)
2701                if len(parts) == 1:
2702                    node_list[-1] += node
2703                else:
2704                   
2705                    node_list[-1] += nodes.Text(parts[0].rstrip())
2706                    for part in parts[1:]:
2707                        classifier_node = nodes.classifier('', part)
2708                        node_list.append(classifier_node)
2709            else:
2710                node_list[-1] += node
2711        return node_list, messages
2712
2713
2714class SpecializedText(Text):
2715
2716    """
2717    Superclass for second and subsequent lines of Text-variants.
2718
2719    All transition methods are disabled. Override individual methods in
2720    subclasses to re-enable.
2721    """
2722
2723    def eof(self, context):
2724        """Incomplete construct."""
2725        return []
2726
2727    def invalid_input(self, match=None, context=None, next_state=None):
2728        """Not a compound element member. Abort this state machine."""
2729        raise EOFError
2730
2731    blank = invalid_input
2732    indent = invalid_input
2733    underline = invalid_input
2734    text = invalid_input
2735
2736
2737class Definition(SpecializedText):
2738
2739    """Second line of potential definition_list_item."""
2740
2741    def eof(self, context):
2742        """Not a definition."""
2743        self.state_machine.previous_line(2) # so parent SM can reassess
2744        return []
2745
2746    def indent(self, match, context, next_state):
2747        """Definition list item."""
2748        definitionlistitem, blank_finish = self.definition_list_item(context)
2749        self.parent += definitionlistitem
2750        self.blank_finish = blank_finish
2751        return [], 'DefinitionList', []
2752
2753
2754class Line(SpecializedText):
2755
2756    """
2757    Second line of over- & underlined section title or transition marker.
2758    """
2759
2760    eofcheck = 1                        # @@@ ???
2761    """Set to 0 while parsing sections, so that we don't catch the EOF."""
2762
2763    def eof(self, context):
2764        """Transition marker at end of section or document."""
2765        marker = context[0].strip()
2766        if self.memo.section_bubble_up_kludge:
2767            self.memo.section_bubble_up_kludge = 0
2768        elif len(marker) < 4:
2769            self.state_correction(context)
2770        if self.eofcheck:               # ignore EOFError with sections
2771            lineno = self.state_machine.abs_line_number() - 1
2772            transition = nodes.transition(rawsource=context[0])
2773            transition.line = lineno
2774            self.parent += transition
2775        self.eofcheck = 1
2776        return []
2777
2778    def blank(self, match, context, next_state):
2779        """Transition marker."""
2780        lineno = self.state_machine.abs_line_number() - 1
2781        marker = context[0].strip()
2782        if len(marker) < 4:
2783            self.state_correction(context)
2784        transition = nodes.transition(rawsource=marker)
2785        transition.line = lineno
2786        self.parent += transition
2787        return [], 'Body', []
2788
2789    def text(self, match, context, next_state):
2790        """Potential over- & underlined title."""
2791        lineno = self.state_machine.abs_line_number() - 1
2792        overline = context[0]
2793        title = match.string
2794        underline = ''
2795        try:
2796            underline = self.state_machine.next_line()
2797        except EOFError:
2798            blocktext = overline + '\n' + title
2799            if len(overline.rstrip()) < 4:
2800                self.short_overline(context, blocktext, lineno, 2)
2801            else:
2802                msg = self.reporter.severe(
2803                    'Incomplete section title.',
2804                    nodes.literal_block(blocktext, blocktext), line=lineno)
2805                self.parent += msg
2806                return [], 'Body', []
2807        source = '%s\n%s\n%s' % (overline, title, underline)
2808        overline = overline.rstrip()
2809        underline = underline.rstrip()
2810        if not self.transitions['underline'][0].match(underline):
2811            blocktext = overline + '\n' + title + '\n' + underline
2812            if len(overline.rstrip()) < 4:
2813                self.short_overline(context, blocktext, lineno, 2)
2814            else:
2815                msg = self.reporter.severe(
2816                    'Missing matching underline for section title overline.',
2817                    nodes.literal_block(source, source), line=lineno)
2818                self.parent += msg
2819                return [], 'Body', []
2820        elif overline != underline:
2821            blocktext = overline + '\n' + title + '\n' + underline
2822            if len(overline.rstrip()) < 4:
2823                self.short_overline(context, blocktext, lineno, 2)
2824            else:
2825                msg = self.reporter.severe(
2826                      'Title overline & underline mismatch.',
2827                      nodes.literal_block(source, source), line=lineno)
2828                self.parent += msg
2829                return [], 'Body', []
2830        title = title.rstrip()
2831        messages = []
2832        if column_width(title) > len(overline):
2833            blocktext = overline + '\n' + title + '\n' + underline
2834            if len(overline.rstrip()) < 4:
2835                self.short_overline(context, blocktext, lineno, 2)
2836            else:
2837                msg = self.reporter.warning(
2838                      'Title overline too short.',
2839                      nodes.literal_block(source, source), line=lineno)
2840                messages.append(msg)
2841        style = (overline[0], underline[0])
2842        self.eofcheck = 0               # @@@ not sure this is correct
2843        self.section(title.lstrip(), source, style, lineno + 1, messages)
2844        self.eofcheck = 1
2845        return [], 'Body', []
2846
2847    indent = text                       # indented title
2848
2849    def underline(self, match, context, next_state):
2850        overline = context[0]
2851        blocktext = overline + '\n' + self.state_machine.line
2852        lineno = self.state_machine.abs_line_number() - 1
2853        if len(overline.rstrip()) < 4:
2854            self.short_overline(context, blocktext, lineno, 1)
2855        msg = self.reporter.error(
2856              'Invalid section title or transition marker.',
2857              nodes.literal_block(blocktext, blocktext), line=lineno)
2858        self.parent += msg
2859        return [], 'Body', []
2860
2861    def short_overline(self, context, blocktext, lineno, lines=1):
2862        msg = self.reporter.info(
2863            'Possible incomplete section title.\nTreating the overline as '
2864            "ordinary text because it's so short.", line=lineno)
2865        self.parent += msg
2866        self.state_correction(context, lines)
2867
2868    def state_correction(self, context, lines=1):
2869        self.state_machine.previous_line(lines)
2870        context[:] = []
2871        raise statemachine.StateCorrection('Body', 'text')
2872
2873
2874class QuotedLiteralBlock(RSTState):
2875
2876    """
2877    Nested parse handler for quoted (unindented) literal blocks.
2878
2879    Special-purpose.  Not for inclusion in `state_classes`.
2880    """
2881
2882    patterns = {'initial_quoted': r'(%(nonalphanum7bit)s)' % Body.pats,
2883                'text': r''}
2884    initial_transitions = ('initial_quoted', 'text')
2885
2886    def __init__(self, state_machine, debug=0):
2887        RSTState.__init__(self, state_machine, debug)
2888        self.messages = []
2889        self.initial_lineno = None
2890
2891    def blank(self, match, context, next_state):
2892        if context:
2893            raise EOFError
2894        else:
2895            return context, next_state, []
2896
2897    def eof(self, context):
2898        if context:
2899            text = '\n'.join(context)
2900            literal_block = nodes.literal_block(text, text)
2901            literal_block.line = self.initial_lineno
2902            self.parent += literal_block
2903        else:
2904            self.parent += self.reporter.warning(
2905                'Literal block expected; none found.',
2906                line=self.state_machine.abs_line_number())
2907            self.state_machine.previous_line()
2908        self.parent += self.messages
2909        return []
2910
2911    def indent(self, match, context, next_state):
2912        assert context, ('QuotedLiteralBlock.indent: context should not '
2913                         'be empty!')
2914        self.messages.append(
2915            self.reporter.error('Unexpected indentation.',
2916                                line=self.state_machine.abs_line_number()))
2917        self.state_machine.previous_line()
2918        raise EOFError
2919
2920    def initial_quoted(self, match, context, next_state):
2921        """Match arbitrary quote character on the first line only."""
2922        self.remove_transition('initial_quoted')
2923        quote = match.string[0]
2924        pattern = re.compile(re.escape(quote))
2925        # New transition matches consistent quotes only:
2926        self.add_transition('quoted',
2927                            (pattern, self.quoted, self.__class__.__name__))
2928        self.initial_lineno = self.state_machine.abs_line_number()
2929        return [match.string], next_state, []
2930
2931    def quoted(self, match, context, next_state):
2932        """Match consistent quotes on subsequent lines."""
2933        context.append(match.string)
2934        return context, next_state, []
2935
2936    def text(self, match, context, next_state):
2937        if context:
2938            self.messages.append(
2939                self.reporter.error('Inconsistent literal block quoting.',
2940                                    line=self.state_machine.abs_line_number()))
2941            self.state_machine.previous_line()
2942        raise EOFError
2943
2944
2945state_classes = (Body, BulletList, DefinitionList, EnumeratedList, FieldList,
2946                 OptionList, LineBlock, ExtensionOptions, Explicit, Text,
2947                 Definition, Line, SubstitutionDef, RFC2822Body, RFC2822List)
2948"""Standard set of State classes used to start `RSTStateMachine`."""
Note: リポジトリブラウザについてのヘルプは TracBrowser を参照してください。