Context Navigation

states.py

リビジョン 3, 121.1 KB (コミッタ: kohda, 14 年前)
Install Unix tools http://hannonlab.cshl.edu/galaxy_unix_tools/galaxy.html

行番号
1	# Author: David Goodger
2	# Contact: goodger@users.sourceforge.net
3	# Revision: $Revision: 4258 $
4	# Date: $Date: 2006-01-09 04:29:23 +0100 (Mon, 09 Jan 2006) $
5	# Copyright: This module has been placed in the public domain.
6
7	"""
8	This is the ``docutils.parsers.restructuredtext.states`` module, the core of
9	the reStructuredText parser. It defines the following:
10
11	:Classes:
12	- `RSTStateMachine`: reStructuredText parser's entry point.
13	- `NestedStateMachine`: recursive StateMachine.
14	- `RSTState`: reStructuredText State superclass.
15	- `Inliner`: For parsing inline markup.
16	- `Body`: Generic classifier of the first line of a block.
17	- `SpecializedBody`: Superclass for compound element members.
18	- `BulletList`: Second and subsequent bullet_list list_items
19	- `DefinitionList`: Second+ definition_list_items.
20	- `EnumeratedList`: Second+ enumerated_list list_items.
21	- `FieldList`: Second+ fields.
22	- `OptionList`: Second+ option_list_items.
23	- `RFC2822List`: Second+ RFC2822-style fields.
24	- `ExtensionOptions`: Parses directive option fields.
25	- `Explicit`: Second+ explicit markup constructs.
26	- `SubstitutionDef`: For embedded directives in substitution definitions.
27	- `Text`: Classifier of second line of a text block.
28	- `SpecializedText`: Superclass for continuation lines of Text-variants.
29	- `Definition`: Second line of potential definition_list_item.
30	- `Line`: Second line of overlined section title or transition marker.
31	- `Struct`: An auxiliary collection class.
32
33	:Exception classes:
34	- `MarkupError`
35	- `ParserError`
36	- `MarkupMismatch`
37
38	:Functions:
39	- `escape2null()`: Return a string, escape-backslashes converted to nulls.
40	- `unescape()`: Return a string, nulls removed or restored to backslashes.
41
42	:Attributes:
43	- `state_classes`: set of State classes used with `RSTStateMachine`.
44
45	Parser Overview
46	===============
47
48	The reStructuredText parser is implemented as a recursive state machine,
49	examining its input one line at a time. To understand how the parser works,
50	please first become familiar with the `docutils.statemachine` module. In the
51	description below, references are made to classes defined in this module;
52	please see the individual classes for details.
53
54	Parsing proceeds as follows:
55
56	1. The state machine examines each line of input, checking each of the
57	transition patterns of the state `Body`, in order, looking for a match.
58	The implicit transitions (blank lines and indentation) are checked before
59	any others. The 'text' transition is a catch-all (matches anything).
60
61	2. The method associated with the matched transition pattern is called.
62
63	A. Some transition methods are self-contained, appending elements to the
64	document tree (`Body.doctest` parses a doctest block). The parser's
65	current line index is advanced to the end of the element, and parsing
66	continues with step 1.
67
68	B. Other transition methods trigger the creation of a nested state machine,
69	whose job is to parse a compound construct ('indent' does a block quote,
70	'bullet' does a bullet list, 'overline' does a section [first checking
71	for a valid section header], etc.).
72
73	- In the case of lists and explicit markup, a one-off state machine is
74	created and run to parse contents of the first item.
75
76	- A new state machine is created and its initial state is set to the
77	appropriate specialized state (`BulletList` in the case of the
78	'bullet' transition; see `SpecializedBody` for more detail). This
79	state machine is run to parse the compound element (or series of
80	explicit markup elements), and returns as soon as a non-member element
81	is encountered. For example, the `BulletList` state machine ends as
82	soon as it encounters an element which is not a list item of that
83	bullet list. The optional omission of inter-element blank lines is
84	enabled by this nested state machine.
85
86	- The current line index is advanced to the end of the elements parsed,
87	and parsing continues with step 1.
88
89	C. The result of the 'text' transition depends on the next line of text.
90	The current state is changed to `Text`, under which the second line is
91	examined. If the second line is:
92
93	- Indented: The element is a definition list item, and parsing proceeds
94	similarly to step 2.B, using the `DefinitionList` state.
95
96	- A line of uniform punctuation characters: The element is a section
97	header; again, parsing proceeds as in step 2.B, and `Body` is still
98	used.
99
100	- Anything else: The element is a paragraph, which is examined for
101	inline markup and appended to the parent element. Processing
102	continues with step 1.
103	"""
104
105	__docformat__ = 'reStructuredText'
106
107
108	import sys
109	import re
110	import roman
111	from types import TupleType
112	from docutils import nodes, statemachine, utils, urischemes
113	from docutils import ApplicationError, DataError
114	from docutils.statemachine import StateMachineWS, StateWS
115	from docutils.nodes import fully_normalize_name as normalize_name
116	from docutils.nodes import whitespace_normalize_name
117	from docutils.utils import escape2null, unescape, column_width
118	from docutils.parsers.rst import directives, languages, tableparser, roles
119	from docutils.parsers.rst.languages import en as _fallback_language_module
120
121
122	class MarkupError(DataError): pass
123	class UnknownInterpretedRoleError(DataError): pass
124	class InterpretedRoleNotImplementedError(DataError): pass
125	class ParserError(ApplicationError): pass
126	class MarkupMismatch(Exception): pass
127
128
129	class Struct:
130
131	"""Stores data attributes for dotted-attribute access."""
132
133	def __init__(self, **keywordargs):
134	self.__dict__.update(keywordargs)
135
136
137	class RSTStateMachine(StateMachineWS):
138
139	"""
140	reStructuredText's master StateMachine.
141
142	The entry point to reStructuredText parsing is the `run()` method.
143	"""
144
145	def run(self, input_lines, document, input_offset=0, match_titles=1,
146	inliner=None):
147	"""
148	Parse `input_lines` and modify the `document` node in place.
149
150	Extend `StateMachineWS.run()`: set up parse-global data and
151	run the StateMachine.
152	"""
153	self.language = languages.get_language(
154	document.settings.language_code)
155	self.match_titles = match_titles
156	if inliner is None:
157	inliner = Inliner()
158	inliner.init_customizations(document.settings)
159	self.memo = Struct(document=document,
160	reporter=document.reporter,
161	language=self.language,
162	title_styles=[],
163	section_level=0,
164	section_bubble_up_kludge=0,
165	inliner=inliner)
166	self.document = document
167	self.attach_observer(document.note_source)
168	self.reporter = self.memo.reporter
169	self.node = document
170	results = StateMachineWS.run(self, input_lines, input_offset,
171	input_source=document['source'])
172	assert results == [], 'RSTStateMachine.run() results should be empty!'
173	self.node = self.memo = None # remove unneeded references
174
175
176	class NestedStateMachine(StateMachineWS):
177
178	"""
179	StateMachine run from within other StateMachine runs, to parse nested
180	document structures.
181	"""
182
183	def run(self, input_lines, input_offset, memo, node, match_titles=1):
184	"""
185	Parse `input_lines` and populate a `docutils.nodes.document` instance.
186
187	Extend `StateMachineWS.run()`: set up document-wide data.
188	"""
189	self.match_titles = match_titles
190	self.memo = memo
191	self.document = memo.document
192	self.attach_observer(self.document.note_source)
193	self.reporter = memo.reporter
194	self.language = memo.language
195	self.node = node
196	results = StateMachineWS.run(self, input_lines, input_offset)
197	assert results == [], ('NestedStateMachine.run() results should be '
198	'empty!')
199	return results
200
201
202	class RSTState(StateWS):
203
204	"""
205	reStructuredText State superclass.
206
207	Contains methods used by all State subclasses.
208	"""
209
210	nested_sm = NestedStateMachine
211
212	def __init__(self, state_machine, debug=0):
213	self.nested_sm_kwargs = {'state_classes': state_classes,
214	'initial_state': 'Body'}
215	StateWS.__init__(self, state_machine, debug)
216
217	def runtime_init(self):
218	StateWS.runtime_init(self)
219	memo = self.state_machine.memo
220	self.memo = memo
221	self.reporter = memo.reporter
222	self.inliner = memo.inliner
223	self.document = memo.document
224	self.parent = self.state_machine.node
225
226	def goto_line(self, abs_line_offset):
227	"""
228	Jump to input line `abs_line_offset`, ignoring jumps past the end.
229	"""
230	try:
231	self.state_machine.goto_line(abs_line_offset)
232	except EOFError:
233	pass
234
235	def no_match(self, context, transitions):
236	"""
237	Override `StateWS.no_match` to generate a system message.
238
239	This code should never be run.
240	"""
241	self.reporter.severe(
242	'Internal error: no transition pattern match. State: "%s"; '
243	'transitions: %s; context: %s; current line: %r.'
244	% (self.__class__.__name__, transitions, context,
245	self.state_machine.line),
246	line=self.state_machine.abs_line_number())
247	return context, None, []
248
249	def bof(self, context):
250	"""Called at beginning of file."""
251	return [], []
252
253	def nested_parse(self, block, input_offset, node, match_titles=0,
254	state_machine_class=None, state_machine_kwargs=None):
255	"""
256	Create a new StateMachine rooted at `node` and run it over the input
257	`block`.
258	"""
259	if state_machine_class is None:
260	state_machine_class = self.nested_sm
261	if state_machine_kwargs is None:
262	state_machine_kwargs = self.nested_sm_kwargs
263	block_length = len(block)
264	state_machine = state_machine_class(debug=self.debug,
265	**state_machine_kwargs)
266	state_machine.run(block, input_offset, memo=self.memo,
267	node=node, match_titles=match_titles)
268	state_machine.unlink()
269	new_offset = state_machine.abs_line_offset()
270	# No `block.parent` implies disconnected -- lines aren't in sync:
271	if block.parent and (len(block) - block_length) != 0:
272	# Adjustment for block if modified in nested parse:
273	self.state_machine.next_line(len(block) - block_length)
274	return new_offset
275
276	def nested_list_parse(self, block, input_offset, node, initial_state,
277	blank_finish,
278	blank_finish_state=None,
279	extra_settings={},
280	match_titles=0,
281	state_machine_class=None,
282	state_machine_kwargs=None):
283	"""
284	Create a new StateMachine rooted at `node` and run it over the input
285	`block`. Also keep track of optional intermediate blank lines and the
286	required final one.
287	"""
288	if state_machine_class is None:
289	state_machine_class = self.nested_sm
290	if state_machine_kwargs is None:
291	state_machine_kwargs = self.nested_sm_kwargs.copy()
292	state_machine_kwargs['initial_state'] = initial_state
293	state_machine = state_machine_class(debug=self.debug,
294	**state_machine_kwargs)
295	if blank_finish_state is None:
296	blank_finish_state = initial_state
297	state_machine.states[blank_finish_state].blank_finish = blank_finish
298	for key, value in extra_settings.items():
299	setattr(state_machine.states[initial_state], key, value)
300	state_machine.run(block, input_offset, memo=self.memo,
301	node=node, match_titles=match_titles)
302	blank_finish = state_machine.states[blank_finish_state].blank_finish
303	state_machine.unlink()
304	return state_machine.abs_line_offset(), blank_finish
305
306	def section(self, title, source, style, lineno, messages):
307	"""Check for a valid subsection and create one if it checks out."""
308	if self.check_subsection(source, style, lineno):
309	self.new_subsection(title, lineno, messages)
310
311	def check_subsection(self, source, style, lineno):
312	"""
313	Check for a valid subsection header. Return 1 (true) or None (false).
314
315	When a new section is reached that isn't a subsection of the current
316	section, back up the line count (use ``previous_line(-x)``), then
317	``raise EOFError``. The current StateMachine will finish, then the
318	calling StateMachine can re-examine the title. This will work its way
319	back up the calling chain until the correct section level isreached.
320
321	@@@ Alternative: Evaluate the title, store the title info & level, and
322	back up the chain until that level is reached. Store in memo? Or
323	return in results?
324
325	:Exception: `EOFError` when a sibling or supersection encountered.
326	"""
327	memo = self.memo
328	title_styles = memo.title_styles
329	mylevel = memo.section_level
330	try: # check for existing title style
331	level = title_styles.index(style) + 1
332	except ValueError: # new title style
333	if len(title_styles) == memo.section_level: # new subsection
334	title_styles.append(style)
335	return 1
336	else: # not at lowest level
337	self.parent += self.title_inconsistent(source, lineno)
338	return None
339	if level <= mylevel: # sibling or supersection
340	memo.section_level = level # bubble up to parent section
341	if len(style) == 2:
342	memo.section_bubble_up_kludge = 1
343	# back up 2 lines for underline title, 3 for overline title
344	self.state_machine.previous_line(len(style) + 1)
345	raise EOFError # let parent section re-evaluate
346	if level == mylevel + 1: # immediate subsection
347	return 1
348	else: # invalid subsection
349	self.parent += self.title_inconsistent(source, lineno)
350	return None
351
352	def title_inconsistent(self, sourcetext, lineno):
353	error = self.reporter.severe(
354	'Title level inconsistent:', nodes.literal_block('', sourcetext),
355	line=lineno)
356	return error
357
358	def new_subsection(self, title, lineno, messages):
359	"""Append new subsection to document tree. On return, check level."""
360	memo = self.memo
361	mylevel = memo.section_level
362	memo.section_level += 1
363	section_node = nodes.section()
364	self.parent += section_node
365	textnodes, title_messages = self.inline_text(title, lineno)
366	titlenode = nodes.title(title, '', *textnodes)
367	name = normalize_name(titlenode.astext())
368	section_node['names'].append(name)
369	section_node += titlenode
370	section_node += messages
371	section_node += title_messages
372	self.document.note_implicit_target(section_node, section_node)
373	offset = self.state_machine.line_offset + 1
374	absoffset = self.state_machine.abs_line_offset() + 1
375	newabsoffset = self.nested_parse(
376	self.state_machine.input_lines[offset:], input_offset=absoffset,
377	node=section_node, match_titles=1)
378	self.goto_line(newabsoffset)
379	if memo.section_level <= mylevel: # can't handle next section?
380	raise EOFError # bubble up to supersection
381	# reset section_level; next pass will detect it properly
382	memo.section_level = mylevel
383
384	def paragraph(self, lines, lineno):
385	"""
386	Return a list (paragraph & messages) & a boolean: literal_block next?
387	"""
388	data = '\n'.join(lines).rstrip()
389	if re.search(r'(?<!\\)(\\\\)*::$', data):
390	if len(data) == 2:
391	return [], 1
392	elif data[-3] in ' \n':
393	text = data[:-3].rstrip()
394	else:
395	text = data[:-1]
396	literalnext = 1
397	else:
398	text = data
399	literalnext = 0
400	textnodes, messages = self.inline_text(text, lineno)
401	p = nodes.paragraph(data, '', *textnodes)
402	p.line = lineno
403	return [p] + messages, literalnext
404
405	def inline_text(self, text, lineno):
406	"""
407	Return 2 lists: nodes (text and inline elements), and system_messages.
408	"""
409	return self.inliner.parse(text, lineno, self.memo, self.parent)
410
411	def unindent_warning(self, node_name):
412	return self.reporter.warning(
413	'%s ends without a blank line; unexpected unindent.' % node_name,
414	line=(self.state_machine.abs_line_number() + 1))
415
416
417	def build_regexp(definition, compile=1):
418	"""
419	Build, compile and return a regular expression based on `definition`.
420
421	:Parameter: `definition`: a 4-tuple (group name, prefix, suffix, parts),
422	where "parts" is a list of regular expressions and/or regular
423	expression definitions to be joined into an or-group.
424	"""
425	name, prefix, suffix, parts = definition
426	part_strings = []
427	for part in parts:
428	if type(part) is TupleType:
429	part_strings.append(build_regexp(part, None))
430	else:
431	part_strings.append(part)
432	or_group = '\|'.join(part_strings)
433	regexp = '%(prefix)s(?P<%(name)s>%(or_group)s)%(suffix)s' % locals()
434	if compile:
435	return re.compile(regexp, re.UNICODE)
436	else:
437	return regexp
438
439
440	class Inliner:
441
442	"""
443	Parse inline markup; call the `parse()` method.
444	"""
445
446	def __init__(self):
447	self.implicit_dispatch = [(self.patterns.uri, self.standalone_uri),]
448	"""List of (pattern, bound method) tuples, used by
449	`self.implicit_inline`."""
450
451	def init_customizations(self, settings):
452	"""Setting-based customizations; run when parsing begins."""
453	if settings.pep_references:
454	self.implicit_dispatch.append((self.patterns.pep,
455	self.pep_reference))
456	if settings.rfc_references:
457	self.implicit_dispatch.append((self.patterns.rfc,
458	self.rfc_reference))
459
460	def parse(self, text, lineno, memo, parent):
461	# Needs to be refactored for nested inline markup.
462	# Add nested_parse() method?
463	"""
464	Return 2 lists: nodes (text and inline elements), and system_messages.
465
466	Using `self.patterns.initial`, a pattern which matches start-strings
467	(emphasis, strong, interpreted, phrase reference, literal,
468	substitution reference, and inline target) and complete constructs
469	(simple reference, footnote reference), search for a candidate. When
470	one is found, check for validity (e.g., not a quoted '*' character).
471	If valid, search for the corresponding end string if applicable, and
472	check it for validity. If not found or invalid, generate a warning
473	and ignore the start-string. Implicit inline markup (e.g. standalone
474	URIs) is found last.
475	"""
476	self.reporter = memo.reporter
477	self.document = memo.document
478	self.language = memo.language
479	self.parent = parent
480	pattern_search = self.patterns.initial.search
481	dispatch = self.dispatch
482	remaining = escape2null(text)
483	processed = []
484	unprocessed = []
485	messages = []
486	while remaining:
487	match = pattern_search(remaining)
488	if match:
489	groups = match.groupdict()
490	method = dispatch[groups['start'] or groups['backquote']
491	or groups['refend'] or groups['fnend']]
492	before, inlines, remaining, sysmessages = method(self, match,
493	lineno)
494	unprocessed.append(before)
495	messages += sysmessages
496	if inlines:
497	processed += self.implicit_inline(''.join(unprocessed),
498	lineno)
499	processed += inlines
500	unprocessed = []
501	else:
502	break
503	remaining = ''.join(unprocessed) + remaining
504	if remaining:
505	processed += self.implicit_inline(remaining, lineno)
506	return processed, messages
507
508	openers = '\'"([{<'
509	closers = '\'")]}>'
510	start_string_prefix = (r'((?<=^)\|(?<=[-/: \n%s]))' % re.escape(openers))
511	end_string_suffix = (r'((?=$)\|(?=[-/:.,;!? \n\x00%s]))'
512	% re.escape(closers))
513	non_whitespace_before = r'(?<![ \n])'
514	non_whitespace_escape_before = r'(?<![ \n\x00])'
515	non_whitespace_after = r'(?![ \n])'
516	# Alphanumerics with isolated internal [-._] chars (i.e. not 2 together):
517	simplename = r'(?:(?!_)\w)+(?:[-._](?:(?!_)\w)+)*'
518	# Valid URI characters (see RFC 2396 & RFC 2732);
519	# final \x00 allows backslash escapes in URIs:
520	uric = r"""[-_.!~*'()[\];/:@&=+$,%a-zA-Z0-9\x00]"""
521	# Delimiter indicating the end of a URI (not part of the URI):
522	uri_end_delim = r"""[>]"""
523	# Last URI character; same as uric but no punctuation:
524	urilast = r"""[_~*/=+a-zA-Z0-9]"""
525	# End of a URI (either 'urilast' or 'uric followed by a
526	# uri_end_delim'):
527	uri_end = r"""(?:%(urilast)s\|%(uric)s(?=%(uri_end_delim)s))""" % locals()
528	emailc = r"""[-_!~*'{\|}/#?^`&=+$%a-zA-Z0-9\x00]"""
529	email_pattern = r"""
530	%(emailc)s+(?:\.%(emailc)s+)* # name
531	(?<!\x00)@ # at
532	%(emailc)s+(?:\.%(emailc)s) # host
533	%(uri_end)s # final URI char
534	"""
535	parts = ('initial_inline', start_string_prefix, '',
536	[('start', '', non_whitespace_after, # simple start-strings
537	[r'\\', # strong
538	r'\(?!\)', # emphasis but not strong
539	r'``', # literal
540	r'_`', # inline internal target
541	r'\\|(?!\\|)'] # substitution reference
542	),
543	('whole', '', end_string_suffix, # whole constructs
544	[# reference name & end-string
545	r'(?P<refname>%s)(?P<refend>__?)' % simplename,
546	('footnotelabel', r'\[', r'(?P<fnend>\]_)',
547	[r'[0-9]+', # manually numbered
548	r'\#(%s)?' % simplename, # auto-numbered (w/ label?)
549	r'\*', # auto-symbol
550	r'(?P<citationlabel>%s)' % simplename] # citation reference
551	)
552	]
553	),
554	('backquote', # interpreted text or phrase reference
555	'(?P<role>(:%s:)?)' % simplename, # optional role
556	non_whitespace_after,
557	['`(?!`)'] # but not literal
558	)
559	]
560	)
561	patterns = Struct(
562	initial=build_regexp(parts),
563	emphasis=re.compile(non_whitespace_escape_before
564	+ r'(\*)' + end_string_suffix),
565	strong=re.compile(non_whitespace_escape_before
566	+ r'(\\)' + end_string_suffix),
567	interpreted_or_phrase_ref=re.compile(
568	r"""
569	%(non_whitespace_escape_before)s
570	(
571	`
572	(?P<suffix>
573	(?P<role>:%(simplename)s:)?
574	(?P<refend>__?)?
575	)
576	)
577	%(end_string_suffix)s
578	""" % locals(), re.VERBOSE \| re.UNICODE),
579	embedded_uri=re.compile(
580	r"""
581	(
582	(?:[ \n]+\|^) # spaces or beginning of line/string
583	< # open bracket
584	%(non_whitespace_after)s
585	([^<>\x00]+) # anything but angle brackets & nulls
586	%(non_whitespace_before)s
587	> # close bracket w/o whitespace before
588	)
589	$ # end of string
590	""" % locals(), re.VERBOSE),
591	literal=re.compile(non_whitespace_before + '(``)'
592	+ end_string_suffix),
593	target=re.compile(non_whitespace_escape_before
594	+ r'(`)' + end_string_suffix),
595	substitution_ref=re.compile(non_whitespace_escape_before
596	+ r'(\\|_{0,2})'
597	+ end_string_suffix),
598	email=re.compile(email_pattern % locals() + '$', re.VERBOSE),
599	uri=re.compile(
600	(r"""
601	%(start_string_prefix)s
602	(?P<whole>
603	(?P<absolute> # absolute URI
604	(?P<scheme> # scheme (http, ftp, mailto)
605	[a-zA-Z][a-zA-Z0-9.+-]*
606	)
607	:
608	(
609	( # either:
610	(//?)? # hierarchical URI
611	%(uric)s* # URI characters
612	%(uri_end)s # final URI char
613	)
614	( # optional query
615	\?%(uric)s*
616	%(uri_end)s
617	)?
618	( # optional fragment
619	\#%(uric)s*
620	%(uri_end)s
621	)?
622	)
623	)
624	\| # OR
625	(?P<email> # email address
626	""" + email_pattern + r"""
627	)
628	)
629	%(end_string_suffix)s
630	""") % locals(), re.VERBOSE),
631	pep=re.compile(
632	r"""
633	%(start_string_prefix)s
634	(
635	(pep-(?P<pepnum1>\d+)(.txt)?) # reference to source file
636	\|
637	(PEP\s+(?P<pepnum2>\d+)) # reference by name
638	)
639	%(end_string_suffix)s""" % locals(), re.VERBOSE),
640	rfc=re.compile(
641	r"""
642	%(start_string_prefix)s
643	(RFC(-\|\s+)?(?P<rfcnum>\d+))
644	%(end_string_suffix)s""" % locals(), re.VERBOSE))
645
646	def quoted_start(self, match):
647	"""Return 1 if inline markup start-string is 'quoted', 0 if not."""
648	string = match.string
649	start = match.start()
650	end = match.end()
651	if start == 0: # start-string at beginning of text
652	return 0
653	prestart = string[start - 1]
654	try:
655	poststart = string[end]
656	if self.openers.index(prestart) \
657	== self.closers.index(poststart): # quoted
658	return 1
659	except IndexError: # start-string at end of text
660	return 1
661	except ValueError: # not quoted
662	pass
663	return 0
664
665	def inline_obj(self, match, lineno, end_pattern, nodeclass,
666	restore_backslashes=0):
667	string = match.string
668	matchstart = match.start('start')
669	matchend = match.end('start')
670	if self.quoted_start(match):
671	return (string[:matchend], [], string[matchend:], [], '')
672	endmatch = end_pattern.search(string[matchend:])
673	if endmatch and endmatch.start(1): # 1 or more chars
674	text = unescape(endmatch.string[:endmatch.start(1)],
675	restore_backslashes)
676	textend = matchend + endmatch.end(1)
677	rawsource = unescape(string[matchstart:textend], 1)
678	return (string[:matchstart], [nodeclass(rawsource, text)],
679	string[textend:], [], endmatch.group(1))
680	msg = self.reporter.warning(
681	'Inline %s start-string without end-string.'
682	% nodeclass.__name__, line=lineno)
683	text = unescape(string[matchstart:matchend], 1)
684	rawsource = unescape(string[matchstart:matchend], 1)
685	prb = self.problematic(text, rawsource, msg)
686	return string[:matchstart], [prb], string[matchend:], [msg], ''
687
688	def problematic(self, text, rawsource, message):
689	msgid = self.document.set_id(message, self.parent)
690	problematic = nodes.problematic(rawsource, text, refid=msgid)
691	prbid = self.document.set_id(problematic)
692	message.add_backref(prbid)
693	return problematic
694
695	def emphasis(self, match, lineno):
696	before, inlines, remaining, sysmessages, endstring = self.inline_obj(
697	match, lineno, self.patterns.emphasis, nodes.emphasis)
698	return before, inlines, remaining, sysmessages
699
700	def strong(self, match, lineno):
701	before, inlines, remaining, sysmessages, endstring = self.inline_obj(
702	match, lineno, self.patterns.strong, nodes.strong)
703	return before, inlines, remaining, sysmessages
704
705	def interpreted_or_phrase_ref(self, match, lineno):
706	end_pattern = self.patterns.interpreted_or_phrase_ref
707	string = match.string
708	matchstart = match.start('backquote')
709	matchend = match.end('backquote')
710	rolestart = match.start('role')
711	role = match.group('role')
712	position = ''
713	if role:
714	role = role[1:-1]
715	position = 'prefix'
716	elif self.quoted_start(match):
717	return (string[:matchend], [], string[matchend:], [])
718	endmatch = end_pattern.search(string[matchend:])
719	if endmatch and endmatch.start(1): # 1 or more chars
720	textend = matchend + endmatch.end()
721	if endmatch.group('role'):
722	if role:
723	msg = self.reporter.warning(
724	'Multiple roles in interpreted text (both '
725	'prefix and suffix present; only one allowed).',
726	line=lineno)
727	text = unescape(string[rolestart:textend], 1)
728	prb = self.problematic(text, text, msg)
729	return string[:rolestart], [prb], string[textend:], [msg]
730	role = endmatch.group('suffix')[1:-1]
731	position = 'suffix'
732	escaped = endmatch.string[:endmatch.start(1)]
733	rawsource = unescape(string[matchstart:textend], 1)
734	if rawsource[-1:] == '_':
735	if role:
736	msg = self.reporter.warning(
737	'Mismatch: both interpreted text role %s and '
738	'reference suffix.' % position, line=lineno)
739	text = unescape(string[rolestart:textend], 1)
740	prb = self.problematic(text, text, msg)
741	return string[:rolestart], [prb], string[textend:], [msg]
742	return self.phrase_ref(string[:matchstart], string[textend:],
743	rawsource, escaped, unescape(escaped))
744	else:
745	rawsource = unescape(string[rolestart:textend], 1)
746	nodelist, messages = self.interpreted(rawsource, escaped, role,
747	lineno)
748	return (string[:rolestart], nodelist,
749	string[textend:], messages)
750	msg = self.reporter.warning(
751	'Inline interpreted text or phrase reference start-string '
752	'without end-string.', line=lineno)
753	text = unescape(string[matchstart:matchend], 1)
754	prb = self.problematic(text, text, msg)
755	return string[:matchstart], [prb], string[matchend:], [msg]
756
757	def phrase_ref(self, before, after, rawsource, escaped, text):
758	match = self.patterns.embedded_uri.search(escaped)
759	if match:
760	text = unescape(escaped[:match.start(0)])
761	uri_text = match.group(2)
762	uri = ''.join(uri_text.split())
763	uri = self.adjust_uri(uri)
764	if uri:
765	target = nodes.target(match.group(1), refuri=uri)
766	else:
767	raise ApplicationError('problem with URI: %r' % uri_text)
768	if not text:
769	text = uri
770	else:
771	target = None
772	refname = normalize_name(text)
773	reference = nodes.reference(rawsource, text,
774	name=whitespace_normalize_name(text))
775	node_list = [reference]
776	if rawsource[-2:] == '__':
777	if target:
778	reference['refuri'] = uri
779	else:
780	reference['anonymous'] = 1
781	else:
782	if target:
783	reference['refuri'] = uri
784	target['names'].append(refname)
785	self.document.note_explicit_target(target, self.parent)
786	node_list.append(target)
787	else:
788	reference['refname'] = refname
789	self.document.note_refname(reference)
790	return before, node_list, after, []
791
792	def adjust_uri(self, uri):
793	match = self.patterns.email.match(uri)
794	if match:
795	return 'mailto:' + uri
796	else:
797	return uri
798
799	def interpreted(self, rawsource, text, role, lineno):
800	role_fn, messages = roles.role(role, self.language, lineno,
801	self.reporter)
802	if role_fn:
803	nodes, messages2 = role_fn(role, rawsource, text, lineno, self)
804	return nodes, messages + messages2
805	else:
806	msg = self.reporter.error(
807	'Unknown interpreted text role "%s".' % role,
808	line=lineno)
809	return ([self.problematic(rawsource, rawsource, msg)],
810	messages + [msg])
811
812	def literal(self, match, lineno):
813	before, inlines, remaining, sysmessages, endstring = self.inline_obj(
814	match, lineno, self.patterns.literal, nodes.literal,
815	restore_backslashes=1)
816	return before, inlines, remaining, sysmessages
817
818	def inline_internal_target(self, match, lineno):
819	before, inlines, remaining, sysmessages, endstring = self.inline_obj(
820	match, lineno, self.patterns.target, nodes.target)
821	if inlines and isinstance(inlines[0], nodes.target):
822	assert len(inlines) == 1
823	target = inlines[0]
824	name = normalize_name(target.astext())
825	target['names'].append(name)
826	self.document.note_explicit_target(target, self.parent)
827	return before, inlines, remaining, sysmessages
828
829	def substitution_reference(self, match, lineno):
830	before, inlines, remaining, sysmessages, endstring = self.inline_obj(
831	match, lineno, self.patterns.substitution_ref,
832	nodes.substitution_reference)
833	if len(inlines) == 1:
834	subref_node = inlines[0]
835	if isinstance(subref_node, nodes.substitution_reference):
836	subref_text = subref_node.astext()
837	self.document.note_substitution_ref(subref_node, subref_text)
838	if endstring[-1:] == '_':
839	reference_node = nodes.reference(
840	'\|%s%s' % (subref_text, endstring), '')
841	if endstring[-2:] == '__':
842	reference_node['anonymous'] = 1
843	else:
844	reference_node['refname'] = normalize_name(subref_text)
845	self.document.note_refname(reference_node)
846	reference_node += subref_node
847	inlines = [reference_node]
848	return before, inlines, remaining, sysmessages
849
850	def footnote_reference(self, match, lineno):
851	"""
852	Handles `nodes.footnote_reference` and `nodes.citation_reference`
853	elements.
854	"""
855	label = match.group('footnotelabel')
856	refname = normalize_name(label)
857	string = match.string
858	before = string[:match.start('whole')]
859	remaining = string[match.end('whole'):]
860	if match.group('citationlabel'):
861	refnode = nodes.citation_reference('[%s]_' % label,
862	refname=refname)
863	refnode += nodes.Text(label)
864	self.document.note_citation_ref(refnode)
865	else:
866	refnode = nodes.footnote_reference('[%s]_' % label)
867	if refname[0] == '#':
868	refname = refname[1:]
869	refnode['auto'] = 1
870	self.document.note_autofootnote_ref(refnode)
871	elif refname == '*':
872	refname = ''
873	refnode['auto'] = '*'
874	self.document.note_symbol_footnote_ref(
875	refnode)
876	else:
877	refnode += nodes.Text(label)
878	if refname:
879	refnode['refname'] = refname
880	self.document.note_footnote_ref(refnode)
881	if utils.get_trim_footnote_ref_space(self.document.settings):
882	before = before.rstrip()
883	return (before, [refnode], remaining, [])
884
885	def reference(self, match, lineno, anonymous=None):
886	referencename = match.group('refname')
887	refname = normalize_name(referencename)
888	referencenode = nodes.reference(
889	referencename + match.group('refend'), referencename,
890	name=whitespace_normalize_name(referencename))
891	if anonymous:
892	referencenode['anonymous'] = 1
893	else:
894	referencenode['refname'] = refname
895	self.document.note_refname(referencenode)
896	string = match.string
897	matchstart = match.start('whole')
898	matchend = match.end('whole')
899	return (string[:matchstart], [referencenode], string[matchend:], [])
900
901	def anonymous_reference(self, match, lineno):
902	return self.reference(match, lineno, anonymous=1)
903
904	def standalone_uri(self, match, lineno):
905	if not match.group('scheme') or urischemes.schemes.has_key(
906	match.group('scheme').lower()):
907	if match.group('email'):
908	addscheme = 'mailto:'
909	else:
910	addscheme = ''
911	text = match.group('whole')
912	unescaped = unescape(text, 0)
913	return [nodes.reference(unescape(text, 1), unescaped,
914	refuri=addscheme + unescaped)]
915	else: # not a valid scheme
916	raise MarkupMismatch
917
918	pep_url = 'pep-%04d.html'
919
920	def pep_reference(self, match, lineno):
921	text = match.group(0)
922	if text.startswith('pep-'):
923	pepnum = int(match.group('pepnum1'))
924	elif text.startswith('PEP'):
925	pepnum = int(match.group('pepnum2'))
926	else:
927	raise MarkupMismatch
928	ref = self.document.settings.pep_base_url + self.pep_url % pepnum
929	unescaped = unescape(text, 0)
930	return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
931
932	rfc_url = 'rfc%d.html'
933
934	def rfc_reference(self, match, lineno):
935	text = match.group(0)
936	if text.startswith('RFC'):
937	rfcnum = int(match.group('rfcnum'))
938	ref = self.document.settings.rfc_base_url + self.rfc_url % rfcnum
939	else:
940	raise MarkupMismatch
941	unescaped = unescape(text, 0)
942	return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
943
944	def implicit_inline(self, text, lineno):
945	"""
946	Check each of the patterns in `self.implicit_dispatch` for a match,
947	and dispatch to the stored method for the pattern. Recursively check
948	the text before and after the match. Return a list of `nodes.Text`
949	and inline element nodes.
950	"""
951	if not text:
952	return []
953	for pattern, method in self.implicit_dispatch:
954	match = pattern.search(text)
955	if match:
956	try:
957	# Must recurse on strings before and after the match;
958	# there may be multiple patterns.
959	return (self.implicit_inline(text[:match.start()], lineno)
960	+ method(match, lineno) +
961	self.implicit_inline(text[match.end():], lineno))
962	except MarkupMismatch:
963	pass
964	return [nodes.Text(unescape(text), rawsource=unescape(text, 1))]
965
966	dispatch = {'*': emphasis,
967	'**': strong,
968	'`': interpreted_or_phrase_ref,
969	'``': literal,
970	'_`': inline_internal_target,
971	']_': footnote_reference,
972	'\|': substitution_reference,
973	'_': reference,
974	'__': anonymous_reference}
975
976
977	def _loweralpha_to_int(s, _zero=(ord('a')-1)):
978	return ord(s) - _zero
979
980	def _upperalpha_to_int(s, _zero=(ord('A')-1)):
981	return ord(s) - _zero
982
983	def _lowerroman_to_int(s):
984	return roman.fromRoman(s.upper())
985
986
987	class Body(RSTState):
988
989	"""
990	Generic classifier of the first line of a block.
991	"""
992
993	double_width_pad_char = tableparser.TableParser.double_width_pad_char
994	"""Padding character for East Asian double-width text."""
995
996	enum = Struct()
997	"""Enumerated list parsing information."""
998
999	enum.formatinfo = {
1000	'parens': Struct(prefix='(', suffix=')', start=1, end=-1),
1001	'rparen': Struct(prefix='', suffix=')', start=0, end=-1),
1002	'period': Struct(prefix='', suffix='.', start=0, end=-1)}
1003	enum.formats = enum.formatinfo.keys()
1004	enum.sequences = ['arabic', 'loweralpha', 'upperalpha',
1005	'lowerroman', 'upperroman'] # ORDERED!
1006	enum.sequencepats = {'arabic': '[0-9]+',
1007	'loweralpha': '[a-z]',
1008	'upperalpha': '[A-Z]',
1009	'lowerroman': '[ivxlcdm]+',
1010	'upperroman': '[IVXLCDM]+',}
1011	enum.converters = {'arabic': int,
1012	'loweralpha': _loweralpha_to_int,
1013	'upperalpha': _upperalpha_to_int,
1014	'lowerroman': _lowerroman_to_int,
1015	'upperroman': roman.fromRoman}
1016
1017	enum.sequenceregexps = {}
1018	for sequence in enum.sequences:
1019	enum.sequenceregexps[sequence] = re.compile(
1020	enum.sequencepats[sequence] + '$')
1021
1022	grid_table_top_pat = re.compile(r'\+-[-+]+-\+ *$')
1023	"""Matches the top (& bottom) of a full table)."""
1024
1025	simple_table_top_pat = re.compile('=+( +=+)+ *$')
1026	"""Matches the top of a simple table."""
1027
1028	simple_table_border_pat = re.compile('=+[ =]*$')
1029	"""Matches the bottom & header bottom of a simple table."""
1030
1031	pats = {}
1032	"""Fragments of patterns used by transitions."""
1033
1034	pats['nonalphanum7bit'] = '[!-/:-@[-`{-~]'
1035	pats['alpha'] = '[a-zA-Z]'
1036	pats['alphanum'] = '[a-zA-Z0-9]'
1037	pats['alphanumplus'] = '[a-zA-Z0-9_-]'
1038	pats['enum'] = ('(%(arabic)s\|%(loweralpha)s\|%(upperalpha)s\|%(lowerroman)s'
1039	'\|%(upperroman)s\|#)' % enum.sequencepats)
1040	pats['optname'] = '%(alphanum)s%(alphanumplus)s*' % pats
1041	# @@@ Loosen up the pattern? Allow Unicode?
1042	pats['optarg'] = '(%(alpha)s%(alphanumplus)s*\|<[^<>]+>)' % pats
1043	pats['shortopt'] = r'(-\|\+)%(alphanum)s( ?%(optarg)s)?' % pats
1044	pats['longopt'] = r'(--\|/)%(optname)s([ =]%(optarg)s)?' % pats
1045	pats['option'] = r'(%(shortopt)s\|%(longopt)s)' % pats
1046
1047	for format in enum.formats:
1048	pats[format] = '(?P<%s>%s%s%s)' % (
1049	format, re.escape(enum.formatinfo[format].prefix),
1050	pats['enum'], re.escape(enum.formatinfo[format].suffix))
1051
1052	patterns = {
1053	'bullet': r'[-+*]( +\|$)',
1054	'enumerator': r'(%(parens)s\|%(rparen)s\|%(period)s)( +\|$)' % pats,
1055	'field_marker': r':(?![: ])([^:\\]\|\\.)*(?<! ):( +\|$)',
1056	'option_marker': r'%(option)s(, %(option)s)*( +\| ?$)' % pats,
1057	'doctest': r'>>>( +\|$)',
1058	'line_block': r'\\|( +\|$)',
1059	'grid_table_top': grid_table_top_pat,
1060	'simple_table_top': simple_table_top_pat,
1061	'explicit_markup': r'\.\.( +\|$)',
1062	'anonymous': r'__( +\|$)',
1063	'line': r'(%(nonalphanum7bit)s)\1* *$' % pats,
1064	'text': r''}
1065	initial_transitions = (
1066	'bullet',
1067	'enumerator',
1068	'field_marker',
1069	'option_marker',
1070	'doctest',
1071	'line_block',
1072	'grid_table_top',
1073	'simple_table_top',
1074	'explicit_markup',
1075	'anonymous',
1076	'line',
1077	'text')
1078
1079	def indent(self, match, context, next_state):
1080	"""Block quote."""
1081	indented, indent, line_offset, blank_finish = \
1082	self.state_machine.get_indented()
1083	blockquote, messages = self.block_quote(indented, line_offset)
1084	self.parent += blockquote
1085	self.parent += messages
1086	if not blank_finish:
1087	self.parent += self.unindent_warning('Block quote')
1088	return context, next_state, []
1089
1090	def block_quote(self, indented, line_offset):
1091	blockquote_lines, attribution_lines, attribution_offset = \
1092	self.check_attribution(indented, line_offset)
1093	blockquote = nodes.block_quote()
1094	self.nested_parse(blockquote_lines, line_offset, blockquote)
1095	messages = []
1096	if attribution_lines:
1097	attribution, messages = self.parse_attribution(attribution_lines,
1098	attribution_offset)
1099	blockquote += attribution
1100	return blockquote, messages
1101
1102	# u'\u2014' is an em-dash:
1103	attribution_pattern = re.compile(ur'(---?(?!-)\|\u2014) *(?=[^ \n])')
1104
1105	def check_attribution(self, indented, line_offset):
1106	"""
1107	Check for an attribution in the last contiguous block of `indented`.
1108
1109	* First line after last blank line must begin with "--" (etc.).
1110	* Every line after that must have consistent indentation.
1111
1112	Return a 3-tuple: (block quote lines, attribution lines,
1113	attribution offset).
1114	"""
1115	#import pdb ; pdb.set_trace()
1116	blank = None
1117	nonblank_seen = None
1118	indent = 0
1119	for i in range(len(indented) - 1, 0, -1): # don't check first line
1120	this_line_blank = not indented[i].strip()
1121	if nonblank_seen and this_line_blank:
1122	match = self.attribution_pattern.match(indented[i + 1])
1123	if match:
1124	blank = i
1125	break
1126	elif not this_line_blank:
1127	nonblank_seen = 1
1128	if blank and len(indented) - blank > 2: # multi-line attribution
1129	indent = (len(indented[blank + 2])
1130	- len(indented[blank + 2].lstrip()))
1131	for j in range(blank + 3, len(indented)):
1132	if ( indented[j] # may be blank last line
1133	and indent != (len(indented[j])
1134	- len(indented[j].lstrip()))):
1135	# bad shape
1136	blank = None
1137	break
1138	if blank:
1139	a_lines = indented[blank + 1:]
1140	a_lines.trim_left(match.end(), end=1)
1141	a_lines.trim_left(indent, start=1)
1142	return (indented[:blank], a_lines, line_offset + blank + 1)
1143	else:
1144	return (indented, None, None)
1145
1146	def parse_attribution(self, indented, line_offset):
1147	text = '\n'.join(indented).rstrip()
1148	lineno = self.state_machine.abs_line_number() + line_offset
1149	textnodes, messages = self.inline_text(text, lineno)
1150	node = nodes.attribution(text, '', *textnodes)
1151	node.line = lineno
1152	return node, messages
1153
1154	def bullet(self, match, context, next_state):
1155	"""Bullet list item."""
1156	bulletlist = nodes.bullet_list()
1157	self.parent += bulletlist
1158	bulletlist['bullet'] = match.string[0]
1159	i, blank_finish = self.list_item(match.end())
1160	bulletlist += i
1161	offset = self.state_machine.line_offset + 1 # next line
1162	new_line_offset, blank_finish = self.nested_list_parse(
1163	self.state_machine.input_lines[offset:],
1164	input_offset=self.state_machine.abs_line_offset() + 1,
1165	node=bulletlist, initial_state='BulletList',
1166	blank_finish=blank_finish)
1167	self.goto_line(new_line_offset)
1168	if not blank_finish:
1169	self.parent += self.unindent_warning('Bullet list')
1170	return [], next_state, []
1171
1172	def list_item(self, indent):
1173	if self.state_machine.line[indent:]:
1174	indented, line_offset, blank_finish = (
1175	self.state_machine.get_known_indented(indent))
1176	else:
1177	indented, indent, line_offset, blank_finish = (
1178	self.state_machine.get_first_known_indented(indent))
1179	listitem = nodes.list_item('\n'.join(indented))
1180	if indented:
1181	self.nested_parse(indented, input_offset=line_offset,
1182	node=listitem)
1183	return listitem, blank_finish
1184
1185	def enumerator(self, match, context, next_state):
1186	"""Enumerated List Item"""
1187	format, sequence, text, ordinal = self.parse_enumerator(match)
1188	if not self.is_enumerated_list_item(ordinal, sequence, format):
1189	raise statemachine.TransitionCorrection('text')
1190	enumlist = nodes.enumerated_list()
1191	self.parent += enumlist
1192	if sequence == '#':
1193	enumlist['enumtype'] = 'arabic'
1194	else:
1195	enumlist['enumtype'] = sequence
1196	enumlist['prefix'] = self.enum.formatinfo[format].prefix
1197	enumlist['suffix'] = self.enum.formatinfo[format].suffix
1198	if ordinal != 1:
1199	enumlist['start'] = ordinal
1200	msg = self.reporter.info(
1201	'Enumerated list start value not ordinal-1: "%s" (ordinal %s)'
1202	% (text, ordinal), line=self.state_machine.abs_line_number())
1203	self.parent += msg
1204	listitem, blank_finish = self.list_item(match.end())
1205	enumlist += listitem
1206	offset = self.state_machine.line_offset + 1 # next line
1207	newline_offset, blank_finish = self.nested_list_parse(
1208	self.state_machine.input_lines[offset:],
1209	input_offset=self.state_machine.abs_line_offset() + 1,
1210	node=enumlist, initial_state='EnumeratedList',
1211	blank_finish=blank_finish,
1212	extra_settings={'lastordinal': ordinal,
1213	'format': format,
1214	'auto': sequence == '#'})
1215	self.goto_line(newline_offset)
1216	if not blank_finish:
1217	self.parent += self.unindent_warning('Enumerated list')
1218	return [], next_state, []
1219
1220	def parse_enumerator(self, match, expected_sequence=None):
1221	"""
1222	Analyze an enumerator and return the results.
1223
1224	:Return:
1225	- the enumerator format ('period', 'parens', or 'rparen'),
1226	- the sequence used ('arabic', 'loweralpha', 'upperroman', etc.),
1227	- the text of the enumerator, stripped of formatting, and
1228	- the ordinal value of the enumerator ('a' -> 1, 'ii' -> 2, etc.;
1229	``None`` is returned for invalid enumerator text).
1230
1231	The enumerator format has already been determined by the regular
1232	expression match. If `expected_sequence` is given, that sequence is
1233	tried first. If not, we check for Roman numeral 1. This way,
1234	single-character Roman numerals (which are also alphabetical) can be
1235	matched. If no sequence has been matched, all sequences are checked in
1236	order.
1237	"""
1238	groupdict = match.groupdict()
1239	sequence = ''
1240	for format in self.enum.formats:
1241	if groupdict[format]: # was this the format matched?
1242	break # yes; keep `format`
1243	else: # shouldn't happen
1244	raise ParserError('enumerator format not matched')
1245	text = groupdict[format][self.enum.formatinfo[format].start
1246	:self.enum.formatinfo[format].end]
1247	if text == '#':
1248	sequence = '#'
1249	elif expected_sequence:
1250	try:
1251	if self.enum.sequenceregexps[expected_sequence].match(text):
1252	sequence = expected_sequence
1253	except KeyError: # shouldn't happen
1254	raise ParserError('unknown enumerator sequence: %s'
1255	% sequence)
1256	elif text == 'i':
1257	sequence = 'lowerroman'
1258	elif text == 'I':
1259	sequence = 'upperroman'
1260	if not sequence:
1261	for sequence in self.enum.sequences:
1262	if self.enum.sequenceregexps[sequence].match(text):
1263	break
1264	else: # shouldn't happen
1265	raise ParserError('enumerator sequence not matched')
1266	if sequence == '#':
1267	ordinal = 1
1268	else:
1269	try:
1270	ordinal = self.enum.converters[sequence](text)
1271	except roman.InvalidRomanNumeralError:
1272	ordinal = None
1273	return format, sequence, text, ordinal
1274
1275	def is_enumerated_list_item(self, ordinal, sequence, format):
1276	"""
1277	Check validity based on the ordinal value and the second line.
1278
1279	Return true iff the ordinal is valid and the second line is blank,
1280	indented, or starts with the next enumerator or an auto-enumerator.
1281	"""
1282	if ordinal is None:
1283	return None
1284	try:
1285	next_line = self.state_machine.next_line()
1286	except EOFError: # end of input lines
1287	self.state_machine.previous_line()
1288	return 1
1289	else:
1290	self.state_machine.previous_line()
1291	if not next_line[:1].strip(): # blank or indented
1292	return 1
1293	result = self.make_enumerator(ordinal + 1, sequence, format)
1294	if result:
1295	next_enumerator, auto_enumerator = result
1296	try:
1297	if ( next_line.startswith(next_enumerator) or
1298	next_line.startswith(auto_enumerator) ):
1299	return 1
1300	except TypeError:
1301	pass
1302	return None
1303
1304	def make_enumerator(self, ordinal, sequence, format):
1305	"""
1306	Construct and return the next enumerated list item marker, and an
1307	auto-enumerator ("#" instead of the regular enumerator).
1308
1309	Return ``None`` for invalid (out of range) ordinals.
1310	""" #"
1311	if sequence == '#':
1312	enumerator = '#'
1313	elif sequence == 'arabic':
1314	enumerator = str(ordinal)
1315	else:
1316	if sequence.endswith('alpha'):
1317	if ordinal > 26:
1318	return None
1319	enumerator = chr(ordinal + ord('a') - 1)
1320	elif sequence.endswith('roman'):
1321	try:
1322	enumerator = roman.toRoman(ordinal)
1323	except roman.RomanError:
1324	return None
1325	else: # shouldn't happen
1326	raise ParserError('unknown enumerator sequence: "%s"'
1327	% sequence)
1328	if sequence.startswith('lower'):
1329	enumerator = enumerator.lower()
1330	elif sequence.startswith('upper'):
1331	enumerator = enumerator.upper()
1332	else: # shouldn't happen
1333	raise ParserError('unknown enumerator sequence: "%s"'
1334	% sequence)
1335	formatinfo = self.enum.formatinfo[format]
1336	next_enumerator = (formatinfo.prefix + enumerator + formatinfo.suffix
1337	+ ' ')
1338	auto_enumerator = formatinfo.prefix + '#' + formatinfo.suffix + ' '
1339	return next_enumerator, auto_enumerator
1340
1341	def field_marker(self, match, context, next_state):
1342	"""Field list item."""
1343	field_list = nodes.field_list()
1344	self.parent += field_list
1345	field, blank_finish = self.field(match)
1346	field_list += field
1347	offset = self.state_machine.line_offset + 1 # next line
1348	newline_offset, blank_finish = self.nested_list_parse(
1349	self.state_machine.input_lines[offset:],
1350	input_offset=self.state_machine.abs_line_offset() + 1,
1351	node=field_list, initial_state='FieldList',
1352	blank_finish=blank_finish)
1353	self.goto_line(newline_offset)
1354	if not blank_finish:
1355	self.parent += self.unindent_warning('Field list')
1356	return [], next_state, []
1357
1358	def field(self, match):
1359	name = self.parse_field_marker(match)
1360	lineno = self.state_machine.abs_line_number()
1361	indented, indent, line_offset, blank_finish = \
1362	self.state_machine.get_first_known_indented(match.end())
1363	field_node = nodes.field()
1364	field_node.line = lineno
1365	name_nodes, name_messages = self.inline_text(name, lineno)
1366	field_node += nodes.field_name(name, '', *name_nodes)
1367	field_body = nodes.field_body('\n'.join(indented), *name_messages)
1368	field_node += field_body
1369	if indented:
1370	self.parse_field_body(indented, line_offset, field_body)
1371	return field_node, blank_finish
1372
1373	def parse_field_marker(self, match):
1374	"""Extract & return field name from a field marker match."""
1375	field = match.group()[1:] # strip off leading ':'
1376	field = field[:field.rfind(':')] # strip off trailing ':' etc.
1377	return field
1378
1379	def parse_field_body(self, indented, offset, node):
1380	self.nested_parse(indented, input_offset=offset, node=node)
1381
1382	def option_marker(self, match, context, next_state):
1383	"""Option list item."""
1384	optionlist = nodes.option_list()
1385	try:
1386	listitem, blank_finish = self.option_list_item(match)
1387	except MarkupError, (message, lineno):
1388	# This shouldn't happen; pattern won't match.
1389	msg = self.reporter.error(
1390	'Invalid option list marker: %s' % message, line=lineno)
1391	self.parent += msg
1392	indented, indent, line_offset, blank_finish = \
1393	self.state_machine.get_first_known_indented(match.end())
1394	blockquote, messages = self.block_quote(indented, line_offset)
1395	self.parent += blockquote
1396	self.parent += messages
1397	if not blank_finish:
1398	self.parent += self.unindent_warning('Option list')
1399	return [], next_state, []
1400	self.parent += optionlist
1401	optionlist += listitem
1402	offset = self.state_machine.line_offset + 1 # next line
1403	newline_offset, blank_finish = self.nested_list_parse(
1404	self.state_machine.input_lines[offset:],
1405	input_offset=self.state_machine.abs_line_offset() + 1,
1406	node=optionlist, initial_state='OptionList',
1407	blank_finish=blank_finish)
1408	self.goto_line(newline_offset)
1409	if not blank_finish:
1410	self.parent += self.unindent_warning('Option list')
1411	return [], next_state, []
1412
1413	def option_list_item(self, match):
1414	offset = self.state_machine.abs_line_offset()
1415	options = self.parse_option_marker(match)
1416	indented, indent, line_offset, blank_finish = \
1417	self.state_machine.get_first_known_indented(match.end())
1418	if not indented: # not an option list item
1419	self.goto_line(offset)
1420	raise statemachine.TransitionCorrection('text')
1421	option_group = nodes.option_group('', *options)
1422	description = nodes.description('\n'.join(indented))
1423	option_list_item = nodes.option_list_item('', option_group,
1424	description)
1425	if indented:
1426	self.nested_parse(indented, input_offset=line_offset,
1427	node=description)
1428	return option_list_item, blank_finish
1429
1430	def parse_option_marker(self, match):
1431	"""
1432	Return a list of `node.option` and `node.option_argument` objects,
1433	parsed from an option marker match.
1434
1435	:Exception: `MarkupError` for invalid option markers.
1436	"""
1437	optlist = []
1438	optionstrings = match.group().rstrip().split(', ')
1439	for optionstring in optionstrings:
1440	tokens = optionstring.split()
1441	delimiter = ' '
1442	firstopt = tokens[0].split('=')
1443	if len(firstopt) > 1:
1444	# "--opt=value" form
1445	tokens[:1] = firstopt
1446	delimiter = '='
1447	elif (len(tokens[0]) > 2
1448	and ((tokens[0].startswith('-')
1449	and not tokens[0].startswith('--'))
1450	or tokens[0].startswith('+'))):
1451	# "-ovalue" form
1452	tokens[:1] = [tokens[0][:2], tokens[0][2:]]
1453	delimiter = ''
1454	if len(tokens) > 1 and (tokens[1].startswith('<')
1455	and tokens[-1].endswith('>')):
1456	# "-o <value1 value2>" form; join all values into one token
1457	tokens[1:] = [' '.join(tokens[1:])]
1458	if 0 < len(tokens) <= 2:
1459	option = nodes.option(optionstring)
1460	option += nodes.option_string(tokens[0], tokens[0])
1461	if len(tokens) > 1:
1462	option += nodes.option_argument(tokens[1], tokens[1],
1463	delimiter=delimiter)
1464	optlist.append(option)
1465	else:
1466	raise MarkupError(
1467	'wrong number of option tokens (=%s), should be 1 or 2: '
1468	'"%s"' % (len(tokens), optionstring),
1469	self.state_machine.abs_line_number() + 1)
1470	return optlist
1471
1472	def doctest(self, match, context, next_state):
1473	data = '\n'.join(self.state_machine.get_text_block())
1474	self.parent += nodes.doctest_block(data, data)
1475	return [], next_state, []
1476
1477	def line_block(self, match, context, next_state):
1478	"""First line of a line block."""
1479	block = nodes.line_block()
1480	self.parent += block
1481	lineno = self.state_machine.abs_line_number()
1482	line, messages, blank_finish = self.line_block_line(match, lineno)
1483	block += line
1484	self.parent += messages
1485	if not blank_finish:
1486	offset = self.state_machine.line_offset + 1 # next line
1487	new_line_offset, blank_finish = self.nested_list_parse(
1488	self.state_machine.input_lines[offset:],
1489	input_offset=self.state_machine.abs_line_offset() + 1,
1490	node=block, initial_state='LineBlock',
1491	blank_finish=0)
1492	self.goto_line(new_line_offset)
1493	if not blank_finish:
1494	self.parent += self.reporter.warning(
1495	'Line block ends without a blank line.',
1496	line=(self.state_machine.abs_line_number() + 1))
1497	if len(block):
1498	if block[0].indent is None:
1499	block[0].indent = 0
1500	self.nest_line_block_lines(block)
1501	return [], next_state, []
1502
1503	def line_block_line(self, match, lineno):
1504	"""Return one line element of a line_block."""
1505	indented, indent, line_offset, blank_finish = \
1506	self.state_machine.get_first_known_indented(match.end(),
1507	until_blank=1)
1508	text = u'\n'.join(indented)
1509	text_nodes, messages = self.inline_text(text, lineno)
1510	line = nodes.line(text, '', *text_nodes)
1511	if match.string.rstrip() != '\|': # not empty
1512	line.indent = len(match.group(1)) - 1
1513	return line, messages, blank_finish
1514
1515	def nest_line_block_lines(self, block):
1516	for index in range(1, len(block)):
1517	if block[index].indent is None:
1518	block[index].indent = block[index - 1].indent
1519	self.nest_line_block_segment(block)
1520
1521	def nest_line_block_segment(self, block):
1522	indents = [item.indent for item in block]
1523	least = min(indents)
1524	new_items = []
1525	new_block = nodes.line_block()
1526	for item in block:
1527	if item.indent > least:
1528	new_block.append(item)
1529	else:
1530	if len(new_block):
1531	self.nest_line_block_segment(new_block)
1532	new_items.append(new_block)
1533	new_block = nodes.line_block()
1534	new_items.append(item)
1535	if len(new_block):
1536	self.nest_line_block_segment(new_block)
1537	new_items.append(new_block)
1538	block[:] = new_items
1539
1540	def grid_table_top(self, match, context, next_state):
1541	"""Top border of a full table."""
1542	return self.table_top(match, context, next_state,
1543	self.isolate_grid_table,
1544	tableparser.GridTableParser)
1545
1546	def simple_table_top(self, match, context, next_state):
1547	"""Top border of a simple table."""
1548	return self.table_top(match, context, next_state,
1549	self.isolate_simple_table,
1550	tableparser.SimpleTableParser)
1551
1552	def table_top(self, match, context, next_state,
1553	isolate_function, parser_class):
1554	"""Top border of a generic table."""
1555	nodelist, blank_finish = self.table(isolate_function, parser_class)
1556	self.parent += nodelist
1557	if not blank_finish:
1558	msg = self.reporter.warning(
1559	'Blank line required after table.',
1560	line=self.state_machine.abs_line_number() + 1)
1561	self.parent += msg
1562	return [], next_state, []
1563
1564	def table(self, isolate_function, parser_class):
1565	"""Parse a table."""
1566	block, messages, blank_finish = isolate_function()
1567	if block:
1568	try:
1569	parser = parser_class()
1570	tabledata = parser.parse(block)
1571	tableline = (self.state_machine.abs_line_number() - len(block)
1572	+ 1)
1573	table = self.build_table(tabledata, tableline)
1574	nodelist = [table] + messages
1575	except tableparser.TableMarkupError, detail:
1576	nodelist = self.malformed_table(
1577	block, ' '.join(detail.args)) + messages
1578	else:
1579	nodelist = messages
1580	return nodelist, blank_finish
1581
1582	def isolate_grid_table(self):
1583	messages = []
1584	blank_finish = 1
1585	try:
1586	block = self.state_machine.get_text_block(flush_left=1)
1587	except statemachine.UnexpectedIndentationError, instance:
1588	block, source, lineno = instance.args
1589	messages.append(self.reporter.error('Unexpected indentation.',
1590	source=source, line=lineno))
1591	blank_finish = 0
1592	block.disconnect()
1593	# for East Asian chars:
1594	block.pad_double_width(self.double_width_pad_char)
1595	width = len(block[0].strip())
1596	for i in range(len(block)):
1597	block[i] = block[i].strip()
1598	if block[i][0] not in '+\|': # check left edge
1599	blank_finish = 0
1600	self.state_machine.previous_line(len(block) - i)
1601	del block[i:]
1602	break
1603	if not self.grid_table_top_pat.match(block[-1]): # find bottom
1604	blank_finish = 0
1605	# from second-last to third line of table:
1606	for i in range(len(block) - 2, 1, -1):
1607	if self.grid_table_top_pat.match(block[i]):
1608	self.state_machine.previous_line(len(block) - i + 1)
1609	del block[i+1:]
1610	break
1611	else:
1612	messages.extend(self.malformed_table(block))
1613	return [], messages, blank_finish
1614	for i in range(len(block)): # check right edge
1615	if len(block[i]) != width or block[i][-1] not in '+\|':
1616	messages.extend(self.malformed_table(block))
1617	return [], messages, blank_finish
1618	return block, messages, blank_finish
1619
1620	def isolate_simple_table(self):
1621	start = self.state_machine.line_offset
1622	lines = self.state_machine.input_lines
1623	limit = len(lines) - 1
1624	toplen = len(lines[start].strip())
1625	pattern_match = self.simple_table_border_pat.match
1626	found = 0
1627	found_at = None
1628	i = start + 1
1629	while i <= limit:
1630	line = lines[i]
1631	match = pattern_match(line)
1632	if match:
1633	if len(line.strip()) != toplen:
1634	self.state_machine.next_line(i - start)
1635	messages = self.malformed_table(
1636	lines[start:i+1], 'Bottom/header table border does '
1637	'not match top border.')
1638	return [], messages, i == limit or not lines[i+1].strip()
1639	found += 1
1640	found_at = i
1641	if found == 2 or i == limit or not lines[i+1].strip():
1642	end = i
1643	break
1644	i += 1
1645	else: # reached end of input_lines
1646	if found:
1647	extra = ' or no blank line after table bottom'
1648	self.state_machine.next_line(found_at - start)
1649	block = lines[start:found_at+1]
1650	else:
1651	extra = ''
1652	self.state_machine.next_line(i - start - 1)
1653	block = lines[start:]
1654	messages = self.malformed_table(
1655	block, 'No bottom table border found%s.' % extra)
1656	return [], messages, not extra
1657	self.state_machine.next_line(end - start)
1658	block = lines[start:end+1]
1659	# for East Asian chars:
1660	block.pad_double_width(self.double_width_pad_char)
1661	return block, [], end == limit or not lines[end+1].strip()
1662
1663	def malformed_table(self, block, detail=''):
1664	block.replace(self.double_width_pad_char, '')
1665	data = '\n'.join(block)
1666	message = 'Malformed table.'
1667	lineno = self.state_machine.abs_line_number() - len(block) + 1
1668	if detail:
1669	message += '\n' + detail
1670	error = self.reporter.error(message, nodes.literal_block(data, data),
1671	line=lineno)
1672	return [error]
1673
1674	def build_table(self, tabledata, tableline, stub_columns=0):
1675	colwidths, headrows, bodyrows = tabledata
1676	table = nodes.table()
1677	tgroup = nodes.tgroup(cols=len(colwidths))
1678	table += tgroup
1679	for colwidth in colwidths:
1680	colspec = nodes.colspec(colwidth=colwidth)
1681	if stub_columns:
1682	colspec.attributes['stub'] = 1
1683	stub_columns -= 1
1684	tgroup += colspec
1685	if headrows:
1686	thead = nodes.thead()
1687	tgroup += thead
1688	for row in headrows:
1689	thead += self.build_table_row(row, tableline)
1690	tbody = nodes.tbody()
1691	tgroup += tbody
1692	for row in bodyrows:
1693	tbody += self.build_table_row(row, tableline)
1694	return table
1695
1696	def build_table_row(self, rowdata, tableline):
1697	row = nodes.row()
1698	for cell in rowdata:
1699	if cell is None:
1700	continue
1701	morerows, morecols, offset, cellblock = cell
1702	attributes = {}
1703	if morerows:
1704	attributes['morerows'] = morerows
1705	if morecols:
1706	attributes['morecols'] = morecols
1707	entry = nodes.entry(**attributes)
1708	row += entry
1709	if ''.join(cellblock):
1710	self.nested_parse(cellblock, input_offset=tableline+offset,
1711	node=entry)
1712	return row
1713
1714
1715	explicit = Struct()
1716	"""Patterns and constants used for explicit markup recognition."""
1717
1718	explicit.patterns = Struct(
1719	target=re.compile(r"""
1720	(
1721	_ # anonymous target
1722	\| # OR
1723	(?P<quote>`?) # optional open quote
1724	(?![ `]) # first char. not space or
1725	# backquote
1726	(?P<name> # reference name
1727	.+?
1728	)
1729	%(non_whitespace_escape_before)s
1730	(?P=quote) # close quote if open quote used
1731	)
1732	(?<!(?<!\x00):) # no unescaped colon at end
1733	%(non_whitespace_escape_before)s
1734	[ ]? # optional space
1735	: # end of reference name
1736	([ ]+\|$) # followed by whitespace
1737	""" % vars(Inliner), re.VERBOSE),
1738	reference=re.compile(r"""
1739	(
1740	(?P<simple>%(simplename)s)_
1741	\| # OR
1742	` # open backquote
1743	(?![ ]) # not space
1744	(?P<phrase>.+?) # hyperlink phrase
1745	%(non_whitespace_escape_before)s
1746	`_ # close backquote,
1747	# reference mark
1748	)
1749	$ # end of string
1750	""" % vars(Inliner), re.VERBOSE \| re.UNICODE),
1751	substitution=re.compile(r"""
1752	(
1753	(?![ ]) # first char. not space
1754	(?P<name>.+?) # substitution text
1755	%(non_whitespace_escape_before)s
1756	\\| # close delimiter
1757	)
1758	([ ]+\|$) # followed by whitespace
1759	""" % vars(Inliner), re.VERBOSE),)
1760
1761	def footnote(self, match):
1762	lineno = self.state_machine.abs_line_number()
1763	indented, indent, offset, blank_finish = \
1764	self.state_machine.get_first_known_indented(match.end())
1765	label = match.group(1)
1766	name = normalize_name(label)
1767	footnote = nodes.footnote('\n'.join(indented))
1768	footnote.line = lineno
1769	if name[0] == '#': # auto-numbered
1770	name = name[1:] # autonumber label
1771	footnote['auto'] = 1
1772	if name:
1773	footnote['names'].append(name)
1774	self.document.note_autofootnote(footnote)
1775	elif name == '*': # auto-symbol
1776	name = ''
1777	footnote['auto'] = '*'
1778	self.document.note_symbol_footnote(footnote)
1779	else: # manually numbered
1780	footnote += nodes.label('', label)
1781	footnote['names'].append(name)
1782	self.document.note_footnote(footnote)
1783	if name:
1784	self.document.note_explicit_target(footnote, footnote)
1785	else:
1786	self.document.set_id(footnote, footnote)
1787	if indented:
1788	self.nested_parse(indented, input_offset=offset, node=footnote)
1789	return [footnote], blank_finish
1790
1791	def citation(self, match):
1792	lineno = self.state_machine.abs_line_number()
1793	indented, indent, offset, blank_finish = \
1794	self.state_machine.get_first_known_indented(match.end())
1795	label = match.group(1)
1796	name = normalize_name(label)
1797	citation = nodes.citation('\n'.join(indented))
1798	citation.line = lineno
1799	citation += nodes.label('', label)
1800	citation['names'].append(name)
1801	self.document.note_citation(citation)
1802	self.document.note_explicit_target(citation, citation)
1803	if indented:
1804	self.nested_parse(indented, input_offset=offset, node=citation)
1805	return [citation], blank_finish
1806
1807	def hyperlink_target(self, match):
1808	pattern = self.explicit.patterns.target
1809	lineno = self.state_machine.abs_line_number()
1810	block, indent, offset, blank_finish = \
1811	self.state_machine.get_first_known_indented(
1812	match.end(), until_blank=1, strip_indent=0)
1813	blocktext = match.string[:match.end()] + '\n'.join(block)
1814	block = [escape2null(line) for line in block]
1815	escaped = block[0]
1816	blockindex = 0
1817	while 1:
1818	targetmatch = pattern.match(escaped)
1819	if targetmatch:
1820	break
1821	blockindex += 1
1822	try:
1823	escaped += block[blockindex]
1824	except IndexError:
1825	raise MarkupError('malformed hyperlink target.', lineno)
1826	del block[:blockindex]
1827	block[0] = (block[0] + ' ')[targetmatch.end()-len(escaped)-1:].strip()
1828	target = self.make_target(block, blocktext, lineno,
1829	targetmatch.group('name'))
1830	return [target], blank_finish
1831
1832	def make_target(self, block, block_text, lineno, target_name):
1833	target_type, data = self.parse_target(block, block_text, lineno)
1834	if target_type == 'refname':
1835	target = nodes.target(block_text, '', refname=normalize_name(data))
1836	target.indirect_reference_name = data
1837	self.add_target(target_name, '', target, lineno)
1838	self.document.note_indirect_target(target)
1839	return target
1840	elif target_type == 'refuri':
1841	target = nodes.target(block_text, '')
1842	self.add_target(target_name, data, target, lineno)
1843	return target
1844	else:
1845	return data
1846
1847	def parse_target(self, block, block_text, lineno):
1848	"""
1849	Determine the type of reference of a target.
1850
1851	:Return: A 2-tuple, one of:
1852
1853	- 'refname' and the indirect reference name
1854	- 'refuri' and the URI
1855	- 'malformed' and a system_message node
1856	"""
1857	if block and block[-1].strip()[-1:] == '_': # possible indirect target
1858	reference = ' '.join([line.strip() for line in block])
1859	refname = self.is_reference(reference)
1860	if refname:
1861	return 'refname', refname
1862	reference = ''.join([''.join(line.split()) for line in block])
1863	return 'refuri', unescape(reference)
1864
1865	def is_reference(self, reference):
1866	match = self.explicit.patterns.reference.match(
1867	whitespace_normalize_name(reference))
1868	if not match:
1869	return None
1870	return unescape(match.group('simple') or match.group('phrase'))
1871
1872	def add_target(self, targetname, refuri, target, lineno):
1873	target.line = lineno
1874	if targetname:
1875	name = normalize_name(unescape(targetname))
1876	target['names'].append(name)
1877	if refuri:
1878	uri = self.inliner.adjust_uri(refuri)
1879	if uri:
1880	target['refuri'] = uri
1881	else:
1882	raise ApplicationError('problem with URI: %r' % refuri)
1883	self.document.note_explicit_target(target, self.parent)
1884	else: # anonymous target
1885	if refuri:
1886	target['refuri'] = refuri
1887	target['anonymous'] = 1
1888	self.document.note_anonymous_target(target)
1889
1890	def substitution_def(self, match):
1891	pattern = self.explicit.patterns.substitution
1892	lineno = self.state_machine.abs_line_number()
1893	block, indent, offset, blank_finish = \
1894	self.state_machine.get_first_known_indented(match.end(),
1895	strip_indent=0)
1896	blocktext = (match.string[:match.end()] + '\n'.join(block))
1897	block.disconnect()
1898	escaped = escape2null(block[0].rstrip())
1899	blockindex = 0
1900	while 1:
1901	subdefmatch = pattern.match(escaped)
1902	if subdefmatch:
1903	break
1904	blockindex += 1
1905	try:
1906	escaped = escaped + ' ' + escape2null(block[blockindex].strip())
1907	except IndexError:
1908	raise MarkupError('malformed substitution definition.',
1909	lineno)
1910	del block[:blockindex] # strip out the substitution marker
1911	block[0] = (block[0].strip() + ' ')[subdefmatch.end()-len(escaped)-1:-1]
1912	if not block[0]:
1913	del block[0]
1914	offset += 1
1915	while block and not block[-1].strip():
1916	block.pop()
1917	subname = subdefmatch.group('name')
1918	substitution_node = nodes.substitution_definition(blocktext)
1919	substitution_node.line = lineno
1920	if not block:
1921	msg = self.reporter.warning(
1922	'Substitution definition "%s" missing contents.' % subname,
1923	nodes.literal_block(blocktext, blocktext), line=lineno)
1924	return [msg], blank_finish
1925	block[0] = block[0].strip()
1926	substitution_node['names'].append(
1927	nodes.whitespace_normalize_name(subname))
1928	new_abs_offset, blank_finish = self.nested_list_parse(
1929	block, input_offset=offset, node=substitution_node,
1930	initial_state='SubstitutionDef', blank_finish=blank_finish)
1931	i = 0
1932	for node in substitution_node[:]:
1933	if not (isinstance(node, nodes.Inline) or
1934	isinstance(node, nodes.Text)):
1935	self.parent += substitution_node[i]
1936	del substitution_node[i]
1937	else:
1938	i += 1
1939	for node in substitution_node.traverse(nodes.Element):
1940	if self.disallowed_inside_substitution_definitions(node):
1941	pformat = nodes.literal_block('', node.pformat().rstrip())
1942	msg = self.reporter.error(
1943	'Substitution definition contains illegal element:',
1944	pformat, nodes.literal_block(blocktext, blocktext),
1945	line=lineno)
1946	return [msg], blank_finish
1947	if len(substitution_node) == 0:
1948	msg = self.reporter.warning(
1949	'Substitution definition "%s" empty or invalid.'
1950	% subname,
1951	nodes.literal_block(blocktext, blocktext), line=lineno)
1952	return [msg], blank_finish
1953	self.document.note_substitution_def(
1954	substitution_node, subname, self.parent)
1955	return [substitution_node], blank_finish
1956
1957	def disallowed_inside_substitution_definitions(self, node):
1958	if (node['ids'] or
1959	isinstance(node, nodes.reference) and node.get('anonymous') or
1960	isinstance(node, nodes.footnote_reference) and node.get('auto')):
1961	return 1
1962	else:
1963	return 0
1964
1965	def directive(self, match, **option_presets):
1966	"""Returns a 2-tuple: list of nodes, and a "blank finish" boolean."""
1967	type_name = match.group(1)
1968	directive_function, messages = directives.directive(
1969	type_name, self.memo.language, self.document)
1970	self.parent += messages
1971	if directive_function:
1972	return self.run_directive(
1973	directive_function, match, type_name, option_presets)
1974	else:
1975	return self.unknown_directive(type_name)
1976
1977	def run_directive(self, directive_fn, match, type_name, option_presets):
1978	"""
1979	Parse a directive then run its directive function.
1980
1981	Parameters:
1982
1983	- `directive_fn`: The function implementing the directive. Uses
1984	function attributes ``arguments``, ``options``, and/or ``content``
1985	if present.
1986
1987	- `match`: A regular expression match object which matched the first
1988	line of the directive.
1989
1990	- `type_name`: The directive name, as used in the source text.
1991
1992	- `option_presets`: A dictionary of preset options, defaults for the
1993	directive options. Currently, only an "alt" option is passed by
1994	substitution definitions (value: the substitution name), which may
1995	be used by an embedded image directive.
1996
1997	Returns a 2-tuple: list of nodes, and a "blank finish" boolean.
1998	"""
1999	lineno = self.state_machine.abs_line_number()
2000	initial_line_offset = self.state_machine.line_offset
2001	indented, indent, line_offset, blank_finish \
2002	= self.state_machine.get_first_known_indented(match.end(),
2003	strip_top=0)
2004	block_text = '\n'.join(self.state_machine.input_lines[
2005	initial_line_offset : self.state_machine.line_offset + 1])
2006	try:
2007	arguments, options, content, content_offset = (
2008	self.parse_directive_block(indented, line_offset,
2009	directive_fn, option_presets))
2010	except MarkupError, detail:
2011	error = self.reporter.error(
2012	'Error in "%s" directive:\n%s.' % (type_name,
2013	' '.join(detail.args)),
2014	nodes.literal_block(block_text, block_text), line=lineno)
2015	return [error], blank_finish
2016	result = directive_fn(type_name, arguments, options, content, lineno,
2017	content_offset, block_text, self,
2018	self.state_machine)
2019	return (result,
2020	blank_finish or self.state_machine.is_next_line_blank())
2021
2022	def parse_directive_block(self, indented, line_offset, directive_fn,
2023	option_presets):
2024	arguments = []
2025	options = {}
2026	argument_spec = getattr(directive_fn, 'arguments', None)
2027	if argument_spec and argument_spec[:2] == (0, 0):
2028	argument_spec = None
2029	option_spec = getattr(directive_fn, 'options', None)
2030	content_spec = getattr(directive_fn, 'content', None)
2031	if indented and not indented[0].strip():
2032	indented.trim_start()
2033	line_offset += 1
2034	while indented and not indented[-1].strip():
2035	indented.trim_end()
2036	if indented and (argument_spec or option_spec):
2037	for i in range(len(indented)):
2038	if not indented[i].strip():
2039	break
2040	else:
2041	i += 1
2042	arg_block = indented[:i]
2043	content = indented[i+1:]
2044	content_offset = line_offset + i + 1
2045	else:
2046	content = indented
2047	content_offset = line_offset
2048	arg_block = []
2049	while content and not content[0].strip():
2050	content.trim_start()
2051	content_offset += 1
2052	if option_spec:
2053	options, arg_block = self.parse_directive_options(
2054	option_presets, option_spec, arg_block)
2055	if arg_block and not argument_spec:
2056	raise MarkupError('no arguments permitted; blank line '
2057	'required before content block')
2058	if argument_spec:
2059	arguments = self.parse_directive_arguments(
2060	argument_spec, arg_block)
2061	if content and not content_spec:
2062	raise MarkupError('no content permitted')
2063	return (arguments, options, content, content_offset)
2064
2065	def parse_directive_options(self, option_presets, option_spec, arg_block):
2066	options = option_presets.copy()
2067	for i in range(len(arg_block)):
2068	if arg_block[i][:1] == ':':
2069	opt_block = arg_block[i:]
2070	arg_block = arg_block[:i]
2071	break
2072	else:
2073	opt_block = []
2074	if opt_block:
2075	success, data = self.parse_extension_options(option_spec,
2076	opt_block)
2077	if success: # data is a dict of options
2078	options.update(data)
2079	else: # data is an error string
2080	raise MarkupError(data)
2081	return options, arg_block
2082
2083	def parse_directive_arguments(self, argument_spec, arg_block):
2084	required, optional, last_whitespace = argument_spec
2085	arg_text = '\n'.join(arg_block)
2086	arguments = arg_text.split()
2087	if len(arguments) < required:
2088	raise MarkupError('%s argument(s) required, %s supplied'
2089	% (required, len(arguments)))
2090	elif len(arguments) > required + optional:
2091	if last_whitespace:
2092	arguments = arg_text.split(None, required + optional - 1)
2093	else:
2094	raise MarkupError(
2095	'maximum %s argument(s) allowed, %s supplied'
2096	% (required + optional, len(arguments)))
2097	return arguments
2098
2099	def parse_extension_options(self, option_spec, datalines):
2100	"""
2101	Parse `datalines` for a field list containing extension options
2102	matching `option_spec`.
2103
2104	:Parameters:
2105	- `option_spec`: a mapping of option name to conversion
2106	function, which should raise an exception on bad input.
2107	- `datalines`: a list of input strings.
2108
2109	:Return:
2110	- Success value, 1 or 0.
2111	- An option dictionary on success, an error string on failure.
2112	"""
2113	node = nodes.field_list()
2114	newline_offset, blank_finish = self.nested_list_parse(
2115	datalines, 0, node, initial_state='ExtensionOptions',
2116	blank_finish=1)
2117	if newline_offset != len(datalines): # incomplete parse of block
2118	return 0, 'invalid option block'
2119	try:
2120	options = utils.extract_extension_options(node, option_spec)
2121	except KeyError, detail:
2122	return 0, ('unknown option: "%s"' % detail.args[0])
2123	except (ValueError, TypeError), detail:
2124	return 0, ('invalid option value: %s' % ' '.join(detail.args))
2125	except utils.ExtensionOptionError, detail:
2126	return 0, ('invalid option data: %s' % ' '.join(detail.args))
2127	if blank_finish:
2128	return 1, options
2129	else:
2130	return 0, 'option data incompletely parsed'
2131
2132	def unknown_directive(self, type_name):
2133	lineno = self.state_machine.abs_line_number()
2134	indented, indent, offset, blank_finish = \
2135	self.state_machine.get_first_known_indented(0, strip_indent=0)
2136	text = '\n'.join(indented)
2137	error = self.reporter.error(
2138	'Unknown directive type "%s".' % type_name,
2139	nodes.literal_block(text, text), line=lineno)
2140	return [error], blank_finish
2141
2142	def comment(self, match):
2143	if not match.string[match.end():].strip() \
2144	and self.state_machine.is_next_line_blank(): # an empty comment?
2145	return [nodes.comment()], 1 # "A tiny but practical wart."
2146	indented, indent, offset, blank_finish = \
2147	self.state_machine.get_first_known_indented(match.end())
2148	while indented and not indented[-1].strip():
2149	indented.trim_end()
2150	text = '\n'.join(indented)
2151	return [nodes.comment(text, text)], blank_finish
2152
2153	explicit.constructs = [
2154	(footnote,
2155	re.compile(r"""
2156	\.\.[ ]+ # explicit markup start
2157	\[
2158	( # footnote label:
2159	[0-9]+ # manually numbered footnote
2160	\| # OR
2161	\# # anonymous auto-numbered footnote
2162	\| # OR
2163	\#%s # auto-number ed?) footnote label
2164	\| # OR
2165	\* # auto-symbol footnote
2166	)
2167	\]
2168	([ ]+\|$) # whitespace or end of line
2169	""" % Inliner.simplename, re.VERBOSE \| re.UNICODE)),
2170	(citation,
2171	re.compile(r"""
2172	\.\.[ ]+ # explicit markup start
2173	\[(%s)\] # citation label
2174	([ ]+\|$) # whitespace or end of line
2175	""" % Inliner.simplename, re.VERBOSE \| re.UNICODE)),
2176	(hyperlink_target,
2177	re.compile(r"""
2178	\.\.[ ]+ # explicit markup start
2179	_ # target indicator
2180	(?![ ]\|$) # first char. not space or EOL
2181	""", re.VERBOSE)),
2182	(substitution_def,
2183	re.compile(r"""
2184	\.\.[ ]+ # explicit markup start
2185	\\| # substitution indicator
2186	(?![ ]\|$) # first char. not space or EOL
2187	""", re.VERBOSE)),
2188	(directive,
2189	re.compile(r"""
2190	\.\.[ ]+ # explicit markup start
2191	(%s) # directive name
2192	[ ]? # optional space
2193	:: # directive delimiter
2194	([ ]+\|$) # whitespace or end of line
2195	""" % Inliner.simplename, re.VERBOSE \| re.UNICODE))]
2196
2197	def explicit_markup(self, match, context, next_state):
2198	"""Footnotes, hyperlink targets, directives, comments."""
2199	nodelist, blank_finish = self.explicit_construct(match)
2200	self.parent += nodelist
2201	self.explicit_list(blank_finish)
2202	return [], next_state, []
2203
2204	def explicit_construct(self, match):
2205	"""Determine which explicit construct this is, parse & return it."""
2206	errors = []
2207	for method, pattern in self.explicit.constructs:
2208	expmatch = pattern.match(match.string)
2209	if expmatch:
2210	try:
2211	return method(self, expmatch)
2212	except MarkupError, (message, lineno): # never reached?
2213	errors.append(self.reporter.warning(message, line=lineno))
2214	break
2215	nodelist, blank_finish = self.comment(match)
2216	return nodelist + errors, blank_finish
2217
2218	def explicit_list(self, blank_finish):
2219	"""
2220	Create a nested state machine for a series of explicit markup
2221	constructs (including anonymous hyperlink targets).
2222	"""
2223	offset = self.state_machine.line_offset + 1 # next line
2224	newline_offset, blank_finish = self.nested_list_parse(
2225	self.state_machine.input_lines[offset:],
2226	input_offset=self.state_machine.abs_line_offset() + 1,
2227	node=self.parent, initial_state='Explicit',
2228	blank_finish=blank_finish,
2229	match_titles=self.state_machine.match_titles)
2230	self.goto_line(newline_offset)
2231	if not blank_finish:
2232	self.parent += self.unindent_warning('Explicit markup')
2233
2234	def anonymous(self, match, context, next_state):
2235	"""Anonymous hyperlink targets."""
2236	nodelist, blank_finish = self.anonymous_target(match)
2237	self.parent += nodelist
2238	self.explicit_list(blank_finish)
2239	return [], next_state, []
2240
2241	def anonymous_target(self, match):
2242	lineno = self.state_machine.abs_line_number()
2243	block, indent, offset, blank_finish \
2244	= self.state_machine.get_first_known_indented(match.end(),
2245	until_blank=1)
2246	blocktext = match.string[:match.end()] + '\n'.join(block)
2247	block = [escape2null(line) for line in block]
2248	target = self.make_target(block, blocktext, lineno, '')
2249	return [target], blank_finish
2250
2251	def line(self, match, context, next_state):
2252	"""Section title overline or transition marker."""
2253	if self.state_machine.match_titles:
2254	return [match.string], 'Line', []
2255	elif match.string.strip() == '::':
2256	raise statemachine.TransitionCorrection('text')
2257	elif len(match.string.strip()) < 4:
2258	msg = self.reporter.info(
2259	'Unexpected possible title overline or transition.\n'
2260	"Treating it as ordinary text because it's so short.",
2261	line=self.state_machine.abs_line_number())
2262	self.parent += msg
2263	raise statemachine.TransitionCorrection('text')
2264	else:
2265	blocktext = self.state_machine.line
2266	msg = self.reporter.severe(
2267	'Unexpected section title or transition.',
2268	nodes.literal_block(blocktext, blocktext),
2269	line=self.state_machine.abs_line_number())
2270	self.parent += msg
2271	return [], next_state, []
2272
2273	def text(self, match, context, next_state):
2274	"""Titles, definition lists, paragraphs."""
2275	return [match.string], 'Text', []
2276
2277
2278	class RFC2822Body(Body):
2279
2280	"""
2281	RFC2822 headers are only valid as the first constructs in documents. As
2282	soon as anything else appears, the `Body` state should take over.
2283	"""
2284
2285	patterns = Body.patterns.copy() # can't modify the original
2286	patterns['rfc2822'] = r'[!-9;-~]+:( +\|$)'
2287	initial_transitions = [(name, 'Body')
2288	for name in Body.initial_transitions]
2289	initial_transitions.insert(-1, ('rfc2822', 'Body')) # just before 'text'
2290
2291	def rfc2822(self, match, context, next_state):
2292	"""RFC2822-style field list item."""
2293	fieldlist = nodes.field_list(classes=['rfc2822'])
2294	self.parent += fieldlist
2295	field, blank_finish = self.rfc2822_field(match)
2296	fieldlist += field
2297	offset = self.state_machine.line_offset + 1 # next line
2298	newline_offset, blank_finish = self.nested_list_parse(
2299	self.state_machine.input_lines[offset:],
2300	input_offset=self.state_machine.abs_line_offset() + 1,
2301	node=fieldlist, initial_state='RFC2822List',
2302	blank_finish=blank_finish)
2303	self.goto_line(newline_offset)
2304	if not blank_finish:
2305	self.parent += self.unindent_warning(
2306	'RFC2822-style field list')
2307	return [], next_state, []
2308
2309	def rfc2822_field(self, match):
2310	name = match.string[:match.string.find(':')]
2311	indented, indent, line_offset, blank_finish = \
2312	self.state_machine.get_first_known_indented(match.end(),
2313	until_blank=1)
2314	fieldnode = nodes.field()
2315	fieldnode += nodes.field_name(name, name)
2316	fieldbody = nodes.field_body('\n'.join(indented))
2317	fieldnode += fieldbody
2318	if indented:
2319	self.nested_parse(indented, input_offset=line_offset,
2320	node=fieldbody)
2321	return fieldnode, blank_finish
2322
2323
2324	class SpecializedBody(Body):
2325
2326	"""
2327	Superclass for second and subsequent compound element members. Compound
2328	elements are lists and list-like constructs.
2329
2330	All transition methods are disabled (redefined as `invalid_input`).
2331	Override individual methods in subclasses to re-enable.
2332
2333	For example, once an initial bullet list item, say, is recognized, the
2334	`BulletList` subclass takes over, with a "bullet_list" node as its
2335	container. Upon encountering the initial bullet list item, `Body.bullet`
2336	calls its ``self.nested_list_parse`` (`RSTState.nested_list_parse`), which
2337	starts up a nested parsing session with `BulletList` as the initial state.
2338	Only the ``bullet`` transition method is enabled in `BulletList`; as long
2339	as only bullet list items are encountered, they are parsed and inserted
2340	into the container. The first construct which is not a bullet list item
2341	triggers the `invalid_input` method, which ends the nested parse and
2342	closes the container. `BulletList` needs to recognize input that is
2343	invalid in the context of a bullet list, which means everything *other
2344	than* bullet list items, so it inherits the transition list created in
2345	`Body`.
2346	"""
2347
2348	def invalid_input(self, match=None, context=None, next_state=None):
2349	"""Not a compound element member. Abort this state machine."""
2350	self.state_machine.previous_line() # back up so parent SM can reassess
2351	raise EOFError
2352
2353	indent = invalid_input
2354	bullet = invalid_input
2355	enumerator = invalid_input
2356	field_marker = invalid_input
2357	option_marker = invalid_input
2358	doctest = invalid_input
2359	line_block = invalid_input
2360	grid_table_top = invalid_input
2361	simple_table_top = invalid_input
2362	explicit_markup = invalid_input
2363	anonymous = invalid_input
2364	line = invalid_input
2365	text = invalid_input
2366
2367
2368	class BulletList(SpecializedBody):
2369
2370	"""Second and subsequent bullet_list list_items."""
2371
2372	def bullet(self, match, context, next_state):
2373	"""Bullet list item."""
2374	if match.string[0] != self.parent['bullet']:
2375	# different bullet: new list
2376	self.invalid_input()
2377	listitem, blank_finish = self.list_item(match.end())
2378	self.parent += listitem
2379	self.blank_finish = blank_finish
2380	return [], next_state, []
2381
2382
2383	class DefinitionList(SpecializedBody):
2384
2385	"""Second and subsequent definition_list_items."""
2386
2387	def text(self, match, context, next_state):
2388	"""Definition lists."""
2389	return [match.string], 'Definition', []
2390
2391
2392	class EnumeratedList(SpecializedBody):
2393
2394	"""Second and subsequent enumerated_list list_items."""
2395
2396	def enumerator(self, match, context, next_state):
2397	"""Enumerated list item."""
2398	format, sequence, text, ordinal = self.parse_enumerator(
2399	match, self.parent['enumtype'])
2400	if ( format != self.format
2401	or (sequence != '#' and (sequence != self.parent['enumtype']
2402	or self.auto
2403	or ordinal != (self.lastordinal + 1)))
2404	or not self.is_enumerated_list_item(ordinal, sequence, format)):
2405	# different enumeration: new list
2406	self.invalid_input()
2407	if sequence == '#':
2408	self.auto = 1
2409	listitem, blank_finish = self.list_item(match.end())
2410	self.parent += listitem
2411	self.blank_finish = blank_finish
2412	self.lastordinal = ordinal
2413	return [], next_state, []
2414
2415
2416	class FieldList(SpecializedBody):
2417
2418	"""Second and subsequent field_list fields."""
2419
2420	def field_marker(self, match, context, next_state):
2421	"""Field list field."""
2422	field, blank_finish = self.field(match)
2423	self.parent += field
2424	self.blank_finish = blank_finish
2425	return [], next_state, []
2426
2427
2428	class OptionList(SpecializedBody):
2429
2430	"""Second and subsequent option_list option_list_items."""
2431
2432	def option_marker(self, match, context, next_state):
2433	"""Option list item."""
2434	try:
2435	option_list_item, blank_finish = self.option_list_item(match)
2436	except MarkupError, (message, lineno):
2437	self.invalid_input()
2438	self.parent += option_list_item
2439	self.blank_finish = blank_finish
2440	return [], next_state, []
2441
2442
2443	class RFC2822List(SpecializedBody, RFC2822Body):
2444
2445	"""Second and subsequent RFC2822-style field_list fields."""
2446
2447	patterns = RFC2822Body.patterns
2448	initial_transitions = RFC2822Body.initial_transitions
2449
2450	def rfc2822(self, match, context, next_state):
2451	"""RFC2822-style field list item."""
2452	field, blank_finish = self.rfc2822_field(match)
2453	self.parent += field
2454	self.blank_finish = blank_finish
2455	return [], 'RFC2822List', []
2456
2457	blank = SpecializedBody.invalid_input
2458
2459
2460	class ExtensionOptions(FieldList):
2461
2462	"""
2463	Parse field_list fields for extension options.
2464
2465	No nested parsing is done (including inline markup parsing).
2466	"""
2467
2468	def parse_field_body(self, indented, offset, node):
2469	"""Override `Body.parse_field_body` for simpler parsing."""
2470	lines = []
2471	for line in list(indented) + ['']:
2472	if line.strip():
2473	lines.append(line)
2474	elif lines:
2475	text = '\n'.join(lines)
2476	node += nodes.paragraph(text, text)
2477	lines = []
2478
2479
2480	class LineBlock(SpecializedBody):
2481
2482	"""Second and subsequent lines of a line_block."""
2483
2484	blank = SpecializedBody.invalid_input
2485
2486	def line_block(self, match, context, next_state):
2487	"""New line of line block."""
2488	lineno = self.state_machine.abs_line_number()
2489	line, messages, blank_finish = self.line_block_line(match, lineno)
2490	self.parent += line
2491	self.parent.parent += messages
2492	self.blank_finish = blank_finish
2493	return [], next_state, []
2494
2495
2496	class Explicit(SpecializedBody):
2497
2498	"""Second and subsequent explicit markup construct."""
2499
2500	def explicit_markup(self, match, context, next_state):
2501	"""Footnotes, hyperlink targets, directives, comments."""
2502	nodelist, blank_finish = self.explicit_construct(match)
2503	self.parent += nodelist
2504	self.blank_finish = blank_finish
2505	return [], next_state, []
2506
2507	def anonymous(self, match, context, next_state):
2508	"""Anonymous hyperlink targets."""
2509	nodelist, blank_finish = self.anonymous_target(match)
2510	self.parent += nodelist
2511	self.blank_finish = blank_finish
2512	return [], next_state, []
2513
2514	blank = SpecializedBody.invalid_input
2515
2516
2517	class SubstitutionDef(Body):
2518
2519	"""
2520	Parser for the contents of a substitution_definition element.
2521	"""
2522
2523	patterns = {
2524	'embedded_directive': re.compile(r'(%s)::( +\|$)'
2525	% Inliner.simplename, re.UNICODE),
2526	'text': r''}
2527	initial_transitions = ['embedded_directive', 'text']
2528
2529	def embedded_directive(self, match, context, next_state):
2530	nodelist, blank_finish = self.directive(match,
2531	alt=self.parent['names'][0])
2532	self.parent += nodelist
2533	if not self.state_machine.at_eof():
2534	self.blank_finish = blank_finish
2535	raise EOFError
2536
2537	def text(self, match, context, next_state):
2538	if not self.state_machine.at_eof():
2539	self.blank_finish = self.state_machine.is_next_line_blank()
2540	raise EOFError
2541
2542
2543	class Text(RSTState):
2544
2545	"""
2546	Classifier of second line of a text block.
2547
2548	Could be a paragraph, a definition list item, or a title.
2549	"""
2550
2551	patterns = {'underline': Body.patterns['line'],
2552	'text': r''}
2553	initial_transitions = [('underline', 'Body'), ('text', 'Body')]
2554
2555	def blank(self, match, context, next_state):
2556	"""End of paragraph."""
2557	paragraph, literalnext = self.paragraph(
2558	context, self.state_machine.abs_line_number() - 1)
2559	self.parent += paragraph
2560	if literalnext:
2561	self.parent += self.literal_block()
2562	return [], 'Body', []
2563
2564	def eof(self, context):
2565	if context:
2566	self.blank(None, context, None)
2567	return []
2568
2569	def indent(self, match, context, next_state):
2570	"""Definition list item."""
2571	definitionlist = nodes.definition_list()
2572	definitionlistitem, blank_finish = self.definition_list_item(context)
2573	definitionlist += definitionlistitem
2574	self.parent += definitionlist
2575	offset = self.state_machine.line_offset + 1 # next line
2576	newline_offset, blank_finish = self.nested_list_parse(
2577	self.state_machine.input_lines[offset:],
2578	input_offset=self.state_machine.abs_line_offset() + 1,
2579	node=definitionlist, initial_state='DefinitionList',
2580	blank_finish=blank_finish, blank_finish_state='Definition')
2581	self.goto_line(newline_offset)
2582	if not blank_finish:
2583	self.parent += self.unindent_warning('Definition list')
2584	return [], 'Body', []
2585
2586	def underline(self, match, context, next_state):
2587	"""Section title."""
2588	lineno = self.state_machine.abs_line_number()
2589	title = context[0].rstrip()
2590	underline = match.string.rstrip()
2591	source = title + '\n' + underline
2592	messages = []
2593	if column_width(title) > len(underline):
2594	if len(underline) < 4:
2595	if self.state_machine.match_titles:
2596	msg = self.reporter.info(
2597	'Possible title underline, too short for the title.\n'
2598	"Treating it as ordinary text because it's so short.",
2599	line=lineno)
2600	self.parent += msg
2601	raise statemachine.TransitionCorrection('text')
2602	else:
2603	blocktext = context[0] + '\n' + self.state_machine.line
2604	msg = self.reporter.warning(
2605	'Title underline too short.',
2606	nodes.literal_block(blocktext, blocktext), line=lineno)
2607	messages.append(msg)
2608	if not self.state_machine.match_titles:
2609	blocktext = context[0] + '\n' + self.state_machine.line
2610	msg = self.reporter.severe(
2611	'Unexpected section title.',
2612	nodes.literal_block(blocktext, blocktext), line=lineno)
2613	self.parent += messages
2614	self.parent += msg
2615	return [], next_state, []
2616	style = underline[0]
2617	context[:] = []
2618	self.section(title, source, style, lineno - 1, messages)
2619	return [], next_state, []
2620
2621	def text(self, match, context, next_state):
2622	"""Paragraph."""
2623	startline = self.state_machine.abs_line_number() - 1
2624	msg = None
2625	try:
2626	block = self.state_machine.get_text_block(flush_left=1)
2627	except statemachine.UnexpectedIndentationError, instance:
2628	block, source, lineno = instance.args
2629	msg = self.reporter.error('Unexpected indentation.',
2630	source=source, line=lineno)
2631	lines = context + list(block)
2632	paragraph, literalnext = self.paragraph(lines, startline)
2633	self.parent += paragraph
2634	self.parent += msg
2635	if literalnext:
2636	try:
2637	self.state_machine.next_line()
2638	except EOFError:
2639	pass
2640	self.parent += self.literal_block()
2641	return [], next_state, []
2642
2643	def literal_block(self):
2644	"""Return a list of nodes."""
2645	indented, indent, offset, blank_finish = \
2646	self.state_machine.get_indented()
2647	while indented and not indented[-1].strip():
2648	indented.trim_end()
2649	if not indented:
2650	return self.quoted_literal_block()
2651	data = '\n'.join(indented)
2652	literal_block = nodes.literal_block(data, data)
2653	literal_block.line = offset + 1
2654	nodelist = [literal_block]
2655	if not blank_finish:
2656	nodelist.append(self.unindent_warning('Literal block'))
2657	return nodelist
2658
2659	def quoted_literal_block(self):
2660	abs_line_offset = self.state_machine.abs_line_offset()
2661	offset = self.state_machine.line_offset
2662	parent_node = nodes.Element()
2663	new_abs_offset = self.nested_parse(
2664	self.state_machine.input_lines[offset:],
2665	input_offset=abs_line_offset, node=parent_node, match_titles=0,
2666	state_machine_kwargs={'state_classes': (QuotedLiteralBlock,),
2667	'initial_state': 'QuotedLiteralBlock'})
2668	self.goto_line(new_abs_offset)
2669	return parent_node.children
2670
2671	def definition_list_item(self, termline):
2672	indented, indent, line_offset, blank_finish = \
2673	self.state_machine.get_indented()
2674	definitionlistitem = nodes.definition_list_item(
2675	'\n'.join(termline + list(indented)))
2676	lineno = self.state_machine.abs_line_number() - 1
2677	definitionlistitem.line = lineno
2678	termlist, messages = self.term(termline, lineno)
2679	definitionlistitem += termlist
2680	definition = nodes.definition('', *messages)
2681	definitionlistitem += definition
2682	if termline[0][-2:] == '::':
2683	definition += self.reporter.info(
2684	'Blank line missing before literal block (after the "::")? '
2685	'Interpreted as a definition list item.', line=line_offset+1)
2686	self.nested_parse(indented, input_offset=line_offset, node=definition)
2687	return definitionlistitem, blank_finish
2688
2689	classifier_delimiter = re.compile(' +: +')
2690
2691	def term(self, lines, lineno):
2692	"""Return a definition_list's term and optional classifiers."""
2693	assert len(lines) == 1
2694	text_nodes, messages = self.inline_text(lines[0], lineno)
2695	term_node = nodes.term()
2696	node_list = [term_node]
2697	for i in range(len(text_nodes)):
2698	node = text_nodes[i]
2699	if isinstance(node, nodes.Text):
2700	parts = self.classifier_delimiter.split(node.rawsource)
2701	if len(parts) == 1:
2702	node_list[-1] += node
2703	else:
2704
2705	node_list[-1] += nodes.Text(parts[0].rstrip())
2706	for part in parts[1:]:
2707	classifier_node = nodes.classifier('', part)
2708	node_list.append(classifier_node)
2709	else:
2710	node_list[-1] += node
2711	return node_list, messages
2712
2713
2714	class SpecializedText(Text):
2715
2716	"""
2717	Superclass for second and subsequent lines of Text-variants.
2718
2719	All transition methods are disabled. Override individual methods in
2720	subclasses to re-enable.
2721	"""
2722
2723	def eof(self, context):
2724	"""Incomplete construct."""
2725	return []
2726
2727	def invalid_input(self, match=None, context=None, next_state=None):
2728	"""Not a compound element member. Abort this state machine."""
2729	raise EOFError
2730
2731	blank = invalid_input
2732	indent = invalid_input
2733	underline = invalid_input
2734	text = invalid_input
2735
2736
2737	class Definition(SpecializedText):
2738
2739	"""Second line of potential definition_list_item."""
2740
2741	def eof(self, context):
2742	"""Not a definition."""
2743	self.state_machine.previous_line(2) # so parent SM can reassess
2744	return []
2745
2746	def indent(self, match, context, next_state):
2747	"""Definition list item."""
2748	definitionlistitem, blank_finish = self.definition_list_item(context)
2749	self.parent += definitionlistitem
2750	self.blank_finish = blank_finish
2751	return [], 'DefinitionList', []
2752
2753
2754	class Line(SpecializedText):
2755
2756	"""
2757	Second line of over- & underlined section title or transition marker.
2758	"""
2759
2760	eofcheck = 1 # @@@ ???
2761	"""Set to 0 while parsing sections, so that we don't catch the EOF."""
2762
2763	def eof(self, context):
2764	"""Transition marker at end of section or document."""
2765	marker = context[0].strip()
2766	if self.memo.section_bubble_up_kludge:
2767	self.memo.section_bubble_up_kludge = 0
2768	elif len(marker) < 4:
2769	self.state_correction(context)
2770	if self.eofcheck: # ignore EOFError with sections
2771	lineno = self.state_machine.abs_line_number() - 1
2772	transition = nodes.transition(rawsource=context[0])
2773	transition.line = lineno
2774	self.parent += transition
2775	self.eofcheck = 1
2776	return []
2777
2778	def blank(self, match, context, next_state):
2779	"""Transition marker."""
2780	lineno = self.state_machine.abs_line_number() - 1
2781	marker = context[0].strip()
2782	if len(marker) < 4:
2783	self.state_correction(context)
2784	transition = nodes.transition(rawsource=marker)
2785	transition.line = lineno
2786	self.parent += transition
2787	return [], 'Body', []
2788
2789	def text(self, match, context, next_state):
2790	"""Potential over- & underlined title."""
2791	lineno = self.state_machine.abs_line_number() - 1
2792	overline = context[0]
2793	title = match.string
2794	underline = ''
2795	try:
2796	underline = self.state_machine.next_line()
2797	except EOFError:
2798	blocktext = overline + '\n' + title
2799	if len(overline.rstrip()) < 4:
2800	self.short_overline(context, blocktext, lineno, 2)
2801	else:
2802	msg = self.reporter.severe(
2803	'Incomplete section title.',
2804	nodes.literal_block(blocktext, blocktext), line=lineno)
2805	self.parent += msg
2806	return [], 'Body', []
2807	source = '%s\n%s\n%s' % (overline, title, underline)
2808	overline = overline.rstrip()
2809	underline = underline.rstrip()
2810	if not self.transitions['underline'][0].match(underline):
2811	blocktext = overline + '\n' + title + '\n' + underline
2812	if len(overline.rstrip()) < 4:
2813	self.short_overline(context, blocktext, lineno, 2)
2814	else:
2815	msg = self.reporter.severe(
2816	'Missing matching underline for section title overline.',
2817	nodes.literal_block(source, source), line=lineno)
2818	self.parent += msg
2819	return [], 'Body', []
2820	elif overline != underline:
2821	blocktext = overline + '\n' + title + '\n' + underline
2822	if len(overline.rstrip()) < 4:
2823	self.short_overline(context, blocktext, lineno, 2)
2824	else:
2825	msg = self.reporter.severe(
2826	'Title overline & underline mismatch.',
2827	nodes.literal_block(source, source), line=lineno)
2828	self.parent += msg
2829	return [], 'Body', []
2830	title = title.rstrip()
2831	messages = []
2832	if column_width(title) > len(overline):
2833	blocktext = overline + '\n' + title + '\n' + underline
2834	if len(overline.rstrip()) < 4:
2835	self.short_overline(context, blocktext, lineno, 2)
2836	else:
2837	msg = self.reporter.warning(
2838	'Title overline too short.',
2839	nodes.literal_block(source, source), line=lineno)
2840	messages.append(msg)
2841	style = (overline[0], underline[0])
2842	self.eofcheck = 0 # @@@ not sure this is correct
2843	self.section(title.lstrip(), source, style, lineno + 1, messages)
2844	self.eofcheck = 1
2845	return [], 'Body', []
2846
2847	indent = text # indented title
2848
2849	def underline(self, match, context, next_state):
2850	overline = context[0]
2851	blocktext = overline + '\n' + self.state_machine.line
2852	lineno = self.state_machine.abs_line_number() - 1
2853	if len(overline.rstrip()) < 4:
2854	self.short_overline(context, blocktext, lineno, 1)
2855	msg = self.reporter.error(
2856	'Invalid section title or transition marker.',
2857	nodes.literal_block(blocktext, blocktext), line=lineno)
2858	self.parent += msg
2859	return [], 'Body', []
2860
2861	def short_overline(self, context, blocktext, lineno, lines=1):
2862	msg = self.reporter.info(
2863	'Possible incomplete section title.\nTreating the overline as '
2864	"ordinary text because it's so short.", line=lineno)
2865	self.parent += msg
2866	self.state_correction(context, lines)
2867
2868	def state_correction(self, context, lines=1):
2869	self.state_machine.previous_line(lines)
2870	context[:] = []
2871	raise statemachine.StateCorrection('Body', 'text')
2872
2873
2874	class QuotedLiteralBlock(RSTState):
2875
2876	"""
2877	Nested parse handler for quoted (unindented) literal blocks.
2878
2879	Special-purpose. Not for inclusion in `state_classes`.
2880	"""
2881
2882	patterns = {'initial_quoted': r'(%(nonalphanum7bit)s)' % Body.pats,
2883	'text': r''}
2884	initial_transitions = ('initial_quoted', 'text')
2885
2886	def __init__(self, state_machine, debug=0):
2887	RSTState.__init__(self, state_machine, debug)
2888	self.messages = []
2889	self.initial_lineno = None
2890
2891	def blank(self, match, context, next_state):
2892	if context:
2893	raise EOFError
2894	else:
2895	return context, next_state, []
2896
2897	def eof(self, context):
2898	if context:
2899	text = '\n'.join(context)
2900	literal_block = nodes.literal_block(text, text)
2901	literal_block.line = self.initial_lineno
2902	self.parent += literal_block
2903	else:
2904	self.parent += self.reporter.warning(
2905	'Literal block expected; none found.',
2906	line=self.state_machine.abs_line_number())
2907	self.state_machine.previous_line()
2908	self.parent += self.messages
2909	return []
2910
2911	def indent(self, match, context, next_state):
2912	assert context, ('QuotedLiteralBlock.indent: context should not '
2913	'be empty!')
2914	self.messages.append(
2915	self.reporter.error('Unexpected indentation.',
2916	line=self.state_machine.abs_line_number()))
2917	self.state_machine.previous_line()
2918	raise EOFError
2919
2920	def initial_quoted(self, match, context, next_state):
2921	"""Match arbitrary quote character on the first line only."""
2922	self.remove_transition('initial_quoted')
2923	quote = match.string[0]
2924	pattern = re.compile(re.escape(quote))
2925	# New transition matches consistent quotes only:
2926	self.add_transition('quoted',
2927	(pattern, self.quoted, self.__class__.__name__))
2928	self.initial_lineno = self.state_machine.abs_line_number()
2929	return [match.string], next_state, []
2930
2931	def quoted(self, match, context, next_state):
2932	"""Match consistent quotes on subsequent lines."""
2933	context.append(match.string)
2934	return context, next_state, []
2935
2936	def text(self, match, context, next_state):
2937	if context:
2938	self.messages.append(
2939	self.reporter.error('Inconsistent literal block quoting.',
2940	line=self.state_machine.abs_line_number()))
2941	self.state_machine.previous_line()
2942	raise EOFError
2943
2944
2945	state_classes = (Body, BulletList, DefinitionList, EnumeratedList, FieldList,
2946	OptionList, LineBlock, ExtensionOptions, Explicit, Text,
2947	Definition, Line, SubstitutionDef, RFC2822Body, RFC2822List)
2948	"""Standard set of State classes used to start `RSTStateMachine`."""

Note: リポジトリブラウザについてのヘルプは TracBrowser を参照してください。

Context Navigation

root/galaxy-central/eggs/docutils-0.4-py2.6.egg/docutils/parsers/rst/states.py

異なるフォーマットでダウンロード: