Context Navigation

markdown.py @ 3

リビジョン 3, 58.9 KB (コミッタ: kohda, 14 年前)
Install Unix tools http://hannonlab.cshl.edu/galaxy_unix_tools/galaxy.html

行番号
1	#!/usr/bin/env python
2
3	SPEED_TEST = 0
4
5	"""
6	====================================================================
7	IF YOU ARE LOOKING TO EXTEND MARKDOWN, SEE THE "FOOTNOTES" SECTION
8	====================================================================
9
10	Python-Markdown
11	===============
12
13	Converts Markdown to HTML. Basic usage as a module:
14
15	import markdown
16	html = markdown.markdown(your_text_string)
17
18	Started by [Manfred Stienstra](http://www.dwerg.net/). Continued and
19	maintained by [Yuri Takhteyev](http://www.freewisdom.org).
20
21	Project website: http://www.freewisdom.org/projects/python-markdown
22	Contact: yuri [at] freewisdom.org
23
24	License: GPL 2 (http://www.gnu.org/copyleft/gpl.html) or BSD
25
26	Version: 1.5 (May 15, 2006)
27
28	For changelog, see end of file
29	"""
30
31	import re, sys, os, random
32
33	# set debug level: 3 none, 2 critical, 1 informative, 0 all
34	(VERBOSE, INFO, CRITICAL, NONE) = range(4)
35
36	MESSAGE_THRESHOLD = CRITICAL
37
38	def message(level, text) :
39	if level >= MESSAGE_THRESHOLD :
40	print text
41
42
43	# --------------- CONSTANTS YOU MIGHT WANT TO MODIFY -----------------
44
45	# all tabs will be expanded to up to this many spaces
46	TAB_LENGTH = 4
47	ENABLE_ATTRIBUTES = 1
48	SMART_EMPHASIS = 1
49
50	# --------------- CONSTANTS YOU _SHOULD NOT_ HAVE TO CHANGE ----------
51
52	FN_BACKLINK_TEXT = "zz1337820767766393qq"
53	# a template for html placeholders
54	HTML_PLACEHOLDER_PREFIX = "qaodmasdkwaspemas"
55	HTML_PLACEHOLDER = HTML_PLACEHOLDER_PREFIX + "%dajkqlsmdqpakldnzsdfls"
56
57	BLOCK_LEVEL_ELEMENTS = ['p', 'div', 'blockquote', 'pre', 'table',
58	'dl', 'ol', 'ul', 'script', 'noscript',
59	'form', 'fieldset', 'iframe', 'math', 'ins',
60	'del', 'hr', 'hr/']
61
62	def is_block_level (tag) :
63	return ( (tag in BLOCK_LEVEL_ELEMENTS) or
64	(tag[0] == 'h' and tag[1] in "0123456789") )
65
66	"""
67	======================================================================
68	========================== NANODOM ===================================
69	======================================================================
70
71	The three classes below implement some of the most basic DOM
72	methods. I use this instead of minidom because I need a simpler
73	functionality and do not want to require additional libraries.
74
75	Importantly, NanoDom does not do normalization, which is what we
76	want. It also adds extra white space when converting DOM to string
77	"""
78
79
80	class Document :
81
82	def appendChild(self, child) :
83	self.documentElement = child
84	child.parent = self
85	self.entities = {}
86
87	def createElement(self, tag, textNode=None) :
88	el = Element(tag)
89	el.doc = self
90	if textNode :
91	el.appendChild(self.createTextNode(textNode))
92	return el
93
94	def createTextNode(self, text) :
95	node = TextNode(text)
96	node.doc = self
97	return node
98
99	def createEntityReference(self, entity):
100	if entity not in self.entities:
101	self.entities[entity] = EntityReference(entity)
102	return self.entities[entity]
103
104	def toxml (self) :
105	return self.documentElement.toxml()
106
107	def normalizeEntities(self, text) :
108
109	pairs = [ #("&", "&"),
110	("<", "<"),
111	(">", ">"),
112	("\"", """)]
113
114	for old, new in pairs :
115	text = text.replace(old, new)
116	return text
117
118	def find(self, test) :
119	return self.documentElement.find(test)
120
121	def unlink(self) :
122	self.documentElement.unlink()
123	self.documentElement = None
124
125
126	class Element :
127
128	type = "element"
129
130	def __init__ (self, tag) :
131
132	self.nodeName = tag
133	self.attributes = []
134	self.attribute_values = {}
135	self.childNodes = []
136
137	def unlink(self) :
138	for child in self.childNodes :
139	if child.type == "element" :
140	child.unlink()
141	self.childNodes = None
142
143	def setAttribute(self, attr, value) :
144	if not attr in self.attributes :
145	self.attributes.append(attr)
146
147	self.attribute_values[attr] = value
148
149	def insertChild(self, position, child) :
150	self.childNodes.insert(position, child)
151	child.parent = self
152
153	def removeChild(self, child) :
154	self.childNodes.remove(child)
155
156	def replaceChild(self, oldChild, newChild) :
157	position = self.childNodes.index(oldChild)
158	self.removeChild(oldChild)
159	self.insertChild(position, newChild)
160
161	def appendChild(self, child) :
162	self.childNodes.append(child)
163	child.parent = self
164
165	def handleAttributes(self) :
166	pass
167
168	def find(self, test, depth=0) :
169	""" Returns a list of descendants that pass the test function """
170	matched_nodes = []
171	for child in self.childNodes :
172	if test(child) :
173	matched_nodes.append(child)
174	if child.type == "element" :
175	matched_nodes += child.find(test, depth+1)
176	return matched_nodes
177
178	def toxml(self):
179	if ENABLE_ATTRIBUTES :
180	for child in self.childNodes:
181	child.handleAttributes()
182	buffer = ""
183	if self.nodeName in ['h1', 'h2', 'h3', 'h4'] :
184	buffer += "\n"
185	elif self.nodeName in ['li'] :
186	buffer += "\n "
187	buffer += "<" + self.nodeName
188	for attr in self.attributes :
189	value = self.attribute_values[attr]
190	value = self.doc.normalizeEntities(value)
191	buffer += ' %s="%s"' % (attr, value)
192	if self.childNodes or self.nodeName in ['blockquote']:
193	buffer += ">"
194	for child in self.childNodes :
195	buffer += child.toxml()
196	if self.nodeName == 'p' :
197	buffer += "\n"
198	elif self.nodeName == 'li' :
199	buffer += "\n "
200	buffer += "</%s>" % self.nodeName
201	else :
202	buffer += "/>"
203	if self.nodeName in ['p', 'li', 'ul', 'ol',
204	'h1', 'h2', 'h3', 'h4'] :
205	buffer += "\n"
206
207	return buffer
208
209
210	class TextNode :
211
212	type = "text"
213	attrRegExp = re.compile(r'\{@([^\}])=([^\}])}') # {@id=123}
214
215	def __init__ (self, text) :
216	self.value = text
217
218	def attributeCallback(self, match) :
219	self.parent.setAttribute(match.group(1), match.group(2))
220
221	def handleAttributes(self) :
222	self.value = self.attrRegExp.sub(self.attributeCallback, self.value)
223
224	def toxml(self) :
225	text = self.value
226	if not text.startswith(HTML_PLACEHOLDER_PREFIX):
227	if self.parent.nodeName == "p" :
228	text = text.replace("\n", "\n ")
229	elif (self.parent.nodeName == "li"
230	and self.parent.childNodes[0]==self):
231	text = "\n " + text.replace("\n", "\n ")
232	text = self.doc.normalizeEntities(text)
233	return text
234
235
236	class EntityReference:
237
238	type = "entity_ref"
239
240	def __init__(self, entity):
241	self.entity = entity
242
243	def handleAttributes(self):
244	pass
245
246	def toxml(self):
247	return "&" + self.entity + ";"
248
249
250	"""
251	======================================================================
252	========================== PRE-PROCESSORS ============================
253	======================================================================
254
255	Preprocessors munge source text before we start doing anything too
256	complicated.
257
258	Each preprocessor implements a "run" method that takes a pointer to
259	a list of lines of the document, modifies it as necessary and
260	returns either the same pointer or a pointer to a new list.
261	"""
262
263	class HeaderPreprocessor :
264
265	"""
266	Replaces underlined headers with hashed headers to avoid
267	the nead for lookahead later.
268	"""
269
270	def run (self, lines) :
271
272	for i in range(len(lines)) :
273	if not lines[i] :
274	continue
275
276	if lines[i].startswith("#") :
277	lines.insert(i+1, "\n")
278
279	if (i+1 <= len(lines)
280	and lines[i+1]
281	and lines[i+1][0] in ['-', '=']) :
282
283	underline = lines[i+1].strip()
284
285	if underline == "="*len(underline) :
286	lines[i] = "# " + lines[i].strip()
287	lines[i+1] = ""
288	elif underline == "-"*len(underline) :
289	lines[i] = "## " + lines[i].strip()
290	lines[i+1] = ""
291
292	return lines
293
294	HEADER_PREPROCESSOR = HeaderPreprocessor()
295
296	class LinePreprocessor :
297	"""Deals with HR lines (needs to be done before processing lists)"""
298
299	def run (self, lines) :
300	for i in range(len(lines)) :
301	if self._isLine(lines[i]) :
302	lines[i] = "<hr />"
303	return lines
304
305	def _isLine(self, block) :
306	"""Determines if a block should be replaced with an <HR>"""
307	if block.startswith(" ") : return 0 # a code block
308	text = "".join([x for x in block if not x.isspace()])
309	if len(text) <= 2 :
310	return 0
311	for pattern in ['isline1', 'isline2', 'isline3'] :
312	m = RE.regExp[pattern].match(text)
313	if (m and m.group(1)) :
314	return 1
315	else:
316	return 0
317
318	LINE_PREPROCESSOR = LinePreprocessor()
319
320
321	class LineBreaksPreprocessor :
322	"""Replaces double spaces at the end of the lines with <br/ >."""
323
324	def run (self, lines) :
325	for i in range(len(lines)) :
326	if (lines[i].endswith(" ")
327	and not RE.regExp['tabbed'].match(lines[i]) ):
328	lines[i] += "<br />"
329	return lines
330
331	LINE_BREAKS_PREPROCESSOR = LineBreaksPreprocessor()
332
333
334	class HtmlBlockPreprocessor :
335	"""Removes html blocks from self.lines"""
336
337	def run (self, lines) :
338	new_blocks = []
339	text = "\n".join(lines)
340	for block in text.split("\n\n") :
341	if block.startswith("\n") :
342	block = block[1:]
343	if ( (block.startswith("<") and block.rstrip().endswith(">"))
344	and (block[1] in ["!", "?", "@", "%"]
345	or is_block_level( block[1:].replace(">", " ")
346	.split()[0].lower()))) :
347	new_blocks.append(
348	self.stash.store(block.strip()))
349	else :
350	new_blocks.append(block)
351	return "\n\n".join(new_blocks).split("\n")
352
353	HTML_BLOCK_PREPROCESSOR = HtmlBlockPreprocessor()
354
355
356	class ReferencePreprocessor :
357
358	def run (self, lines) :
359	new_text = [];
360	for line in lines:
361	m = RE.regExp['reference-def'].match(line)
362	if m:
363	id = m.group(2).strip().lower()
364	title = dequote(m.group(4).strip()) #.replace('"', """)
365	self.references[id] = (m.group(3), title)
366	else:
367	new_text.append(line)
368	return new_text #+ "\n"
369
370	REFERENCE_PREPROCESSOR = ReferencePreprocessor()
371
372	"""
373	======================================================================
374	========================== INLINE PATTERNS ===========================
375	======================================================================
376
377	Inline patterns such as emphasis are handled by means of auxiliary
378	objects, one per pattern. Each pattern object uses a single regular
379	expression and needs support the following methods:
380
381	pattern.getCompiledRegExp() - returns a regular expression
382
383	pattern.handleMatch(m, doc) - takes a match object and returns
384	a NanoDom node (as a part of the provided
385	doc) or None
386
387	All of python markdown's built-in patterns subclass from BasePatter,
388	but you can add additional patterns that don't.
389
390	Also note that all the regular expressions used by inline must
391	capture the whole block. For this reason, they all start with
392	'^(.)' and end with '(.)!'. In case with built-in expression
393	BasePattern takes care of adding the "^(.)" and "(.)!".
394
395	Finally, the order in which regular expressions are applied is very
396	important - e.g. if we first replace http://.../ links with <a> tags
397	and _then_ try to replace inline html, we would end up with a mess.
398	So, we apply the expressions in the following order:
399
400	* escape and backticks have to go before everything else, so
401	that we can preempt any markdown patterns by escaping them.
402
403	* then we handle auto-links (must be done before inline html)
404
405	* then we handle inline HTML. At this point we will simply
406	replace all inline HTML strings with a placeholder and add
407	the actual HTML to a hash.
408
409	* then inline images (must be done before links)
410
411	* then bracketed links, first regular then reference-style
412
413	* finally we apply strong and emphasis
414	"""
415
416	NOBRACKET = r'[^\]\[]*'
417	BRK = ( r'\[('
418	+ (NOBRACKET + r'(\['+NOBRACKET)*6
419	+ (NOBRACKET+ r'\])'+NOBRACKET)6
420	+ NOBRACKET + r')\]' )
421
422	BACKTICK_RE = r'\`([^\`])\`' # `e= mc^2`
423	DOUBLE_BACKTICK_RE = r'\`\`(.*)\`\`' # ``e=f("`")``
424	ESCAPE_RE = r'\\(.)' # \<
425	EMPHASIS_RE = r'\([^\])\' # emphasis
426	STRONG_RE = r'\\(.)\\' # strong*
427	STRONG_EM_RE = r'\\\([^_])\\\' # strong*
428
429	if SMART_EMPHASIS:
430	EMPHASIS_2_RE = r'(?<!\S)_(\S[^_]*)_' # _emphasis_
431	else :
432	EMPHASIS_2_RE = r'_([^_]*)_' # _emphasis_
433
434	STRONG_2_RE = r'__([^_]*)__' # __strong__
435	STRONG_EM_2_RE = r'___([^_]*)___' # ___strong___
436
437	LINK_RE = BRK + r'\s$([^$])\)' # [text](url)
438	LINK_ANGLED_RE = BRK + r'\s$<([^$])>\)' # [text](<url>)
439	IMAGE_LINK_RE = r'\!' + BRK + r'\s$([^$])\)' # ![alttxt](http://x.com/)
440	REFERENCE_RE = BRK+ r'\s\[([^\]])\]' # [Google][3]
441	IMAGE_REFERENCE_RE = r'\!' + BRK + '\s\[([^\]])\]' # ![alt text][2]
442	NOT_STRONG_RE = r'( \* )' # stand-alone * or _
443	AUTOLINK_RE = r'<(http://[^>]*)>' # <http://www.123.com>
444	AUTOMAIL_RE = r'<([^> ]@[^> ])>' # <me@example.com>
445	HTML_RE = r'(\<[^\>]*\>)' # <...>
446	ENTITY_RE = r'(&[\#a-zA-Z0-9]*;)' # &
447
448	class BasePattern:
449
450	def __init__ (self, pattern) :
451	self.pattern = pattern
452	self.compiled_re = re.compile("^(.)%s(.)$" % pattern, re.DOTALL)
453
454	def getCompiledRegExp (self) :
455	return self.compiled_re
456
457	class SimpleTextPattern (BasePattern) :
458
459	def handleMatch(self, m, doc) :
460	return doc.createTextNode(m.group(2))
461
462	class SimpleTagPattern (BasePattern):
463
464	def __init__ (self, pattern, tag) :
465	BasePattern.__init__(self, pattern)
466	self.tag = tag
467
468	def handleMatch(self, m, doc) :
469	el = doc.createElement(self.tag)
470	el.appendChild(doc.createTextNode(m.group(2)))
471	return el
472
473	class BacktickPattern (BasePattern):
474
475	def __init__ (self, pattern):
476	BasePattern.__init__(self, pattern)
477	self.tag = "code"
478
479	def handleMatch(self, m, doc) :
480	el = doc.createElement(self.tag)
481	text = m.group(2).strip()
482	text = text.replace("&", "&")
483	el.appendChild(doc.createTextNode(text))
484	return el
485
486
487	class DoubleTagPattern (SimpleTagPattern) :
488
489	def handleMatch(self, m, doc) :
490	tag1, tag2 = self.tag.split(",")
491	el1 = doc.createElement(tag1)
492	el2 = doc.createElement(tag2)
493	el1.appendChild(el2)
494	el2.appendChild(doc.createTextNode(m.group(2)))
495	return el1
496
497
498	class HtmlPattern (BasePattern):
499
500	def handleMatch (self, m, doc) :
501	place_holder = self.stash.store(m.group(2))
502	return doc.createTextNode(place_holder)
503
504
505	class LinkPattern (BasePattern):
506
507	def handleMatch(self, m, doc) :
508	el = doc.createElement('a')
509	el.appendChild(doc.createTextNode(m.group(2)))
510	parts = m.group(9).split()
511	# We should now have [], [href], or [href, title]
512	if parts :
513	el.setAttribute('href', parts[0])
514	else :
515	el.setAttribute('href', "")
516	if len(parts) > 1 :
517	# we also got a title
518	title = " ".join(parts[1:]).strip()
519	title = dequote(title) #.replace('"', """)
520	el.setAttribute('title', title)
521	return el
522
523
524	class ImagePattern (BasePattern):
525
526	def handleMatch(self, m, doc):
527	el = doc.createElement('img')
528	src_parts = m.group(9).split()
529	el.setAttribute('src', src_parts[0])
530	if len(src_parts) > 1 :
531	el.setAttribute('title', dequote(" ".join(src_parts[1:])))
532	if ENABLE_ATTRIBUTES :
533	text = doc.createTextNode(m.group(2))
534	el.appendChild(text)
535	text.handleAttributes()
536	truealt = text.value
537	el.childNodes.remove(text)
538	else:
539	truealt = m.group(2)
540	el.setAttribute('alt', truealt)
541	return el
542
543	class ReferencePattern (BasePattern):
544
545	def handleMatch(self, m, doc):
546	if m.group(9) :
547	id = m.group(9).lower()
548	else :
549	# if we got something like "[Google][]"
550	# we'll use "google" as the id
551	id = m.group(2).lower()
552	if not self.references.has_key(id) : # ignore undefined refs
553	return None
554	href, title = self.references[id]
555	text = m.group(2)
556	return self.makeTag(href, title, text, doc)
557
558	def makeTag(self, href, title, text, doc):
559	el = doc.createElement('a')
560	el.setAttribute('href', href)
561	if title :
562	el.setAttribute('title', title)
563	el.appendChild(doc.createTextNode(text))
564	return el
565
566
567	class ImageReferencePattern (ReferencePattern):
568
569	def makeTag(self, href, title, text, doc):
570	el = doc.createElement('img')
571	el.setAttribute('src', href)
572	if title :
573	el.setAttribute('title', title)
574	el.setAttribute('alt', text)
575	return el
576
577
578	class AutolinkPattern (BasePattern):
579
580	def handleMatch(self, m, doc):
581	el = doc.createElement('a')
582	el.setAttribute('href', m.group(2))
583	el.appendChild(doc.createTextNode(m.group(2)))
584	return el
585
586	class AutomailPattern (BasePattern):
587
588	def handleMatch(self, m, doc) :
589	el = doc.createElement('a')
590	email = m.group(2)
591	if email.startswith("mailto:"):
592	email = email[len("mailto:"):]
593	for letter in email:
594	entity = doc.createEntityReference("#%d" % ord(letter))
595	el.appendChild(entity)
596	mailto = "mailto:" + email
597	mailto = "".join(['&#%d;' % ord(letter) for letter in mailto])
598	el.setAttribute('href', mailto)
599	return el
600
601	ESCAPE_PATTERN = SimpleTextPattern(ESCAPE_RE)
602	NOT_STRONG_PATTERN = SimpleTextPattern(NOT_STRONG_RE)
603
604	BACKTICK_PATTERN = BacktickPattern(BACKTICK_RE)
605	DOUBLE_BACKTICK_PATTERN = BacktickPattern(DOUBLE_BACKTICK_RE)
606	STRONG_PATTERN = SimpleTagPattern(STRONG_RE, 'strong')
607	STRONG_PATTERN_2 = SimpleTagPattern(STRONG_2_RE, 'strong')
608	EMPHASIS_PATTERN = SimpleTagPattern(EMPHASIS_RE, 'em')
609	EMPHASIS_PATTERN_2 = SimpleTagPattern(EMPHASIS_2_RE, 'em')
610
611	STRONG_EM_PATTERN = DoubleTagPattern(STRONG_EM_RE, 'strong,em')
612	STRONG_EM_PATTERN_2 = DoubleTagPattern(STRONG_EM_2_RE, 'strong,em')
613
614	LINK_PATTERN = LinkPattern(LINK_RE)
615	LINK_ANGLED_PATTERN = LinkPattern(LINK_ANGLED_RE)
616	IMAGE_LINK_PATTERN = ImagePattern(IMAGE_LINK_RE)
617	IMAGE_REFERENCE_PATTERN = ImageReferencePattern(IMAGE_REFERENCE_RE)
618	REFERENCE_PATTERN = ReferencePattern(REFERENCE_RE)
619
620	HTML_PATTERN = HtmlPattern(HTML_RE)
621	ENTITY_PATTERN = HtmlPattern(ENTITY_RE)
622
623	AUTOLINK_PATTERN = AutolinkPattern(AUTOLINK_RE)
624	AUTOMAIL_PATTERN = AutomailPattern(AUTOMAIL_RE)
625
626
627	"""
628	======================================================================
629	========================== POST-PROCESSORS ===========================
630	======================================================================
631
632	Markdown also allows post-processors, which are similar to
633	preprocessors in that they need to implement a "run" method. Unlike
634	pre-processors, they take a NanoDom document as a parameter and work
635	with that.
636	#
637	There are currently no standard post-processors, but the footnote
638	extension below uses one.
639	"""
640	"""
641	======================================================================
642	========================== MISC AUXILIARY CLASSES ====================
643	======================================================================
644	"""
645
646	class HtmlStash :
647	"""This class is used for stashing HTML objects that we extract
648	in the beginning and replace with place-holders."""
649
650	def __init__ (self) :
651	self.html_counter = 0 # for counting inline html segments
652	self.rawHtmlBlocks=[]
653
654	def store(self, html) :
655	"""Saves an HTML segment for later reinsertion. Returns a
656	placeholder string that needs to be inserted into the
657	document.
658
659	@param html: an html segment
660	@returns : a placeholder string """
661	self.rawHtmlBlocks.append(html)
662	placeholder = HTML_PLACEHOLDER % self.html_counter
663	self.html_counter += 1
664	return placeholder
665
666
667	class BlockGuru :
668
669	def _findHead(self, lines, fn, allowBlank=0) :
670
671	"""Functional magic to help determine boundaries of indented
672	blocks.
673
674	@param lines: an array of strings
675	@param fn: a function that returns a substring of a string
676	if the string matches the necessary criteria
677	@param allowBlank: specifies whether it's ok to have blank
678	lines between matching functions
679	@returns: a list of post processes items and the unused
680	remainder of the original list"""
681
682	items = []
683	item = -1
684
685	i = 0 # to keep track of where we are
686
687	for line in lines :
688
689	if not line.strip() and not allowBlank:
690	return items, lines[i:]
691
692	if not line.strip() and allowBlank:
693	# If we see a blank line, this _might_ be the end
694	i += 1
695
696	# Find the next non-blank line
697	for j in range(i, len(lines)) :
698	if lines[j].strip() :
699	next = lines[j]
700	break
701	else :
702	# There is no more text => this is the end
703	break
704
705	# Check if the next non-blank line is still a part of the list
706
707	part = fn(next)
708
709	if part :
710	items.append("")
711	continue
712	else :
713	break # found end of the list
714
715	part = fn(line)
716
717	if part :
718	items.append(part)
719	i += 1
720	continue
721	else :
722	return items, lines[i:]
723	else :
724	i += 1
725
726	return items, lines[i:]
727
728
729	def detabbed_fn(self, line) :
730	""" An auxiliary method to be passed to _findHead """
731	m = RE.regExp['tabbed'].match(line)
732	if m:
733	return m.group(4)
734	else :
735	return None
736
737
738	def detectTabbed(self, lines) :
739
740	return self._findHead(lines, self.detabbed_fn,
741	allowBlank = 1)
742
743
744	def print_error(string):
745	"""Print an error string to stderr"""
746	sys.stderr.write(string +'\n')
747
748
749	def dequote(string) :
750	""" Removes quotes from around a string """
751	if ( ( string.startswith('"') and string.endswith('"'))
752	or (string.startswith("'") and string.endswith("'")) ) :
753	return string[1:-1]
754	else :
755	return string
756
757	"""
758	======================================================================
759	========================== CORE MARKDOWN =============================
760	======================================================================
761
762	This stuff is ugly, so if you are thinking of extending the syntax,
763	see first if you can do it via pre-processors, post-processors,
764	inline patterns or a combination of the three.
765	"""
766
767	class CorePatterns :
768	"""This class is scheduled for removal as part of a refactoring
769	effort."""
770
771	patterns = {
772	'header': r'(#)([^#])(#*)', # # A title
773	'reference-def' : r'(\ ?\ ?\ ?)\[([^\]])\]:\s([^ ])(.)',
774	# [Google]: http://www.google.com/
775	'containsline': r'([-])$\|^([=])', # -----, =====, etc.
776	'ol': r'[ ]{0,3}[\d]\.\s+(.)', # 1. text
777	'ul': r'[ ]{0,3}[+-]\s+(.)', # "* text"
778	'isline1': r'(\)', # *
779	'isline2': r'(\-*)', # ---
780	'isline3': r'(\_*)', # ___
781	'tabbed': r'((\t)\|( ))(.*)', # an indented line
782	'quoted' : r'> ?(.*)', # a quoted block ("> ...")
783	}
784
785	def __init__ (self) :
786
787	self.regExp = {}
788	for key in self.patterns.keys() :
789	self.regExp[key] = re.compile("^%s$" % self.patterns[key],
790	re.DOTALL)
791
792	self.regExp['containsline'] = re.compile(r'^([-])$\|^([=])$', re.M)
793
794	RE = CorePatterns()
795
796
797	class Markdown:
798	""" Markdown formatter class for creating an html document from
799	Markdown text """
800
801
802	def __init__(self, source=None):
803	"""Creates a new Markdown instance.
804
805	@param source: The text in Markdown format. """
806
807	if isinstance(source, unicode):
808	source = source.encode('utf8')
809	self.source = source
810	self.blockGuru = BlockGuru()
811	self.registeredExtensions = []
812	self.stripTopLevelTags = 1
813
814	self.preprocessors = [ HEADER_PREPROCESSOR,
815	LINE_PREPROCESSOR,
816	HTML_BLOCK_PREPROCESSOR,
817	LINE_BREAKS_PREPROCESSOR,
818	# A footnote preprocessor will
819	# get inserted here
820	REFERENCE_PREPROCESSOR ]
821
822
823	self.postprocessors = [] # a footnote postprocessor will get
824	# inserted later
825
826	self.prePatterns = []
827
828
829	self.inlinePatterns = [ DOUBLE_BACKTICK_PATTERN,
830	BACKTICK_PATTERN,
831	ESCAPE_PATTERN,
832	IMAGE_LINK_PATTERN,
833	IMAGE_REFERENCE_PATTERN,
834	REFERENCE_PATTERN,
835	LINK_ANGLED_PATTERN,
836	LINK_PATTERN,
837	AUTOLINK_PATTERN,
838	AUTOMAIL_PATTERN,
839	HTML_PATTERN,
840	ENTITY_PATTERN,
841	NOT_STRONG_PATTERN,
842	STRONG_EM_PATTERN,
843	STRONG_EM_PATTERN_2,
844	STRONG_PATTERN,
845	STRONG_PATTERN_2,
846	EMPHASIS_PATTERN,
847	EMPHASIS_PATTERN_2
848	# The order of the handlers matters!!!
849	]
850
851	self.reset()
852
853	def registerExtension(self, extension) :
854	self.registeredExtensions.append(extension)
855
856	def reset(self) :
857	"""Resets all state variables so that we can start
858	with a new text."""
859	self.references={}
860	self.htmlStash = HtmlStash()
861
862	HTML_BLOCK_PREPROCESSOR.stash = self.htmlStash
863	REFERENCE_PREPROCESSOR.references = self.references
864	HTML_PATTERN.stash = self.htmlStash
865	ENTITY_PATTERN.stash = self.htmlStash
866	REFERENCE_PATTERN.references = self.references
867	IMAGE_REFERENCE_PATTERN.references = self.references
868
869	for extension in self.registeredExtensions :
870	extension.reset()
871
872
873	def _transform(self):
874	"""Transforms the Markdown text into a XHTML body document
875
876	@returns: A NanoDom Document """
877
878	# Setup the document
879
880	self.doc = Document()
881	self.top_element = self.doc.createElement("span")
882	self.top_element.appendChild(self.doc.createTextNode('\n'))
883	self.top_element.setAttribute('class', 'markdown')
884	self.doc.appendChild(self.top_element)
885
886	# Fixup the source text
887	text = self.source.strip()
888	text = text.replace("\r\n", "\n").replace("\r", "\n")
889	text += "\n\n"
890	text = text.expandtabs(TAB_LENGTH)
891
892	# Split into lines and run the preprocessors that will work with
893	# self.lines
894
895	self.lines = text.split("\n")
896
897	# Run the pre-processors on the lines
898	for prep in self.preprocessors :
899	self.lines = prep.run(self.lines)
900
901	# Create a NanoDom tree from the lines and attach it to Document
902
903
904	buffer = []
905	for line in self.lines :
906	if line.startswith("#") :
907	self._processSection(self.top_element, buffer)
908	buffer = [line]
909	else :
910	buffer.append(line)
911	self._processSection(self.top_element, buffer)
912
913	#self._processSection(self.top_element, self.lines)
914
915	# Not sure why I put this in but let's leave it for now.
916	self.top_element.appendChild(self.doc.createTextNode('\n'))
917
918	# Run the post-processors
919	for postprocessor in self.postprocessors :
920	postprocessor.run(self.doc)
921
922	return self.doc
923
924
925	def _processSection(self, parent_elem, lines,
926	inList = 0, looseList = 0) :
927
928	"""Process a section of a source document, looking for high
929	level structural elements like lists, block quotes, code
930	segments, html blocks, etc. Some those then get stripped
931	of their high level markup (e.g. get unindented) and the
932	lower-level markup is processed recursively.
933
934	@param parent_elem: A NanoDom element to which the content
935	will be added
936	@param lines: a list of lines
937	@param inList: a level
938	@returns: None"""
939
940	if not lines :
941	return
942
943	# Check if this section starts with a list, a blockquote or
944	# a code block
945
946	processFn = { 'ul' : self._processUList,
947	'ol' : self._processOList,
948	'quoted' : self._processQuote,
949	'tabbed' : self._processCodeBlock }
950
951	for regexp in ['ul', 'ol', 'quoted', 'tabbed'] :
952	m = RE.regExp[regexp].match(lines[0])
953	if m :
954	processFn[regexp](parent_elem, lines, inList)
955	return
956
957	# We are NOT looking at one of the high-level structures like
958	# lists or blockquotes. So, it's just a regular paragraph
959	# (though perhaps nested inside a list or something else). If
960	# we are NOT inside a list, we just need to look for a blank
961	# line to find the end of the block. If we ARE inside a
962	# list, however, we need to consider that a sublist does not
963	# need to be separated by a blank line. Rather, the following
964	# markup is legal:
965	#
966	# * The top level list item
967	#
968	# Another paragraph of the list. This is where we are now.
969	# * Underneath we might have a sublist.
970	#
971
972	if inList :
973
974	start, theRest = self._linesUntil(lines, (lambda line:
975	RE.regExp['ul'].match(line)
976	or RE.regExp['ol'].match(line)
977	or not line.strip()))
978
979	self._processSection(parent_elem, start,
980	inList - 1, looseList = looseList)
981	self._processSection(parent_elem, theRest,
982	inList - 1, looseList = looseList)
983
984
985	else : # Ok, so it's just a simple block
986
987	paragraph, theRest = self._linesUntil(lines, lambda line:
988	not line.strip())
989
990	if len(paragraph) and paragraph[0].startswith('#') :
991	m = RE.regExp['header'].match(paragraph[0])
992	if m :
993	level = len(m.group(1))
994	h = self.doc.createElement("h%d" % level)
995	parent_elem.appendChild(h)
996	for item in self._handleInlineWrapper2(m.group(2).strip()) :
997	h.appendChild(item)
998	else :
999	message(CRITICAL, "We've got a problem header!")
1000
1001	elif paragraph :
1002
1003	list = self._handleInlineWrapper2("\n".join(paragraph))
1004
1005	if ( parent_elem.nodeName == 'li'
1006	and not (looseList or parent_elem.childNodes)):
1007
1008	#and not parent_elem.childNodes) :
1009	# If this is the first paragraph inside "li", don't
1010	# put <p> around it - append the paragraph bits directly
1011	# onto parent_elem
1012	el = parent_elem
1013	else :
1014	# Otherwise make a "p" element
1015	el = self.doc.createElement("p")
1016	parent_elem.appendChild(el)
1017
1018	for item in list :
1019	el.appendChild(item)
1020
1021	if theRest :
1022	theRest = theRest[1:] # skip the first (blank) line
1023
1024	self._processSection(parent_elem, theRest, inList)
1025
1026
1027
1028	def _processUList(self, parent_elem, lines, inList) :
1029	self._processList(parent_elem, lines, inList,
1030	listexpr='ul', tag = 'ul')
1031
1032	def _processOList(self, parent_elem, lines, inList) :
1033	self._processList(parent_elem, lines, inList,
1034	listexpr='ol', tag = 'ol')
1035
1036
1037	def _processList(self, parent_elem, lines, inList, listexpr, tag) :
1038	"""Given a list of document lines starting with a list item,
1039	finds the end of the list, breaks it up, and recursively
1040	processes each list item and the remainder of the text file.
1041
1042	@param parent_elem: A dom element to which the content will be added
1043	@param lines: a list of lines
1044	@param inList: a level
1045	@returns: None"""
1046
1047	ul = self.doc.createElement(tag) # ul might actually be '<ol>'
1048	parent_elem.appendChild(ul)
1049
1050	looseList = 0
1051
1052	# Make a list of list items
1053	items = []
1054	item = -1
1055
1056	i = 0 # a counter to keep track of where we are
1057
1058	for line in lines :
1059
1060	loose = 0
1061	if not line.strip() :
1062	# If we see a blank line, this _might_ be the end of the list
1063	i += 1
1064	loose = 1
1065
1066	# Find the next non-blank line
1067	for j in range(i, len(lines)) :
1068	if lines[j].strip() :
1069	next = lines[j]
1070	break
1071	else :
1072	# There is no more text => end of the list
1073	break
1074
1075	# Check if the next non-blank line is still a part of the list
1076	if ( RE.regExp['ul'].match(next) or
1077	RE.regExp['ol'].match(next) or
1078	RE.regExp['tabbed'].match(next) ):
1079	# get rid of any white space in the line
1080	items[item].append(line.strip())
1081	looseList = loose or looseList
1082	continue
1083	else :
1084	break # found end of the list
1085
1086	# Now we need to detect list items (at the current level)
1087	# while also detabing child elements if necessary
1088
1089	for expr in ['ul', 'ol', 'tabbed']:
1090
1091	m = RE.regExp[expr].match(line)
1092	if m :
1093	if expr in ['ul', 'ol'] : # We are looking at a new item
1094	if m.group(1) :
1095	items.append([m.group(1)])
1096	item += 1
1097	elif expr == 'tabbed' : # This line needs to be detabbed
1098	items[item].append(m.group(4)) #after the 'tab'
1099
1100	i += 1
1101	break
1102	else :
1103	items[item].append(line) # Just regular continuation
1104	i += 1 # added on 2006.02.25
1105	else :
1106	i += 1
1107
1108	# Add the dom elements
1109	for item in items :
1110	li = self.doc.createElement("li")
1111	ul.appendChild(li)
1112
1113	self._processSection(li, item, inList + 1, looseList = looseList)
1114
1115	# Process the remaining part of the section
1116
1117	self._processSection(parent_elem, lines[i:], inList)
1118
1119
1120	def _linesUntil(self, lines, condition) :
1121	""" A utility function to break a list of lines upon the
1122	first line that satisfied a condition. The condition
1123	argument should be a predicate function.
1124	"""
1125
1126	i = -1
1127	for line in lines :
1128	i += 1
1129	if condition(line) : break
1130	else :
1131	i += 1
1132	return lines[:i], lines[i:]
1133
1134	def _processQuote(self, parent_elem, lines, inList) :
1135	"""Given a list of document lines starting with a quote finds
1136	the end of the quote, unindents it and recursively
1137	processes the body of the quote and the remainder of the
1138	text file.
1139
1140	@param parent_elem: DOM element to which the content will be added
1141	@param lines: a list of lines
1142	@param inList: a level
1143	@returns: None """
1144
1145	dequoted = []
1146	i = 0
1147	for line in lines :
1148	m = RE.regExp['quoted'].match(line)
1149	if m :
1150	dequoted.append(m.group(1))
1151	i += 1
1152	else :
1153	break
1154	else :
1155	i += 1
1156
1157	blockquote = self.doc.createElement('blockquote')
1158	parent_elem.appendChild(blockquote)
1159
1160	self._processSection(blockquote, dequoted, inList)
1161	self._processSection(parent_elem, lines[i:], inList)
1162
1163
1164
1165
1166	def _processCodeBlock(self, parent_elem, lines, inList) :
1167	"""Given a list of document lines starting with a code block
1168	finds the end of the block, puts it into the dom verbatim
1169	wrapped in ("<pre><code>") and recursively processes the
1170	the remainder of the text file.
1171
1172	@param parent_elem: DOM element to which the content will be added
1173	@param lines: a list of lines
1174	@param inList: a level
1175	@returns: None"""
1176
1177	detabbed, theRest = self.blockGuru.detectTabbed(lines)
1178
1179	pre = self.doc.createElement('pre')
1180	code = self.doc.createElement('code')
1181	parent_elem.appendChild(pre)
1182	pre.appendChild(code)
1183	text = "\n".join(detabbed).rstrip()+"\n"
1184	text = text.replace("&", "&")
1185	code.appendChild(self.doc.createTextNode(text))
1186	self._processSection(parent_elem, theRest, inList)
1187
1188
1189	def _handleInlineWrapper2 (self, line) :
1190
1191
1192	parts = [line]
1193
1194	#if not(line):
1195	# return [self.doc.createTextNode(' ')]
1196
1197	for pattern in self.inlinePatterns :
1198
1199	#print
1200	#print self.inlinePatterns.index(pattern)
1201
1202	i = 0
1203
1204	#print parts
1205	while i < len(parts) :
1206
1207	x = parts[i]
1208	#print i
1209	if isinstance(x, (str, unicode)) :
1210	result = self._applyPattern(x, pattern)
1211	#print result
1212	#print result
1213	#print parts, i
1214	if result :
1215	i -= 1
1216	parts.remove(x)
1217	for y in result :
1218	parts.insert(i+1,y)
1219
1220	i += 1
1221
1222	for i in range(len(parts)) :
1223	x = parts[i]
1224	if isinstance(x, (str, unicode)) :
1225	parts[i] = self.doc.createTextNode(x)
1226
1227	return parts
1228
1229
1230
1231	def _handleInlineWrapper (self, line) :
1232
1233	# A wrapper around _handleInline to avoid recursion
1234
1235	parts = [line]
1236
1237	i = 0
1238
1239	while i < len(parts) :
1240	x = parts[i]
1241	if isinstance(x, (str, unicode)) :
1242	parts.remove(x)
1243	result = self._handleInline(x)
1244	for y in result :
1245	parts.insert(i,y)
1246	else :
1247	i += 1
1248
1249	return parts
1250
1251	def _handleInline(self, line):
1252	"""Transform a Markdown line with inline elements to an XHTML
1253	fragment.
1254
1255	This function uses auxiliary objects called inline patterns.
1256	See notes on inline patterns above.
1257
1258	@param item: A block of Markdown text
1259	@return: A list of NanoDom nodes """
1260
1261	if not(line):
1262	return [self.doc.createTextNode(' ')]
1263
1264	for pattern in self.inlinePatterns :
1265	list = self._applyPattern( line, pattern)
1266	if list: return list
1267
1268	return [self.doc.createTextNode(line)]
1269
1270	def _applyPattern(self, line, pattern) :
1271	""" Given a pattern name, this function checks if the line
1272	fits the pattern, creates the necessary elements, and returns
1273	back a list consisting of NanoDom elements and/or strings.
1274
1275	@param line: the text to be processed
1276	@param pattern: the pattern to be checked
1277
1278	@returns: the appropriate newly created NanoDom element if the
1279	pattern matches, None otherwise.
1280	"""
1281
1282	# match the line to pattern's pre-compiled reg exp.
1283	# if no match, move on.
1284
1285	m = pattern.getCompiledRegExp().match(line)
1286	if not m :
1287	return None
1288
1289	# if we got a match let the pattern make us a NanoDom node
1290	# if it doesn't, move on
1291	node = pattern.handleMatch(m, self.doc)
1292
1293	if node :
1294	# Those are in the reverse order!
1295	return ( m.groups()[-1], # the string to the left
1296	node, # the new node
1297	m.group(1)) # the string to the right of the match
1298
1299	else :
1300	return None
1301
1302	def __str__(self):
1303	"""Return the document in XHTML format.
1304
1305	@returns: A serialized XHTML body."""
1306	#try :
1307	doc = self._transform()
1308	xml = doc.toxml()
1309	#finally:
1310	# doc.unlink()
1311
1312	# Let's stick in all the raw html pieces
1313
1314	for i in range(self.htmlStash.html_counter) :
1315	xml = xml.replace("<p>%s\n</p>" % (HTML_PLACEHOLDER % i),
1316	self.htmlStash.rawHtmlBlocks[i] + "\n")
1317	xml = xml.replace(HTML_PLACEHOLDER % i,
1318	self.htmlStash.rawHtmlBlocks[i])
1319
1320	xml = xml.replace(FN_BACKLINK_TEXT, "↩")
1321
1322	# And return everything but the top level tag
1323
1324	if self.stripTopLevelTags :
1325	xml = xml.strip()[23:-7]
1326
1327	if isinstance(xml, unicode) :
1328	xml = xml.encode("utf8")
1329
1330	return xml
1331
1332
1333	toString = __str__
1334
1335
1336	"""
1337	========================= FOOTNOTES =================================
1338
1339	This section adds footnote handling to markdown. It can be used as
1340	an example for extending python-markdown with relatively complex
1341	functionality. While in this case the extension is included inside
1342	the module itself, it could just as easily be added from outside the
1343	module. Not that all markdown classes above are ignorant about
1344	footnotes. All footnote functionality is provided separately and
1345	then added to the markdown instance at the run time.
1346
1347	Footnote functionality is attached by calling extendMarkdown()
1348	method of FootnoteExtension. The method also registers the
1349	extension to allow it's state to be reset by a call to reset()
1350	method.
1351	"""
1352
1353	class FootnoteExtension :
1354
1355	DEF_RE = re.compile(r'(\ ?\ ?\ ?)\[\^([^\]])\]:\s(.*)')
1356	SHORT_USE_RE = re.compile(r'\[\^([^\]]*)\]', re.M) # [^a]
1357
1358	FN_PLACE_MARKER = "///Footnotes Go Here///"
1359
1360	def __init__ (self) :
1361	self.reset()
1362
1363	def extendMarkdown(self, md) :
1364
1365	self.md = md
1366
1367	# Stateless extensions do not need to be registered
1368	md.registerExtension(self)
1369
1370	# Insert a preprocessor before ReferencePreprocessor
1371	index = md.preprocessors.index(REFERENCE_PREPROCESSOR)
1372	preprocessor = FootnotePreprocessor(self)
1373	preprocessor.md = md
1374	md.preprocessors.insert(index, preprocessor)
1375
1376	# Insert an inline pattern before ImageReferencePattern
1377	FOOTNOTE_RE = r'\[\^([^\]]*)\]' # blah blah [^1] blah
1378	index = md.inlinePatterns.index(IMAGE_REFERENCE_PATTERN)
1379	md.inlinePatterns.insert(index, FootnotePattern(FOOTNOTE_RE, self))
1380
1381	# Insert a post-processor that would actually add the footnote div
1382	postprocessor = FootnotePostprocessor(self)
1383	postprocessor.extension = self
1384
1385	md.postprocessors.append(postprocessor)
1386
1387
1388	def reset(self) :
1389	# May be called by Markdown is state reset is desired
1390
1391	self.footnote_suffix = "-" + str(int(random.random()*1000000000))
1392	self.used_footnotes={}
1393	self.footnotes = {}
1394
1395	def findFootnotesPlaceholder(self, doc) :
1396	def findFootnotePlaceholderFn(node=None, indent=0):
1397	if node.type == 'text':
1398	if node.value.find(self.FN_PLACE_MARKER) > -1 :
1399	return True
1400
1401	fn_div_list = doc.find(findFootnotePlaceholderFn)
1402	if fn_div_list :
1403	return fn_div_list[0]
1404
1405
1406	def setFootnote(self, id, text) :
1407	self.footnotes[id] = text
1408
1409	def makeFootnoteId(self, num) :
1410	return 'fn%d%s' % (num, self.footnote_suffix)
1411
1412	def makeFootnoteRefId(self, num) :
1413	return 'fnr%d%s' % (num, self.footnote_suffix)
1414
1415	def makeFootnotesDiv (self, doc) :
1416	"""Creates the div with class='footnote' and populates it with
1417	the text of the footnotes.
1418
1419	@returns: the footnote div as a dom element """
1420
1421	if not self.footnotes.keys() :
1422	return None
1423
1424	div = doc.createElement("div")
1425	div.setAttribute('class', 'footnote')
1426	hr = doc.createElement("hr")
1427	div.appendChild(hr)
1428	ol = doc.createElement("ol")
1429	div.appendChild(ol)
1430
1431	footnotes = [(self.used_footnotes[id], id)
1432	for id in self.footnotes.keys()]
1433	footnotes.sort()
1434
1435	for i, id in footnotes :
1436	li = doc.createElement('li')
1437	li.setAttribute('id', self.makeFootnoteId(i))
1438
1439	self.md._processSection(li, self.footnotes[id].split("\n"))
1440
1441	#li.appendChild(doc.createTextNode(self.footnotes[id]))
1442
1443	backlink = doc.createElement('a')
1444	backlink.setAttribute('href', '#' + self.makeFootnoteRefId(i))
1445	backlink.setAttribute('class', 'footnoteBackLink')
1446	backlink.setAttribute('title',
1447	'Jump back to footnote %d in the text' % 1)
1448	backlink.appendChild(doc.createTextNode(FN_BACKLINK_TEXT))
1449
1450	if li.childNodes :
1451	node = li.childNodes[-1]
1452	if node.type == "text" :
1453	node = li
1454	node.appendChild(backlink)
1455
1456	ol.appendChild(li)
1457
1458	return div
1459
1460
1461	class FootnotePreprocessor :
1462
1463	def __init__ (self, footnotes) :
1464	self.footnotes = footnotes
1465
1466	def run(self, lines) :
1467
1468	self.blockGuru = BlockGuru()
1469	lines = self._handleFootnoteDefinitions (lines)
1470
1471	# Make a hash of all footnote marks in the text so that we
1472	# know in what order they are supposed to appear. (This
1473	# function call doesn't really substitute anything - it's just
1474	# a way to get a callback for each occurence.
1475
1476	text = "\n".join(lines)
1477	self.footnotes.SHORT_USE_RE.sub(self.recordFootnoteUse, text)
1478
1479	return text.split("\n")
1480
1481
1482	def recordFootnoteUse(self, match) :
1483
1484	id = match.group(1)
1485	id = id.strip()
1486	nextNum = len(self.footnotes.used_footnotes.keys()) + 1
1487	self.footnotes.used_footnotes[id] = nextNum
1488
1489
1490	def _handleFootnoteDefinitions(self, lines) :
1491	"""Recursively finds all footnote definitions in the lines.
1492
1493	@param lines: a list of lines of text
1494	@returns: a string representing the text with footnote
1495	definitions removed """
1496
1497	i, id, footnote = self._findFootnoteDefinition(lines)
1498
1499	if id :
1500
1501	plain = lines[:i]
1502
1503	detabbed, theRest = self.blockGuru.detectTabbed(lines[i+1:])
1504
1505	self.footnotes.setFootnote(id,
1506	footnote + "\n"
1507	+ "\n".join(detabbed))
1508
1509	more_plain = self._handleFootnoteDefinitions(theRest)
1510	return plain + [""] + more_plain
1511
1512	else :
1513	return lines
1514
1515	def _findFootnoteDefinition(self, lines) :
1516	"""Finds the first line of a footnote definition.
1517
1518	@param lines: a list of lines of text
1519	@returns: the index of the line containing a footnote definition """
1520
1521	counter = 0
1522	for line in lines :
1523	m = self.footnotes.DEF_RE.match(line)
1524	if m :
1525	return counter, m.group(2), m.group(3)
1526	counter += 1
1527	return counter, None, None
1528
1529
1530	class FootnotePattern (BasePattern) :
1531
1532	def __init__ (self, pattern, footnotes) :
1533
1534	BasePattern.__init__(self, pattern)
1535	self.footnotes = footnotes
1536
1537	def handleMatch(self, m, doc) :
1538	sup = doc.createElement('sup')
1539	a = doc.createElement('a')
1540	sup.appendChild(a)
1541	id = m.group(2)
1542	num = self.footnotes.used_footnotes[id]
1543	sup.setAttribute('id', self.footnotes.makeFootnoteRefId(num))
1544	a.setAttribute('href', '#' + self.footnotes.makeFootnoteId(num))
1545	a.appendChild(doc.createTextNode(str(num)))
1546	return sup
1547
1548	class FootnotePostprocessor :
1549
1550	def __init__ (self, footnotes) :
1551	self.footnotes = footnotes
1552
1553	def run(self, doc) :
1554	footnotesDiv = self.footnotes.makeFootnotesDiv(doc)
1555	if footnotesDiv :
1556	fnPlaceholder = self.extension.findFootnotesPlaceholder(doc)
1557	if fnPlaceholder :
1558	fnPlaceholder.parent.replaceChild(fnPlaceholder, footnotesDiv)
1559	else :
1560	doc.documentElement.appendChild(footnotesDiv)
1561
1562	# ====================================================================
1563
1564	def markdown(text) :
1565	message(VERBOSE, "in markdown.py, received text:\n%s" % text)
1566	return Markdown(text).toString()
1567
1568	def markdownWithFootnotes(text):
1569	message(VERBOSE, "Running markdown with footnotes, "
1570	+ "received text:\n%s" % text)
1571	md = Markdown()
1572	footnoteExtension = FootnoteExtension()
1573	footnoteExtension.extendMarkdown(md)
1574	md.source = text
1575
1576	return str(md)
1577
1578	def test_markdown(args):
1579	"""test markdown at the command line.
1580	in each test, arg 0 is the module name"""
1581	print "\nTEST 1: no arguments on command line"
1582	cmd_line(["markdown.py"])
1583	print "\nTEST 2a: 1 argument on command line: a good option"
1584	cmd_line(["markdown.py","-footnotes"])
1585	print "\nTEST 2b: 1 argument on command line: a bad option"
1586	cmd_line(["markdown.py","-foodnotes"])
1587	print "\nTEST 3: 1 argument on command line: non-existent input file"
1588	cmd_line(["markdown.py","junk.txt"])
1589	print "\nTEST 4: 1 argument on command line: existing input file"
1590	lines = """
1591	Markdown text with[^1]:
1592
1593	2. bold text,
1594	3. italic text.
1595
1596	Then more:
1597
1598	beginning of code block;
1599	another line of code block.
1600
1601	a second paragraph of code block.
1602
1603	more text to end our file.
1604
1605	[^1]: "italic" means emphasis.
1606	"""
1607	fid = "markdown-test.txt"
1608	f1 = open(fid, 'w+')
1609	f1.write(lines)
1610	f1.close()
1611	cmd_line(["markdown.py",fid])
1612	print "\nTEST 5: 2 arguments on command line: nofootnotes and input file"
1613	cmd_line(["markdown.py","-nofootnotes", fid])
1614	print "\nTEST 6: 2 arguments on command line: footnotes and input file"
1615	cmd_line(["markdown.py","-footnotes", fid])
1616	print "\nTEST 7: 3 arguments on command line: nofootnotes,inputfile, outputfile"
1617	fidout = "markdown-test.html"
1618	cmd_line(["markdown.py","-nofootnotes", fid, fidout])
1619
1620
1621	def get_vars(args):
1622	"""process the command-line args received; return usable variables"""
1623	#firstly get the variables
1624
1625	message(VERBOSE, "in get_vars(), args: %s" % args)
1626
1627	if len(args) <= 1:
1628	option, inFile, outFile = (None, None, None)
1629	elif len(args) >= 4:
1630	option, inFile, outFile = args[1:4]
1631	elif len(args) == 3:
1632	temp1, temp2 = args[1:3]
1633	if temp1[0] == '-':
1634	#then we have an option and inFile
1635	option, inFile, outFile = temp1, temp2, None
1636	else:
1637	#we have no option, so we must have inFile and outFile
1638	option, inFile, outFile = None, temp1, temp2
1639	else:
1640	#len(args) = 2
1641	#we have only one usable arg: might be an option or a file
1642	temp1 = args[1]
1643
1644	message(VERBOSE, "our single arg is: %s" % str(temp1))
1645
1646	if temp1[0] == '-':
1647	#then we have an option
1648	option, inFile, outFile = temp1, None, None
1649	else:
1650	#we have no option, so we must have inFile
1651	option, inFile, outFile = None, temp1, None
1652
1653	message(VERBOSE,
1654	"prior to validation, option: %s, inFile: %s, outFile: %s" %
1655	(str(option), str(inFile), str(outFile),))
1656
1657	return option, inFile, outFile
1658
1659
1660	USAGE = """
1661	\nUsing markdown.py:
1662
1663	python markdown.py [option] input_file_with_markdown.txt [output_file.html]
1664
1665	Options:
1666
1667	-footnotes or -fn : generate markdown with footnotes
1668	-test or -t : run a self-test
1669	-help or -h : print this message
1670
1671	"""
1672
1673	VALID_OPTIONS = ['footnotes','nofootnotes', 'fn', 'test', 't', 'f',
1674	'help', 'h']
1675
1676	EXPANDED_OPTIONS = { "fn" : "footnotes",
1677	"t" : "test",
1678	"h" : "help" }
1679
1680
1681	def validate_option(option) :
1682
1683	""" Check if the option makes sense and print an appropriate message
1684	if it isn't.
1685
1686	@return: valid option string or None
1687	"""
1688
1689	#now validate the variables
1690	if (option is not None):
1691	if (len(option) > 1 and option[1:] in VALID_OPTIONS) :
1692	option = option[1:]
1693
1694	if option in EXPANDED_OPTIONS.keys() :
1695	option = EXPANDED_OPTIONS[option]
1696	return option
1697	else:
1698	message(CRITICAL,
1699	"\nSorry, I don't understand option %s" % option)
1700	message(CRITICAL, USAGE)
1701	return None
1702
1703
1704	def validate_input_file(inFile) :
1705	""" Check if the input file is specified and exists.
1706
1707	@return: valid input file path or None
1708	"""
1709
1710	if not inFile :
1711	message(CRITICAL,
1712	"\nI need an input filename.\n")
1713	message(CRITICAL, USAGE)
1714	return None
1715
1716
1717	if os.access(inFile, os.R_OK):
1718	return inFile
1719	else :
1720	message(CRITICAL, "Sorry, I can't find input file %s" % str(inFile))
1721	return None
1722
1723
1724
1725
1726	def cmd_line(args):
1727
1728	message(VERBOSE, "in cmd_line with args: %s" % args)
1729
1730	option, inFile, outFile = get_vars(args)
1731
1732	if option :
1733	option = validate_option(option)
1734	if not option : return
1735
1736	if option == "help" :
1737	message(CRITICAL, USAGE)
1738	return
1739	elif option == "test" :
1740	test_markdown(None)
1741	return
1742
1743	inFile = validate_input_file(inFile)
1744	if not inFile :
1745	return
1746	else :
1747	input = file(inFile).read()
1748
1749	message(VERBOSE, "Validated command line parameters:" +
1750	"\n\toption: %s, \n\tinFile: %s, \n\toutFile: %s" % (
1751	str(option), str(inFile), str(outFile),))
1752
1753	if option == "footnotes" :
1754	md_function = markdownWithFootnotes
1755	else :
1756	md_function = markdown
1757
1758	if outFile is None:
1759	print md_function(input)
1760	else:
1761	output = md_function(input)
1762	f1 = open(outFile, "w+")
1763	f1.write(output)
1764	f1.close()
1765
1766	if os.access(outFile, os.F_OK):
1767	message(INFO, "Successfully wrote %s" % outFile)
1768	else:
1769	message(INFO, "Failed to write %s" % outFile)
1770
1771
1772	if __name__ == '__main__':
1773	""" Run Markdown from the command line.
1774	Set debug = 3 at top of file to get diagnostic output"""
1775	args = sys.argv
1776
1777	#set testing=1 to test the command-line response of markdown.py
1778	testing = 0
1779	if testing:
1780	test_markdown(args)
1781	else:
1782	import time
1783	t0 = time.time()
1784	#for x in range(10) :
1785	cmd_line(args)
1786	#import profile
1787	#profile.run('cmd_line(args)', 'profile')
1788	t1 = time.time()
1789	#print "Time: %f - %f = %f" % (t1, t0, t1-t0)
1790
1791	"""
1792	CHANGELOG
1793	=========
1794
1795	May 15, 2006: A bug with lists, recursion on block-level elements,
1796	run-in headers, spaces before headers, unicode input (thanks to Aaron
1797	Swartz). Sourceforge tracker #s: 1489313, 1489312, 1489311, 1488370,
1798	1485178, 1485176. (v. 1.5)
1799
1800	Mar. 24, 2006: Switched to a not-so-recursive algorithm with
1801	_handleInline. (Version 1.4)
1802
1803	Mar. 15, 2006: Replaced some instance variables with class variables
1804	(a patch from Stelios Xanthakis). Chris Clark's new regexps that do
1805	not trigger midword underlining.
1806
1807	Feb. 28, 2006: Clean-up and command-line handling by Stewart
1808	Midwinter. (Version 1.3)
1809
1810	Feb. 24, 2006: Fixed a bug with the last line of the list appearing
1811	again as a separate paragraph. Incorporated Chris Clark's "mailto"
1812	patch. Added support for <br /> at the end of lines ending in two or
1813	more spaces. Fixed a crashing bug when using ImageReferencePattern.
1814	Added several utility methods to Nanodom. (Version 1.2)
1815
1816	Jan. 31, 2006: Added "hr" and "hr/" to BLOCK_LEVEL_ELEMENTS and
1817	changed <hr/> to <hr />. (Thanks to Sergej Chodarev.)
1818
1819	Nov. 26, 2005: Fixed a bug with certain tabbed lines inside lists
1820	getting wrapped in <pre><code>. (v. 1.1)
1821
1822	Nov. 19, 2005: Made "<!...", "<?...", etc. behave like block-level
1823	HTML tags.
1824
1825	Nov. 14, 2005: Added entity code and email autolink fix by Tiago
1826	Cogumbreiro. Fixed some small issues with backticks to get 100%
1827	compliance with John's test suite. (v. 1.0)
1828
1829	Nov. 7, 2005: Added an unlink method for documents to aid with memory
1830	collection (per Doug Sauder's suggestion).
1831
1832	Oct. 29, 2005: Restricted a set of html tags that get treated as
1833	block-level elements.
1834
1835	Sept. 18, 2005: Refactored the whole script to make it easier to
1836	customize it and made footnote functionality into an extension.
1837	(v. 0.9)
1838
1839	Sept. 5, 2005: Fixed a bug with multi-paragraph footnotes. Added
1840	attribute support.
1841
1842	Sept. 1, 2005: Changed the way headers are handled to allow inline
1843	syntax in headers (e.g. links) and got the lists to use p-tags
1844	correctly (v. 0.8)
1845
1846	Aug. 29, 2005: Added flexible tabs, fixed a few small issues, added
1847	basic support for footnotes. Got rid of xml.dom.minidom and added
1848	pretty-printing. (v. 0.7)
1849
1850	Aug. 13, 2005: Fixed a number of small bugs in order to conform to the
1851	test suite. (v. 0.6)
1852
1853	Aug. 11, 2005: Added support for inline html and entities, inline
1854	images, autolinks, underscore emphasis. Cleaned up and refactored the
1855	code, added some more comments.
1856
1857	Feb. 19, 2005: Rewrote the handling of high-level elements to allow
1858	multi-line list items and all sorts of nesting.
1859
1860	Feb. 3, 2005: Reference-style links, single-line lists, backticks,
1861	escape, emphasis in the beginning of the paragraph.
1862
1863	Nov. 2004: Added links, blockquotes, html blocks to Manfred
1864	Stienstra's code
1865
1866	Apr. 2004: Manfred's version at http://www.dwerg.net/projects/markdown/
1867
1868	"""
1869
1870
1871
1872
1873
1874

Note: リポジトリブラウザについてのヘルプは TracBrowser を参照してください。

Context Navigation

root/galaxy-central/eggs/WebHelpers-0.2-py2.6.egg/webhelpers/markdown.py @ 3

異なるフォーマットでダウンロード: