Context Navigation

seqmapping.py @ 3

リビジョン 3, 2.9 KB (コミッタ: kohda, 14 年前)
Install Unix tools http://hannonlab.cshl.edu/galaxy_unix_tools/galaxy.html

行番号
1	"""
2	Classes for char-to-int mapping and int-to-int mapping.
3
4	:Author: James Taylor (james@bx.psu.edu)
5
6	The char-to-int mapping can be used to translate a list of strings
7	over some alphabet to a single int array (example for encoding a multiple
8	sequence alignment).
9
10	The int-to-int mapping is particularly useful for creating partitions,
11	and provides methods to merge/split symbols in the output mapping.
12
13	The two forms of mapping can be combined, for example to encode a
14	multiple sequence alignment in a reduced alphabet defined by a partition
15	of alignment columns. Many of the helper methods provided are for
16	solving such alignment oriented problems.
17
18	This code was originally written for the `ESPERR`_ project which includes
19	software for searcing for alignment encodings that work well for specific
20	classification problems using various Markov chain classifiers over the
21	reduced encodings.
22
23	Most of the core implementation is in the pyrex/C extension
24	"_seqmapping.pyx" for performance reasons (specifically to avoid the
25	excessive bounds checking that would make a sequence/array lookup heavy
26	problem like this slow in pure python).
27
28	.. _ESPERR: http://www.bx.psu.edu/projects/esperr/
29	"""
30
31	from _seqmapping import *
32
33	# Char->Int mapping for DNA characters with missing data
34
35	DNA = CharToIntArrayMapping()
36	DNA.set_mapping( "a", 0 )
37	DNA.set_mapping( "A", 0 )
38	DNA.set_mapping( "c", 1 )
39	DNA.set_mapping( "C", 1 )
40	DNA.set_mapping( "g", 2 )
41	DNA.set_mapping( "G", 2 )
42	DNA.set_mapping( "t", 3 )
43	DNA.set_mapping( "T", 3 )
44	DNA.set_mapping( "-", 4 )
45	DNA.set_mapping( "*", 5 )
46
47	# Creating mappings
48
49	def alignment_mapping_from_file( f, char_mapping=DNA ):
50	"""
51	Create a mapping from a file of alignment columns.
52	"""
53	columns, symbols = [], []
54	for line in f:
55	column, symbol = line.split()
56	columns.append( column )
57	symbols.append( int( symbol ) )
58
59	align_count = len( columns[0] )
60
61	mapping = IntToIntMapping( char_mapping.get_out_size() ** align_count )
62
63	for column, symbol in zip( columns, symbols ):
64	index = char_mapping.translate_list( list( column ) )[0]
65	mapping.set_mapping( index, symbol )
66
67	return align_count, mapping
68
69	def second_mapping_from_file( f, first_mapping, char_mapping=DNA ):
70
71	columns, symbols = [], []
72	for line in f:
73	column, symbol = line.split()
74	columns.append( column )
75	symbols.append( int( symbol ) )
76
77	align_count = len( columns[0] )
78
79	mapping = IntToIntMapping( first_mapping.get_out_size() )
80
81	for column, symbol in zip( columns, symbols ):
82	index = char_mapping.translate_list( list( column ) )[0]
83	if first_mapping[index] >= 0:
84	mapping.set_mapping( first_mapping[index], symbol )
85
86	return mapping
87
88
89	def identity_mapping( size ):
90	mapping = IntToIntMapping( size )
91	for i in range( size ):
92	mapping.set_mapping( i, i )
93	return mapping
94

Note: リポジトリブラウザについてのヘルプは TracBrowser を参照してください。

Context Navigation

root/galaxy-central/eggs/bx_python-0.5.0_dev_f74aec067563-py2.6-macosx-10.6-universal-ucs2.egg/bx/seqmapping.py @ 3

異なるフォーマットでダウンロード: