Context Navigation

stats.py

リビジョン 3, 147.9 KB (コミッタ: kohda, 15 年前)
Install Unix tools http://hannonlab.cshl.edu/galaxy_unix_tools/galaxy.html

行番号
1	# Copyright (c) 1999-2002 Gary Strangman; All Rights Reserved.
2	#
3	# This software is distributable under the terms of the GNU
4	# General Public License (GPL) v2, the text of which can be found at
5	# http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise
6	# using this module constitutes acceptance of the terms of this License.
7	#
8	# Disclaimer
9	#
10	# This software is provided "as-is". There are no expressed or implied
11	# warranties of any kind, including, but not limited to, the warranties
12	# of merchantability and fittness for a given application. In no event
13	# shall Gary Strangman be liable for any direct, indirect, incidental,
14	# special, exemplary or consequential damages (including, but not limited
15	# to, loss of use, data or profits, or business interruption) however
16	# caused and on any theory of liability, whether in contract, strict
17	# liability or tort (including negligence or otherwise) arising in any way
18	# out of the use of this software, even if advised of the possibility of
19	# such damage.
20	#
21	# Comments and/or additions are welcome (send e-mail to:
22	# strang@nmr.mgh.harvard.edu).
23	#
24	"""
25	stats.py module
26
27	(Requires pstat.py module.)
28
29	#################################################
30	####### Written by: Gary Strangman ###########
31	####### Last modified: May 10, 2002 ###########
32	#################################################
33
34	A collection of basic statistical functions for python. The function
35	names appear below.
36
37	IMPORTANT: There are really 3 sets of functions. The first set has an 'l'
38	prefix, which can be used with list or tuple arguments. The second set has
39	an 'a' prefix, which can accept NumPy array arguments. These latter
40	functions are defined only when NumPy is available on the system. The third
41	type has NO prefix (i.e., has the name that appears below). Functions of
42	this set are members of a "Dispatch" class, c/o David Ascher. This class
43	allows different functions to be called depending on the type of the passed
44	arguments. Thus, stats.mean is a member of the Dispatch class and
45	stats.mean(range(20)) will call stats.lmean(range(20)) while
46	stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
47	This is a handy way to keep consistent function names when different
48	argument types require different functions to be called. Having
49	implementated the Dispatch class, however, means that to get info on
50	a given function, you must use the REAL function name ... that is
51	"print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
52	while "print stats.mean.__doc__" will print the doc for the Dispatch
53	class. NUMPY FUNCTIONS ('a' prefix) generally have more argument options
54	but should otherwise be consistent with the corresponding list functions.
55
56	Disclaimers: The function list is obviously incomplete and, worse, the
57	functions are not optimized. All functions have been tested (some more
58	so than others), but they are far from bulletproof. Thus, as with any
59	free software, no warranty or guarantee is expressed or implied. :-) A
60	few extra functions that don't appear in the list below can be found by
61	interested treasure-hunters. These functions don't necessarily have
62	both list and array versions but were deemed useful
63
64	CENTRAL TENDENCY: geometricmean
65	harmonicmean
66	mean
67	median
68	medianscore
69	mode
70
71	MOMENTS: moment
72	variation
73	skew
74	kurtosis
75	skewtest (for Numpy arrays only)
76	kurtosistest (for Numpy arrays only)
77	normaltest (for Numpy arrays only)
78
79	ALTERED VERSIONS: tmean (for Numpy arrays only)
80	tvar (for Numpy arrays only)
81	tmin (for Numpy arrays only)
82	tmax (for Numpy arrays only)
83	tstdev (for Numpy arrays only)
84	tsem (for Numpy arrays only)
85	describe
86
87	FREQUENCY STATS: itemfreq
88	scoreatpercentile
89	percentileofscore
90	histogram
91	cumfreq
92	relfreq
93
94	VARIABILITY: obrientransform
95	samplevar
96	samplestdev
97	signaltonoise (for Numpy arrays only)
98	var
99	stdev
100	sterr
101	sem
102	z
103	zs
104	zmap (for Numpy arrays only)
105
106	TRIMMING FCNS: threshold (for Numpy arrays only)
107	trimboth
108	trim1
109	round (round all vals to 'n' decimals; Numpy only)
110
111	CORRELATION FCNS: covariance (for Numpy arrays only)
112	correlation (for Numpy arrays only)
113	paired
114	pearsonr
115	spearmanr
116	pointbiserialr
117	kendalltau
118	linregress
119
120	INFERENTIAL STATS: ttest_1samp
121	ttest_ind
122	ttest_rel
123	chisquare
124	ks_2samp
125	mannwhitneyu
126	ranksums
127	wilcoxont
128	kruskalwallish
129	friedmanchisquare
130
131	PROBABILITY CALCS: chisqprob
132	erfcc
133	zprob
134	ksprob
135	fprob
136	betacf
137	gammln
138	betai
139
140	ANOVA FUNCTIONS: F_oneway
141	F_value
142
143	SUPPORT FUNCTIONS: writecc
144	incr
145	sign (for Numpy arrays only)
146	sum
147	cumsum
148	ss
149	summult
150	sumdiffsquared
151	square_of_sums
152	shellsort
153	rankdata
154	outputpairedstats
155	findwithin
156	"""
157	## CHANGE LOG:
158	## ===========
159	## 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows
160	## 02-05-10 ... fixed lchisqprob indentation (failed when df=even)
161	## 00-12-28 ... removed aanova() to separate module, fixed licensing to
162	## match Python License, fixed doc string & imports
163	## 00-04-13 ... pulled all "global" statements, except from aanova()
164	## added/fixed lots of documentation, removed io.py dependency
165	## changed to version 0.5
166	## 99-11-13 ... added asign() function
167	## 99-11-01 ... changed version to 0.4 ... enough incremental changes now
168	## 99-10-25 ... added acovariance and acorrelation functions
169	## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors
170	## added aglm function (crude, but will be improved)
171	## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to
172	## all handle lists of 'dimension's and keepdims
173	## REMOVED ar0, ar2, ar3, ar4 and replaced them with around
174	## reinserted fixes for abetai to avoid math overflows
175	## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to
176	## handle multi-dimensional arrays (whew!)
177	## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990)
178	## added anormaltest per same reference
179	## re-wrote azprob to calc arrays of probs all at once
180	## 99-08-22 ... edited attest_ind printing section so arrays could be rounded
181	## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on
182	## short/byte arrays (mean of #s btw 100-300 = -150??)
183	## 99-08-09 ... fixed asum so that the None case works for Byte arrays
184	## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays
185	## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap)
186	## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0])
187	## 04/11/99 ... added asignaltonoise, athreshold functions, changed all
188	## max/min in array section to N.maximum/N.minimum,
189	## fixed square_of_sums to prevent integer overflow
190	## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums
191	## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions
192	## 02/28/99 ... Fixed aobrientransform to return an array rather than a list
193	## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!)
194	## 01/13/99 ... CHANGED TO VERSION 0.3
195	## fixed bug in a/lmannwhitneyu p-value calculation
196	## 12/31/98 ... fixed variable-name bug in ldescribe
197	## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix)
198	## 12/16/98 ... changed amedianscore to return float (not array) for 1 score
199	## 12/14/98 ... added atmin and atmax functions
200	## removed umath from import line (not needed)
201	## l/ageometricmean modified to reduce chance of overflows (take
202	## nth root first, then multiply)
203	## 12/07/98 ... added __version__variable (now 0.2)
204	## removed all 'stats.' from anova() fcn
205	## 12/06/98 ... changed those functions (except shellsort) that altered
206	## arguments in-place ... cumsum, ranksort, ...
207	## updated (and fixed some) doc-strings
208	## 12/01/98 ... added anova() function (requires NumPy)
209	## incorporated Dispatch class
210	## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean
211	## added 'asum' function (added functionality to N.add.reduce)
212	## fixed both moment and amoment (two errors)
213	## changed name of skewness and askewness to skew and askew
214	## fixed (a)histogram (which sometimes counted points <lowerlimit)
215
216	import pstat # required 3rd party module
217	import math, string, copy # required python modules
218	from types import *
219
220	__version__ = 0.6
221
222	############# DISPATCH CODE ##############
223
224
225	class Dispatch:
226	"""
227	The Dispatch class, care of David Ascher, allows different functions to
228	be called depending on the argument types. This way, there can be one
229	function name regardless of the argument type. To access function doc
230	in stats.py module, prefix the function with an 'l' or 'a' for list or
231	array arguments, respectively. That is, print stats.lmean.__doc__ or
232	print stats.amean.__doc__ or whatever.
233	"""
234
235	def __init__(self, *tuples):
236	self._dispatch = {}
237	for func, types in tuples:
238	for t in types:
239	if t in self._dispatch.keys():
240	raise ValueError, "can't have two dispatches on "+str(t)
241	self._dispatch[t] = func
242	self._types = self._dispatch.keys()
243
244	def __call__(self, arg1, args, *kw):
245	if type(arg1) not in self._types:
246	raise TypeError, "don't know how to dispatch %s arguments" % type(arg1)
247	return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
248
249
250	##########################################################################
251	######################## LIST-BASED FUNCTIONS ########################
252	##########################################################################
253
254	### Define these regardless
255
256	####################################
257	####### CENTRAL TENDENCY #########
258	####################################
259
260	def lgeometricmean (inlist):
261	"""
262	Calculates the geometric mean of the values in the passed list.
263	That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list.
264
265	Usage: lgeometricmean(inlist)
266	"""
267	mult = 1.0
268	one_over_n = 1.0/len(inlist)
269	for item in inlist:
270	mult = mult * pow(item,one_over_n)
271	return mult
272
273
274	def lharmonicmean (inlist):
275	"""
276	Calculates the harmonic mean of the values in the passed list.
277	That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list.
278
279	Usage: lharmonicmean(inlist)
280	"""
281	sum = 0
282	for item in inlist:
283	sum = sum + 1.0/item
284	return len(inlist) / sum
285
286
287	def lmean (inlist):
288	"""
289	Returns the arithematic mean of the values in the passed list.
290	Assumes a '1D' list, but will function on the 1st dim of an array(!).
291
292	Usage: lmean(inlist)
293	"""
294	sum = 0
295	for item in inlist:
296	sum = sum + item
297	return sum/float(len(inlist))
298
299
300	def lmedian (inlist,numbins=1000):
301	"""
302	Returns the computed median value of a list of numbers, given the
303	number of bins to use for the histogram (more bins brings the computed value
304	closer to the median score, default number of bins = 1000). See G.W.
305	Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics.
306
307	Usage: lmedian (inlist, numbins=1000)
308	"""
309	(hist, smallest, binsize, extras) = histogram(inlist,numbins) # make histog
310	cumhist = cumsum(hist) # make cumulative histogram
311	for i in range(len(cumhist)): # get 1st(!) index holding 50%ile score
312	if cumhist[i]>=len(inlist)/2.0:
313	cfbin = i
314	break
315	LRL = smallest + binsize*cfbin # get lower read limit of that bin
316	cfbelow = cumhist[cfbin-1]
317	freq = float(hist[cfbin]) # frequency IN the 50%ile bin
318	median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize # median formula
319	return median
320
321
322	def lmedianscore (inlist):
323	"""
324	Returns the 'middle' score of the passed list. If there is an even
325	number of scores, the mean of the 2 middle scores is returned.
326
327	Usage: lmedianscore(inlist)
328	"""
329
330	newlist = copy.deepcopy(inlist)
331	newlist.sort()
332	if len(newlist) % 2 == 0: # if even number of scores, average middle 2
333	index = len(newlist)/2 # integer division correct
334	median = float(newlist[index] + newlist[index-1]) /2
335	else:
336	index = len(newlist)/2 # int divsion gives mid value when count from 0
337	median = newlist[index]
338	return median
339
340
341	def lmode(inlist):
342	"""
343	Returns a list of the modal (most common) score(s) in the passed
344	list. If there is more than one such score, all are returned. The
345	bin-count for the mode(s) is also returned.
346
347	Usage: lmode(inlist)
348	Returns: bin-count for mode(s), a list of modal value(s)
349	"""
350
351	scores = pstat.unique(inlist)
352	scores.sort()
353	freq = []
354	for item in scores:
355	freq.append(inlist.count(item))
356	maxfreq = max(freq)
357	mode = []
358	stillmore = 1
359	while stillmore:
360	try:
361	indx = freq.index(maxfreq)
362	mode.append(scores[indx])
363	del freq[indx]
364	del scores[indx]
365	except ValueError:
366	stillmore=0
367	return maxfreq, mode
368
369
370	####################################
371	############ MOMENTS #############
372	####################################
373
374	def lmoment(inlist,moment=1):
375	"""
376	Calculates the nth moment about the mean for a sample (defaults to
377	the 1st moment). Used to calculate coefficients of skewness and kurtosis.
378
379	Usage: lmoment(inlist,moment=1)
380	Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
381	"""
382	if moment == 1:
383	return 0.0
384	else:
385	mn = mean(inlist)
386	n = len(inlist)
387	s = 0
388	for x in inlist:
389	s = s + (x-mn)**moment
390	return s/float(n)
391
392
393	def lvariation(inlist):
394	"""
395	Returns the coefficient of variation, as defined in CRC Standard
396	Probability and Statistics, p.6.
397
398	Usage: lvariation(inlist)
399	"""
400	return 100.0*samplestdev(inlist)/float(mean(inlist))
401
402
403	def lskew(inlist):
404	"""
405	Returns the skewness of a distribution, as defined in Numerical
406	Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
407
408	Usage: lskew(inlist)
409	"""
410	return moment(inlist,3)/pow(moment(inlist,2),1.5)
411
412
413	def lkurtosis(inlist):
414	"""
415	Returns the kurtosis of a distribution, as defined in Numerical
416	Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
417
418	Usage: lkurtosis(inlist)
419	"""
420	return moment(inlist,4)/pow(moment(inlist,2),2.0)
421
422
423	def ldescribe(inlist):
424	"""
425	Returns some descriptive statistics of the passed list (assumed to be 1D).
426
427	Usage: ldescribe(inlist)
428	Returns: n, mean, standard deviation, skew, kurtosis
429	"""
430	n = len(inlist)
431	mm = (min(inlist),max(inlist))
432	m = mean(inlist)
433	sd = stdev(inlist)
434	sk = skew(inlist)
435	kurt = kurtosis(inlist)
436	return n, mm, m, sd, sk, kurt
437
438
439	####################################
440	####### FREQUENCY STATS ##########
441	####################################
442
443	def litemfreq(inlist):
444	"""
445	Returns a list of pairs. Each pair consists of one of the scores in inlist
446	and it's frequency count. Assumes a 1D list is passed.
447
448	Usage: litemfreq(inlist)
449	Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
450	"""
451	scores = pstat.unique(inlist)
452	scores.sort()
453	freq = []
454	for item in scores:
455	freq.append(inlist.count(item))
456	return pstat.abut(scores, freq)
457
458
459	def lscoreatpercentile (inlist, percent):
460	"""
461	Returns the score at a given percentile relative to the distribution
462	given by inlist.
463
464	Usage: lscoreatpercentile(inlist,percent)
465	"""
466	if percent > 1:
467	print "\nDividing percent>1 by 100 in lscoreatpercentile().\n"
468	percent = percent / 100.0
469	targetcf = percent*len(inlist)
470	h, lrl, binsize, extras = histogram(inlist)
471	cumhist = cumsum(copy.deepcopy(h))
472	for i in range(len(cumhist)):
473	if cumhist[i] >= targetcf:
474	break
475	score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
476	return score
477
478
479	def lpercentileofscore (inlist, score,histbins=10,defaultlimits=None):
480	"""
481	Returns the percentile value of a score relative to the distribution
482	given by inlist. Formula depends on the values used to histogram the data(!).
483
484	Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
485	"""
486
487	h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits)
488	cumhist = cumsum(copy.deepcopy(h))
489	i = int((score - lrl)/float(binsize))
490	pct = (cumhist[i-1]+((score-(lrl+binsizei))/float(binsize))h[i])/float(len(inlist)) * 100
491	return pct
492
493
494	def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0):
495	"""
496	Returns (i) a list of histogram bin counts, (ii) the smallest value
497	of the histogram binning, and (iii) the bin width (the last 2 are not
498	necessarily integers). Default number of bins is 10. If no sequence object
499	is given for defaultreallimits, the routine picks (usually non-pretty) bins
500	spanning all the numbers in the inlist.
501
502	Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0)
503	Returns: list of bin values, lowerreallimit, binsize, extrapoints
504	"""
505	if (defaultreallimits <> None):
506	if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1: # only one limit given, assumed to be lower one & upper is calc'd
507	lowerreallimit = defaultreallimits
508	upperreallimit = 1.0001 * max(inlist)
509	else: # assume both limits given
510	lowerreallimit = defaultreallimits[0]
511	upperreallimit = defaultreallimits[1]
512	binsize = (upperreallimit-lowerreallimit)/float(numbins)
513	else: # no limits given for histogram, both must be calc'd
514	estbinwidth=(max(inlist)-min(inlist))/float(numbins) + 1 # 1=>cover all
515	binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins)
516	lowerreallimit = min(inlist) - binsize/2 #lower real limit,1st bin
517	bins = [0]*(numbins)
518	extrapoints = 0
519	for num in inlist:
520	try:
521	if (num-lowerreallimit) < 0:
522	extrapoints = extrapoints + 1
523	else:
524	bintoincrement = int((num-lowerreallimit)/float(binsize))
525	bins[bintoincrement] = bins[bintoincrement] + 1
526	except:
527	extrapoints = extrapoints + 1
528	if (extrapoints > 0 and printextras == 1):
529	print '\nPoints outside given histogram range =',extrapoints
530	return (bins, lowerreallimit, binsize, extrapoints)
531
532
533	def lcumfreq(inlist,numbins=10,defaultreallimits=None):
534	"""
535	Returns a cumulative frequency histogram, using the histogram function.
536
537	Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None)
538	Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
539	"""
540	h,l,b,e = histogram(inlist,numbins,defaultreallimits)
541	cumhist = cumsum(copy.deepcopy(h))
542	return cumhist,l,b,e
543
544
545	def lrelfreq(inlist,numbins=10,defaultreallimits=None):
546	"""
547	Returns a relative frequency histogram, using the histogram function.
548
549	Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None)
550	Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
551	"""
552	h,l,b,e = histogram(inlist,numbins,defaultreallimits)
553	for i in range(len(h)):
554	h[i] = h[i]/float(len(inlist))
555	return h,l,b,e
556
557
558	####################################
559	##### VARIABILITY FUNCTIONS ######
560	####################################
561
562	def lobrientransform(*args):
563	"""
564	Computes a transform on input data (any number of columns). Used to
565	test for homogeneity of variance prior to running one-way stats. From
566	Maxwell and Delaney, p.112.
567
568	Usage: lobrientransform(*args)
569	Returns: transformed data for use in an ANOVA
570	"""
571	TINY = 1e-10
572	k = len(args)
573	n = [0.0]*k
574	v = [0.0]*k
575	m = [0.0]*k
576	nargs = []
577	for i in range(k):
578	nargs.append(copy.deepcopy(args[i]))
579	n[i] = float(len(nargs[i]))
580	v[i] = var(nargs[i])
581	m[i] = mean(nargs[i])
582	for j in range(k):
583	for i in range(n[j]):
584	t1 = (n[j]-1.5)n[j](nargs[j][i]-m[j])**2
585	t2 = 0.5v[j](n[j]-1.0)
586	t3 = (n[j]-1.0)*(n[j]-2.0)
587	nargs[j][i] = (t1-t2) / float(t3)
588	check = 1
589	for j in range(k):
590	if v[j] - mean(nargs[j]) > TINY:
591	check = 0
592	if check <> 1:
593	raise ValueError, 'Problem in obrientransform.'
594	else:
595	return nargs
596
597
598	def lsamplevar (inlist):
599	"""
600	Returns the variance of the values in the passed list using
601	N for the denominator (i.e., DESCRIBES the sample variance only).
602
603	Usage: lsamplevar(inlist)
604	"""
605	n = len(inlist)
606	mn = mean(inlist)
607	deviations = []
608	for item in inlist:
609	deviations.append(item-mn)
610	return ss(deviations)/float(n)
611
612
613	def lsamplestdev (inlist):
614	"""
615	Returns the standard deviation of the values in the passed list using
616	N for the denominator (i.e., DESCRIBES the sample stdev only).
617
618	Usage: lsamplestdev(inlist)
619	"""
620	return math.sqrt(samplevar(inlist))
621
622
623	def lvar (inlist):
624	"""
625	Returns the variance of the values in the passed list using N-1
626	for the denominator (i.e., for estimating population variance).
627
628	Usage: lvar(inlist)
629	"""
630	n = len(inlist)
631	mn = mean(inlist)
632	deviations = [0]*len(inlist)
633	for i in range(len(inlist)):
634	deviations[i] = inlist[i] - mn
635	return ss(deviations)/float(n-1)
636
637
638	def lstdev (inlist):
639	"""
640	Returns the standard deviation of the values in the passed list
641	using N-1 in the denominator (i.e., to estimate population stdev).
642
643	Usage: lstdev(inlist)
644	"""
645	return math.sqrt(var(inlist))
646
647
648	def lsterr(inlist):
649	"""
650	Returns the standard error of the values in the passed list using N-1
651	in the denominator (i.e., to estimate population standard error).
652
653	Usage: lsterr(inlist)
654	"""
655	return stdev(inlist) / float(math.sqrt(len(inlist)))
656
657
658	def lsem (inlist):
659	"""
660	Returns the estimated standard error of the mean (sx-bar) of the
661	values in the passed list. sem = stdev / sqrt(n)
662
663	Usage: lsem(inlist)
664	"""
665	sd = stdev(inlist)
666	n = len(inlist)
667	return sd/math.sqrt(n)
668
669
670	def lz (inlist, score):
671	"""
672	Returns the z-score for a given input score, given that score and the
673	list from which that score came. Not appropriate for population calculations.
674
675	Usage: lz(inlist, score)
676	"""
677	z = (score-mean(inlist))/samplestdev(inlist)
678	return z
679
680
681	def lzs (inlist):
682	"""
683	Returns a list of z-scores, one for each score in the passed list.
684
685	Usage: lzs(inlist)
686	"""
687	zscores = []
688	for item in inlist:
689	zscores.append(z(inlist,item))
690	return zscores
691
692
693	####################################
694	####### TRIMMING FUNCTIONS #######
695	####################################
696
697	def ltrimboth (l,proportiontocut):
698	"""
699	Slices off the passed proportion of items from BOTH ends of the passed
700	list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
701	10% of scores. Assumes list is sorted by magnitude. Slices off LESS if
702	proportion results in a non-integer slice index (i.e., conservatively
703	slices off proportiontocut).
704
705	Usage: ltrimboth (l,proportiontocut)
706	Returns: trimmed version of list l
707	"""
708	lowercut = int(proportiontocut*len(l))
709	uppercut = len(l) - lowercut
710	return l[lowercut:uppercut]
711
712
713	def ltrim1 (l,proportiontocut,tail='right'):
714	"""
715	Slices off the passed proportion of items from ONE end of the passed
716	list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
717	10% of scores). Slices off LESS if proportion results in a non-integer
718	slice index (i.e., conservatively slices off proportiontocut).
719
720	Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left'
721	Returns: trimmed version of list l
722	"""
723	if tail == 'right':
724	lowercut = 0
725	uppercut = len(l) - int(proportiontocut*len(l))
726	elif tail == 'left':
727	lowercut = int(proportiontocut*len(l))
728	uppercut = len(l)
729	return l[lowercut:uppercut]
730
731
732	####################################
733	##### CORRELATION FUNCTIONS ######
734	####################################
735
736	def lpaired(x,y):
737	"""
738	Interactively determines the type of data and then runs the
739	appropriated statistic for paired group data.
740
741	Usage: lpaired(x,y)
742	Returns: appropriate statistic name, value, and probability
743	"""
744	samples = ''
745	while samples not in ['i','r','I','R','c','C']:
746	print '\nIndependent or related samples, or correlation (i,r,c): ',
747	samples = raw_input()
748
749	if samples in ['i','I','r','R']:
750	print '\nComparing variances ...',
751	# USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
752	r = obrientransform(x,y)
753	f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
754	if p<0.05:
755	vartype='unequal, p='+str(round(p,4))
756	else:
757	vartype='equal'
758	print vartype
759	if samples in ['i','I']:
760	if vartype[0]=='e':
761	t,p = ttest_ind(x,y,0)
762	print '\nIndependent samples t-test: ', round(t,4),round(p,4)
763	else:
764	if len(x)>20 or len(y)>20:
765	z,p = ranksums(x,y)
766	print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
767	else:
768	u,p = mannwhitneyu(x,y)
769	print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
770
771	else: # RELATED SAMPLES
772	if vartype[0]=='e':
773	t,p = ttest_rel(x,y,0)
774	print '\nRelated samples t-test: ', round(t,4),round(p,4)
775	else:
776	t,p = ranksums(x,y)
777	print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
778	else: # CORRELATION ANALYSIS
779	corrtype = ''
780	while corrtype not in ['c','C','r','R','d','D']:
781	print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
782	corrtype = raw_input()
783	if corrtype in ['c','C']:
784	m,b,r,p,see = linregress(x,y)
785	print '\nLinear regression for continuous variables ...'
786	lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
787	pstat.printcc(lol)
788	elif corrtype in ['r','R']:
789	r,p = spearmanr(x,y)
790	print '\nCorrelation for ranked variables ...'
791	print "Spearman's r: ",round(r,4),round(p,4)
792	else: # DICHOTOMOUS
793	r,p = pointbiserialr(x,y)
794	print '\nAssuming x contains a dichotomous variable ...'
795	print 'Point Biserial r: ',round(r,4),round(p,4)
796	print '\n\n'
797	return None
798
799
800	def lpearsonr(x,y):
801	"""
802	Calculates a Pearson correlation coefficient and the associated
803	probability value. Taken from Heiman's Basic Statistics for the Behav.
804	Sci (2nd), p.195.
805
806	Usage: lpearsonr(x,y) where x and y are equal-length lists
807	Returns: Pearson's r value, two-tailed p-value
808	"""
809	TINY = 1.0e-30
810	if len(x) <> len(y):
811	raise ValueError, 'Input values not paired in pearsonr. Aborting.'
812	n = len(x)
813	x = map(float,x)
814	y = map(float,y)
815	xmean = mean(x)
816	ymean = mean(y)
817	r_num = n(summult(x,y)) - sum(x)sum(y)
818	r_den = math.sqrt((nss(x) - square_of_sums(x))(n*ss(y)-square_of_sums(y)))
819	r = (r_num / r_den) # denominator already a float
820	df = n-2
821	t = rmath.sqrt(df/((1.0-r+TINY)(1.0+r+TINY)))
822	prob = betai(0.5df,0.5,df/float(df+tt))
823	return r, prob
824
825
826	def lspearmanr(x,y):
827	"""
828	Calculates a Spearman rank-order correlation coefficient. Taken
829	from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
830
831	Usage: lspearmanr(x,y) where x and y are equal-length lists
832	Returns: Spearman's r, two-tailed p-value
833	"""
834	TINY = 1e-30
835	if len(x) <> len(y):
836	raise ValueError, 'Input values not paired in spearmanr. Aborting.'
837	n = len(x)
838	rankx = rankdata(x)
839	ranky = rankdata(y)
840	dsq = sumdiffsquared(rankx,ranky)
841	rs = 1 - 6dsq / float(n(n**2-1))
842	t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
843	df = n-2
844	probrs = betai(0.5df,0.5,df/(df+tt)) # t already a float
845	# probability values for rs are from part 2 of the spearman function in
846	# Numerical Recipies, p.510. They are close to tables, but not exact. (?)
847	return rs, probrs
848
849
850	def lpointbiserialr(x,y):
851	"""
852	Calculates a point-biserial correlation coefficient and the associated
853	probability value. Taken from Heiman's Basic Statistics for the Behav.
854	Sci (1st), p.194.
855
856	Usage: lpointbiserialr(x,y) where x,y are equal-length lists
857	Returns: Point-biserial r, two-tailed p-value
858	"""
859	TINY = 1e-30
860	if len(x) <> len(y):
861	raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.'
862	data = pstat.abut(x,y)
863	categories = pstat.unique(x)
864	if len(categories) <> 2:
865	raise ValueError, "Exactly 2 categories required for pointbiserialr()."
866	else: # there are 2 categories, continue
867	codemap = pstat.abut(categories,range(2))
868	recoded = pstat.recode(data,codemap,0)
869	x = pstat.linexand(data,0,categories[0])
870	y = pstat.linexand(data,0,categories[1])
871	xmean = mean(pstat.colex(x,1))
872	ymean = mean(pstat.colex(y,1))
873	n = len(data)
874	adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
875	rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust
876	df = n-2
877	t = rpbmath.sqrt(df/((1.0-rpb+TINY)(1.0+rpb+TINY)))
878	prob = betai(0.5df,0.5,df/(df+tt)) # t already a float
879	return rpb, prob
880
881
882	def lkendalltau(x,y):
883	"""
884	Calculates Kendall's tau ... correlation of ordinal data. Adapted
885	from function kendl1 in Numerical Recipies. Needs good test-routine.@@@
886
887	Usage: lkendalltau(x,y)
888	Returns: Kendall's tau, two-tailed p-value
889	"""
890	n1 = 0
891	n2 = 0
892	iss = 0
893	for j in range(len(x)-1):
894	for k in range(j,len(y)):
895	a1 = x[j] - x[k]
896	a2 = y[j] - y[k]
897	aa = a1 * a2
898	if (aa): # neither list has a tie
899	n1 = n1 + 1
900	n2 = n2 + 1
901	if aa > 0:
902	iss = iss + 1
903	else:
904	iss = iss -1
905	else:
906	if (a1):
907	n1 = n1 + 1
908	else:
909	n2 = n2 + 1
910	tau = iss / math.sqrt(n1*n2)
911	svar = (4.0len(x)+10.0) / (9.0len(x)*(len(x)-1))
912	z = tau / math.sqrt(svar)
913	prob = erfcc(abs(z)/1.4142136)
914	return tau, prob
915
916
917	def llinregress(x,y):
918	"""
919	Calculates a regression line on x,y pairs.
920
921	Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates
922	Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
923	"""
924	TINY = 1.0e-20
925	if len(x) <> len(y):
926	raise ValueError, 'Input values not paired in linregress. Aborting.'
927	n = len(x)
928	x = map(float,x)
929	y = map(float,y)
930	xmean = mean(x)
931	ymean = mean(y)
932	r_num = float(n(summult(x,y)) - sum(x)sum(y))
933	r_den = math.sqrt((nss(x) - square_of_sums(x))(n*ss(y)-square_of_sums(y)))
934	r = r_num / r_den
935	z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
936	df = n-2
937	t = rmath.sqrt(df/((1.0-r+TINY)(1.0+r+TINY)))
938	prob = betai(0.5df,0.5,df/(df+tt))
939	slope = r_num / float(n*ss(x) - square_of_sums(x))
940	intercept = ymean - slope*xmean
941	sterrest = math.sqrt(1-rr)samplestdev(y)
942	return slope, intercept, r, prob, sterrest
943
944
945	####################################
946	##### INFERENTIAL STATISTICS #####
947	####################################
948
949	def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
950	"""
951	Calculates the t-obtained for the independent samples T-test on ONE group
952	of scores a, given a population mean. If printit=1, results are printed
953	to the screen. If printit='filename', the results are output to 'filename'
954	using the given writemode (default=append). Returns t-value, and prob.
955
956	Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
957	Returns: t-value, two-tailed prob
958	"""
959	x = mean(a)
960	v = var(a)
961	n = len(a)
962	df = n-1
963	svar = ((n-1)*v)/float(df)
964	t = (x-popmean)/math.sqrt(svar*(1.0/n))
965	prob = betai(0.5df,0.5,float(df)/(df+tt))
966
967	if printit <> 0:
968	statname = 'Single-sample T-test.'
969	outputpairedstats(printit,writemode,
970	'Population','--',popmean,0,0,0,
971	name,n,x,v,min(a),max(a),
972	statname,t,prob)
973	return t,prob
974
975
976	def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
977	"""
978	Calculates the t-obtained T-test on TWO INDEPENDENT samples of
979	scores a, and b. From Numerical Recipies, p.483. If printit=1, results
980	are printed to the screen. If printit='filename', the results are output
981	to 'filename' using the given writemode (default=append). Returns t-value,
982	and prob.
983
984	Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
985	Returns: t-value, two-tailed prob
986	"""
987	x1 = mean(a)
988	x2 = mean(b)
989	v1 = stdev(a)**2
990	v2 = stdev(b)**2
991	n1 = len(a)
992	n2 = len(b)
993	df = n1+n2-2
994	svar = ((n1-1)v1+(n2-1)v2)/float(df)
995	t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2))
996	prob = betai(0.5df,0.5,df/(df+tt))
997
998	if printit <> 0:
999	statname = 'Independent samples T-test.'
1000	outputpairedstats(printit,writemode,
1001	name1,n1,x1,v1,min(a),max(a),
1002	name2,n2,x2,v2,min(b),max(b),
1003	statname,t,prob)
1004	return t,prob
1005
1006
1007	def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'):
1008	"""
1009	Calculates the t-obtained T-test on TWO RELATED samples of scores,
1010	a and b. From Numerical Recipies, p.483. If printit=1, results are
1011	printed to the screen. If printit='filename', the results are output to
1012	'filename' using the given writemode (default=append). Returns t-value,
1013	and prob.
1014
1015	Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
1016	Returns: t-value, two-tailed prob
1017	"""
1018	if len(a)<>len(b):
1019	raise ValueError, 'Unequal length lists in ttest_rel.'
1020	x1 = mean(a)
1021	x2 = mean(b)
1022	v1 = var(a)
1023	v2 = var(b)
1024	n = len(a)
1025	cov = 0
1026	for i in range(len(a)):
1027	cov = cov + (a[i]-x1) * (b[i]-x2)
1028	df = n-1
1029	cov = cov / float(df)
1030	sd = math.sqrt((v1+v2 - 2.0*cov)/float(n))
1031	t = (x1-x2)/sd
1032	prob = betai(0.5df,0.5,df/(df+tt))
1033
1034	if printit <> 0:
1035	statname = 'Related samples T-test.'
1036	outputpairedstats(printit,writemode,
1037	name1,n,x1,v1,min(a),max(a),
1038	name2,n,x2,v2,min(b),max(b),
1039	statname,t,prob)
1040	return t, prob
1041
1042
1043	def lchisquare(f_obs,f_exp=None):
1044	"""
1045	Calculates a one-way chi square for list of observed frequencies and returns
1046	the result. If no expected frequencies are given, the total N is assumed to
1047	be equally distributed across all groups.
1048
1049	Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq.
1050	Returns: chisquare-statistic, associated p-value
1051	"""
1052	k = len(f_obs) # number of groups
1053	if f_exp == None:
1054	f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq.
1055	chisq = 0
1056	for i in range(len(f_obs)):
1057	chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i])
1058	return chisq, chisqprob(chisq, k-1)
1059
1060
1061	def lks_2samp (data1,data2):
1062	"""
1063	Computes the Kolmogorov-Smirnof statistic on 2 samples. From
1064	Numerical Recipies in C, page 493.
1065
1066	Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions
1067	Returns: KS D-value, associated p-value
1068	"""
1069	j1 = 0
1070	j2 = 0
1071	fn1 = 0.0
1072	fn2 = 0.0
1073	n1 = len(data1)
1074	n2 = len(data2)
1075	en1 = n1
1076	en2 = n2
1077	d = 0.0
1078	data1.sort()
1079	data2.sort()
1080	while j1 < n1 and j2 < n2:
1081	d1=data1[j1]
1082	d2=data2[j2]
1083	if d1 <= d2:
1084	fn1 = (j1)/float(en1)
1085	j1 = j1 + 1
1086	if d2 <= d1:
1087	fn2 = (j2)/float(en2)
1088	j2 = j2 + 1
1089	dt = (fn2-fn1)
1090	if math.fabs(dt) > math.fabs(d):
1091	d = dt
1092	try:
1093	en = math.sqrt(en1*en2/float(en1+en2))
1094	prob = ksprob((en+0.12+0.11/en)*abs(d))
1095	except:
1096	prob = 1.0
1097	return d, prob
1098
1099
1100	def lmannwhitneyu(x,y):
1101	"""
1102	Calculates a Mann-Whitney U statistic on the provided scores and
1103	returns the result. Use only when the n in each condition is < 20 and
1104	you have 2 independent samples of ranks. NOTE: Mann-Whitney U is
1105	significant if the u-obtained is LESS THAN or equal to the critical
1106	value of U found in the tables. Equivalent to Kruskal-Wallis H with
1107	just 2 groups.
1108
1109	Usage: lmannwhitneyu(data)
1110	Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
1111	"""
1112	n1 = len(x)
1113	n2 = len(y)
1114	ranked = rankdata(x+y)
1115	rankx = ranked[0:n1] # get the x-ranks
1116	ranky = ranked[n1:] # the rest are y-ranks
1117	u1 = n1n2 + (n1(n1+1))/2.0 - sum(rankx) # calc U for x
1118	u2 = n1*n2 - u1 # remainder is U for y
1119	bigu = max(u1,u2)
1120	smallu = min(u1,u2)
1121	T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores
1122	if T == 0:
1123	raise ValueError, 'All numbers are identical in lmannwhitneyu'
1124	sd = math.sqrt(Tn1n2*(n1+n2+1)/12.0)
1125	z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc
1126	return smallu, 1.0 - zprob(z)
1127
1128
1129	def ltiecorrect(rankvals):
1130	"""
1131	Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See
1132	Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
1133	New York: McGraw-Hill. Code adapted from \|Stat rankind.c code.
1134
1135	Usage: ltiecorrect(rankvals)
1136	Returns: T correction factor for U or H
1137	"""
1138	sorted,posn = shellsort(rankvals)
1139	n = len(sorted)
1140	T = 0.0
1141	i = 0
1142	while (i<n-1):
1143	if sorted[i] == sorted[i+1]:
1144	nties = 1
1145	while (i<n-1) and (sorted[i] == sorted[i+1]):
1146	nties = nties +1
1147	i = i +1
1148	T = T + nties**3 - nties
1149	i = i+1
1150	T = T / float(n**3-n)
1151	return 1.0 - T
1152
1153
1154	def lranksums(x,y):
1155	"""
1156	Calculates the rank sums statistic on the provided scores and
1157	returns the result. Use only when the n in each condition is > 20 and you
1158	have 2 independent samples of ranks.
1159
1160	Usage: lranksums(x,y)
1161	Returns: a z-statistic, two-tailed p-value
1162	"""
1163	n1 = len(x)
1164	n2 = len(y)
1165	alldata = x+y
1166	ranked = rankdata(alldata)
1167	x = ranked[:n1]
1168	y = ranked[n1:]
1169	s = sum(x)
1170	expected = n1*(n1+n2+1) / 2.0
1171	z = (s - expected) / math.sqrt(n1n2(n1+n2+1)/12.0)
1172	prob = 2*(1.0 -zprob(abs(z)))
1173	return z, prob
1174
1175
1176	def lwilcoxont(x,y):
1177	"""
1178	Calculates the Wilcoxon T-test for related samples and returns the
1179	result. A non-parametric T-test.
1180
1181	Usage: lwilcoxont(x,y)
1182	Returns: a t-statistic, two-tail probability estimate
1183	"""
1184	if len(x) <> len(y):
1185	raise ValueError, 'Unequal N in wilcoxont. Aborting.'
1186	d=[]
1187	for i in range(len(x)):
1188	diff = x[i] - y[i]
1189	if diff <> 0:
1190	d.append(diff)
1191	count = len(d)
1192	absd = map(abs,d)
1193	absranked = rankdata(absd)
1194	r_plus = 0.0
1195	r_minus = 0.0
1196	for i in range(len(absd)):
1197	if d[i] < 0:
1198	r_minus = r_minus + absranked[i]
1199	else:
1200	r_plus = r_plus + absranked[i]
1201	wt = min(r_plus, r_minus)
1202	mn = count * (count+1) * 0.25
1203	se = math.sqrt(count(count+1)(2.0*count+1.0)/24.0)
1204	z = math.fabs(wt-mn) / se
1205	prob = 2*(1.0 -zprob(abs(z)))
1206	return wt, prob
1207
1208
1209	def lkruskalwallish(*args):
1210	"""
1211	The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
1212	groups, requiring at least 5 subjects in each group. This function
1213	calculates the Kruskal-Wallis H-test for 3 or more independent samples
1214	and returns the result.
1215
1216	Usage: lkruskalwallish(*args)
1217	Returns: H-statistic (corrected for ties), associated p-value
1218	"""
1219	args = list(args)
1220	n = [0]*len(args)
1221	all = []
1222	n = map(len,args)
1223	for i in range(len(args)):
1224	all = all + args[i]
1225	ranked = rankdata(all)
1226	T = tiecorrect(ranked)
1227	for i in range(len(args)):
1228	args[i] = ranked[0:n[i]]
1229	del ranked[0:n[i]]
1230	rsums = []
1231	for i in range(len(args)):
1232	rsums.append(sum(args[i])**2)
1233	rsums[i] = rsums[i] / float(n[i])
1234	ssbn = sum(rsums)
1235	totaln = sum(n)
1236	h = 12.0 / (totaln(totaln+1)) ssbn - 3*(totaln+1)
1237	df = len(args) - 1
1238	if T == 0:
1239	raise ValueError, 'All numbers are identical in lkruskalwallish'
1240	h = h / float(T)
1241	return h, chisqprob(h,df)
1242
1243
1244	def lfriedmanchisquare(*args):
1245	"""
1246	Friedman Chi-Square is a non-parametric, one-way within-subjects
1247	ANOVA. This function calculates the Friedman Chi-square test for repeated
1248	measures and returns the result, along with the associated probability
1249	value. It assumes 3 or more repeated measures. Only 3 levels requires a
1250	minimum of 10 subjects in the study. Four levels requires 5 subjects per
1251	level(??).
1252
1253	Usage: lfriedmanchisquare(*args)
1254	Returns: chi-square statistic, associated p-value
1255	"""
1256	k = len(args)
1257	if k < 3:
1258	raise ValueError, 'Less than 3 levels. Friedman test not appropriate.'
1259	n = len(args[0])
1260	data = apply(pstat.abut,tuple(args))
1261	for i in range(len(data)):
1262	data[i] = rankdata(data[i])
1263	ssbn = 0
1264	for i in range(k):
1265	ssbn = ssbn + sum(args[i])**2
1266	chisq = 12.0 / (kn(k+1)) * ssbn - 3n(k+1)
1267	return chisq, chisqprob(chisq,k-1)
1268
1269
1270	####################################
1271	#### PROBABILITY CALCULATIONS ####
1272	####################################
1273
1274	def lchisqprob(chisq,df):
1275	"""
1276	Returns the (1-tailed) probability value associated with the provided
1277	chi-square value and df. Adapted from chisq.c in Gary Perlman's \|Stat.
1278
1279	Usage: lchisqprob(chisq,df)
1280	"""
1281	BIG = 20.0
1282	def ex(x):
1283	BIG = 20.0
1284	if x < -BIG:
1285	return 0.0
1286	else:
1287	return math.exp(x)
1288
1289	if chisq <=0 or df < 1:
1290	return 1.0
1291	a = 0.5 * chisq
1292	if df%2 == 0:
1293	even = 1
1294	else:
1295	even = 0
1296	if df > 1:
1297	y = ex(-a)
1298	if even:
1299	s = y
1300	else:
1301	s = 2.0 * zprob(-math.sqrt(chisq))
1302	if (df > 2):
1303	chisq = 0.5 * (df - 1.0)
1304	if even:
1305	z = 1.0
1306	else:
1307	z = 0.5
1308	if a > BIG:
1309	if even:
1310	e = 0.0
1311	else:
1312	e = math.log(math.sqrt(math.pi))
1313	c = math.log(a)
1314	while (z <= chisq):
1315	e = math.log(z) + e
1316	s = s + ex(c*z-a-e)
1317	z = z + 1.0
1318	return s
1319	else:
1320	if even:
1321	e = 1.0
1322	else:
1323	e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
1324	c = 0.0
1325	while (z <= chisq):
1326	e = e * (a/float(z))
1327	c = c + e
1328	z = z + 1.0
1329	return (c*y+s)
1330	else:
1331	return s
1332
1333
1334	def lerfcc(x):
1335	"""
1336	Returns the complementary error function erfc(x) with fractional
1337	error everywhere less than 1.2e-7. Adapted from Numerical Recipies.
1338
1339	Usage: lerfcc(x)
1340	"""
1341	z = abs(x)
1342	t = 1.0 / (1.0+0.5*z)
1343	ans = t * math.exp(-zz-1.26551223 + t(1.00002368+t(0.37409196+t(0.09678418+t(-0.18628806+t(0.27886807+t(-1.13520398+t(1.48851587+t(-0.82215223+t0.17087277)))))))))
1344	if x >= 0:
1345	return ans
1346	else:
1347	return 2.0 - ans
1348
1349
1350	def lzprob(z):
1351	"""
1352	Returns the area under the normal curve 'to the left of' the given z value.
1353	Thus,
1354	for z<0, zprob(z) = 1-tail probability
1355	for z>0, 1.0-zprob(z) = 1-tail probability
1356	for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
1357	Adapted from z.c in Gary Perlman's \|Stat.
1358
1359	Usage: lzprob(z)
1360	"""
1361	Z_MAX = 6.0 # maximum meaningful z-value
1362	if z == 0.0:
1363	x = 0.0
1364	else:
1365	y = 0.5 * math.fabs(z)
1366	if y >= (Z_MAX*0.5):
1367	x = 1.0
1368	elif (y < 1.0):
1369	w = y*y
1370	x = ((((((((0.000124818987 * w
1371	-0.001075204047) * w +0.005198775019) * w
1372	-0.019198292004) * w +0.059054035642) * w
1373	-0.151968751364) * w +0.319152932694) * w
1374	-0.531923007300) * w +0.797884560593) * y * 2.0
1375	else:
1376	y = y - 2.0
1377	x = (((((((((((((-0.000045255659 * y
1378	+0.000152529290) * y -0.000019538132) * y
1379	-0.000676904986) * y +0.001390604284) * y
1380	-0.000794620820) * y -0.002034254874) * y
1381	+0.006549791214) * y -0.010557625006) * y
1382	+0.011630447319) * y -0.009279453341) * y
1383	+0.005353579108) * y -0.002141268741) * y
1384	+0.000535310849) * y +0.999936657524
1385	if z > 0.0:
1386	prob = ((x+1.0)*0.5)
1387	else:
1388	prob = ((1.0-x)*0.5)
1389	return prob
1390
1391
1392	def lksprob(alam):
1393	"""
1394	Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from
1395	Numerical Recipies.
1396
1397	Usage: lksprob(alam)
1398	"""
1399	fac = 2.0
1400	sum = 0.0
1401	termbf = 0.0
1402	a2 = -2.0alamalam
1403	for j in range(1,201):
1404	term = facmath.exp(a2j*j)
1405	sum = sum + term
1406	if math.fabs(term) <= (0.001termbf) or math.fabs(term) < (1.0e-8sum):
1407	return sum
1408	fac = -fac
1409	termbf = math.fabs(term)
1410	return 1.0 # Get here only if fails to converge; was 0.0!!
1411
1412
1413	def lfprob (dfnum, dfden, F):
1414	"""
1415	Returns the (1-tailed) significance level (p-value) of an F
1416	statistic given the degrees of freedom for the numerator (dfR-dfF) and
1417	the degrees of freedom for the denominator (dfF).
1418
1419	Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
1420	"""
1421	p = betai(0.5dfden, 0.5dfnum, dfden/float(dfden+dfnum*F))
1422	return p
1423
1424
1425	def lbetacf(a,b,x):
1426	"""
1427	This function evaluates the continued fraction form of the incomplete
1428	Beta function, betai. (Adapted from: Numerical Recipies in C.)
1429
1430	Usage: lbetacf(a,b,x)
1431	"""
1432	ITMAX = 200
1433	EPS = 3.0e-7
1434
1435	bm = az = am = 1.0
1436	qab = a+b
1437	qap = a+1.0
1438	qam = a-1.0
1439	bz = 1.0-qab*x/qap
1440	for i in range(ITMAX+1):
1441	em = float(i+1)
1442	tem = em + em
1443	d = em(b-em)x/((qam+tem)*(a+tem))
1444	ap = az + d*am
1445	bp = bz+d*bm
1446	d = -(a+em)(qab+em)x/((qap+tem)*(a+tem))
1447	app = ap+d*az
1448	bpp = bp+d*bz
1449	aold = az
1450	am = ap/bpp
1451	bm = bp/bpp
1452	az = app/bpp
1453	bz = 1.0
1454	if (abs(az-aold)<(EPS*abs(az))):
1455	return az
1456	print 'a or b too big, or ITMAX too small in Betacf.'
1457
1458
1459	def lgammln(xx):
1460	"""
1461	Returns the gamma function of xx.
1462	Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
1463	(Adapted from: Numerical Recipies in C.)
1464
1465	Usage: lgammln(xx)
1466	"""
1467
1468	coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
1469	0.120858003e-2, -0.536382e-5]
1470	x = xx - 1.0
1471	tmp = x + 5.5
1472	tmp = tmp - (x+0.5)*math.log(tmp)
1473	ser = 1.0
1474	for j in range(len(coeff)):
1475	x = x + 1
1476	ser = ser + coeff[j]/x
1477	return -tmp + math.log(2.50662827465*ser)
1478
1479
1480	def lbetai(a,b,x):
1481	"""
1482	Returns the incomplete beta function:
1483
1484	I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
1485
1486	where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
1487	function of a. The continued fraction formulation is implemented here,
1488	using the betacf function. (Adapted from: Numerical Recipies in C.)
1489
1490	Usage: lbetai(a,b,x)
1491	"""
1492	if (x<0.0 or x>1.0):
1493	raise ValueError, 'Bad x in lbetai'
1494	if (x==0.0 or x==1.0):
1495	bt = 0.0
1496	else:
1497	bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+amath.log(x)+b
1498	math.log(1.0-x))
1499	if (x<(a+1.0)/(a+b+2.0)):
1500	return bt*betacf(a,b,x)/float(a)
1501	else:
1502	return 1.0-bt*betacf(b,a,1.0-x)/float(b)
1503
1504
1505	####################################
1506	####### ANOVA CALCULATIONS #######
1507	####################################
1508
1509	def lF_oneway(*lists):
1510	"""
1511	Performs a 1-way ANOVA, returning an F-value and probability given
1512	any number of groups. From Heiman, pp.394-7.
1513
1514	Usage: F_oneway(lists) where lists is any number of lists, one per
1515	treatment group
1516	Returns: F value, one-tailed p-value
1517	"""
1518	a = len(lists) # ANOVA on 'a' groups, each in it's own list
1519	means = [0]*a
1520	vars = [0]*a
1521	ns = [0]*a
1522	alldata = []
1523	tmp = map(N.array,lists)
1524	means = map(amean,tmp)
1525	vars = map(avar,tmp)
1526	ns = map(len,lists)
1527	for i in range(len(lists)):
1528	alldata = alldata + lists[i]
1529	alldata = N.array(alldata)
1530	bign = len(alldata)
1531	sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
1532	ssbn = 0
1533	for list in lists:
1534	ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list))
1535	ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
1536	sswn = sstot-ssbn
1537	dfbn = a-1
1538	dfwn = bign - a
1539	msb = ssbn/float(dfbn)
1540	msw = sswn/float(dfwn)
1541	f = msb/msw
1542	prob = fprob(dfbn,dfwn,f)
1543	return f, prob
1544
1545
1546	def lF_value (ER,EF,dfnum,dfden):
1547	"""
1548	Returns an F-statistic given the following:
1549	ER = error associated with the null hypothesis (the Restricted model)
1550	EF = error associated with the alternate hypothesis (the Full model)
1551	dfR-dfF = degrees of freedom of the numerator
1552	dfF = degrees of freedom associated with the denominator/Full model
1553
1554	Usage: lF_value(ER,EF,dfnum,dfden)
1555	"""
1556	return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1557
1558
1559	####################################
1560	######## SUPPORT FUNCTIONS #######
1561	####################################
1562
1563	def writecc (listoflists,file,writetype='w',extra=2):
1564	"""
1565	Writes a list of lists to a file in columns, customized by the max
1566	size of items within the columns (max size of items in col, +2 characters)
1567	to specified file. File-overwrite is the default.
1568
1569	Usage: writecc (listoflists,file,writetype='w',extra=2)
1570	Returns: None
1571	"""
1572	if type(listoflists[0]) not in [ListType,TupleType]:
1573	listoflists = [listoflists]
1574	outfile = open(file,writetype)
1575	rowstokill = []
1576	list2print = copy.deepcopy(listoflists)
1577	for i in range(len(listoflists)):
1578	if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes':
1579	rowstokill = rowstokill + [i]
1580	rowstokill.reverse()
1581	for row in rowstokill:
1582	del list2print[row]
1583	maxsize = [0]*len(list2print[0])
1584	for col in range(len(list2print[0])):
1585	items = pstat.colex(list2print,col)
1586	items = map(pstat.makestr,items)
1587	maxsize[col] = max(map(len,items)) + extra
1588	for row in listoflists:
1589	if row == ['\n'] or row == '\n':
1590	outfile.write('\n')
1591	elif row == ['dashes'] or row == 'dashes':
1592	dashes = [0]*len(maxsize)
1593	for j in range(len(maxsize)):
1594	dashes[j] = '-'*(maxsize[j]-2)
1595	outfile.write(pstat.lineincustcols(dashes,maxsize))
1596	else:
1597	outfile.write(pstat.lineincustcols(row,maxsize))
1598	outfile.write('\n')
1599	outfile.close()
1600	return None
1601
1602
1603	def lincr(l,cap): # to increment a list up to a max-list of 'cap'
1604	"""
1605	Simulate a counting system from an n-dimensional list.
1606
1607	Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n
1608	Returns: next set of values for list l, OR -1 (if overflow)
1609	"""
1610	l[0] = l[0] + 1 # e.g., [0,0,0] --> [2,4,3] (=cap)
1611	for i in range(len(l)):
1612	if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done
1613	l[i] = 0
1614	l[i+1] = l[i+1] + 1
1615	elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished
1616	l = -1
1617	return l
1618
1619
1620	def lsum (inlist):
1621	"""
1622	Returns the sum of the items in the passed list.
1623
1624	Usage: lsum(inlist)
1625	"""
1626	s = 0
1627	for item in inlist:
1628	s = s + item
1629	return s
1630
1631
1632	def lcumsum (inlist):
1633	"""
1634	Returns a list consisting of the cumulative sum of the items in the
1635	passed list.
1636
1637	Usage: lcumsum(inlist)
1638	"""
1639	newlist = copy.deepcopy(inlist)
1640	for i in range(1,len(newlist)):
1641	newlist[i] = newlist[i] + newlist[i-1]
1642	return newlist
1643
1644
1645	def lss(inlist):
1646	"""
1647	Squares each value in the passed list, adds up these squares and
1648	returns the result.
1649
1650	Usage: lss(inlist)
1651	"""
1652	ss = 0
1653	for item in inlist:
1654	ss = ss + item*item
1655	return ss
1656
1657
1658	def lsummult (list1,list2):
1659	"""
1660	Multiplies elements in list1 and list2, element by element, and
1661	returns the sum of all resulting multiplications. Must provide equal
1662	length lists.
1663
1664	Usage: lsummult(list1,list2)
1665	"""
1666	if len(list1) <> len(list2):
1667	raise ValueError, "Lists not equal length in summult."
1668	s = 0
1669	for item1,item2 in pstat.abut(list1,list2):
1670	s = s + item1*item2
1671	return s
1672
1673
1674	def lsumdiffsquared(x,y):
1675	"""
1676	Takes pairwise differences of the values in lists x and y, squares
1677	these differences, and returns the sum of these squares.
1678
1679	Usage: lsumdiffsquared(x,y)
1680	Returns: sum[(x[i]-y[i])**2]
1681	"""
1682	sds = 0
1683	for i in range(len(x)):
1684	sds = sds + (x[i]-y[i])**2
1685	return sds
1686
1687
1688	def lsquare_of_sums(inlist):
1689	"""
1690	Adds the values in the passed list, squares the sum, and returns
1691	the result.
1692
1693	Usage: lsquare_of_sums(inlist)
1694	Returns: sum(inlist[i])**2
1695	"""
1696	s = sum(inlist)
1697	return float(s)*s
1698
1699
1700	def lshellsort(inlist):
1701	"""
1702	Shellsort algorithm. Sorts a 1D-list.
1703
1704	Usage: lshellsort(inlist)
1705	Returns: sorted-inlist, sorting-index-vector (for original list)
1706	"""
1707	n = len(inlist)
1708	svec = copy.deepcopy(inlist)
1709	ivec = range(n)
1710	gap = n/2 # integer division needed
1711	while gap >0:
1712	for i in range(gap,n):
1713	for j in range(i-gap,-1,-gap):
1714	while j>=0 and svec[j]>svec[j+gap]:
1715	temp = svec[j]
1716	svec[j] = svec[j+gap]
1717	svec[j+gap] = temp
1718	itemp = ivec[j]
1719	ivec[j] = ivec[j+gap]
1720	ivec[j+gap] = itemp
1721	gap = gap / 2 # integer division needed
1722	# svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]]
1723	return svec, ivec
1724
1725
1726	def lrankdata(inlist):
1727	"""
1728	Ranks the data in inlist, dealing with ties appropritely. Assumes
1729	a 1D inlist. Adapted from Gary Perlman's \|Stat ranksort.
1730
1731	Usage: lrankdata(inlist)
1732	Returns: a list of length equal to inlist, containing rank scores
1733	"""
1734	n = len(inlist)
1735	svec, ivec = shellsort(inlist)
1736	sumranks = 0
1737	dupcount = 0
1738	newlist = [0]*n
1739	for i in range(n):
1740	sumranks = sumranks + i
1741	dupcount = dupcount + 1
1742	if i==n-1 or svec[i] <> svec[i+1]:
1743	averank = sumranks / float(dupcount) + 1
1744	for j in range(i-dupcount+1,i+1):
1745	newlist[ivec[j]] = averank
1746	sumranks = 0
1747	dupcount = 0
1748	return newlist
1749
1750
1751	def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob):
1752	"""
1753	Prints or write to a file stats for two groups, using the name, n,
1754	mean, sterr, min and max for each group, as well as the statistic name,
1755	its value, and the associated p-value.
1756
1757	Usage: outputpairedstats(fname,writemode,
1758	name1,n1,mean1,stderr1,min1,max1,
1759	name2,n2,mean2,stderr2,min2,max2,
1760	statname,stat,prob)
1761	Returns: None
1762	"""
1763	suffix = '' # for *s after the p-value
1764	try:
1765	x = prob.shape
1766	prob = prob[0]
1767	except:
1768	pass
1769	if prob < 0.001: suffix = ' ***'
1770	elif prob < 0.01: suffix = ' **'
1771	elif prob < 0.05: suffix = ' *'
1772	title = [['Name','N','Mean','SD','Min','Max']]
1773	lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1],
1774	[name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]]
1775	if type(fname)<>StringType or len(fname)==0:
1776	print
1777	print statname
1778	print
1779	pstat.printcc(lofl)
1780	print
1781	try:
1782	if stat.shape == ():
1783	stat = stat[0]
1784	if prob.shape == ():
1785	prob = prob[0]
1786	except:
1787	pass
1788	print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix
1789	print
1790	else:
1791	file = open(fname,writemode)
1792	file.write('\n'+statname+'\n\n')
1793	file.close()
1794	writecc(lofl,fname,'a')
1795	file = open(fname,'a')
1796	try:
1797	if stat.shape == ():
1798	stat = stat[0]
1799	if prob.shape == ():
1800	prob = prob[0]
1801	except:
1802	pass
1803	file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n']))
1804	file.close()
1805	return None
1806
1807
1808	def lfindwithin (data):
1809	"""
1810	Returns an integer representing a binary vector, where 1=within-
1811	subject factor, 0=between. Input equals the entire data 2D list (i.e.,
1812	column 0=random factor, column -1=measured values (those two are skipped).
1813	Note: input data is in \|Stat format ... a list of lists ("2D list") with
1814	one row per measured value, first column=subject identifier, last column=
1815	score, one in-between column per factor (these columns contain level
1816	designations on each factor). See also stats.anova.__doc__.
1817
1818	Usage: lfindwithin(data) data in \|Stat format
1819	"""
1820
1821	numfact = len(data[0])-1
1822	withinvec = 0
1823	for col in range(1,numfact):
1824	examplelevel = pstat.unique(pstat.colex(data,col))[0]
1825	rows = pstat.linexand(data,col,examplelevel) # get 1 level of this factor
1826	factsubjs = pstat.unique(pstat.colex(rows,0))
1827	allsubjs = pstat.unique(pstat.colex(data,0))
1828	if len(factsubjs) == len(allsubjs): # fewer Ss than scores on this factor?
1829	withinvec = withinvec + (1 << col)
1830	return withinvec
1831
1832
1833	#########################################################
1834	#########################################################
1835	####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS #########
1836	#########################################################
1837	#########################################################
1838
1839	## CENTRAL TENDENCY:
1840	geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), )
1841	harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), )
1842	mean = Dispatch ( (lmean, (ListType, TupleType)), )
1843	median = Dispatch ( (lmedian, (ListType, TupleType)), )
1844	medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), )
1845	mode = Dispatch ( (lmode, (ListType, TupleType)), )
1846
1847	## MOMENTS:
1848	moment = Dispatch ( (lmoment, (ListType, TupleType)), )
1849	variation = Dispatch ( (lvariation, (ListType, TupleType)), )
1850	skew = Dispatch ( (lskew, (ListType, TupleType)), )
1851	kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), )
1852	describe = Dispatch ( (ldescribe, (ListType, TupleType)), )
1853
1854	## FREQUENCY STATISTICS:
1855	itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), )
1856	scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), )
1857	percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), )
1858	histogram = Dispatch ( (lhistogram, (ListType, TupleType)), )
1859	cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), )
1860	relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), )
1861
1862	## VARIABILITY:
1863	obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), )
1864	samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), )
1865	samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), )
1866	var = Dispatch ( (lvar, (ListType, TupleType)), )
1867	stdev = Dispatch ( (lstdev, (ListType, TupleType)), )
1868	sterr = Dispatch ( (lsterr, (ListType, TupleType)), )
1869	sem = Dispatch ( (lsem, (ListType, TupleType)), )
1870	z = Dispatch ( (lz, (ListType, TupleType)), )
1871	zs = Dispatch ( (lzs, (ListType, TupleType)), )
1872
1873	## TRIMMING FCNS:
1874	trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), )
1875	trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), )
1876
1877	## CORRELATION FCNS:
1878	paired = Dispatch ( (lpaired, (ListType, TupleType)), )
1879	pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), )
1880	spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), )
1881	pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), )
1882	kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), )
1883	linregress = Dispatch ( (llinregress, (ListType, TupleType)), )
1884
1885	## INFERENTIAL STATS:
1886	ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), )
1887	ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), )
1888	ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), )
1889	chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), )
1890	ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), )
1891	mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), )
1892	ranksums = Dispatch ( (lranksums, (ListType, TupleType)), )
1893	tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), )
1894	wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), )
1895	kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), )
1896	friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), )
1897
1898	## PROBABILITY CALCS:
1899	chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), )
1900	zprob = Dispatch ( (lzprob, (IntType, FloatType)), )
1901	ksprob = Dispatch ( (lksprob, (IntType, FloatType)), )
1902	fprob = Dispatch ( (lfprob, (IntType, FloatType)), )
1903	betacf = Dispatch ( (lbetacf, (IntType, FloatType)), )
1904	betai = Dispatch ( (lbetai, (IntType, FloatType)), )
1905	erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), )
1906	gammln = Dispatch ( (lgammln, (IntType, FloatType)), )
1907
1908	## ANOVA FUNCTIONS:
1909	F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), )
1910	F_value = Dispatch ( (lF_value, (ListType, TupleType)), )
1911
1912	## SUPPORT FUNCTIONS:
1913	incr = Dispatch ( (lincr, (ListType, TupleType)), )
1914	sum = Dispatch ( (lsum, (ListType, TupleType)), )
1915	cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), )
1916	ss = Dispatch ( (lss, (ListType, TupleType)), )
1917	summult = Dispatch ( (lsummult, (ListType, TupleType)), )
1918	square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), )
1919	sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), )
1920	shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), )
1921	rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), )
1922	findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), )
1923
1924
1925	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1926	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1927	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1928	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1929	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1930	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1931	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1932	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1933	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1934	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1935	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1936	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1937	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1938	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1939	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1940	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1941	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1942	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1943	#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1944
1945	try: # DEFINE THESE ONLY IF NUMERIC IS AVAILABLE
1946	import Numeric
1947	N = Numeric
1948	import LinearAlgebra
1949	LA = LinearAlgebra
1950
1951
1952	#####################################
1953	######## ACENTRAL TENDENCY ########
1954	#####################################
1955
1956	def ageometricmean (inarray,dimension=None,keepdims=0):
1957	"""
1958	Calculates the geometric mean of the values in the passed array.
1959	That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in
1960	the passed array. Use dimension=None to flatten array first. REMEMBER: if
1961	dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
1962	if dimension is a sequence, it collapses over all specified dimensions. If
1963	keepdims is set to 1, the resulting array will have as many dimensions as
1964	inarray, with only 1 'level' per dim that was collapsed over.
1965
1966	Usage: ageometricmean(inarray,dimension=None,keepdims=0)
1967	Returns: geometric mean computed over dim(s) listed in dimension
1968	"""
1969	inarray = N.array(inarray,N.Float)
1970	if dimension == None:
1971	inarray = N.ravel(inarray)
1972	size = len(inarray)
1973	mult = N.power(inarray,1.0/size)
1974	mult = N.multiply.reduce(mult)
1975	elif type(dimension) in [IntType,FloatType]:
1976	size = inarray.shape[dimension]
1977	mult = N.power(inarray,1.0/size)
1978	mult = N.multiply.reduce(mult,dimension)
1979	if keepdims == 1:
1980	shp = list(inarray.shape)
1981	shp[dimension] = 1
1982	sum = N.reshape(sum,shp)
1983	else: # must be a SEQUENCE of dims to average over
1984	dims = list(dimension)
1985	dims.sort()
1986	dims.reverse()
1987	size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float)
1988	mult = N.power(inarray,1.0/size)
1989	for dim in dims:
1990	mult = N.multiply.reduce(mult,dim)
1991	if keepdims == 1:
1992	shp = list(inarray.shape)
1993	for dim in dims:
1994	shp[dim] = 1
1995	mult = N.reshape(mult,shp)
1996	return mult
1997
1998
1999	def aharmonicmean (inarray,dimension=None,keepdims=0):
2000	"""
2001	Calculates the harmonic mean of the values in the passed array.
2002	That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in
2003	the passed array. Use dimension=None to flatten array first. REMEMBER: if
2004	dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2005	if dimension is a sequence, it collapses over all specified dimensions. If
2006	keepdims is set to 1, the resulting array will have as many dimensions as
2007	inarray, with only 1 'level' per dim that was collapsed over.
2008
2009	Usage: aharmonicmean(inarray,dimension=None,keepdims=0)
2010	Returns: harmonic mean computed over dim(s) in dimension
2011	"""
2012	inarray = inarray.astype(N.Float)
2013	if dimension == None:
2014	inarray = N.ravel(inarray)
2015	size = len(inarray)
2016	s = N.add.reduce(1.0 / inarray)
2017	elif type(dimension) in [IntType,FloatType]:
2018	size = float(inarray.shape[dimension])
2019	s = N.add.reduce(1.0/inarray, dimension)
2020	if keepdims == 1:
2021	shp = list(inarray.shape)
2022	shp[dimension] = 1
2023	s = N.reshape(s,shp)
2024	else: # must be a SEQUENCE of dims to average over
2025	dims = list(dimension)
2026	dims.sort()
2027	nondims = []
2028	for i in range(len(inarray.shape)):
2029	if i not in dims:
2030	nondims.append(i)
2031	tinarray = N.transpose(inarray,nondims+dims) # put keep-dims first
2032	idx = [0] *len(nondims)
2033	if idx == []:
2034	size = len(N.ravel(inarray))
2035	s = asum(1.0 / inarray)
2036	if keepdims == 1:
2037	s = N.reshape([s],N.ones(len(inarray.shape)))
2038	else:
2039	idx[0] = -1
2040	loopcap = N.array(tinarray.shape[0:len(nondims)]) -1
2041	s = N.zeros(loopcap+1,N.Float)
2042	while incr(idx,loopcap) <> -1:
2043	s[idx] = asum(1.0/tinarray[idx])
2044	size = N.multiply.reduce(N.take(inarray.shape,dims))
2045	if keepdims == 1:
2046	shp = list(inarray.shape)
2047	for dim in dims:
2048	shp[dim] = 1
2049	s = N.reshape(s,shp)
2050	return size / s
2051
2052
2053	def amean (inarray,dimension=None,keepdims=0):
2054	"""
2055	Calculates the arithmatic mean of the values in the passed array.
2056	That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the
2057	passed array. Use dimension=None to flatten array first. REMEMBER: if
2058	dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2059	if dimension is a sequence, it collapses over all specified dimensions. If
2060	keepdims is set to 1, the resulting array will have as many dimensions as
2061	inarray, with only 1 'level' per dim that was collapsed over.
2062
2063	Usage: amean(inarray,dimension=None,keepdims=0)
2064	Returns: arithematic mean calculated over dim(s) in dimension
2065	"""
2066	if inarray.typecode() in ['l','s','b']:
2067	inarray = inarray.astype(N.Float)
2068	if dimension == None:
2069	inarray = N.ravel(inarray)
2070	sum = N.add.reduce(inarray)
2071	denom = float(len(inarray))
2072	elif type(dimension) in [IntType,FloatType]:
2073	sum = asum(inarray,dimension)
2074	denom = float(inarray.shape[dimension])
2075	if keepdims == 1:
2076	shp = list(inarray.shape)
2077	shp[dimension] = 1
2078	sum = N.reshape(sum,shp)
2079	else: # must be a TUPLE of dims to average over
2080	dims = list(dimension)
2081	dims.sort()
2082	dims.reverse()
2083	sum = inarray *1.0
2084	for dim in dims:
2085	sum = N.add.reduce(sum,dim)
2086	denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float)
2087	if keepdims == 1:
2088	shp = list(inarray.shape)
2089	for dim in dims:
2090	shp[dim] = 1
2091	sum = N.reshape(sum,shp)
2092	return sum/denom
2093
2094
2095	def amedian (inarray,numbins=1000):
2096	"""
2097	Calculates the COMPUTED median value of an array of numbers, given the
2098	number of bins to use for the histogram (more bins approaches finding the
2099	precise median value of the array; default number of bins = 1000). From
2100	G.W. Heiman's Basic Stats, or CRC Probability & Statistics.
2101	NOTE: THIS ROUTINE ALWAYS uses the entire passed array (flattens it first).
2102
2103	Usage: amedian(inarray,numbins=1000)
2104	Returns: median calculated over ALL values in inarray
2105	"""
2106	inarray = N.ravel(inarray)
2107	(hist, smallest, binsize, extras) = ahistogram(inarray,numbins)
2108	cumhist = N.cumsum(hist) # make cumulative histogram
2109	otherbins = N.greater_equal(cumhist,len(inarray)/2.0)
2110	otherbins = list(otherbins) # list of 0/1s, 1s start at median bin
2111	cfbin = otherbins.index(1) # get 1st(!) index holding 50%ile score
2112	LRL = smallest + binsize*cfbin # get lower read limit of that bin
2113	cfbelow = N.add.reduce(hist[0:cfbin]) # cum. freq. below bin
2114	freq = hist[cfbin] # frequency IN the 50%ile bin
2115	median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN
2116	return median
2117
2118
2119	def amedianscore (inarray,dimension=None):
2120	"""
2121	Returns the 'middle' score of the passed array. If there is an even
2122	number of scores, the mean of the 2 middle scores is returned. Can function
2123	with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can
2124	be None, to pre-flatten the array, or else dimension must equal 0).
2125
2126	Usage: amedianscore(inarray,dimension=None)
2127	Returns: 'middle' score of the array, or the mean of the 2 middle scores
2128	"""
2129	if dimension == None:
2130	inarray = N.ravel(inarray)
2131	dimension = 0
2132	inarray = N.sort(inarray,dimension)
2133	if inarray.shape[dimension] % 2 == 0: # if even number of elements
2134	indx = inarray.shape[dimension]/2 # integer division correct
2135	median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0
2136	else:
2137	indx = inarray.shape[dimension] / 2 # integer division correct
2138	median = N.take(inarray,[indx],dimension)
2139	if median.shape == (1,):
2140	median = median[0]
2141	return median
2142
2143
2144	def amode(a, dimension=None):
2145	"""
2146	Returns an array of the modal (most common) score in the passed array.
2147	If there is more than one such score, ONLY THE FIRST is returned.
2148	The bin-count for the modal values is also returned. Operates on whole
2149	array (dimension=None), or on a given dimension.
2150
2151	Usage: amode(a, dimension=None)
2152	Returns: array of bin-counts for mode(s), array of corresponding modal values
2153	"""
2154
2155	if dimension == None:
2156	a = N.ravel(a)
2157	dimension = 0
2158	scores = pstat.aunique(N.ravel(a)) # get ALL unique values
2159	testshape = list(a.shape)
2160	testshape[dimension] = 1
2161	oldmostfreq = N.zeros(testshape)
2162	oldcounts = N.zeros(testshape)
2163	for score in scores:
2164	template = N.equal(a,score)
2165	counts = asum(template,dimension,1)
2166	mostfrequent = N.where(N.greater(counts,oldcounts),score,oldmostfreq)
2167	oldcounts = N.where(N.greater(counts,oldcounts),counts,oldcounts)
2168	oldmostfreq = mostfrequent
2169	return oldcounts, mostfrequent
2170
2171
2172	def atmean(a,limits=None,inclusive=(1,1)):
2173	"""
2174	Returns the arithmetic mean of all values in an array, ignoring values
2175	strictly outside the sequence passed to 'limits'. Note: either limit
2176	in the sequence, or the value of limits itself, can be set to None. The
2177	inclusive list/tuple determines whether the lower and upper limiting bounds
2178	(respectively) are open/exclusive (0) or closed/inclusive (1).
2179
2180	Usage: atmean(a,limits=None,inclusive=(1,1))
2181	"""
2182	if a.typecode() in ['l','s','b']:
2183	a = a.astype(N.Float)
2184	if limits == None:
2185	return mean(a)
2186	assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atmean"
2187	if inclusive[0]: lowerfcn = N.greater_equal
2188	else: lowerfcn = N.greater
2189	if inclusive[1]: upperfcn = N.less_equal
2190	else: upperfcn = N.less
2191	if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2192	raise ValueError, "No array values within given limits (atmean)."
2193	elif limits[0]==None and limits[1]<>None:
2194	mask = upperfcn(a,limits[1])
2195	elif limits[0]<>None and limits[1]==None:
2196	mask = lowerfcn(a,limits[0])
2197	elif limits[0]<>None and limits[1]<>None:
2198	mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2199	s = float(N.add.reduce(N.ravel(a*mask)))
2200	n = float(N.add.reduce(N.ravel(mask)))
2201	return s/n
2202
2203
2204	def atvar(a,limits=None,inclusive=(1,1)):
2205	"""
2206	Returns the sample variance of values in an array, (i.e., using N-1),
2207	ignoring values strictly outside the sequence passed to 'limits'.
2208	Note: either limit in the sequence, or the value of limits itself,
2209	can be set to None. The inclusive list/tuple determines whether the lower
2210	and upper limiting bounds (respectively) are open/exclusive (0) or
2211	closed/inclusive (1).
2212
2213	Usage: atvar(a,limits=None,inclusive=(1,1))
2214	"""
2215	a = a.astype(N.Float)
2216	if limits == None or limits == [None,None]:
2217	term1 = N.add.reduce(N.ravel(a*a))
2218	n = float(len(N.ravel(a))) - 1
2219	term2 = N.add.reduce(N.ravel(a))**2 / n
2220	print term1, term2, n
2221	return (term1 - term2) / n
2222	assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atvar"
2223	if inclusive[0]: lowerfcn = N.greater_equal
2224	else: lowerfcn = N.greater
2225	if inclusive[1]: upperfcn = N.less_equal
2226	else: upperfcn = N.less
2227	if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2228	raise ValueError, "No array values within given limits (atvar)."
2229	elif limits[0]==None and limits[1]<>None:
2230	mask = upperfcn(a,limits[1])
2231	elif limits[0]<>None and limits[1]==None:
2232	mask = lowerfcn(a,limits[0])
2233	elif limits[0]<>None and limits[1]<>None:
2234	mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2235	term1 = N.add.reduce(N.ravel(aamask))
2236	n = float(N.add.reduce(N.ravel(mask))) - 1
2237	term2 = N.add.reduce(N.ravel(amask))*2 / n
2238	print term1, term2, n
2239	return (term1 - term2) / n
2240
2241
2242	def atmin(a,lowerlimit=None,dimension=None,inclusive=1):
2243	"""
2244	Returns the minimum value of a, along dimension, including only values less
2245	than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None,
2246	all values in the array are used.
2247
2248	Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1)
2249	"""
2250	if inclusive: lowerfcn = N.greater
2251	else: lowerfcn = N.greater_equal
2252	if dimension == None:
2253	a = N.ravel(a)
2254	dimension = 0
2255	if lowerlimit == None:
2256	lowerlimit = N.minimum.reduce(N.ravel(a))-11
2257	biggest = N.maximum.reduce(N.ravel(a))
2258	ta = N.where(lowerfcn(a,lowerlimit),a,biggest)
2259	return N.minimum.reduce(ta,dimension)
2260
2261
2262	def atmax(a,upperlimit,dimension=None,inclusive=1):
2263	"""
2264	Returns the maximum value of a, along dimension, including only values greater
2265	than (or equal to, if inclusive=1) upperlimit. If the limit is set to None,
2266	a limit larger than the max value in the array is used.
2267
2268	Usage: atmax(a,upperlimit,dimension=None,inclusive=1)
2269	"""
2270	if inclusive: upperfcn = N.less
2271	else: upperfcn = N.less_equal
2272	if dimension == None:
2273	a = N.ravel(a)
2274	dimension = 0
2275	if upperlimit == None:
2276	upperlimit = N.maximum.reduce(N.ravel(a))+1
2277	smallest = N.minimum.reduce(N.ravel(a))
2278	ta = N.where(upperfcn(a,upperlimit),a,smallest)
2279	return N.maximum.reduce(ta,dimension)
2280
2281
2282	def atstdev(a,limits=None,inclusive=(1,1)):
2283	"""
2284	Returns the standard deviation of all values in an array, ignoring values
2285	strictly outside the sequence passed to 'limits'. Note: either limit
2286	in the sequence, or the value of limits itself, can be set to None. The
2287	inclusive list/tuple determines whether the lower and upper limiting bounds
2288	(respectively) are open/exclusive (0) or closed/inclusive (1).
2289
2290	Usage: atstdev(a,limits=None,inclusive=(1,1))
2291	"""
2292	return N.sqrt(tvar(a,limits,inclusive))
2293
2294
2295	def atsem(a,limits=None,inclusive=(1,1)):
2296	"""
2297	Returns the standard error of the mean for the values in an array,
2298	(i.e., using N for the denominator), ignoring values strictly outside
2299	the sequence passed to 'limits'. Note: either limit in the sequence,
2300	or the value of limits itself, can be set to None. The inclusive list/tuple
2301	determines whether the lower and upper limiting bounds (respectively) are
2302	open/exclusive (0) or closed/inclusive (1).
2303
2304	Usage: atsem(a,limits=None,inclusive=(1,1))
2305	"""
2306	sd = tstdev(a,limits,inclusive)
2307	if limits == None or limits == [None,None]:
2308	n = float(len(N.ravel(a)))
2309	assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atsem"
2310	if inclusive[0]: lowerfcn = N.greater_equal
2311	else: lowerfcn = N.greater
2312	if inclusive[1]: upperfcn = N.less_equal
2313	else: upperfcn = N.less
2314	if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2315	raise ValueError, "No array values within given limits (atsem)."
2316	elif limits[0]==None and limits[1]<>None:
2317	mask = upperfcn(a,limits[1])
2318	elif limits[0]<>None and limits[1]==None:
2319	mask = lowerfcn(a,limits[0])
2320	elif limits[0]<>None and limits[1]<>None:
2321	mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2322	term1 = N.add.reduce(N.ravel(aamask))
2323	n = float(N.add.reduce(N.ravel(mask)))
2324	return sd/math.sqrt(n)
2325
2326
2327	#####################################
2328	############ AMOMENTS #############
2329	#####################################
2330
2331	def amoment(a,moment=1,dimension=None):
2332	"""
2333	Calculates the nth moment about the mean for a sample (defaults to the
2334	1st moment). Generally used to calculate coefficients of skewness and
2335	kurtosis. Dimension can equal None (ravel array first), an integer
2336	(the dimension over which to operate), or a sequence (operate over
2337	multiple dimensions).
2338
2339	Usage: amoment(a,moment=1,dimension=None)
2340	Returns: appropriate moment along given dimension
2341	"""
2342	if dimension == None:
2343	a = N.ravel(a)
2344	dimension = 0
2345	if moment == 1:
2346	return 0.0
2347	else:
2348	mn = amean(a,dimension,1) # 1=keepdims
2349	s = N.power((a-mn),moment)
2350	return amean(s,dimension)
2351
2352
2353	def avariation(a,dimension=None):
2354	"""
2355	Returns the coefficient of variation, as defined in CRC Standard
2356	Probability and Statistics, p.6. Dimension can equal None (ravel array
2357	first), an integer (the dimension over which to operate), or a
2358	sequence (operate over multiple dimensions).
2359
2360	Usage: avariation(a,dimension=None)
2361	"""
2362	return 100.0*asamplestdev(a,dimension)/amean(a,dimension)
2363
2364
2365	def askew(a,dimension=None):
2366	"""
2367	Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
2368	weight in left tail). Use askewtest() to see if it's close enough.
2369	Dimension can equal None (ravel array first), an integer (the
2370	dimension over which to operate), or a sequence (operate over multiple
2371	dimensions).
2372
2373	Usage: askew(a, dimension=None)
2374	Returns: skew of vals in a along dimension, returning ZERO where all vals equal
2375	"""
2376	denom = N.power(amoment(a,2,dimension),1.5)
2377	zero = N.equal(denom,0)
2378	if type(denom) == N.ArrayType and asum(zero) <> 0:
2379	print "Number of zeros in askew: ",asum(zero)
2380	denom = denom + zero # prevent divide-by-zero
2381	return N.where(zero, 0, amoment(a,3,dimension)/denom)
2382
2383
2384	def akurtosis(a,dimension=None):
2385	"""
2386	Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
2387	heavier in the tails, and usually more peaked). Use akurtosistest()
2388	to see if it's close enough. Dimension can equal None (ravel array
2389	first), an integer (the dimension over which to operate), or a
2390	sequence (operate over multiple dimensions).
2391
2392	Usage: akurtosis(a,dimension=None)
2393	Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
2394	"""
2395	denom = N.power(amoment(a,2,dimension),2)
2396	zero = N.equal(denom,0)
2397	if type(denom) == N.ArrayType and asum(zero) <> 0:
2398	print "Number of zeros in akurtosis: ",asum(zero)
2399	denom = denom + zero # prevent divide-by-zero
2400	return N.where(zero,0,amoment(a,4,dimension)/denom)
2401
2402
2403	def adescribe(inarray,dimension=None):
2404	"""
2405	Returns several descriptive statistics of the passed array. Dimension
2406	can equal None (ravel array first), an integer (the dimension over
2407	which to operate), or a sequence (operate over multiple dimensions).
2408
2409	Usage: adescribe(inarray,dimension=None)
2410	Returns: n, (min,max), mean, standard deviation, skew, kurtosis
2411	"""
2412	if dimension == None:
2413	inarray = N.ravel(inarray)
2414	dimension = 0
2415	n = inarray.shape[dimension]
2416	mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray))
2417	m = amean(inarray,dimension)
2418	sd = astdev(inarray,dimension)
2419	skew = askew(inarray,dimension)
2420	kurt = akurtosis(inarray,dimension)
2421	return n, mm, m, sd, skew, kurt
2422
2423
2424	#####################################
2425	######## NORMALITY TESTS ##########
2426	#####################################
2427
2428	def askewtest(a,dimension=None):
2429	"""
2430	Tests whether the skew is significantly different from a normal
2431	distribution. Dimension can equal None (ravel array first), an
2432	integer (the dimension over which to operate), or a sequence (operate
2433	over multiple dimensions).
2434
2435	Usage: askewtest(a,dimension=None)
2436	Returns: z-score and 2-tail z-probability
2437	"""
2438	if dimension == None:
2439	a = N.ravel(a)
2440	dimension = 0
2441	b2 = askew(a,dimension)
2442	n = float(a.shape[dimension])
2443	y = b2 * N.sqrt(((n+1)(n+3)) / (6.0(n-2)) )
2444	beta2 = ( 3.0(nn+27n-70)(n+1)(n+3) ) / ( (n-2.0)(n+5)(n+7)(n+9) )
2445	W2 = -1 + N.sqrt(2*(beta2-1))
2446	delta = 1/N.sqrt(N.log(N.sqrt(W2)))
2447	alpha = N.sqrt(2/(W2-1))
2448	y = N.where(N.equal(y,0),1,y)
2449	Z = deltaN.log(y/alpha + N.sqrt((y/alpha)*2+1))
2450	return Z, (1.0-zprob(Z))*2
2451
2452
2453	def akurtosistest(a,dimension=None):
2454	"""
2455	Tests whether a dataset has normal kurtosis (i.e.,
2456	kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None
2457	(ravel array first), an integer (the dimension over which to operate),
2458	or a sequence (operate over multiple dimensions).
2459
2460	Usage: akurtosistest(a,dimension=None)
2461	Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
2462	"""
2463	if dimension == None:
2464	a = N.ravel(a)
2465	dimension = 0
2466	n = float(a.shape[dimension])
2467	if n<20:
2468	print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n
2469	b2 = akurtosis(a,dimension)
2470	E = 3.0*(n-1) /(n+1)
2471	varb2 = 24.0n(n-2)(n-3) / ((n+1)(n+1)(n+3)(n+5))
2472	x = (b2-E)/N.sqrt(varb2)
2473	sqrtbeta1 = 6.0(nn-5n+2)/((n+7)(n+9)) * N.sqrt((6.0(n+3)(n+5))/
2474	(n(n-2)(n-3)))
2475	A = 6.0 + 8.0/sqrtbeta1 (2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1*2)))
2476	term1 = 1 -2/(9.0*A)
2477	denom = 1 +x*N.sqrt(2/(A-4.0))
2478	denom = N.where(N.less(denom,0), 99, denom)
2479	term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0))
2480	Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A))
2481	Z = N.where(N.equal(denom,99), 0, Z)
2482	return Z, (1.0-zprob(Z))*2
2483
2484
2485	def anormaltest(a,dimension=None):
2486	"""
2487	Tests whether skew and/OR kurtosis of dataset differs from normal
2488	curve. Can operate over multiple dimensions. Dimension can equal
2489	None (ravel array first), an integer (the dimension over which to
2490	operate), or a sequence (operate over multiple dimensions).
2491
2492	Usage: anormaltest(a,dimension=None)
2493	Returns: z-score and 2-tail probability
2494	"""
2495	if dimension == None:
2496	a = N.ravel(a)
2497	dimension = 0
2498	s,p = askewtest(a,dimension)
2499	k,p = akurtosistest(a,dimension)
2500	k2 = N.power(s,2) + N.power(k,2)
2501	return k2, achisqprob(k2,2)
2502
2503
2504	#####################################
2505	###### AFREQUENCY FUNCTIONS #######
2506	#####################################
2507
2508	def aitemfreq(a):
2509	"""
2510	Returns a 2D array of item frequencies. Column 1 contains item values,
2511	column 2 contains their respective counts. Assumes a 1D array is passed.
2512
2513	Usage: aitemfreq(a)
2514	Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
2515	"""
2516	scores = pstat.aunique(a)
2517	scores = N.sort(scores)
2518	freq = N.zeros(len(scores))
2519	for i in range(len(scores)):
2520	freq[i] = N.add.reduce(N.equal(a,scores[i]))
2521	return N.array(pstat.aabut(scores, freq))
2522
2523
2524	def ascoreatpercentile (inarray, percent):
2525	"""
2526	Usage: ascoreatpercentile(inarray,percent) 0<percent<100
2527	Returns: score at given percentile, relative to inarray distribution
2528	"""
2529	percent = percent / 100.0
2530	targetcf = percent*len(inarray)
2531	h, lrl, binsize, extras = histogram(inarray)
2532	cumhist = cumsum(h*1)
2533	for i in range(len(cumhist)):
2534	if cumhist[i] >= targetcf:
2535	break
2536	score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
2537	return score
2538
2539
2540	def apercentileofscore (inarray,score,histbins=10,defaultlimits=None):
2541	"""
2542	Note: result of this function depends on the values used to histogram
2543	the data(!).
2544
2545	Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
2546	Returns: percentile-position of score (0-100) relative to inarray
2547	"""
2548	h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits)
2549	cumhist = cumsum(h*1)
2550	i = int((score - lrl)/float(binsize))
2551	pct = (cumhist[i-1]+((score-(lrl+binsizei))/float(binsize))h[i])/float(len(inarray)) * 100
2552	return pct
2553
2554
2555	def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1):
2556	"""
2557	Returns (i) an array of histogram bin counts, (ii) the smallest value
2558	of the histogram binning, and (iii) the bin width (the last 2 are not
2559	necessarily integers). Default number of bins is 10. Defaultlimits
2560	can be None (the routine picks bins spanning all the numbers in the
2561	inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the
2562	following: array of bin values, lowerreallimit, binsize, extrapoints.
2563
2564	Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
2565	Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
2566	"""
2567	inarray = N.ravel(inarray) # flatten any >1D arrays
2568	if (defaultlimits <> None):
2569	lowerreallimit = defaultlimits[0]
2570	upperreallimit = defaultlimits[1]
2571	binsize = (upperreallimit-lowerreallimit) / float(numbins)
2572	else:
2573	Min = N.minimum.reduce(inarray)
2574	Max = N.maximum.reduce(inarray)
2575	estbinwidth = float(Max - Min)/float(numbins) + 1
2576	binsize = (Max-Min+estbinwidth)/float(numbins)
2577	lowerreallimit = Min - binsize/2.0 #lower real limit,1st bin
2578	bins = N.zeros(numbins)
2579	extrapoints = 0
2580	for num in inarray:
2581	try:
2582	if (num-lowerreallimit) < 0:
2583	extrapoints = extrapoints + 1
2584	else:
2585	bintoincrement = int((num-lowerreallimit) / float(binsize))
2586	bins[bintoincrement] = bins[bintoincrement] + 1
2587	except: # point outside lower/upper limits
2588	extrapoints = extrapoints + 1
2589	if (extrapoints > 0 and printextras == 1):
2590	print '\nPoints outside given histogram range =',extrapoints
2591	return (bins, lowerreallimit, binsize, extrapoints)
2592
2593
2594	def acumfreq(a,numbins=10,defaultreallimits=None):
2595	"""
2596	Returns a cumulative frequency histogram, using the histogram function.
2597	Defaultreallimits can be None (use all data), or a 2-sequence containing
2598	lower and upper limits on values to include.
2599
2600	Usage: acumfreq(a,numbins=10,defaultreallimits=None)
2601	Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2602	"""
2603	h,l,b,e = histogram(a,numbins,defaultreallimits)
2604	cumhist = cumsum(h*1)
2605	return cumhist,l,b,e
2606
2607
2608	def arelfreq(a,numbins=10,defaultreallimits=None):
2609	"""
2610	Returns a relative frequency histogram, using the histogram function.
2611	Defaultreallimits can be None (use all data), or a 2-sequence containing
2612	lower and upper limits on values to include.
2613
2614	Usage: arelfreq(a,numbins=10,defaultreallimits=None)
2615	Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2616	"""
2617	h,l,b,e = histogram(a,numbins,defaultreallimits)
2618	h = N.array(h/float(a.shape[0]))
2619	return h,l,b,e
2620
2621
2622	#####################################
2623	###### AVARIABILITY FUNCTIONS #####
2624	#####################################
2625
2626	def aobrientransform(*args):
2627	"""
2628	Computes a transform on input data (any number of columns). Used to
2629	test for homogeneity of variance prior to running one-way stats. Each
2630	array in *args is one level of a factor. If an F_oneway() run on the
2631	transformed data and found significant, variances are unequal. From
2632	Maxwell and Delaney, p.112.
2633
2634	Usage: aobrientransform(args) args = 1D arrays, one per level of factor
2635	Returns: transformed data for use in an ANOVA
2636	"""
2637	TINY = 1e-10
2638	k = len(args)
2639	n = N.zeros(k,N.Float)
2640	v = N.zeros(k,N.Float)
2641	m = N.zeros(k,N.Float)
2642	nargs = []
2643	for i in range(k):
2644	nargs.append(args[i].astype(N.Float))
2645	n[i] = float(len(nargs[i]))
2646	v[i] = var(nargs[i])
2647	m[i] = mean(nargs[i])
2648	for j in range(k):
2649	for i in range(n[j]):
2650	t1 = (n[j]-1.5)n[j](nargs[j][i]-m[j])**2
2651	t2 = 0.5v[j](n[j]-1.0)
2652	t3 = (n[j]-1.0)*(n[j]-2.0)
2653	nargs[j][i] = (t1-t2) / float(t3)
2654	check = 1
2655	for j in range(k):
2656	if v[j] - mean(nargs[j]) > TINY:
2657	check = 0
2658	if check <> 1:
2659	raise ValueError, 'Lack of convergence in obrientransform.'
2660	else:
2661	return N.array(nargs)
2662
2663
2664	def asamplevar (inarray,dimension=None,keepdims=0):
2665	"""
2666	Returns the sample standard deviation of the values in the passed
2667	array (i.e., using N). Dimension can equal None (ravel array first),
2668	an integer (the dimension over which to operate), or a sequence
2669	(operate over multiple dimensions). Set keepdims=1 to return an array
2670	with the same number of dimensions as inarray.
2671
2672	Usage: asamplevar(inarray,dimension=None,keepdims=0)
2673	"""
2674	if dimension == None:
2675	inarray = N.ravel(inarray)
2676	dimension = 0
2677	if dimension == 1:
2678	mn = amean(inarray,dimension)[:,N.NewAxis]
2679	else:
2680	mn = amean(inarray,dimension,keepdims=1)
2681	deviations = inarray - mn
2682	if type(dimension) == ListType:
2683	n = 1
2684	for d in dimension:
2685	n = n*inarray.shape[d]
2686	else:
2687	n = inarray.shape[dimension]
2688	svar = ass(deviations,dimension,keepdims) / float(n)
2689	return svar
2690
2691
2692	def asamplestdev (inarray, dimension=None, keepdims=0):
2693	"""
2694	Returns the sample standard deviation of the values in the passed
2695	array (i.e., using N). Dimension can equal None (ravel array first),
2696	an integer (the dimension over which to operate), or a sequence
2697	(operate over multiple dimensions). Set keepdims=1 to return an array
2698	with the same number of dimensions as inarray.
2699
2700	Usage: asamplestdev(inarray,dimension=None,keepdims=0)
2701	"""
2702	return N.sqrt(asamplevar(inarray,dimension,keepdims))
2703
2704
2705	def asignaltonoise(instack,dimension=0):
2706	"""
2707	Calculates signal-to-noise. Dimension can equal None (ravel array
2708	first), an integer (the dimension over which to operate), or a
2709	sequence (operate over multiple dimensions).
2710
2711	Usage: asignaltonoise(instack,dimension=0):
2712	Returns: array containing the value of (mean/stdev) along dimension,
2713	or 0 when stdev=0
2714	"""
2715	m = mean(instack,dimension)
2716	sd = stdev(instack,dimension)
2717	return N.where(N.equal(sd,0),0,m/sd)
2718
2719
2720	def avar (inarray, dimension=None,keepdims=0):
2721	"""
2722	Returns the estimated population variance of the values in the passed
2723	array (i.e., N-1). Dimension can equal None (ravel array first), an
2724	integer (the dimension over which to operate), or a sequence (operate
2725	over multiple dimensions). Set keepdims=1 to return an array with the
2726	same number of dimensions as inarray.
2727
2728	Usage: avar(inarray,dimension=None,keepdims=0)
2729	"""
2730	if dimension == None:
2731	inarray = N.ravel(inarray)
2732	dimension = 0
2733	mn = amean(inarray,dimension,1)
2734	deviations = inarray - mn
2735	if type(dimension) == ListType:
2736	n = 1
2737	for d in dimension:
2738	n = n*inarray.shape[d]
2739	else:
2740	n = inarray.shape[dimension]
2741	var = ass(deviations,dimension,keepdims)/float(n-1)
2742	return var
2743
2744
2745	def astdev (inarray, dimension=None, keepdims=0):
2746	"""
2747	Returns the estimated population standard deviation of the values in
2748	the passed array (i.e., N-1). Dimension can equal None (ravel array
2749	first), an integer (the dimension over which to operate), or a
2750	sequence (operate over multiple dimensions). Set keepdims=1 to return
2751	an array with the same number of dimensions as inarray.
2752
2753	Usage: astdev(inarray,dimension=None,keepdims=0)
2754	"""
2755	return N.sqrt(avar(inarray,dimension,keepdims))
2756
2757
2758	def asterr (inarray, dimension=None, keepdims=0):
2759	"""
2760	Returns the estimated population standard error of the values in the
2761	passed array (i.e., N-1). Dimension can equal None (ravel array
2762	first), an integer (the dimension over which to operate), or a
2763	sequence (operate over multiple dimensions). Set keepdims=1 to return
2764	an array with the same number of dimensions as inarray.
2765
2766	Usage: asterr(inarray,dimension=None,keepdims=0)
2767	"""
2768	if dimension == None:
2769	inarray = N.ravel(inarray)
2770	dimension = 0
2771	return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension]))
2772
2773
2774	def asem (inarray, dimension=None, keepdims=0):
2775	"""
2776	Returns the standard error of the mean (i.e., using N) of the values
2777	in the passed array. Dimension can equal None (ravel array first), an
2778	integer (the dimension over which to operate), or a sequence (operate
2779	over multiple dimensions). Set keepdims=1 to return an array with the
2780	same number of dimensions as inarray.
2781
2782	Usage: asem(inarray,dimension=None, keepdims=0)
2783	"""
2784	if dimension == None:
2785	inarray = N.ravel(inarray)
2786	dimension = 0
2787	if type(dimension) == ListType:
2788	n = 1
2789	for d in dimension:
2790	n = n*inarray.shape[d]
2791	else:
2792	n = inarray.shape[dimension]
2793	s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1)
2794	return s
2795
2796
2797	def az (a, score):
2798	"""
2799	Returns the z-score of a given input score, given thearray from which
2800	that score came. Not appropriate for population calculations, nor for
2801	arrays > 1D.
2802
2803	Usage: az(a, score)
2804	"""
2805	z = (score-amean(a)) / asamplestdev(a)
2806	return z
2807
2808
2809	def azs (a):
2810	"""
2811	Returns a 1D array of z-scores, one for each score in the passed array,
2812	computed relative to the passed array.
2813
2814	Usage: azs(a)
2815	"""
2816	zscores = []
2817	for item in a:
2818	zscores.append(z(a,item))
2819	return N.array(zscores)
2820
2821
2822	def azmap (scores, compare, dimension=0):
2823	"""
2824	Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
2825	array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0
2826	of the compare array.
2827
2828	Usage: azs(scores, compare, dimension=0)
2829	"""
2830	mns = amean(compare,dimension)
2831	sstd = asamplestdev(compare,0)
2832	return (scores - mns) / sstd
2833
2834
2835	#####################################
2836	####### ATRIMMING FUNCTIONS #######
2837	#####################################
2838
2839	def around(a,digits=1):
2840	"""
2841	Rounds all values in array a to 'digits' decimal places.
2842
2843	Usage: around(a,digits)
2844	Returns: a, where each value is rounded to 'digits' decimals
2845	"""
2846	def ar(x,d=digits):
2847	return round(x,d)
2848
2849	if type(a) <> N.ArrayType:
2850	try:
2851	a = N.array(a)
2852	except:
2853	a = N.array(a,'O')
2854	shp = a.shape
2855	if a.typecode() in ['f','F','d','D']:
2856	b = N.ravel(a)
2857	b = N.array(map(ar,b))
2858	b.shape = shp
2859	elif a.typecode() in ['o','O']:
2860	b = N.ravel(a)*1
2861	for i in range(len(b)):
2862	if type(b[i]) == FloatType:
2863	b[i] = round(b[i],digits)
2864	b.shape = shp
2865	else: # not a float, double or Object array
2866	b = a*1
2867	return b
2868
2869
2870	def athreshold(a,threshmin=None,threshmax=None,newval=0):
2871	"""
2872	Like Numeric.clip() except that values <threshmid or >threshmax are replaced
2873	by newval instead of by threshmin/threshmax (respectively).
2874
2875	Usage: athreshold(a,threshmin=None,threshmax=None,newval=0)
2876	Returns: a, with values <threshmin or >threshmax replaced with newval
2877	"""
2878	mask = N.zeros(a.shape)
2879	if threshmin <> None:
2880	mask = mask + N.where(N.less(a,threshmin),1,0)
2881	if threshmax <> None:
2882	mask = mask + N.where(N.greater(a,threshmax),1,0)
2883	mask = N.clip(mask,0,1)
2884	return N.where(mask,newval,a)
2885
2886
2887	def atrimboth (a,proportiontocut):
2888	"""
2889	Slices off the passed proportion of items from BOTH ends of the passed
2890	array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
2891	'rightmost' 10% of scores. You must pre-sort the array if you want
2892	"proper" trimming. Slices off LESS if proportion results in a
2893	non-integer slice index (i.e., conservatively slices off
2894	proportiontocut).
2895
2896	Usage: atrimboth (a,proportiontocut)
2897	Returns: trimmed version of array a
2898	"""
2899	lowercut = int(proportiontocut*len(a))
2900	uppercut = len(a) - lowercut
2901	return a[lowercut:uppercut]
2902
2903
2904	def atrim1 (a,proportiontocut,tail='right'):
2905	"""
2906	Slices off the passed proportion of items from ONE end of the passed
2907	array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
2908	10% of scores). Slices off LESS if proportion results in a non-integer
2909	slice index (i.e., conservatively slices off proportiontocut).
2910
2911	Usage: atrim1(a,proportiontocut,tail='right') or set tail='left'
2912	Returns: trimmed version of array a
2913	"""
2914	if string.lower(tail) == 'right':
2915	lowercut = 0
2916	uppercut = len(a) - int(proportiontocut*len(a))
2917	elif string.lower(tail) == 'left':
2918	lowercut = int(proportiontocut*len(a))
2919	uppercut = len(a)
2920	return a[lowercut:uppercut]
2921
2922
2923	#####################################
2924	##### ACORRELATION FUNCTIONS ######
2925	#####################################
2926
2927	def acovariance(X):
2928	"""
2929	Computes the covariance matrix of a matrix X. Requires a 2D matrix input.
2930
2931	Usage: acovariance(X)
2932	Returns: covariance matrix of X
2933	"""
2934	if len(X.shape) <> 2:
2935	raise TypeError, "acovariance requires 2D matrices"
2936	n = X.shape[0]
2937	mX = amean(X,0)
2938	return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX)
2939
2940
2941	def acorrelation(X):
2942	"""
2943	Computes the correlation matrix of a matrix X. Requires a 2D matrix input.
2944
2945	Usage: acorrelation(X)
2946	Returns: correlation matrix of X
2947	"""
2948	C = acovariance(X)
2949	V = N.diagonal(C)
2950	return C / N.sqrt(N.multiply.outer(V,V))
2951
2952
2953	def apaired(x,y):
2954	"""
2955	Interactively determines the type of data in x and y, and then runs the
2956	appropriated statistic for paired group data.
2957
2958	Usage: apaired(x,y) x,y = the two arrays of values to be compared
2959	Returns: appropriate statistic name, value, and probability
2960	"""
2961	samples = ''
2962	while samples not in ['i','r','I','R','c','C']:
2963	print '\nIndependent or related samples, or correlation (i,r,c): ',
2964	samples = raw_input()
2965
2966	if samples in ['i','I','r','R']:
2967	print '\nComparing variances ...',
2968	# USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
2969	r = obrientransform(x,y)
2970	f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
2971	if p<0.05:
2972	vartype='unequal, p='+str(round(p,4))
2973	else:
2974	vartype='equal'
2975	print vartype
2976	if samples in ['i','I']:
2977	if vartype[0]=='e':
2978	t,p = ttest_ind(x,y,None,0)
2979	print '\nIndependent samples t-test: ', round(t,4),round(p,4)
2980	else:
2981	if len(x)>20 or len(y)>20:
2982	z,p = ranksums(x,y)
2983	print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
2984	else:
2985	u,p = mannwhitneyu(x,y)
2986	print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
2987
2988	else: # RELATED SAMPLES
2989	if vartype[0]=='e':
2990	t,p = ttest_rel(x,y,0)
2991	print '\nRelated samples t-test: ', round(t,4),round(p,4)
2992	else:
2993	t,p = ranksums(x,y)
2994	print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
2995	else: # CORRELATION ANALYSIS
2996	corrtype = ''
2997	while corrtype not in ['c','C','r','R','d','D']:
2998	print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
2999	corrtype = raw_input()
3000	if corrtype in ['c','C']:
3001	m,b,r,p,see = linregress(x,y)
3002	print '\nLinear regression for continuous variables ...'
3003	lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
3004	pstat.printcc(lol)
3005	elif corrtype in ['r','R']:
3006	r,p = spearmanr(x,y)
3007	print '\nCorrelation for ranked variables ...'
3008	print "Spearman's r: ",round(r,4),round(p,4)
3009	else: # DICHOTOMOUS
3010	r,p = pointbiserialr(x,y)
3011	print '\nAssuming x contains a dichotomous variable ...'
3012	print 'Point Biserial r: ',round(r,4),round(p,4)
3013	print '\n\n'
3014	return None
3015
3016
3017	def apearsonr(x,y,verbose=1):
3018	"""
3019	Calculates a Pearson correlation coefficient and returns p. Taken
3020	from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
3021
3022	Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays
3023	Returns: Pearson's r, two-tailed p-value
3024	"""
3025	TINY = 1.0e-20
3026	n = len(x)
3027	xmean = amean(x)
3028	ymean = amean(y)
3029	r_num = n(N.add.reduce(xy)) - N.add.reduce(x)*N.add.reduce(y)
3030	r_den = math.sqrt((nass(x) - asquare_of_sums(x))(n*ass(y)-asquare_of_sums(y)))
3031	r = (r_num / r_den)
3032	df = n-2
3033	t = rmath.sqrt(df/((1.0-r+TINY)(1.0+r+TINY)))
3034	prob = abetai(0.5df,0.5,df/(df+tt),verbose)
3035	return r,prob
3036
3037
3038	def aspearmanr(x,y):
3039	"""
3040	Calculates a Spearman rank-order correlation coefficient. Taken
3041	from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
3042
3043	Usage: aspearmanr(x,y) where x,y are equal-length arrays
3044	Returns: Spearman's r, two-tailed p-value
3045	"""
3046	TINY = 1e-30
3047	n = len(x)
3048	rankx = rankdata(x)
3049	ranky = rankdata(y)
3050	dsq = N.add.reduce((rankx-ranky)**2)
3051	rs = 1 - 6dsq / float(n(n**2-1))
3052	t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
3053	df = n-2
3054	probrs = abetai(0.5df,0.5,df/(df+tt))
3055	# probability values for rs are from part 2 of the spearman function in
3056	# Numerical Recipies, p.510. They close to tables, but not exact.(?)
3057	return rs, probrs
3058
3059
3060	def apointbiserialr(x,y):
3061	"""
3062	Calculates a point-biserial correlation coefficient and the associated
3063	probability value. Taken from Heiman's Basic Statistics for the Behav.
3064	Sci (1st), p.194.
3065
3066	Usage: apointbiserialr(x,y) where x,y are equal length arrays
3067	Returns: Point-biserial r, two-tailed p-value
3068	"""
3069	TINY = 1e-30
3070	categories = pstat.aunique(x)
3071	data = pstat.aabut(x,y)
3072	if len(categories) <> 2:
3073	raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()."
3074	else: # there are 2 categories, continue
3075	codemap = pstat.aabut(categories,N.arange(2))
3076	recoded = pstat.arecode(data,codemap,0)
3077	x = pstat.alinexand(data,0,categories[0])
3078	y = pstat.alinexand(data,0,categories[1])
3079	xmean = amean(pstat.acolex(x,1))
3080	ymean = amean(pstat.acolex(y,1))
3081	n = len(data)
3082	adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
3083	rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust
3084	df = n-2
3085	t = rpbmath.sqrt(df/((1.0-rpb+TINY)(1.0+rpb+TINY)))
3086	prob = abetai(0.5df,0.5,df/(df+tt))
3087	return rpb, prob
3088
3089
3090	def akendalltau(x,y):
3091	"""
3092	Calculates Kendall's tau ... correlation of ordinal data. Adapted
3093	from function kendl1 in Numerical Recipies. Needs good test-cases.@@@
3094
3095	Usage: akendalltau(x,y)
3096	Returns: Kendall's tau, two-tailed p-value
3097	"""
3098	n1 = 0
3099	n2 = 0
3100	iss = 0
3101	for j in range(len(x)-1):
3102	for k in range(j,len(y)):
3103	a1 = x[j] - x[k]
3104	a2 = y[j] - y[k]
3105	aa = a1 * a2
3106	if (aa): # neither array has a tie
3107	n1 = n1 + 1
3108	n2 = n2 + 1
3109	if aa > 0:
3110	iss = iss + 1
3111	else:
3112	iss = iss -1
3113	else:
3114	if (a1):
3115	n1 = n1 + 1
3116	else:
3117	n2 = n2 + 1
3118	tau = iss / math.sqrt(n1*n2)
3119	svar = (4.0len(x)+10.0) / (9.0len(x)*(len(x)-1))
3120	z = tau / math.sqrt(svar)
3121	prob = erfcc(abs(z)/1.4142136)
3122	return tau, prob
3123
3124
3125	def alinregress(*args):
3126	"""
3127	Calculates a regression line on two arrays, x and y, corresponding to x,y
3128	pairs. If a single 2D array is passed, alinregress finds dim with 2 levels
3129	and splits data into x,y pairs along that dim.
3130
3131	Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array
3132	Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate
3133	"""
3134	TINY = 1.0e-20
3135	if len(args) == 1: # more than 1D array?
3136	args = args[0]
3137	if len(args) == 2:
3138	x = args[0]
3139	y = args[1]
3140	else:
3141	x = args[:,0]
3142	y = args[:,1]
3143	else:
3144	x = args[0]
3145	y = args[1]
3146	n = len(x)
3147	xmean = amean(x)
3148	ymean = amean(y)
3149	r_num = n(N.add.reduce(xy)) - N.add.reduce(x)*N.add.reduce(y)
3150	r_den = math.sqrt((nass(x) - asquare_of_sums(x))(n*ass(y)-asquare_of_sums(y)))
3151	r = r_num / r_den
3152	z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
3153	df = n-2
3154	t = rmath.sqrt(df/((1.0-r+TINY)(1.0+r+TINY)))
3155	prob = abetai(0.5df,0.5,df/(df+tt))
3156	slope = r_num / (float(n)*ass(x) - asquare_of_sums(x))
3157	intercept = ymean - slope*xmean
3158	sterrest = math.sqrt(1-rr)asamplestdev(y)
3159	return slope, intercept, r, prob, sterrest
3160
3161
3162	#####################################
3163	##### AINFERENTIAL STATISTICS #####
3164	#####################################
3165
3166	def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
3167	"""
3168	Calculates the t-obtained for the independent samples T-test on ONE group
3169	of scores a, given a population mean. If printit=1, results are printed
3170	to the screen. If printit='filename', the results are output to 'filename'
3171	using the given writemode (default=append). Returns t-value, and prob.
3172
3173	Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
3174	Returns: t-value, two-tailed prob
3175	"""
3176	if type(a) != N.ArrayType:
3177	a = N.array(a)
3178	x = amean(a)
3179	v = avar(a)
3180	n = len(a)
3181	df = n-1
3182	svar = ((n-1)*v) / float(df)
3183	t = (x-popmean)/math.sqrt(svar*(1.0/n))
3184	prob = abetai(0.5df,0.5,df/(df+tt))
3185
3186	if printit <> 0:
3187	statname = 'Single-sample T-test.'
3188	outputpairedstats(printit,writemode,
3189	'Population','--',popmean,0,0,0,
3190	name,n,x,v,N.minimum.reduce(N.ravel(a)),
3191	N.maximum.reduce(N.ravel(a)),
3192	statname,t,prob)
3193	return t,prob
3194
3195
3196	def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'):
3197	"""
3198	Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
3199	a, and b. From Numerical Recipies, p.483. If printit=1, results are
3200	printed to the screen. If printit='filename', the results are output
3201	to 'filename' using the given writemode (default=append). Dimension
3202	can equal None (ravel array first), or an integer (the dimension over
3203	which to operate on a and b).
3204
3205	Usage: attest_ind (a,b,dimension=None,printit=0,
3206	Name1='Samp1',Name2='Samp2',writemode='a')
3207	Returns: t-value, two-tailed p-value
3208	"""
3209	if dimension == None:
3210	a = N.ravel(a)
3211	b = N.ravel(b)
3212	dimension = 0
3213	x1 = amean(a,dimension)
3214	x2 = amean(b,dimension)
3215	v1 = avar(a,dimension)
3216	v2 = avar(b,dimension)
3217	n1 = a.shape[dimension]
3218	n2 = b.shape[dimension]
3219	df = n1+n2-2
3220	svar = ((n1-1)v1+(n2-1)v2) / float(df)
3221	zerodivproblem = N.equal(svar,0)
3222	svar = N.where(zerodivproblem,1,svar) # avoid zero-division in 1st place
3223	t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2)) # N-D COMPUTATION HERE!!!!!!
3224	t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0
3225	probs = abetai(0.5df,0.5,float(df)/(df+tt))
3226
3227	if type(t) == N.ArrayType:
3228	probs = N.reshape(probs,t.shape)
3229	if len(probs) == 1:
3230	probs = probs[0]
3231
3232	if printit <> 0:
3233	if type(t) == N.ArrayType:
3234	t = t[0]
3235	if type(probs) == N.ArrayType:
3236	probs = probs[0]
3237	statname = 'Independent samples T-test.'
3238	outputpairedstats(printit,writemode,
3239	name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)),
3240	N.maximum.reduce(N.ravel(a)),
3241	name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)),
3242	N.maximum.reduce(N.ravel(b)),
3243	statname,t,probs)
3244	return
3245	return t, probs
3246
3247
3248	def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'):
3249	"""
3250	Calculates the t-obtained T-test on TWO RELATED samples of scores, a
3251	and b. From Numerical Recipies, p.483. If printit=1, results are
3252	printed to the screen. If printit='filename', the results are output
3253	to 'filename' using the given writemode (default=append). Dimension
3254	can equal None (ravel array first), or an integer (the dimension over
3255	which to operate on a and b).
3256
3257	Usage: attest_rel(a,b,dimension=None,printit=0,
3258	name1='Samp1',name2='Samp2',writemode='a')
3259	Returns: t-value, two-tailed p-value
3260	"""
3261	if dimension == None:
3262	a = N.ravel(a)
3263	b = N.ravel(b)
3264	dimension = 0
3265	if len(a)<>len(b):
3266	raise ValueError, 'Unequal length arrays.'
3267	x1 = amean(a,dimension)
3268	x2 = amean(b,dimension)
3269	v1 = avar(a,dimension)
3270	v2 = avar(b,dimension)
3271	n = a.shape[dimension]
3272	df = float(n-1)
3273	d = (a-b).astype('d')
3274
3275	denom = N.sqrt((nN.add.reduce(dd,dimension) - N.add.reduce(d,dimension)**2) /df)
3276	zerodivproblem = N.equal(denom,0)
3277	denom = N.where(zerodivproblem,1,denom) # avoid zero-division in 1st place
3278	t = N.add.reduce(d,dimension) / denom # N-D COMPUTATION HERE!!!!!!
3279	t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0
3280	probs = abetai(0.5df,0.5,float(df)/(df+tt))
3281	if type(t) == N.ArrayType:
3282	probs = N.reshape(probs,t.shape)
3283	if len(probs) == 1:
3284	probs = probs[0]
3285
3286	if printit <> 0:
3287	statname = 'Related samples T-test.'
3288	outputpairedstats(printit,writemode,
3289	name1,n,x1,v1,N.minimum.reduce(N.ravel(a)),
3290	N.maximum.reduce(N.ravel(a)),
3291	name2,n,x2,v2,N.minimum.reduce(N.ravel(b)),
3292	N.maximum.reduce(N.ravel(b)),
3293	statname,t,probs)
3294	return
3295	return t, probs
3296
3297
3298	def achisquare(f_obs,f_exp=None):
3299	"""
3300	Calculates a one-way chi square for array of observed frequencies and returns
3301	the result. If no expected frequencies are given, the total N is assumed to
3302	be equally distributed across all groups.
3303
3304	Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq.
3305	Returns: chisquare-statistic, associated p-value
3306	"""
3307
3308	k = len(f_obs)
3309	if f_exp == None:
3310	f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.Float)
3311	f_exp = f_exp.astype(N.Float)
3312	chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp)
3313	return chisq, chisqprob(chisq, k-1)
3314
3315
3316	def aks_2samp (data1,data2):
3317	"""
3318	Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from
3319	Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc-
3320	like.
3321
3322	Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays
3323	Returns: KS D-value, p-value
3324	"""
3325	j1 = 0 # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE
3326	j2 = 0 # N.zeros(data2.shape[1:])
3327	fn1 = 0.0 # N.zeros(data1.shape[1:],N.Float)
3328	fn2 = 0.0 # N.zeros(data2.shape[1:],N.Float)
3329	n1 = data1.shape[0]
3330	n2 = data2.shape[0]
3331	en1 = n1*1
3332	en2 = n2*1
3333	d = N.zeros(data1.shape[1:],N.Float)
3334	data1 = N.sort(data1,0)
3335	data2 = N.sort(data2,0)
3336	while j1 < n1 and j2 < n2:
3337	d1=data1[j1]
3338	d2=data2[j2]
3339	if d1 <= d2:
3340	fn1 = (j1)/float(en1)
3341	j1 = j1 + 1
3342	if d2 <= d1:
3343	fn2 = (j2)/float(en2)
3344	j2 = j2 + 1
3345	dt = (fn2-fn1)
3346	if abs(dt) > abs(d):
3347	d = dt
3348	try:
3349	en = math.sqrt(en1*en2/float(en1+en2))
3350	prob = aksprob((en+0.12+0.11/en)*N.fabs(d))
3351	except:
3352	prob = 1.0
3353	return d, prob
3354
3355
3356	def amannwhitneyu(x,y):
3357	"""
3358	Calculates a Mann-Whitney U statistic on the provided scores and
3359	returns the result. Use only when the n in each condition is < 20 and
3360	you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is
3361	significant if the u-obtained is LESS THAN or equal to the critical
3362	value of U.
3363
3364	Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions
3365	Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
3366	"""
3367	n1 = len(x)
3368	n2 = len(y)
3369	ranked = rankdata(N.concatenate((x,y)))
3370	rankx = ranked[0:n1] # get the x-ranks
3371	ranky = ranked[n1:] # the rest are y-ranks
3372	u1 = n1n2 + (n1(n1+1))/2.0 - sum(rankx) # calc U for x
3373	u2 = n1*n2 - u1 # remainder is U for y
3374	bigu = max(u1,u2)
3375	smallu = min(u1,u2)
3376	T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores
3377	if T == 0:
3378	raise ValueError, 'All numbers are identical in amannwhitneyu'
3379	sd = math.sqrt(Tn1n2*(n1+n2+1)/12.0)
3380	z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc
3381	return smallu, 1.0 - zprob(z)
3382
3383
3384	def atiecorrect(rankvals):
3385	"""
3386	Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
3387	See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
3388	Sciences. New York: McGraw-Hill. Code adapted from \|Stat rankind.c
3389	code.
3390
3391	Usage: atiecorrect(rankvals)
3392	Returns: T correction factor for U or H
3393	"""
3394	sorted,posn = ashellsort(N.array(rankvals))
3395	n = len(sorted)
3396	T = 0.0
3397	i = 0
3398	while (i<n-1):
3399	if sorted[i] == sorted[i+1]:
3400	nties = 1
3401	while (i<n-1) and (sorted[i] == sorted[i+1]):
3402	nties = nties +1
3403	i = i +1
3404	T = T + nties**3 - nties
3405	i = i+1
3406	T = T / float(n**3-n)
3407	return 1.0 - T
3408
3409
3410	def aranksums(x,y):
3411	"""
3412	Calculates the rank sums statistic on the provided scores and returns
3413	the result.
3414
3415	Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions
3416	Returns: z-statistic, two-tailed p-value
3417	"""
3418	n1 = len(x)
3419	n2 = len(y)
3420	alldata = N.concatenate((x,y))
3421	ranked = arankdata(alldata)
3422	x = ranked[:n1]
3423	y = ranked[n1:]
3424	s = sum(x)
3425	expected = n1*(n1+n2+1) / 2.0
3426	z = (s - expected) / math.sqrt(n1n2(n1+n2+1)/12.0)
3427	prob = 2*(1.0 -zprob(abs(z)))
3428	return z, prob
3429
3430
3431	def awilcoxont(x,y):
3432	"""
3433	Calculates the Wilcoxon T-test for related samples and returns the
3434	result. A non-parametric T-test.
3435
3436	Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions
3437	Returns: t-statistic, two-tailed p-value
3438	"""
3439	if len(x) <> len(y):
3440	raise ValueError, 'Unequal N in awilcoxont. Aborting.'
3441	d = x-y
3442	d = N.compress(N.not_equal(d,0),d) # Keep all non-zero differences
3443	count = len(d)
3444	absd = abs(d)
3445	absranked = arankdata(absd)
3446	r_plus = 0.0
3447	r_minus = 0.0
3448	for i in range(len(absd)):
3449	if d[i] < 0:
3450	r_minus = r_minus + absranked[i]
3451	else:
3452	r_plus = r_plus + absranked[i]
3453	wt = min(r_plus, r_minus)
3454	mn = count * (count+1) * 0.25
3455	se = math.sqrt(count(count+1)(2.0*count+1.0)/24.0)
3456	z = math.fabs(wt-mn) / se
3457	z = math.fabs(wt-mn) / se
3458	prob = 2*(1.0 -zprob(abs(z)))
3459	return wt, prob
3460
3461
3462	def akruskalwallish(*args):
3463	"""
3464	The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
3465	groups, requiring at least 5 subjects in each group. This function
3466	calculates the Kruskal-Wallis H and associated p-value for 3 or more
3467	independent samples.
3468
3469	Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions
3470	Returns: H-statistic (corrected for ties), associated p-value
3471	"""
3472	assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()"
3473	args = list(args)
3474	n = [0]*len(args)
3475	n = map(len,args)
3476	all = []
3477	for i in range(len(args)):
3478	all = all + args[i].tolist()
3479	ranked = rankdata(all)
3480	T = tiecorrect(ranked)
3481	for i in range(len(args)):
3482	args[i] = ranked[0:n[i]]
3483	del ranked[0:n[i]]
3484	rsums = []
3485	for i in range(len(args)):
3486	rsums.append(sum(args[i])**2)
3487	rsums[i] = rsums[i] / float(n[i])
3488	ssbn = sum(rsums)
3489	totaln = sum(n)
3490	h = 12.0 / (totaln(totaln+1)) ssbn - 3*(totaln+1)
3491	df = len(args) - 1
3492	if T == 0:
3493	raise ValueError, 'All numbers are identical in akruskalwallish'
3494	h = h / float(T)
3495	return h, chisqprob(h,df)
3496
3497
3498	def afriedmanchisquare(*args):
3499	"""
3500	Friedman Chi-Square is a non-parametric, one-way within-subjects
3501	ANOVA. This function calculates the Friedman Chi-square test for
3502	repeated measures and returns the result, along with the associated
3503	probability value. It assumes 3 or more repeated measures. Only 3
3504	levels requires a minimum of 10 subjects in the study. Four levels
3505	requires 5 subjects per level(??).
3506
3507	Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions
3508	Returns: chi-square statistic, associated p-value
3509	"""
3510	k = len(args)
3511	if k < 3:
3512	raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n'
3513	n = len(args[0])
3514	data = apply(pstat.aabut,args)
3515	data = data.astype(N.Float)
3516	for i in range(len(data)):
3517	data[i] = arankdata(data[i])
3518	ssbn = asum(asum(args,1)**2)
3519	chisq = 12.0 / (kn(k+1)) * ssbn - 3n(k+1)
3520	return chisq, chisqprob(chisq,k-1)
3521
3522
3523	#####################################
3524	#### APROBABILITY CALCULATIONS ####
3525	#####################################
3526
3527	def achisqprob(chisq,df):
3528	"""
3529	Returns the (1-tail) probability value associated with the provided chi-square
3530	value and df. Heavily modified from chisq.c in Gary Perlman's \|Stat. Can
3531	handle multiple dimensions.
3532
3533	Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom
3534	"""
3535	BIG = 200.0
3536	def ex(x):
3537	BIG = 200.0
3538	exponents = N.where(N.less(x,-BIG),-BIG,x)
3539	return N.exp(exponents)
3540
3541	if type(chisq) == N.ArrayType:
3542	arrayflag = 1
3543	else:
3544	arrayflag = 0
3545	chisq = N.array([chisq])
3546	if df < 1:
3547	return N.ones(chisq.shape,N.float)
3548	probs = N.zeros(chisq.shape,N.Float)
3549	probs = N.where(N.less_equal(chisq,0),1.0,probs) # set prob=1 for chisq<0
3550	a = 0.5 * chisq
3551	if df > 1:
3552	y = ex(-a)
3553	if df%2 == 0:
3554	even = 1
3555	s = y*1
3556	s2 = s*1
3557	else:
3558	even = 0
3559	s = 2.0 * azprob(-N.sqrt(chisq))
3560	s2 = s*1
3561	if (df > 2):
3562	chisq = 0.5 * (df - 1.0)
3563	if even:
3564	z = N.ones(probs.shape,N.Float)
3565	else:
3566	z = 0.5 *N.ones(probs.shape,N.Float)
3567	if even:
3568	e = N.zeros(probs.shape,N.Float)
3569	else:
3570	e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.Float)
3571	c = N.log(a)
3572	mask = N.zeros(probs.shape)
3573	a_big = N.greater(a,BIG)
3574	a_big_frozen = -1 *N.ones(probs.shape,N.Float)
3575	totalelements = N.multiply.reduce(N.array(probs.shape))
3576	while asum(mask)<>totalelements:
3577	e = N.log(z) + e
3578	s = s + ex(c*z-a-e)
3579	z = z + 1.0
3580	# print z, e, s
3581	newmask = N.greater(z,chisq)
3582	a_big_frozen = N.where(newmaskN.equal(mask,0)a_big, s, a_big_frozen)
3583	mask = N.clip(newmask+mask,0,1)
3584	if even:
3585	z = N.ones(probs.shape,N.Float)
3586	e = N.ones(probs.shape,N.Float)
3587	else:
3588	z = 0.5 *N.ones(probs.shape,N.Float)
3589	e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.Float)
3590	c = 0.0
3591	mask = N.zeros(probs.shape)
3592	a_notbig_frozen = -1 *N.ones(probs.shape,N.Float)
3593	while asum(mask)<>totalelements:
3594	e = e * (a/z.astype(N.Float))
3595	c = c + e
3596	z = z + 1.0
3597	# print '#2', z, e, c, s, c*y+s2
3598	newmask = N.greater(z,chisq)
3599	a_notbig_frozen = N.where(newmaskN.equal(mask,0)(1-a_big),
3600	c*y+s2, a_notbig_frozen)
3601	mask = N.clip(newmask+mask,0,1)
3602	probs = N.where(N.equal(probs,1),1,
3603	N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen))
3604	return probs
3605	else:
3606	return s
3607
3608
3609	def aerfcc(x):
3610	"""
3611	Returns the complementary error function erfc(x) with fractional error
3612	everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can
3613	handle multiple dimensions.
3614
3615	Usage: aerfcc(x)
3616	"""
3617	z = abs(x)
3618	t = 1.0 / (1.0+0.5*z)
3619	ans = t * N.exp(-zz-1.26551223 + t(1.00002368+t(0.37409196+t(0.09678418+t(-0.18628806+t(0.27886807+t(-1.13520398+t(1.48851587+t(-0.82215223+t0.17087277)))))))))
3620	return N.where(N.greater_equal(x,0), ans, 2.0-ans)
3621
3622
3623	def azprob(z):
3624	"""
3625	Returns the area under the normal curve 'to the left of' the given z value.
3626	Thus,
3627	for z<0, zprob(z) = 1-tail probability
3628	for z>0, 1.0-zprob(z) = 1-tail probability
3629	for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
3630	Adapted from z.c in Gary Perlman's \|Stat. Can handle multiple dimensions.
3631
3632	Usage: azprob(z) where z is a z-value
3633	"""
3634	def yfunc(y):
3635	x = (((((((((((((-0.000045255659 * y
3636	+0.000152529290) * y -0.000019538132) * y
3637	-0.000676904986) * y +0.001390604284) * y
3638	-0.000794620820) * y -0.002034254874) * y
3639	+0.006549791214) * y -0.010557625006) * y
3640	+0.011630447319) * y -0.009279453341) * y
3641	+0.005353579108) * y -0.002141268741) * y
3642	+0.000535310849) * y +0.999936657524
3643	return x
3644
3645	def wfunc(w):
3646	x = ((((((((0.000124818987 * w
3647	-0.001075204047) * w +0.005198775019) * w
3648	-0.019198292004) * w +0.059054035642) * w
3649	-0.151968751364) * w +0.319152932694) * w
3650	-0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0
3651	return x
3652
3653	Z_MAX = 6.0 # maximum meaningful z-value
3654	x = N.zeros(z.shape,N.Float) # initialize
3655	y = 0.5 * N.fabs(z)
3656	x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0)) # get x's
3657	x = N.where(N.greater(y,Z_MAX*0.5),1.0,x) # kill those with big Z
3658	prob = N.where(N.greater(z,0),(x+1)0.5,(1-x)0.5)
3659	return prob
3660
3661
3662	def aksprob(alam):
3663	"""
3664	Returns the probability value for a K-S statistic computed via ks_2samp.
3665	Adapted from Numerical Recipies. Can handle multiple dimensions.
3666
3667	Usage: aksprob(alam)
3668	"""
3669	if type(alam) == N.ArrayType:
3670	frozen = -1 *N.ones(alam.shape,N.Float64)
3671	alam = alam.astype(N.Float64)
3672	arrayflag = 1
3673	else:
3674	frozen = N.array(-1.)
3675	alam = N.array(alam,N.Float64)
3676	mask = N.zeros(alam.shape)
3677	fac = 2.0 *N.ones(alam.shape,N.Float)
3678	sum = N.zeros(alam.shape,N.Float)
3679	termbf = N.zeros(alam.shape,N.Float)
3680	a2 = N.array(-2.0alamalam,N.Float64)
3681	totalelements = N.multiply.reduce(N.array(mask.shape))
3682	for j in range(1,201):
3683	if asum(mask) == totalelements:
3684	break
3685	exponents = (a2jj)
3686	overflowmask = N.less(exponents,-746)
3687	frozen = N.where(overflowmask,0,frozen)
3688	mask = mask+overflowmask
3689	term = fac*N.exp(exponents)
3690	sum = sum + term
3691	newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) +
3692	N.less(abs(term),1.0e-8*sum), 1, 0)
3693	frozen = N.where(newmask*N.equal(mask,0), sum, frozen)
3694	mask = N.clip(mask+newmask,0,1)
3695	fac = -fac
3696	termbf = abs(term)
3697	if arrayflag:
3698	return N.where(N.equal(frozen,-1), 1.0, frozen) # 1.0 if doesn't converge
3699	else:
3700	return N.where(N.equal(frozen,-1), 1.0, frozen)[0] # 1.0 if doesn't converge
3701
3702
3703	def afprob (dfnum, dfden, F):
3704	"""
3705	Returns the 1-tailed significance level (p-value) of an F statistic
3706	given the degrees of freedom for the numerator (dfR-dfF) and the degrees
3707	of freedom for the denominator (dfF). Can handle multiple dims for F.
3708
3709	Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
3710	"""
3711	if type(F) == N.ArrayType:
3712	return abetai(0.5dfden, 0.5dfnum, dfden/(1.0dfden+dfnumF))
3713	else:
3714	return abetai(0.5dfden, 0.5dfnum, dfden/float(dfden+dfnum*F))
3715
3716
3717	def abetacf(a,b,x,verbose=1):
3718	"""
3719	Evaluates the continued fraction form of the incomplete Beta function,
3720	betai. (Adapted from: Numerical Recipies in C.) Can handle multiple
3721	dimensions for x.
3722
3723	Usage: abetacf(a,b,x,verbose=1)
3724	"""
3725	ITMAX = 200
3726	EPS = 3.0e-7
3727
3728	arrayflag = 1
3729	if type(x) == N.ArrayType:
3730	frozen = N.ones(x.shape,N.Float) *-1 #start out w/ -1s, should replace all
3731	else:
3732	arrayflag = 0
3733	frozen = N.array([-1])
3734	x = N.array([x])
3735	mask = N.zeros(x.shape)
3736	bm = az = am = 1.0
3737	qab = a+b
3738	qap = a+1.0
3739	qam = a-1.0
3740	bz = 1.0-qab*x/qap
3741	for i in range(ITMAX+1):
3742	if N.sum(N.ravel(N.equal(frozen,-1)))==0:
3743	break
3744	em = float(i+1)
3745	tem = em + em
3746	d = em(b-em)x/((qam+tem)*(a+tem))
3747	ap = az + d*am
3748	bp = bz+d*bm
3749	d = -(a+em)(qab+em)x/((qap+tem)*(a+tem))
3750	app = ap+d*az
3751	bpp = bp+d*bz
3752	aold = az*1
3753	am = ap/bpp
3754	bm = bp/bpp
3755	az = app/bpp
3756	bz = 1.0
3757	newmask = N.less(abs(az-aold),EPS*abs(az))
3758	frozen = N.where(newmask*N.equal(mask,0), az, frozen)
3759	mask = N.clip(mask+newmask,0,1)
3760	noconverge = asum(N.equal(frozen,-1))
3761	if noconverge <> 0 and verbose:
3762	print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements'
3763	if arrayflag:
3764	return frozen
3765	else:
3766	return frozen[0]
3767
3768
3769	def agammln(xx):
3770	"""
3771	Returns the gamma function of xx.
3772	Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
3773	Adapted from: Numerical Recipies in C. Can handle multiple dims ... but
3774	probably doesn't normally have to.
3775
3776	Usage: agammln(xx)
3777	"""
3778	coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
3779	0.120858003e-2, -0.536382e-5]
3780	x = xx - 1.0
3781	tmp = x + 5.5
3782	tmp = tmp - (x+0.5)*N.log(tmp)
3783	ser = 1.0
3784	for j in range(len(coeff)):
3785	x = x + 1
3786	ser = ser + coeff[j]/x
3787	return -tmp + N.log(2.50662827465*ser)
3788
3789
3790	def abetai(a,b,x,verbose=1):
3791	"""
3792	Returns the incomplete beta function:
3793
3794	I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
3795
3796	where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
3797	function of a. The continued fraction formulation is implemented
3798	here, using the betacf function. (Adapted from: Numerical Recipies in
3799	C.) Can handle multiple dimensions.
3800
3801	Usage: abetai(a,b,x,verbose=1)
3802	"""
3803	TINY = 1e-15
3804	if type(a) == N.ArrayType:
3805	if asum(N.less(x,0)+N.greater(x,1)) <> 0:
3806	raise ValueError, 'Bad x in abetai'
3807	x = N.where(N.equal(x,0),TINY,x)
3808	x = N.where(N.equal(x,1.0),1-TINY,x)
3809
3810	bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1)
3811	exponents = ( gammln(a+b)-gammln(a)-gammln(b)+aN.log(x)+b
3812	N.log(1.0-x) )
3813	# 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW
3814	exponents = N.where(N.less(exponents,-740),-740,exponents)
3815	bt = N.exp(exponents)
3816	if type(x) == N.ArrayType:
3817	ans = N.where(N.less(x,(a+1)/(a+b+2.0)),
3818	bt*abetacf(a,b,x,verbose)/float(a),
3819	1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b))
3820	else:
3821	if x<(a+1)/(a+b+2.0):
3822	ans = bt*abetacf(a,b,x,verbose)/float(a)
3823	else:
3824	ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)
3825	return ans
3826
3827
3828	#####################################
3829	####### AANOVA CALCULATIONS #######
3830	#####################################
3831
3832	import LinearAlgebra, operator
3833	LA = LinearAlgebra
3834
3835	def aglm(data,para):
3836	"""
3837	Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
3838	from:
3839	Peterson et al. Statistical limitations in functional neuroimaging
3840	I. Non-inferential methods and statistical models. Phil Trans Royal Soc
3841	Lond B 354: 1239-1260.
3842
3843	Usage: aglm(data,para)
3844	Returns: statistic, p-value ???
3845	"""
3846	if len(para) <> len(data):
3847	print "data and para must be same length in aglm"
3848	return
3849	n = len(para)
3850	p = pstat.aunique(para)
3851	x = N.zeros((n,len(p))) # design matrix
3852	for l in range(len(p)):
3853	x[:,l] = N.equal(para,p[l])
3854	b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x),x)), # i.e., b=inv(X'X)X'Y
3855	N.transpose(x)),
3856	data)
3857	diffs = (data - N.dot(x,b))
3858	s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs)
3859
3860	if len(p) == 2: # ttest_ind
3861	c = N.array([1,-1])
3862	df = n-2
3863	fact = asum(1.0/asum(x,0)) # i.e., 1/n1 + 1/n2 + 1/n3 ...
3864	t = N.dot(c,b) / N.sqrt(s_sq*fact)
3865	probs = abetai(0.5df,0.5,float(df)/(df+tt))
3866	return t, probs
3867
3868
3869	def aF_oneway(*args):
3870	"""
3871	Performs a 1-way ANOVA, returning an F-value and probability given
3872	any number of groups. From Heiman, pp.394-7.
3873
3874	Usage: aF_oneway (args) where args is 2 or more arrays, one per
3875	treatment group
3876	Returns: f-value, probability
3877	"""
3878	na = len(args) # ANOVA on 'na' groups, each in it's own array
3879	means = [0]*na
3880	vars = [0]*na
3881	ns = [0]*na
3882	alldata = []
3883	tmp = map(N.array,args)
3884	means = map(amean,tmp)
3885	vars = map(avar,tmp)
3886	ns = map(len,args)
3887	alldata = N.concatenate(args)
3888	bign = len(alldata)
3889	sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
3890	ssbn = 0
3891	for a in args:
3892	ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a))
3893	ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
3894	sswn = sstot-ssbn
3895	dfbn = na-1
3896	dfwn = bign - na
3897	msb = ssbn/float(dfbn)
3898	msw = sswn/float(dfwn)
3899	f = msb/msw
3900	prob = fprob(dfbn,dfwn,f)
3901	return f, prob
3902
3903
3904	def aF_value (ER,EF,dfR,dfF):
3905	"""
3906	Returns an F-statistic given the following:
3907	ER = error associated with the null hypothesis (the Restricted model)
3908	EF = error associated with the alternate hypothesis (the Full model)
3909	dfR = degrees of freedom the Restricted model
3910	dfF = degrees of freedom associated with the Restricted model
3911	"""
3912	return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
3913
3914
3915	def outputfstats(Enum, Eden, dfnum, dfden, f, prob):
3916	Enum = round(Enum,3)
3917	Eden = round(Eden,3)
3918	dfnum = round(Enum,3)
3919	dfden = round(dfden,3)
3920	f = round(f,3)
3921	prob = round(prob,3)
3922	suffix = '' # for *s after the p-value
3923	if prob < 0.001: suffix = ' ***'
3924	elif prob < 0.01: suffix = ' **'
3925	elif prob < 0.05: suffix = ' *'
3926	title = [['EF/ER','DF','Mean Square','F-value','prob','']]
3927	lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix],
3928	[Eden, dfden, round(Eden/float(dfden),3),'','','']]
3929	pstat.printcc(lofl)
3930	return
3931
3932
3933	def F_value_multivariate(ER, EF, dfnum, dfden):
3934	"""
3935	Returns an F-statistic given the following:
3936	ER = error associated with the null hypothesis (the Restricted model)
3937	EF = error associated with the alternate hypothesis (the Full model)
3938	dfR = degrees of freedom the Restricted model
3939	dfF = degrees of freedom associated with the Restricted model
3940	where ER and EF are matrices from a multivariate F calculation.
3941	"""
3942	if type(ER) in [IntType, FloatType]:
3943	ER = N.array([[ER]])
3944	if type(EF) in [IntType, FloatType]:
3945	EF = N.array([[EF]])
3946	n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum)
3947	d_en = LA.determinant(EF) / float(dfden)
3948	return n_um / d_en
3949
3950
3951	#####################################
3952	####### ASUPPORT FUNCTIONS ########
3953	#####################################
3954
3955	def asign(a):
3956	"""
3957	Usage: asign(a)
3958	Returns: array shape of a, with -1 where a<0 and +1 where a>=0
3959	"""
3960	a = N.asarray(a)
3961	if ((type(a) == type(1.4)) or (type(a) == type(1))):
3962	return a-a-N.less(a,0)+N.greater(a,0)
3963	else:
3964	return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0)
3965
3966
3967	def asum (a, dimension=None,keepdims=0):
3968	"""
3969	An alternative to the Numeric.add.reduce function, which allows one to
3970	(1) collapse over multiple dimensions at once, and/or (2) to retain
3971	all dimensions in the original array (squashing one down to size.
3972	Dimension can equal None (ravel array first), an integer (the
3973	dimension over which to operate), or a sequence (operate over multiple
3974	dimensions). If keepdims=1, the resulting array will have as many
3975	dimensions as the input array.
3976
3977	Usage: asum(a, dimension=None, keepdims=0)
3978	Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
3979	"""
3980	if type(a) == N.ArrayType and a.typecode() in ['l','s','b']:
3981	a = a.astype(N.Float)
3982	if dimension == None:
3983	s = N.sum(N.ravel(a))
3984	elif type(dimension) in [IntType,FloatType]:
3985	s = N.add.reduce(a, dimension)
3986	if keepdims == 1:
3987	shp = list(a.shape)
3988	shp[dimension] = 1
3989	s = N.reshape(s,shp)
3990	else: # must be a SEQUENCE of dims to sum over
3991	dims = list(dimension)
3992	dims.sort()
3993	dims.reverse()
3994	s = a *1.0
3995	for dim in dims:
3996	s = N.add.reduce(s,dim)
3997	if keepdims == 1:
3998	shp = list(a.shape)
3999	for dim in dims:
4000	shp[dim] = 1
4001	s = N.reshape(s,shp)
4002	return s
4003
4004
4005	def acumsum (a,dimension=None):
4006	"""
4007	Returns an array consisting of the cumulative sum of the items in the
4008	passed array. Dimension can equal None (ravel array first), an
4009	integer (the dimension over which to operate), or a sequence (operate
4010	over multiple dimensions, but this last one just barely makes sense).
4011
4012	Usage: acumsum(a,dimension=None)
4013	"""
4014	if dimension == None:
4015	a = N.ravel(a)
4016	dimension = 0
4017	if type(dimension) in [ListType, TupleType, N.ArrayType]:
4018	dimension = list(dimension)
4019	dimension.sort()
4020	dimension.reverse()
4021	for d in dimension:
4022	a = N.add.accumulate(a,d)
4023	return a
4024	else:
4025	return N.add.accumulate(a,dimension)
4026
4027
4028	def ass(inarray, dimension=None, keepdims=0):
4029	"""
4030	Squares each value in the passed array, adds these squares & returns
4031	the result. Unfortunate function name. :-) Defaults to ALL values in
4032	the array. Dimension can equal None (ravel array first), an integer
4033	(the dimension over which to operate), or a sequence (operate over
4034	multiple dimensions). Set keepdims=1 to maintain the original number
4035	of dimensions.
4036
4037	Usage: ass(inarray, dimension=None, keepdims=0)
4038	Returns: sum-along-'dimension' for (inarray*inarray)
4039	"""
4040	if dimension == None:
4041	inarray = N.ravel(inarray)
4042	dimension = 0
4043	return asum(inarray*inarray,dimension,keepdims)
4044
4045
4046	def asummult (array1,array2,dimension=None,keepdims=0):
4047	"""
4048	Multiplies elements in array1 and array2, element by element, and
4049	returns the sum (along 'dimension') of all resulting multiplications.
4050	Dimension can equal None (ravel array first), an integer (the
4051	dimension over which to operate), or a sequence (operate over multiple
4052	dimensions). A trivial function, but included for completeness.
4053
4054	Usage: asummult(array1,array2,dimension=None,keepdims=0)
4055	"""
4056	if dimension == None:
4057	array1 = N.ravel(array1)
4058	array2 = N.ravel(array2)
4059	dimension = 0
4060	return asum(array1*array2,dimension,keepdims)
4061
4062
4063	def asquare_of_sums(inarray, dimension=None, keepdims=0):
4064	"""
4065	Adds the values in the passed array, squares that sum, and returns the
4066	result. Dimension can equal None (ravel array first), an integer (the
4067	dimension over which to operate), or a sequence (operate over multiple
4068	dimensions). If keepdims=1, the returned array will have the same
4069	NUMBER of dimensions as the original.
4070
4071	Usage: asquare_of_sums(inarray, dimension=None, keepdims=0)
4072	Returns: the square of the sum over dim(s) in dimension
4073	"""
4074	if dimension == None:
4075	inarray = N.ravel(inarray)
4076	dimension = 0
4077	s = asum(inarray,dimension,keepdims)
4078	if type(s) == N.ArrayType:
4079	return s.astype(N.Float)*s
4080	else:
4081	return float(s)*s
4082
4083
4084	def asumdiffsquared(a,b, dimension=None, keepdims=0):
4085	"""
4086	Takes pairwise differences of the values in arrays a and b, squares
4087	these differences, and returns the sum of these squares. Dimension
4088	can equal None (ravel array first), an integer (the dimension over
4089	which to operate), or a sequence (operate over multiple dimensions).
4090	keepdims=1 means the return shape = len(a.shape) = len(b.shape)
4091
4092	Usage: asumdiffsquared(a,b)
4093	Returns: sum[ravel(a-b)**2]
4094	"""
4095	if dimension == None:
4096	inarray = N.ravel(a)
4097	dimension = 0
4098	return asum((a-b)**2,dimension,keepdims)
4099
4100
4101	def ashellsort(inarray):
4102	"""
4103	Shellsort algorithm. Sorts a 1D-array.
4104
4105	Usage: ashellsort(inarray)
4106	Returns: sorted-inarray, sorting-index-vector (for original array)
4107	"""
4108	n = len(inarray)
4109	svec = inarray *1.0
4110	ivec = range(n)
4111	gap = n/2 # integer division needed
4112	while gap >0:
4113	for i in range(gap,n):
4114	for j in range(i-gap,-1,-gap):
4115	while j>=0 and svec[j]>svec[j+gap]:
4116	temp = svec[j]
4117	svec[j] = svec[j+gap]
4118	svec[j+gap] = temp
4119	itemp = ivec[j]
4120	ivec[j] = ivec[j+gap]
4121	ivec[j+gap] = itemp
4122	gap = gap / 2 # integer division needed
4123	# svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]]
4124	return svec, ivec
4125
4126
4127	def arankdata(inarray):
4128	"""
4129	Ranks the data in inarray, dealing with ties appropritely. Assumes
4130	a 1D inarray. Adapted from Gary Perlman's \|Stat ranksort.
4131
4132	Usage: arankdata(inarray)
4133	Returns: array of length equal to inarray, containing rank scores
4134	"""
4135	n = len(inarray)
4136	svec, ivec = ashellsort(inarray)
4137	sumranks = 0
4138	dupcount = 0
4139	newarray = N.zeros(n,N.Float)
4140	for i in range(n):
4141	sumranks = sumranks + i
4142	dupcount = dupcount + 1
4143	if i==n-1 or svec[i] <> svec[i+1]:
4144	averank = sumranks / float(dupcount) + 1
4145	for j in range(i-dupcount+1,i+1):
4146	newarray[ivec[j]] = averank
4147	sumranks = 0
4148	dupcount = 0
4149	return newarray
4150
4151
4152	def afindwithin(data):
4153	"""
4154	Returns a binary vector, 1=within-subject factor, 0=between. Input
4155	equals the entire data array (i.e., column 1=random factor, last
4156	column = measured values.
4157
4158	Usage: afindwithin(data) data in \|Stat format
4159	"""
4160	numfact = len(data[0])-2
4161	withinvec = [0]*numfact
4162	for col in range(1,numfact+1):
4163	rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0]) # get 1 level of this factor
4164	if len(pstat.unique(pstat.colex(rows,0))) < len(rows): # if fewer subjects than scores on this factor
4165	withinvec[col-1] = 1
4166	return withinvec
4167
4168
4169	#########################################################
4170	#########################################################
4171	###### RE-DEFINE DISPATCHES TO INCLUDE ARRAYS #########
4172	#########################################################
4173	#########################################################
4174
4175	## CENTRAL TENDENCY:
4176	geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)),
4177	(ageometricmean, (N.ArrayType,)) )
4178	harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)),
4179	(aharmonicmean, (N.ArrayType,)) )
4180	mean = Dispatch ( (lmean, (ListType, TupleType)),
4181	(amean, (N.ArrayType,)) )
4182	median = Dispatch ( (lmedian, (ListType, TupleType)),
4183	(amedian, (N.ArrayType,)) )
4184	medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)),
4185	(amedianscore, (N.ArrayType,)) )
4186	mode = Dispatch ( (lmode, (ListType, TupleType)),
4187	(amode, (N.ArrayType,)) )
4188	tmean = Dispatch ( (atmean, (N.ArrayType,)) )
4189	tvar = Dispatch ( (atvar, (N.ArrayType,)) )
4190	tstdev = Dispatch ( (atstdev, (N.ArrayType,)) )
4191	tsem = Dispatch ( (atsem, (N.ArrayType,)) )
4192
4193	## VARIATION:
4194	moment = Dispatch ( (lmoment, (ListType, TupleType)),
4195	(amoment, (N.ArrayType,)) )
4196	variation = Dispatch ( (lvariation, (ListType, TupleType)),
4197	(avariation, (N.ArrayType,)) )
4198	skew = Dispatch ( (lskew, (ListType, TupleType)),
4199	(askew, (N.ArrayType,)) )
4200	kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)),
4201	(akurtosis, (N.ArrayType,)) )
4202	describe = Dispatch ( (ldescribe, (ListType, TupleType)),
4203	(adescribe, (N.ArrayType,)) )
4204
4205	## DISTRIBUTION TESTS
4206
4207	skewtest = Dispatch ( (askewtest, (ListType, TupleType)),
4208	(askewtest, (N.ArrayType,)) )
4209	kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)),
4210	(akurtosistest, (N.ArrayType,)) )
4211	normaltest = Dispatch ( (anormaltest, (ListType, TupleType)),
4212	(anormaltest, (N.ArrayType,)) )
4213
4214	## FREQUENCY STATS:
4215	itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)),
4216	(aitemfreq, (N.ArrayType,)) )
4217	scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)),
4218	(ascoreatpercentile, (N.ArrayType,)) )
4219	percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)),
4220	(apercentileofscore, (N.ArrayType,)) )
4221	histogram = Dispatch ( (lhistogram, (ListType, TupleType)),
4222	(ahistogram, (N.ArrayType,)) )
4223	cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)),
4224	(acumfreq, (N.ArrayType,)) )
4225	relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)),
4226	(arelfreq, (N.ArrayType,)) )
4227
4228	## VARIABILITY:
4229	obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)),
4230	(aobrientransform, (N.ArrayType,)) )
4231	samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)),
4232	(asamplevar, (N.ArrayType,)) )
4233	samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)),
4234	(asamplestdev, (N.ArrayType,)) )
4235	signaltonoise = Dispatch( (asignaltonoise, (N.ArrayType,)),)
4236	var = Dispatch ( (lvar, (ListType, TupleType)),
4237	(avar, (N.ArrayType,)) )
4238	stdev = Dispatch ( (lstdev, (ListType, TupleType)),
4239	(astdev, (N.ArrayType,)) )
4240	sterr = Dispatch ( (lsterr, (ListType, TupleType)),
4241	(asterr, (N.ArrayType,)) )
4242	sem = Dispatch ( (lsem, (ListType, TupleType)),
4243	(asem, (N.ArrayType,)) )
4244	z = Dispatch ( (lz, (ListType, TupleType)),
4245	(az, (N.ArrayType,)) )
4246	zs = Dispatch ( (lzs, (ListType, TupleType)),
4247	(azs, (N.ArrayType,)) )
4248
4249	## TRIMMING FCNS:
4250	threshold = Dispatch( (athreshold, (N.ArrayType,)),)
4251	trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)),
4252	(atrimboth, (N.ArrayType,)) )
4253	trim1 = Dispatch ( (ltrim1, (ListType, TupleType)),
4254	(atrim1, (N.ArrayType,)) )
4255
4256	## CORRELATION FCNS:
4257	paired = Dispatch ( (lpaired, (ListType, TupleType)),
4258	(apaired, (N.ArrayType,)) )
4259	pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)),
4260	(apearsonr, (N.ArrayType,)) )
4261	spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)),
4262	(aspearmanr, (N.ArrayType,)) )
4263	pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)),
4264	(apointbiserialr, (N.ArrayType,)) )
4265	kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)),
4266	(akendalltau, (N.ArrayType,)) )
4267	linregress = Dispatch ( (llinregress, (ListType, TupleType)),
4268	(alinregress, (N.ArrayType,)) )
4269
4270	## INFERENTIAL STATS:
4271	ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)),
4272	(attest_1samp, (N.ArrayType,)) )
4273	ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)),
4274	(attest_ind, (N.ArrayType,)) )
4275	ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)),
4276	(attest_rel, (N.ArrayType,)) )
4277	chisquare = Dispatch ( (lchisquare, (ListType, TupleType)),
4278	(achisquare, (N.ArrayType,)) )
4279	ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)),
4280	(aks_2samp, (N.ArrayType,)) )
4281	mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)),
4282	(amannwhitneyu, (N.ArrayType,)) )
4283	tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)),
4284	(atiecorrect, (N.ArrayType,)) )
4285	ranksums = Dispatch ( (lranksums, (ListType, TupleType)),
4286	(aranksums, (N.ArrayType,)) )
4287	wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)),
4288	(awilcoxont, (N.ArrayType,)) )
4289	kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)),
4290	(akruskalwallish, (N.ArrayType,)) )
4291	friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)),
4292	(afriedmanchisquare, (N.ArrayType,)) )
4293
4294	## PROBABILITY CALCS:
4295	chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)),
4296	(achisqprob, (N.ArrayType,)) )
4297	zprob = Dispatch ( (lzprob, (IntType, FloatType)),
4298	(azprob, (N.ArrayType,)) )
4299	ksprob = Dispatch ( (lksprob, (IntType, FloatType)),
4300	(aksprob, (N.ArrayType,)) )
4301	fprob = Dispatch ( (lfprob, (IntType, FloatType)),
4302	(afprob, (N.ArrayType,)) )
4303	betacf = Dispatch ( (lbetacf, (IntType, FloatType)),
4304	(abetacf, (N.ArrayType,)) )
4305	betai = Dispatch ( (lbetai, (IntType, FloatType)),
4306	(abetai, (N.ArrayType,)) )
4307	erfcc = Dispatch ( (lerfcc, (IntType, FloatType)),
4308	(aerfcc, (N.ArrayType,)) )
4309	gammln = Dispatch ( (lgammln, (IntType, FloatType)),
4310	(agammln, (N.ArrayType,)) )
4311
4312	## ANOVA FUNCTIONS:
4313	F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)),
4314	(aF_oneway, (N.ArrayType,)) )
4315	F_value = Dispatch ( (lF_value, (ListType, TupleType)),
4316	(aF_value, (N.ArrayType,)) )
4317
4318	## SUPPORT FUNCTIONS:
4319	incr = Dispatch ( (lincr, (ListType, TupleType, N.ArrayType)), )
4320	sum = Dispatch ( (lsum, (ListType, TupleType)),
4321	(asum, (N.ArrayType,)) )
4322	cumsum = Dispatch ( (lcumsum, (ListType, TupleType)),
4323	(acumsum, (N.ArrayType,)) )
4324	ss = Dispatch ( (lss, (ListType, TupleType)),
4325	(ass, (N.ArrayType,)) )
4326	summult = Dispatch ( (lsummult, (ListType, TupleType)),
4327	(asummult, (N.ArrayType,)) )
4328	square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)),
4329	(asquare_of_sums, (N.ArrayType,)) )
4330	sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)),
4331	(asumdiffsquared, (N.ArrayType,)) )
4332	shellsort = Dispatch ( (lshellsort, (ListType, TupleType)),
4333	(ashellsort, (N.ArrayType,)) )
4334	rankdata = Dispatch ( (lrankdata, (ListType, TupleType)),
4335	(arankdata, (N.ArrayType,)) )
4336	findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)),
4337	(afindwithin, (N.ArrayType,)) )
4338
4339	###################### END OF NUMERIC FUNCTION BLOCK #####################
4340
4341	###################### END OF STATISTICAL FUNCTIONS ######################
4342
4343	except ImportError:
4344	pass

Note: リポジトリブラウザについてのヘルプは TracBrowser を参照してください。

Context Navigation

root/galaxy-central/eggs/bx_python-0.5.0_dev_f74aec067563-py2.6-macosx-10.6-universal-ucs2.egg/bx_extras/stats.py

異なるフォーマットでダウンロード: