| 1 | # Copyright (c) 1999-2002 Gary Strangman; All Rights Reserved. |
|---|
| 2 | # |
|---|
| 3 | # This software is distributable under the terms of the GNU |
|---|
| 4 | # General Public License (GPL) v2, the text of which can be found at |
|---|
| 5 | # http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise |
|---|
| 6 | # using this module constitutes acceptance of the terms of this License. |
|---|
| 7 | # |
|---|
| 8 | # Disclaimer |
|---|
| 9 | # |
|---|
| 10 | # This software is provided "as-is". There are no expressed or implied |
|---|
| 11 | # warranties of any kind, including, but not limited to, the warranties |
|---|
| 12 | # of merchantability and fittness for a given application. In no event |
|---|
| 13 | # shall Gary Strangman be liable for any direct, indirect, incidental, |
|---|
| 14 | # special, exemplary or consequential damages (including, but not limited |
|---|
| 15 | # to, loss of use, data or profits, or business interruption) however |
|---|
| 16 | # caused and on any theory of liability, whether in contract, strict |
|---|
| 17 | # liability or tort (including negligence or otherwise) arising in any way |
|---|
| 18 | # out of the use of this software, even if advised of the possibility of |
|---|
| 19 | # such damage. |
|---|
| 20 | # |
|---|
| 21 | # Comments and/or additions are welcome (send e-mail to: |
|---|
| 22 | # strang@nmr.mgh.harvard.edu). |
|---|
| 23 | # |
|---|
| 24 | """ |
|---|
| 25 | stats.py module |
|---|
| 26 | |
|---|
| 27 | (Requires pstat.py module.) |
|---|
| 28 | |
|---|
| 29 | ################################################# |
|---|
| 30 | ####### Written by: Gary Strangman ########### |
|---|
| 31 | ####### Last modified: May 10, 2002 ########### |
|---|
| 32 | ################################################# |
|---|
| 33 | |
|---|
| 34 | A collection of basic statistical functions for python. The function |
|---|
| 35 | names appear below. |
|---|
| 36 | |
|---|
| 37 | IMPORTANT: There are really *3* sets of functions. The first set has an 'l' |
|---|
| 38 | prefix, which can be used with list or tuple arguments. The second set has |
|---|
| 39 | an 'a' prefix, which can accept NumPy array arguments. These latter |
|---|
| 40 | functions are defined only when NumPy is available on the system. The third |
|---|
| 41 | type has NO prefix (i.e., has the name that appears below). Functions of |
|---|
| 42 | this set are members of a "Dispatch" class, c/o David Ascher. This class |
|---|
| 43 | allows different functions to be called depending on the type of the passed |
|---|
| 44 | arguments. Thus, stats.mean is a member of the Dispatch class and |
|---|
| 45 | stats.mean(range(20)) will call stats.lmean(range(20)) while |
|---|
| 46 | stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)). |
|---|
| 47 | This is a handy way to keep consistent function names when different |
|---|
| 48 | argument types require different functions to be called. Having |
|---|
| 49 | implementated the Dispatch class, however, means that to get info on |
|---|
| 50 | a given function, you must use the REAL function name ... that is |
|---|
| 51 | "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine, |
|---|
| 52 | while "print stats.mean.__doc__" will print the doc for the Dispatch |
|---|
| 53 | class. NUMPY FUNCTIONS ('a' prefix) generally have more argument options |
|---|
| 54 | but should otherwise be consistent with the corresponding list functions. |
|---|
| 55 | |
|---|
| 56 | Disclaimers: The function list is obviously incomplete and, worse, the |
|---|
| 57 | functions are not optimized. All functions have been tested (some more |
|---|
| 58 | so than others), but they are far from bulletproof. Thus, as with any |
|---|
| 59 | free software, no warranty or guarantee is expressed or implied. :-) A |
|---|
| 60 | few extra functions that don't appear in the list below can be found by |
|---|
| 61 | interested treasure-hunters. These functions don't necessarily have |
|---|
| 62 | both list and array versions but were deemed useful |
|---|
| 63 | |
|---|
| 64 | CENTRAL TENDENCY: geometricmean |
|---|
| 65 | harmonicmean |
|---|
| 66 | mean |
|---|
| 67 | median |
|---|
| 68 | medianscore |
|---|
| 69 | mode |
|---|
| 70 | |
|---|
| 71 | MOMENTS: moment |
|---|
| 72 | variation |
|---|
| 73 | skew |
|---|
| 74 | kurtosis |
|---|
| 75 | skewtest (for Numpy arrays only) |
|---|
| 76 | kurtosistest (for Numpy arrays only) |
|---|
| 77 | normaltest (for Numpy arrays only) |
|---|
| 78 | |
|---|
| 79 | ALTERED VERSIONS: tmean (for Numpy arrays only) |
|---|
| 80 | tvar (for Numpy arrays only) |
|---|
| 81 | tmin (for Numpy arrays only) |
|---|
| 82 | tmax (for Numpy arrays only) |
|---|
| 83 | tstdev (for Numpy arrays only) |
|---|
| 84 | tsem (for Numpy arrays only) |
|---|
| 85 | describe |
|---|
| 86 | |
|---|
| 87 | FREQUENCY STATS: itemfreq |
|---|
| 88 | scoreatpercentile |
|---|
| 89 | percentileofscore |
|---|
| 90 | histogram |
|---|
| 91 | cumfreq |
|---|
| 92 | relfreq |
|---|
| 93 | |
|---|
| 94 | VARIABILITY: obrientransform |
|---|
| 95 | samplevar |
|---|
| 96 | samplestdev |
|---|
| 97 | signaltonoise (for Numpy arrays only) |
|---|
| 98 | var |
|---|
| 99 | stdev |
|---|
| 100 | sterr |
|---|
| 101 | sem |
|---|
| 102 | z |
|---|
| 103 | zs |
|---|
| 104 | zmap (for Numpy arrays only) |
|---|
| 105 | |
|---|
| 106 | TRIMMING FCNS: threshold (for Numpy arrays only) |
|---|
| 107 | trimboth |
|---|
| 108 | trim1 |
|---|
| 109 | round (round all vals to 'n' decimals; Numpy only) |
|---|
| 110 | |
|---|
| 111 | CORRELATION FCNS: covariance (for Numpy arrays only) |
|---|
| 112 | correlation (for Numpy arrays only) |
|---|
| 113 | paired |
|---|
| 114 | pearsonr |
|---|
| 115 | spearmanr |
|---|
| 116 | pointbiserialr |
|---|
| 117 | kendalltau |
|---|
| 118 | linregress |
|---|
| 119 | |
|---|
| 120 | INFERENTIAL STATS: ttest_1samp |
|---|
| 121 | ttest_ind |
|---|
| 122 | ttest_rel |
|---|
| 123 | chisquare |
|---|
| 124 | ks_2samp |
|---|
| 125 | mannwhitneyu |
|---|
| 126 | ranksums |
|---|
| 127 | wilcoxont |
|---|
| 128 | kruskalwallish |
|---|
| 129 | friedmanchisquare |
|---|
| 130 | |
|---|
| 131 | PROBABILITY CALCS: chisqprob |
|---|
| 132 | erfcc |
|---|
| 133 | zprob |
|---|
| 134 | ksprob |
|---|
| 135 | fprob |
|---|
| 136 | betacf |
|---|
| 137 | gammln |
|---|
| 138 | betai |
|---|
| 139 | |
|---|
| 140 | ANOVA FUNCTIONS: F_oneway |
|---|
| 141 | F_value |
|---|
| 142 | |
|---|
| 143 | SUPPORT FUNCTIONS: writecc |
|---|
| 144 | incr |
|---|
| 145 | sign (for Numpy arrays only) |
|---|
| 146 | sum |
|---|
| 147 | cumsum |
|---|
| 148 | ss |
|---|
| 149 | summult |
|---|
| 150 | sumdiffsquared |
|---|
| 151 | square_of_sums |
|---|
| 152 | shellsort |
|---|
| 153 | rankdata |
|---|
| 154 | outputpairedstats |
|---|
| 155 | findwithin |
|---|
| 156 | """ |
|---|
| 157 | ## CHANGE LOG: |
|---|
| 158 | ## =========== |
|---|
| 159 | ## 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows |
|---|
| 160 | ## 02-05-10 ... fixed lchisqprob indentation (failed when df=even) |
|---|
| 161 | ## 00-12-28 ... removed aanova() to separate module, fixed licensing to |
|---|
| 162 | ## match Python License, fixed doc string & imports |
|---|
| 163 | ## 00-04-13 ... pulled all "global" statements, except from aanova() |
|---|
| 164 | ## added/fixed lots of documentation, removed io.py dependency |
|---|
| 165 | ## changed to version 0.5 |
|---|
| 166 | ## 99-11-13 ... added asign() function |
|---|
| 167 | ## 99-11-01 ... changed version to 0.4 ... enough incremental changes now |
|---|
| 168 | ## 99-10-25 ... added acovariance and acorrelation functions |
|---|
| 169 | ## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors |
|---|
| 170 | ## added aglm function (crude, but will be improved) |
|---|
| 171 | ## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to |
|---|
| 172 | ## all handle lists of 'dimension's and keepdims |
|---|
| 173 | ## REMOVED ar0, ar2, ar3, ar4 and replaced them with around |
|---|
| 174 | ## reinserted fixes for abetai to avoid math overflows |
|---|
| 175 | ## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to |
|---|
| 176 | ## handle multi-dimensional arrays (whew!) |
|---|
| 177 | ## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990) |
|---|
| 178 | ## added anormaltest per same reference |
|---|
| 179 | ## re-wrote azprob to calc arrays of probs all at once |
|---|
| 180 | ## 99-08-22 ... edited attest_ind printing section so arrays could be rounded |
|---|
| 181 | ## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on |
|---|
| 182 | ## short/byte arrays (mean of #s btw 100-300 = -150??) |
|---|
| 183 | ## 99-08-09 ... fixed asum so that the None case works for Byte arrays |
|---|
| 184 | ## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays |
|---|
| 185 | ## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap) |
|---|
| 186 | ## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0]) |
|---|
| 187 | ## 04/11/99 ... added asignaltonoise, athreshold functions, changed all |
|---|
| 188 | ## max/min in array section to N.maximum/N.minimum, |
|---|
| 189 | ## fixed square_of_sums to prevent integer overflow |
|---|
| 190 | ## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums |
|---|
| 191 | ## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions |
|---|
| 192 | ## 02/28/99 ... Fixed aobrientransform to return an array rather than a list |
|---|
| 193 | ## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!) |
|---|
| 194 | ## 01/13/99 ... CHANGED TO VERSION 0.3 |
|---|
| 195 | ## fixed bug in a/lmannwhitneyu p-value calculation |
|---|
| 196 | ## 12/31/98 ... fixed variable-name bug in ldescribe |
|---|
| 197 | ## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix) |
|---|
| 198 | ## 12/16/98 ... changed amedianscore to return float (not array) for 1 score |
|---|
| 199 | ## 12/14/98 ... added atmin and atmax functions |
|---|
| 200 | ## removed umath from import line (not needed) |
|---|
| 201 | ## l/ageometricmean modified to reduce chance of overflows (take |
|---|
| 202 | ## nth root first, then multiply) |
|---|
| 203 | ## 12/07/98 ... added __version__variable (now 0.2) |
|---|
| 204 | ## removed all 'stats.' from anova() fcn |
|---|
| 205 | ## 12/06/98 ... changed those functions (except shellsort) that altered |
|---|
| 206 | ## arguments in-place ... cumsum, ranksort, ... |
|---|
| 207 | ## updated (and fixed some) doc-strings |
|---|
| 208 | ## 12/01/98 ... added anova() function (requires NumPy) |
|---|
| 209 | ## incorporated Dispatch class |
|---|
| 210 | ## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean |
|---|
| 211 | ## added 'asum' function (added functionality to N.add.reduce) |
|---|
| 212 | ## fixed both moment and amoment (two errors) |
|---|
| 213 | ## changed name of skewness and askewness to skew and askew |
|---|
| 214 | ## fixed (a)histogram (which sometimes counted points <lowerlimit) |
|---|
| 215 | |
|---|
| 216 | import pstat # required 3rd party module |
|---|
| 217 | import math, string, copy # required python modules |
|---|
| 218 | from types import * |
|---|
| 219 | |
|---|
| 220 | __version__ = 0.6 |
|---|
| 221 | |
|---|
| 222 | ############# DISPATCH CODE ############## |
|---|
| 223 | |
|---|
| 224 | |
|---|
| 225 | class Dispatch: |
|---|
| 226 | """ |
|---|
| 227 | The Dispatch class, care of David Ascher, allows different functions to |
|---|
| 228 | be called depending on the argument types. This way, there can be one |
|---|
| 229 | function name regardless of the argument type. To access function doc |
|---|
| 230 | in stats.py module, prefix the function with an 'l' or 'a' for list or |
|---|
| 231 | array arguments, respectively. That is, print stats.lmean.__doc__ or |
|---|
| 232 | print stats.amean.__doc__ or whatever. |
|---|
| 233 | """ |
|---|
| 234 | |
|---|
| 235 | def __init__(self, *tuples): |
|---|
| 236 | self._dispatch = {} |
|---|
| 237 | for func, types in tuples: |
|---|
| 238 | for t in types: |
|---|
| 239 | if t in self._dispatch.keys(): |
|---|
| 240 | raise ValueError, "can't have two dispatches on "+str(t) |
|---|
| 241 | self._dispatch[t] = func |
|---|
| 242 | self._types = self._dispatch.keys() |
|---|
| 243 | |
|---|
| 244 | def __call__(self, arg1, *args, **kw): |
|---|
| 245 | if type(arg1) not in self._types: |
|---|
| 246 | raise TypeError, "don't know how to dispatch %s arguments" % type(arg1) |
|---|
| 247 | return apply(self._dispatch[type(arg1)], (arg1,) + args, kw) |
|---|
| 248 | |
|---|
| 249 | |
|---|
| 250 | ########################################################################## |
|---|
| 251 | ######################## LIST-BASED FUNCTIONS ######################## |
|---|
| 252 | ########################################################################## |
|---|
| 253 | |
|---|
| 254 | ### Define these regardless |
|---|
| 255 | |
|---|
| 256 | #################################### |
|---|
| 257 | ####### CENTRAL TENDENCY ######### |
|---|
| 258 | #################################### |
|---|
| 259 | |
|---|
| 260 | def lgeometricmean (inlist): |
|---|
| 261 | """ |
|---|
| 262 | Calculates the geometric mean of the values in the passed list. |
|---|
| 263 | That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list. |
|---|
| 264 | |
|---|
| 265 | Usage: lgeometricmean(inlist) |
|---|
| 266 | """ |
|---|
| 267 | mult = 1.0 |
|---|
| 268 | one_over_n = 1.0/len(inlist) |
|---|
| 269 | for item in inlist: |
|---|
| 270 | mult = mult * pow(item,one_over_n) |
|---|
| 271 | return mult |
|---|
| 272 | |
|---|
| 273 | |
|---|
| 274 | def lharmonicmean (inlist): |
|---|
| 275 | """ |
|---|
| 276 | Calculates the harmonic mean of the values in the passed list. |
|---|
| 277 | That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list. |
|---|
| 278 | |
|---|
| 279 | Usage: lharmonicmean(inlist) |
|---|
| 280 | """ |
|---|
| 281 | sum = 0 |
|---|
| 282 | for item in inlist: |
|---|
| 283 | sum = sum + 1.0/item |
|---|
| 284 | return len(inlist) / sum |
|---|
| 285 | |
|---|
| 286 | |
|---|
| 287 | def lmean (inlist): |
|---|
| 288 | """ |
|---|
| 289 | Returns the arithematic mean of the values in the passed list. |
|---|
| 290 | Assumes a '1D' list, but will function on the 1st dim of an array(!). |
|---|
| 291 | |
|---|
| 292 | Usage: lmean(inlist) |
|---|
| 293 | """ |
|---|
| 294 | sum = 0 |
|---|
| 295 | for item in inlist: |
|---|
| 296 | sum = sum + item |
|---|
| 297 | return sum/float(len(inlist)) |
|---|
| 298 | |
|---|
| 299 | |
|---|
| 300 | def lmedian (inlist,numbins=1000): |
|---|
| 301 | """ |
|---|
| 302 | Returns the computed median value of a list of numbers, given the |
|---|
| 303 | number of bins to use for the histogram (more bins brings the computed value |
|---|
| 304 | closer to the median score, default number of bins = 1000). See G.W. |
|---|
| 305 | Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics. |
|---|
| 306 | |
|---|
| 307 | Usage: lmedian (inlist, numbins=1000) |
|---|
| 308 | """ |
|---|
| 309 | (hist, smallest, binsize, extras) = histogram(inlist,numbins) # make histog |
|---|
| 310 | cumhist = cumsum(hist) # make cumulative histogram |
|---|
| 311 | for i in range(len(cumhist)): # get 1st(!) index holding 50%ile score |
|---|
| 312 | if cumhist[i]>=len(inlist)/2.0: |
|---|
| 313 | cfbin = i |
|---|
| 314 | break |
|---|
| 315 | LRL = smallest + binsize*cfbin # get lower read limit of that bin |
|---|
| 316 | cfbelow = cumhist[cfbin-1] |
|---|
| 317 | freq = float(hist[cfbin]) # frequency IN the 50%ile bin |
|---|
| 318 | median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize # median formula |
|---|
| 319 | return median |
|---|
| 320 | |
|---|
| 321 | |
|---|
| 322 | def lmedianscore (inlist): |
|---|
| 323 | """ |
|---|
| 324 | Returns the 'middle' score of the passed list. If there is an even |
|---|
| 325 | number of scores, the mean of the 2 middle scores is returned. |
|---|
| 326 | |
|---|
| 327 | Usage: lmedianscore(inlist) |
|---|
| 328 | """ |
|---|
| 329 | |
|---|
| 330 | newlist = copy.deepcopy(inlist) |
|---|
| 331 | newlist.sort() |
|---|
| 332 | if len(newlist) % 2 == 0: # if even number of scores, average middle 2 |
|---|
| 333 | index = len(newlist)/2 # integer division correct |
|---|
| 334 | median = float(newlist[index] + newlist[index-1]) /2 |
|---|
| 335 | else: |
|---|
| 336 | index = len(newlist)/2 # int divsion gives mid value when count from 0 |
|---|
| 337 | median = newlist[index] |
|---|
| 338 | return median |
|---|
| 339 | |
|---|
| 340 | |
|---|
| 341 | def lmode(inlist): |
|---|
| 342 | """ |
|---|
| 343 | Returns a list of the modal (most common) score(s) in the passed |
|---|
| 344 | list. If there is more than one such score, all are returned. The |
|---|
| 345 | bin-count for the mode(s) is also returned. |
|---|
| 346 | |
|---|
| 347 | Usage: lmode(inlist) |
|---|
| 348 | Returns: bin-count for mode(s), a list of modal value(s) |
|---|
| 349 | """ |
|---|
| 350 | |
|---|
| 351 | scores = pstat.unique(inlist) |
|---|
| 352 | scores.sort() |
|---|
| 353 | freq = [] |
|---|
| 354 | for item in scores: |
|---|
| 355 | freq.append(inlist.count(item)) |
|---|
| 356 | maxfreq = max(freq) |
|---|
| 357 | mode = [] |
|---|
| 358 | stillmore = 1 |
|---|
| 359 | while stillmore: |
|---|
| 360 | try: |
|---|
| 361 | indx = freq.index(maxfreq) |
|---|
| 362 | mode.append(scores[indx]) |
|---|
| 363 | del freq[indx] |
|---|
| 364 | del scores[indx] |
|---|
| 365 | except ValueError: |
|---|
| 366 | stillmore=0 |
|---|
| 367 | return maxfreq, mode |
|---|
| 368 | |
|---|
| 369 | |
|---|
| 370 | #################################### |
|---|
| 371 | ############ MOMENTS ############# |
|---|
| 372 | #################################### |
|---|
| 373 | |
|---|
| 374 | def lmoment(inlist,moment=1): |
|---|
| 375 | """ |
|---|
| 376 | Calculates the nth moment about the mean for a sample (defaults to |
|---|
| 377 | the 1st moment). Used to calculate coefficients of skewness and kurtosis. |
|---|
| 378 | |
|---|
| 379 | Usage: lmoment(inlist,moment=1) |
|---|
| 380 | Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r) |
|---|
| 381 | """ |
|---|
| 382 | if moment == 1: |
|---|
| 383 | return 0.0 |
|---|
| 384 | else: |
|---|
| 385 | mn = mean(inlist) |
|---|
| 386 | n = len(inlist) |
|---|
| 387 | s = 0 |
|---|
| 388 | for x in inlist: |
|---|
| 389 | s = s + (x-mn)**moment |
|---|
| 390 | return s/float(n) |
|---|
| 391 | |
|---|
| 392 | |
|---|
| 393 | def lvariation(inlist): |
|---|
| 394 | """ |
|---|
| 395 | Returns the coefficient of variation, as defined in CRC Standard |
|---|
| 396 | Probability and Statistics, p.6. |
|---|
| 397 | |
|---|
| 398 | Usage: lvariation(inlist) |
|---|
| 399 | """ |
|---|
| 400 | return 100.0*samplestdev(inlist)/float(mean(inlist)) |
|---|
| 401 | |
|---|
| 402 | |
|---|
| 403 | def lskew(inlist): |
|---|
| 404 | """ |
|---|
| 405 | Returns the skewness of a distribution, as defined in Numerical |
|---|
| 406 | Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) |
|---|
| 407 | |
|---|
| 408 | Usage: lskew(inlist) |
|---|
| 409 | """ |
|---|
| 410 | return moment(inlist,3)/pow(moment(inlist,2),1.5) |
|---|
| 411 | |
|---|
| 412 | |
|---|
| 413 | def lkurtosis(inlist): |
|---|
| 414 | """ |
|---|
| 415 | Returns the kurtosis of a distribution, as defined in Numerical |
|---|
| 416 | Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) |
|---|
| 417 | |
|---|
| 418 | Usage: lkurtosis(inlist) |
|---|
| 419 | """ |
|---|
| 420 | return moment(inlist,4)/pow(moment(inlist,2),2.0) |
|---|
| 421 | |
|---|
| 422 | |
|---|
| 423 | def ldescribe(inlist): |
|---|
| 424 | """ |
|---|
| 425 | Returns some descriptive statistics of the passed list (assumed to be 1D). |
|---|
| 426 | |
|---|
| 427 | Usage: ldescribe(inlist) |
|---|
| 428 | Returns: n, mean, standard deviation, skew, kurtosis |
|---|
| 429 | """ |
|---|
| 430 | n = len(inlist) |
|---|
| 431 | mm = (min(inlist),max(inlist)) |
|---|
| 432 | m = mean(inlist) |
|---|
| 433 | sd = stdev(inlist) |
|---|
| 434 | sk = skew(inlist) |
|---|
| 435 | kurt = kurtosis(inlist) |
|---|
| 436 | return n, mm, m, sd, sk, kurt |
|---|
| 437 | |
|---|
| 438 | |
|---|
| 439 | #################################### |
|---|
| 440 | ####### FREQUENCY STATS ########## |
|---|
| 441 | #################################### |
|---|
| 442 | |
|---|
| 443 | def litemfreq(inlist): |
|---|
| 444 | """ |
|---|
| 445 | Returns a list of pairs. Each pair consists of one of the scores in inlist |
|---|
| 446 | and it's frequency count. Assumes a 1D list is passed. |
|---|
| 447 | |
|---|
| 448 | Usage: litemfreq(inlist) |
|---|
| 449 | Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) |
|---|
| 450 | """ |
|---|
| 451 | scores = pstat.unique(inlist) |
|---|
| 452 | scores.sort() |
|---|
| 453 | freq = [] |
|---|
| 454 | for item in scores: |
|---|
| 455 | freq.append(inlist.count(item)) |
|---|
| 456 | return pstat.abut(scores, freq) |
|---|
| 457 | |
|---|
| 458 | |
|---|
| 459 | def lscoreatpercentile (inlist, percent): |
|---|
| 460 | """ |
|---|
| 461 | Returns the score at a given percentile relative to the distribution |
|---|
| 462 | given by inlist. |
|---|
| 463 | |
|---|
| 464 | Usage: lscoreatpercentile(inlist,percent) |
|---|
| 465 | """ |
|---|
| 466 | if percent > 1: |
|---|
| 467 | print "\nDividing percent>1 by 100 in lscoreatpercentile().\n" |
|---|
| 468 | percent = percent / 100.0 |
|---|
| 469 | targetcf = percent*len(inlist) |
|---|
| 470 | h, lrl, binsize, extras = histogram(inlist) |
|---|
| 471 | cumhist = cumsum(copy.deepcopy(h)) |
|---|
| 472 | for i in range(len(cumhist)): |
|---|
| 473 | if cumhist[i] >= targetcf: |
|---|
| 474 | break |
|---|
| 475 | score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) |
|---|
| 476 | return score |
|---|
| 477 | |
|---|
| 478 | |
|---|
| 479 | def lpercentileofscore (inlist, score,histbins=10,defaultlimits=None): |
|---|
| 480 | """ |
|---|
| 481 | Returns the percentile value of a score relative to the distribution |
|---|
| 482 | given by inlist. Formula depends on the values used to histogram the data(!). |
|---|
| 483 | |
|---|
| 484 | Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None) |
|---|
| 485 | """ |
|---|
| 486 | |
|---|
| 487 | h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits) |
|---|
| 488 | cumhist = cumsum(copy.deepcopy(h)) |
|---|
| 489 | i = int((score - lrl)/float(binsize)) |
|---|
| 490 | pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100 |
|---|
| 491 | return pct |
|---|
| 492 | |
|---|
| 493 | |
|---|
| 494 | def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0): |
|---|
| 495 | """ |
|---|
| 496 | Returns (i) a list of histogram bin counts, (ii) the smallest value |
|---|
| 497 | of the histogram binning, and (iii) the bin width (the last 2 are not |
|---|
| 498 | necessarily integers). Default number of bins is 10. If no sequence object |
|---|
| 499 | is given for defaultreallimits, the routine picks (usually non-pretty) bins |
|---|
| 500 | spanning all the numbers in the inlist. |
|---|
| 501 | |
|---|
| 502 | Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0) |
|---|
| 503 | Returns: list of bin values, lowerreallimit, binsize, extrapoints |
|---|
| 504 | """ |
|---|
| 505 | if (defaultreallimits <> None): |
|---|
| 506 | if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1: # only one limit given, assumed to be lower one & upper is calc'd |
|---|
| 507 | lowerreallimit = defaultreallimits |
|---|
| 508 | upperreallimit = 1.0001 * max(inlist) |
|---|
| 509 | else: # assume both limits given |
|---|
| 510 | lowerreallimit = defaultreallimits[0] |
|---|
| 511 | upperreallimit = defaultreallimits[1] |
|---|
| 512 | binsize = (upperreallimit-lowerreallimit)/float(numbins) |
|---|
| 513 | else: # no limits given for histogram, both must be calc'd |
|---|
| 514 | estbinwidth=(max(inlist)-min(inlist))/float(numbins) + 1 # 1=>cover all |
|---|
| 515 | binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins) |
|---|
| 516 | lowerreallimit = min(inlist) - binsize/2 #lower real limit,1st bin |
|---|
| 517 | bins = [0]*(numbins) |
|---|
| 518 | extrapoints = 0 |
|---|
| 519 | for num in inlist: |
|---|
| 520 | try: |
|---|
| 521 | if (num-lowerreallimit) < 0: |
|---|
| 522 | extrapoints = extrapoints + 1 |
|---|
| 523 | else: |
|---|
| 524 | bintoincrement = int((num-lowerreallimit)/float(binsize)) |
|---|
| 525 | bins[bintoincrement] = bins[bintoincrement] + 1 |
|---|
| 526 | except: |
|---|
| 527 | extrapoints = extrapoints + 1 |
|---|
| 528 | if (extrapoints > 0 and printextras == 1): |
|---|
| 529 | print '\nPoints outside given histogram range =',extrapoints |
|---|
| 530 | return (bins, lowerreallimit, binsize, extrapoints) |
|---|
| 531 | |
|---|
| 532 | |
|---|
| 533 | def lcumfreq(inlist,numbins=10,defaultreallimits=None): |
|---|
| 534 | """ |
|---|
| 535 | Returns a cumulative frequency histogram, using the histogram function. |
|---|
| 536 | |
|---|
| 537 | Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None) |
|---|
| 538 | Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints |
|---|
| 539 | """ |
|---|
| 540 | h,l,b,e = histogram(inlist,numbins,defaultreallimits) |
|---|
| 541 | cumhist = cumsum(copy.deepcopy(h)) |
|---|
| 542 | return cumhist,l,b,e |
|---|
| 543 | |
|---|
| 544 | |
|---|
| 545 | def lrelfreq(inlist,numbins=10,defaultreallimits=None): |
|---|
| 546 | """ |
|---|
| 547 | Returns a relative frequency histogram, using the histogram function. |
|---|
| 548 | |
|---|
| 549 | Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None) |
|---|
| 550 | Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints |
|---|
| 551 | """ |
|---|
| 552 | h,l,b,e = histogram(inlist,numbins,defaultreallimits) |
|---|
| 553 | for i in range(len(h)): |
|---|
| 554 | h[i] = h[i]/float(len(inlist)) |
|---|
| 555 | return h,l,b,e |
|---|
| 556 | |
|---|
| 557 | |
|---|
| 558 | #################################### |
|---|
| 559 | ##### VARIABILITY FUNCTIONS ###### |
|---|
| 560 | #################################### |
|---|
| 561 | |
|---|
| 562 | def lobrientransform(*args): |
|---|
| 563 | """ |
|---|
| 564 | Computes a transform on input data (any number of columns). Used to |
|---|
| 565 | test for homogeneity of variance prior to running one-way stats. From |
|---|
| 566 | Maxwell and Delaney, p.112. |
|---|
| 567 | |
|---|
| 568 | Usage: lobrientransform(*args) |
|---|
| 569 | Returns: transformed data for use in an ANOVA |
|---|
| 570 | """ |
|---|
| 571 | TINY = 1e-10 |
|---|
| 572 | k = len(args) |
|---|
| 573 | n = [0.0]*k |
|---|
| 574 | v = [0.0]*k |
|---|
| 575 | m = [0.0]*k |
|---|
| 576 | nargs = [] |
|---|
| 577 | for i in range(k): |
|---|
| 578 | nargs.append(copy.deepcopy(args[i])) |
|---|
| 579 | n[i] = float(len(nargs[i])) |
|---|
| 580 | v[i] = var(nargs[i]) |
|---|
| 581 | m[i] = mean(nargs[i]) |
|---|
| 582 | for j in range(k): |
|---|
| 583 | for i in range(n[j]): |
|---|
| 584 | t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 |
|---|
| 585 | t2 = 0.5*v[j]*(n[j]-1.0) |
|---|
| 586 | t3 = (n[j]-1.0)*(n[j]-2.0) |
|---|
| 587 | nargs[j][i] = (t1-t2) / float(t3) |
|---|
| 588 | check = 1 |
|---|
| 589 | for j in range(k): |
|---|
| 590 | if v[j] - mean(nargs[j]) > TINY: |
|---|
| 591 | check = 0 |
|---|
| 592 | if check <> 1: |
|---|
| 593 | raise ValueError, 'Problem in obrientransform.' |
|---|
| 594 | else: |
|---|
| 595 | return nargs |
|---|
| 596 | |
|---|
| 597 | |
|---|
| 598 | def lsamplevar (inlist): |
|---|
| 599 | """ |
|---|
| 600 | Returns the variance of the values in the passed list using |
|---|
| 601 | N for the denominator (i.e., DESCRIBES the sample variance only). |
|---|
| 602 | |
|---|
| 603 | Usage: lsamplevar(inlist) |
|---|
| 604 | """ |
|---|
| 605 | n = len(inlist) |
|---|
| 606 | mn = mean(inlist) |
|---|
| 607 | deviations = [] |
|---|
| 608 | for item in inlist: |
|---|
| 609 | deviations.append(item-mn) |
|---|
| 610 | return ss(deviations)/float(n) |
|---|
| 611 | |
|---|
| 612 | |
|---|
| 613 | def lsamplestdev (inlist): |
|---|
| 614 | """ |
|---|
| 615 | Returns the standard deviation of the values in the passed list using |
|---|
| 616 | N for the denominator (i.e., DESCRIBES the sample stdev only). |
|---|
| 617 | |
|---|
| 618 | Usage: lsamplestdev(inlist) |
|---|
| 619 | """ |
|---|
| 620 | return math.sqrt(samplevar(inlist)) |
|---|
| 621 | |
|---|
| 622 | |
|---|
| 623 | def lvar (inlist): |
|---|
| 624 | """ |
|---|
| 625 | Returns the variance of the values in the passed list using N-1 |
|---|
| 626 | for the denominator (i.e., for estimating population variance). |
|---|
| 627 | |
|---|
| 628 | Usage: lvar(inlist) |
|---|
| 629 | """ |
|---|
| 630 | n = len(inlist) |
|---|
| 631 | mn = mean(inlist) |
|---|
| 632 | deviations = [0]*len(inlist) |
|---|
| 633 | for i in range(len(inlist)): |
|---|
| 634 | deviations[i] = inlist[i] - mn |
|---|
| 635 | return ss(deviations)/float(n-1) |
|---|
| 636 | |
|---|
| 637 | |
|---|
| 638 | def lstdev (inlist): |
|---|
| 639 | """ |
|---|
| 640 | Returns the standard deviation of the values in the passed list |
|---|
| 641 | using N-1 in the denominator (i.e., to estimate population stdev). |
|---|
| 642 | |
|---|
| 643 | Usage: lstdev(inlist) |
|---|
| 644 | """ |
|---|
| 645 | return math.sqrt(var(inlist)) |
|---|
| 646 | |
|---|
| 647 | |
|---|
| 648 | def lsterr(inlist): |
|---|
| 649 | """ |
|---|
| 650 | Returns the standard error of the values in the passed list using N-1 |
|---|
| 651 | in the denominator (i.e., to estimate population standard error). |
|---|
| 652 | |
|---|
| 653 | Usage: lsterr(inlist) |
|---|
| 654 | """ |
|---|
| 655 | return stdev(inlist) / float(math.sqrt(len(inlist))) |
|---|
| 656 | |
|---|
| 657 | |
|---|
| 658 | def lsem (inlist): |
|---|
| 659 | """ |
|---|
| 660 | Returns the estimated standard error of the mean (sx-bar) of the |
|---|
| 661 | values in the passed list. sem = stdev / sqrt(n) |
|---|
| 662 | |
|---|
| 663 | Usage: lsem(inlist) |
|---|
| 664 | """ |
|---|
| 665 | sd = stdev(inlist) |
|---|
| 666 | n = len(inlist) |
|---|
| 667 | return sd/math.sqrt(n) |
|---|
| 668 | |
|---|
| 669 | |
|---|
| 670 | def lz (inlist, score): |
|---|
| 671 | """ |
|---|
| 672 | Returns the z-score for a given input score, given that score and the |
|---|
| 673 | list from which that score came. Not appropriate for population calculations. |
|---|
| 674 | |
|---|
| 675 | Usage: lz(inlist, score) |
|---|
| 676 | """ |
|---|
| 677 | z = (score-mean(inlist))/samplestdev(inlist) |
|---|
| 678 | return z |
|---|
| 679 | |
|---|
| 680 | |
|---|
| 681 | def lzs (inlist): |
|---|
| 682 | """ |
|---|
| 683 | Returns a list of z-scores, one for each score in the passed list. |
|---|
| 684 | |
|---|
| 685 | Usage: lzs(inlist) |
|---|
| 686 | """ |
|---|
| 687 | zscores = [] |
|---|
| 688 | for item in inlist: |
|---|
| 689 | zscores.append(z(inlist,item)) |
|---|
| 690 | return zscores |
|---|
| 691 | |
|---|
| 692 | |
|---|
| 693 | #################################### |
|---|
| 694 | ####### TRIMMING FUNCTIONS ####### |
|---|
| 695 | #################################### |
|---|
| 696 | |
|---|
| 697 | def ltrimboth (l,proportiontocut): |
|---|
| 698 | """ |
|---|
| 699 | Slices off the passed proportion of items from BOTH ends of the passed |
|---|
| 700 | list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost' |
|---|
| 701 | 10% of scores. Assumes list is sorted by magnitude. Slices off LESS if |
|---|
| 702 | proportion results in a non-integer slice index (i.e., conservatively |
|---|
| 703 | slices off proportiontocut). |
|---|
| 704 | |
|---|
| 705 | Usage: ltrimboth (l,proportiontocut) |
|---|
| 706 | Returns: trimmed version of list l |
|---|
| 707 | """ |
|---|
| 708 | lowercut = int(proportiontocut*len(l)) |
|---|
| 709 | uppercut = len(l) - lowercut |
|---|
| 710 | return l[lowercut:uppercut] |
|---|
| 711 | |
|---|
| 712 | |
|---|
| 713 | def ltrim1 (l,proportiontocut,tail='right'): |
|---|
| 714 | """ |
|---|
| 715 | Slices off the passed proportion of items from ONE end of the passed |
|---|
| 716 | list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' |
|---|
| 717 | 10% of scores). Slices off LESS if proportion results in a non-integer |
|---|
| 718 | slice index (i.e., conservatively slices off proportiontocut). |
|---|
| 719 | |
|---|
| 720 | Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left' |
|---|
| 721 | Returns: trimmed version of list l |
|---|
| 722 | """ |
|---|
| 723 | if tail == 'right': |
|---|
| 724 | lowercut = 0 |
|---|
| 725 | uppercut = len(l) - int(proportiontocut*len(l)) |
|---|
| 726 | elif tail == 'left': |
|---|
| 727 | lowercut = int(proportiontocut*len(l)) |
|---|
| 728 | uppercut = len(l) |
|---|
| 729 | return l[lowercut:uppercut] |
|---|
| 730 | |
|---|
| 731 | |
|---|
| 732 | #################################### |
|---|
| 733 | ##### CORRELATION FUNCTIONS ###### |
|---|
| 734 | #################################### |
|---|
| 735 | |
|---|
| 736 | def lpaired(x,y): |
|---|
| 737 | """ |
|---|
| 738 | Interactively determines the type of data and then runs the |
|---|
| 739 | appropriated statistic for paired group data. |
|---|
| 740 | |
|---|
| 741 | Usage: lpaired(x,y) |
|---|
| 742 | Returns: appropriate statistic name, value, and probability |
|---|
| 743 | """ |
|---|
| 744 | samples = '' |
|---|
| 745 | while samples not in ['i','r','I','R','c','C']: |
|---|
| 746 | print '\nIndependent or related samples, or correlation (i,r,c): ', |
|---|
| 747 | samples = raw_input() |
|---|
| 748 | |
|---|
| 749 | if samples in ['i','I','r','R']: |
|---|
| 750 | print '\nComparing variances ...', |
|---|
| 751 | # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 |
|---|
| 752 | r = obrientransform(x,y) |
|---|
| 753 | f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) |
|---|
| 754 | if p<0.05: |
|---|
| 755 | vartype='unequal, p='+str(round(p,4)) |
|---|
| 756 | else: |
|---|
| 757 | vartype='equal' |
|---|
| 758 | print vartype |
|---|
| 759 | if samples in ['i','I']: |
|---|
| 760 | if vartype[0]=='e': |
|---|
| 761 | t,p = ttest_ind(x,y,0) |
|---|
| 762 | print '\nIndependent samples t-test: ', round(t,4),round(p,4) |
|---|
| 763 | else: |
|---|
| 764 | if len(x)>20 or len(y)>20: |
|---|
| 765 | z,p = ranksums(x,y) |
|---|
| 766 | print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4) |
|---|
| 767 | else: |
|---|
| 768 | u,p = mannwhitneyu(x,y) |
|---|
| 769 | print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4) |
|---|
| 770 | |
|---|
| 771 | else: # RELATED SAMPLES |
|---|
| 772 | if vartype[0]=='e': |
|---|
| 773 | t,p = ttest_rel(x,y,0) |
|---|
| 774 | print '\nRelated samples t-test: ', round(t,4),round(p,4) |
|---|
| 775 | else: |
|---|
| 776 | t,p = ranksums(x,y) |
|---|
| 777 | print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4) |
|---|
| 778 | else: # CORRELATION ANALYSIS |
|---|
| 779 | corrtype = '' |
|---|
| 780 | while corrtype not in ['c','C','r','R','d','D']: |
|---|
| 781 | print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', |
|---|
| 782 | corrtype = raw_input() |
|---|
| 783 | if corrtype in ['c','C']: |
|---|
| 784 | m,b,r,p,see = linregress(x,y) |
|---|
| 785 | print '\nLinear regression for continuous variables ...' |
|---|
| 786 | lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] |
|---|
| 787 | pstat.printcc(lol) |
|---|
| 788 | elif corrtype in ['r','R']: |
|---|
| 789 | r,p = spearmanr(x,y) |
|---|
| 790 | print '\nCorrelation for ranked variables ...' |
|---|
| 791 | print "Spearman's r: ",round(r,4),round(p,4) |
|---|
| 792 | else: # DICHOTOMOUS |
|---|
| 793 | r,p = pointbiserialr(x,y) |
|---|
| 794 | print '\nAssuming x contains a dichotomous variable ...' |
|---|
| 795 | print 'Point Biserial r: ',round(r,4),round(p,4) |
|---|
| 796 | print '\n\n' |
|---|
| 797 | return None |
|---|
| 798 | |
|---|
| 799 | |
|---|
| 800 | def lpearsonr(x,y): |
|---|
| 801 | """ |
|---|
| 802 | Calculates a Pearson correlation coefficient and the associated |
|---|
| 803 | probability value. Taken from Heiman's Basic Statistics for the Behav. |
|---|
| 804 | Sci (2nd), p.195. |
|---|
| 805 | |
|---|
| 806 | Usage: lpearsonr(x,y) where x and y are equal-length lists |
|---|
| 807 | Returns: Pearson's r value, two-tailed p-value |
|---|
| 808 | """ |
|---|
| 809 | TINY = 1.0e-30 |
|---|
| 810 | if len(x) <> len(y): |
|---|
| 811 | raise ValueError, 'Input values not paired in pearsonr. Aborting.' |
|---|
| 812 | n = len(x) |
|---|
| 813 | x = map(float,x) |
|---|
| 814 | y = map(float,y) |
|---|
| 815 | xmean = mean(x) |
|---|
| 816 | ymean = mean(y) |
|---|
| 817 | r_num = n*(summult(x,y)) - sum(x)*sum(y) |
|---|
| 818 | r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) |
|---|
| 819 | r = (r_num / r_den) # denominator already a float |
|---|
| 820 | df = n-2 |
|---|
| 821 | t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) |
|---|
| 822 | prob = betai(0.5*df,0.5,df/float(df+t*t)) |
|---|
| 823 | return r, prob |
|---|
| 824 | |
|---|
| 825 | |
|---|
| 826 | def lspearmanr(x,y): |
|---|
| 827 | """ |
|---|
| 828 | Calculates a Spearman rank-order correlation coefficient. Taken |
|---|
| 829 | from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. |
|---|
| 830 | |
|---|
| 831 | Usage: lspearmanr(x,y) where x and y are equal-length lists |
|---|
| 832 | Returns: Spearman's r, two-tailed p-value |
|---|
| 833 | """ |
|---|
| 834 | TINY = 1e-30 |
|---|
| 835 | if len(x) <> len(y): |
|---|
| 836 | raise ValueError, 'Input values not paired in spearmanr. Aborting.' |
|---|
| 837 | n = len(x) |
|---|
| 838 | rankx = rankdata(x) |
|---|
| 839 | ranky = rankdata(y) |
|---|
| 840 | dsq = sumdiffsquared(rankx,ranky) |
|---|
| 841 | rs = 1 - 6*dsq / float(n*(n**2-1)) |
|---|
| 842 | t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) |
|---|
| 843 | df = n-2 |
|---|
| 844 | probrs = betai(0.5*df,0.5,df/(df+t*t)) # t already a float |
|---|
| 845 | # probability values for rs are from part 2 of the spearman function in |
|---|
| 846 | # Numerical Recipies, p.510. They are close to tables, but not exact. (?) |
|---|
| 847 | return rs, probrs |
|---|
| 848 | |
|---|
| 849 | |
|---|
| 850 | def lpointbiserialr(x,y): |
|---|
| 851 | """ |
|---|
| 852 | Calculates a point-biserial correlation coefficient and the associated |
|---|
| 853 | probability value. Taken from Heiman's Basic Statistics for the Behav. |
|---|
| 854 | Sci (1st), p.194. |
|---|
| 855 | |
|---|
| 856 | Usage: lpointbiserialr(x,y) where x,y are equal-length lists |
|---|
| 857 | Returns: Point-biserial r, two-tailed p-value |
|---|
| 858 | """ |
|---|
| 859 | TINY = 1e-30 |
|---|
| 860 | if len(x) <> len(y): |
|---|
| 861 | raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.' |
|---|
| 862 | data = pstat.abut(x,y) |
|---|
| 863 | categories = pstat.unique(x) |
|---|
| 864 | if len(categories) <> 2: |
|---|
| 865 | raise ValueError, "Exactly 2 categories required for pointbiserialr()." |
|---|
| 866 | else: # there are 2 categories, continue |
|---|
| 867 | codemap = pstat.abut(categories,range(2)) |
|---|
| 868 | recoded = pstat.recode(data,codemap,0) |
|---|
| 869 | x = pstat.linexand(data,0,categories[0]) |
|---|
| 870 | y = pstat.linexand(data,0,categories[1]) |
|---|
| 871 | xmean = mean(pstat.colex(x,1)) |
|---|
| 872 | ymean = mean(pstat.colex(y,1)) |
|---|
| 873 | n = len(data) |
|---|
| 874 | adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) |
|---|
| 875 | rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust |
|---|
| 876 | df = n-2 |
|---|
| 877 | t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) |
|---|
| 878 | prob = betai(0.5*df,0.5,df/(df+t*t)) # t already a float |
|---|
| 879 | return rpb, prob |
|---|
| 880 | |
|---|
| 881 | |
|---|
| 882 | def lkendalltau(x,y): |
|---|
| 883 | """ |
|---|
| 884 | Calculates Kendall's tau ... correlation of ordinal data. Adapted |
|---|
| 885 | from function kendl1 in Numerical Recipies. Needs good test-routine.@@@ |
|---|
| 886 | |
|---|
| 887 | Usage: lkendalltau(x,y) |
|---|
| 888 | Returns: Kendall's tau, two-tailed p-value |
|---|
| 889 | """ |
|---|
| 890 | n1 = 0 |
|---|
| 891 | n2 = 0 |
|---|
| 892 | iss = 0 |
|---|
| 893 | for j in range(len(x)-1): |
|---|
| 894 | for k in range(j,len(y)): |
|---|
| 895 | a1 = x[j] - x[k] |
|---|
| 896 | a2 = y[j] - y[k] |
|---|
| 897 | aa = a1 * a2 |
|---|
| 898 | if (aa): # neither list has a tie |
|---|
| 899 | n1 = n1 + 1 |
|---|
| 900 | n2 = n2 + 1 |
|---|
| 901 | if aa > 0: |
|---|
| 902 | iss = iss + 1 |
|---|
| 903 | else: |
|---|
| 904 | iss = iss -1 |
|---|
| 905 | else: |
|---|
| 906 | if (a1): |
|---|
| 907 | n1 = n1 + 1 |
|---|
| 908 | else: |
|---|
| 909 | n2 = n2 + 1 |
|---|
| 910 | tau = iss / math.sqrt(n1*n2) |
|---|
| 911 | svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) |
|---|
| 912 | z = tau / math.sqrt(svar) |
|---|
| 913 | prob = erfcc(abs(z)/1.4142136) |
|---|
| 914 | return tau, prob |
|---|
| 915 | |
|---|
| 916 | |
|---|
| 917 | def llinregress(x,y): |
|---|
| 918 | """ |
|---|
| 919 | Calculates a regression line on x,y pairs. |
|---|
| 920 | |
|---|
| 921 | Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates |
|---|
| 922 | Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate |
|---|
| 923 | """ |
|---|
| 924 | TINY = 1.0e-20 |
|---|
| 925 | if len(x) <> len(y): |
|---|
| 926 | raise ValueError, 'Input values not paired in linregress. Aborting.' |
|---|
| 927 | n = len(x) |
|---|
| 928 | x = map(float,x) |
|---|
| 929 | y = map(float,y) |
|---|
| 930 | xmean = mean(x) |
|---|
| 931 | ymean = mean(y) |
|---|
| 932 | r_num = float(n*(summult(x,y)) - sum(x)*sum(y)) |
|---|
| 933 | r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) |
|---|
| 934 | r = r_num / r_den |
|---|
| 935 | z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) |
|---|
| 936 | df = n-2 |
|---|
| 937 | t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) |
|---|
| 938 | prob = betai(0.5*df,0.5,df/(df+t*t)) |
|---|
| 939 | slope = r_num / float(n*ss(x) - square_of_sums(x)) |
|---|
| 940 | intercept = ymean - slope*xmean |
|---|
| 941 | sterrest = math.sqrt(1-r*r)*samplestdev(y) |
|---|
| 942 | return slope, intercept, r, prob, sterrest |
|---|
| 943 | |
|---|
| 944 | |
|---|
| 945 | #################################### |
|---|
| 946 | ##### INFERENTIAL STATISTICS ##### |
|---|
| 947 | #################################### |
|---|
| 948 | |
|---|
| 949 | def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'): |
|---|
| 950 | """ |
|---|
| 951 | Calculates the t-obtained for the independent samples T-test on ONE group |
|---|
| 952 | of scores a, given a population mean. If printit=1, results are printed |
|---|
| 953 | to the screen. If printit='filename', the results are output to 'filename' |
|---|
| 954 | using the given writemode (default=append). Returns t-value, and prob. |
|---|
| 955 | |
|---|
| 956 | Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') |
|---|
| 957 | Returns: t-value, two-tailed prob |
|---|
| 958 | """ |
|---|
| 959 | x = mean(a) |
|---|
| 960 | v = var(a) |
|---|
| 961 | n = len(a) |
|---|
| 962 | df = n-1 |
|---|
| 963 | svar = ((n-1)*v)/float(df) |
|---|
| 964 | t = (x-popmean)/math.sqrt(svar*(1.0/n)) |
|---|
| 965 | prob = betai(0.5*df,0.5,float(df)/(df+t*t)) |
|---|
| 966 | |
|---|
| 967 | if printit <> 0: |
|---|
| 968 | statname = 'Single-sample T-test.' |
|---|
| 969 | outputpairedstats(printit,writemode, |
|---|
| 970 | 'Population','--',popmean,0,0,0, |
|---|
| 971 | name,n,x,v,min(a),max(a), |
|---|
| 972 | statname,t,prob) |
|---|
| 973 | return t,prob |
|---|
| 974 | |
|---|
| 975 | |
|---|
| 976 | def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'): |
|---|
| 977 | """ |
|---|
| 978 | Calculates the t-obtained T-test on TWO INDEPENDENT samples of |
|---|
| 979 | scores a, and b. From Numerical Recipies, p.483. If printit=1, results |
|---|
| 980 | are printed to the screen. If printit='filename', the results are output |
|---|
| 981 | to 'filename' using the given writemode (default=append). Returns t-value, |
|---|
| 982 | and prob. |
|---|
| 983 | |
|---|
| 984 | Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a') |
|---|
| 985 | Returns: t-value, two-tailed prob |
|---|
| 986 | """ |
|---|
| 987 | x1 = mean(a) |
|---|
| 988 | x2 = mean(b) |
|---|
| 989 | v1 = stdev(a)**2 |
|---|
| 990 | v2 = stdev(b)**2 |
|---|
| 991 | n1 = len(a) |
|---|
| 992 | n2 = len(b) |
|---|
| 993 | df = n1+n2-2 |
|---|
| 994 | svar = ((n1-1)*v1+(n2-1)*v2)/float(df) |
|---|
| 995 | t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2)) |
|---|
| 996 | prob = betai(0.5*df,0.5,df/(df+t*t)) |
|---|
| 997 | |
|---|
| 998 | if printit <> 0: |
|---|
| 999 | statname = 'Independent samples T-test.' |
|---|
| 1000 | outputpairedstats(printit,writemode, |
|---|
| 1001 | name1,n1,x1,v1,min(a),max(a), |
|---|
| 1002 | name2,n2,x2,v2,min(b),max(b), |
|---|
| 1003 | statname,t,prob) |
|---|
| 1004 | return t,prob |
|---|
| 1005 | |
|---|
| 1006 | |
|---|
| 1007 | def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'): |
|---|
| 1008 | """ |
|---|
| 1009 | Calculates the t-obtained T-test on TWO RELATED samples of scores, |
|---|
| 1010 | a and b. From Numerical Recipies, p.483. If printit=1, results are |
|---|
| 1011 | printed to the screen. If printit='filename', the results are output to |
|---|
| 1012 | 'filename' using the given writemode (default=append). Returns t-value, |
|---|
| 1013 | and prob. |
|---|
| 1014 | |
|---|
| 1015 | Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a') |
|---|
| 1016 | Returns: t-value, two-tailed prob |
|---|
| 1017 | """ |
|---|
| 1018 | if len(a)<>len(b): |
|---|
| 1019 | raise ValueError, 'Unequal length lists in ttest_rel.' |
|---|
| 1020 | x1 = mean(a) |
|---|
| 1021 | x2 = mean(b) |
|---|
| 1022 | v1 = var(a) |
|---|
| 1023 | v2 = var(b) |
|---|
| 1024 | n = len(a) |
|---|
| 1025 | cov = 0 |
|---|
| 1026 | for i in range(len(a)): |
|---|
| 1027 | cov = cov + (a[i]-x1) * (b[i]-x2) |
|---|
| 1028 | df = n-1 |
|---|
| 1029 | cov = cov / float(df) |
|---|
| 1030 | sd = math.sqrt((v1+v2 - 2.0*cov)/float(n)) |
|---|
| 1031 | t = (x1-x2)/sd |
|---|
| 1032 | prob = betai(0.5*df,0.5,df/(df+t*t)) |
|---|
| 1033 | |
|---|
| 1034 | if printit <> 0: |
|---|
| 1035 | statname = 'Related samples T-test.' |
|---|
| 1036 | outputpairedstats(printit,writemode, |
|---|
| 1037 | name1,n,x1,v1,min(a),max(a), |
|---|
| 1038 | name2,n,x2,v2,min(b),max(b), |
|---|
| 1039 | statname,t,prob) |
|---|
| 1040 | return t, prob |
|---|
| 1041 | |
|---|
| 1042 | |
|---|
| 1043 | def lchisquare(f_obs,f_exp=None): |
|---|
| 1044 | """ |
|---|
| 1045 | Calculates a one-way chi square for list of observed frequencies and returns |
|---|
| 1046 | the result. If no expected frequencies are given, the total N is assumed to |
|---|
| 1047 | be equally distributed across all groups. |
|---|
| 1048 | |
|---|
| 1049 | Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq. |
|---|
| 1050 | Returns: chisquare-statistic, associated p-value |
|---|
| 1051 | """ |
|---|
| 1052 | k = len(f_obs) # number of groups |
|---|
| 1053 | if f_exp == None: |
|---|
| 1054 | f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq. |
|---|
| 1055 | chisq = 0 |
|---|
| 1056 | for i in range(len(f_obs)): |
|---|
| 1057 | chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i]) |
|---|
| 1058 | return chisq, chisqprob(chisq, k-1) |
|---|
| 1059 | |
|---|
| 1060 | |
|---|
| 1061 | def lks_2samp (data1,data2): |
|---|
| 1062 | """ |
|---|
| 1063 | Computes the Kolmogorov-Smirnof statistic on 2 samples. From |
|---|
| 1064 | Numerical Recipies in C, page 493. |
|---|
| 1065 | |
|---|
| 1066 | Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions |
|---|
| 1067 | Returns: KS D-value, associated p-value |
|---|
| 1068 | """ |
|---|
| 1069 | j1 = 0 |
|---|
| 1070 | j2 = 0 |
|---|
| 1071 | fn1 = 0.0 |
|---|
| 1072 | fn2 = 0.0 |
|---|
| 1073 | n1 = len(data1) |
|---|
| 1074 | n2 = len(data2) |
|---|
| 1075 | en1 = n1 |
|---|
| 1076 | en2 = n2 |
|---|
| 1077 | d = 0.0 |
|---|
| 1078 | data1.sort() |
|---|
| 1079 | data2.sort() |
|---|
| 1080 | while j1 < n1 and j2 < n2: |
|---|
| 1081 | d1=data1[j1] |
|---|
| 1082 | d2=data2[j2] |
|---|
| 1083 | if d1 <= d2: |
|---|
| 1084 | fn1 = (j1)/float(en1) |
|---|
| 1085 | j1 = j1 + 1 |
|---|
| 1086 | if d2 <= d1: |
|---|
| 1087 | fn2 = (j2)/float(en2) |
|---|
| 1088 | j2 = j2 + 1 |
|---|
| 1089 | dt = (fn2-fn1) |
|---|
| 1090 | if math.fabs(dt) > math.fabs(d): |
|---|
| 1091 | d = dt |
|---|
| 1092 | try: |
|---|
| 1093 | en = math.sqrt(en1*en2/float(en1+en2)) |
|---|
| 1094 | prob = ksprob((en+0.12+0.11/en)*abs(d)) |
|---|
| 1095 | except: |
|---|
| 1096 | prob = 1.0 |
|---|
| 1097 | return d, prob |
|---|
| 1098 | |
|---|
| 1099 | |
|---|
| 1100 | def lmannwhitneyu(x,y): |
|---|
| 1101 | """ |
|---|
| 1102 | Calculates a Mann-Whitney U statistic on the provided scores and |
|---|
| 1103 | returns the result. Use only when the n in each condition is < 20 and |
|---|
| 1104 | you have 2 independent samples of ranks. NOTE: Mann-Whitney U is |
|---|
| 1105 | significant if the u-obtained is LESS THAN or equal to the critical |
|---|
| 1106 | value of U found in the tables. Equivalent to Kruskal-Wallis H with |
|---|
| 1107 | just 2 groups. |
|---|
| 1108 | |
|---|
| 1109 | Usage: lmannwhitneyu(data) |
|---|
| 1110 | Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) |
|---|
| 1111 | """ |
|---|
| 1112 | n1 = len(x) |
|---|
| 1113 | n2 = len(y) |
|---|
| 1114 | ranked = rankdata(x+y) |
|---|
| 1115 | rankx = ranked[0:n1] # get the x-ranks |
|---|
| 1116 | ranky = ranked[n1:] # the rest are y-ranks |
|---|
| 1117 | u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x |
|---|
| 1118 | u2 = n1*n2 - u1 # remainder is U for y |
|---|
| 1119 | bigu = max(u1,u2) |
|---|
| 1120 | smallu = min(u1,u2) |
|---|
| 1121 | T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores |
|---|
| 1122 | if T == 0: |
|---|
| 1123 | raise ValueError, 'All numbers are identical in lmannwhitneyu' |
|---|
| 1124 | sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) |
|---|
| 1125 | z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc |
|---|
| 1126 | return smallu, 1.0 - zprob(z) |
|---|
| 1127 | |
|---|
| 1128 | |
|---|
| 1129 | def ltiecorrect(rankvals): |
|---|
| 1130 | """ |
|---|
| 1131 | Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See |
|---|
| 1132 | Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences. |
|---|
| 1133 | New York: McGraw-Hill. Code adapted from |Stat rankind.c code. |
|---|
| 1134 | |
|---|
| 1135 | Usage: ltiecorrect(rankvals) |
|---|
| 1136 | Returns: T correction factor for U or H |
|---|
| 1137 | """ |
|---|
| 1138 | sorted,posn = shellsort(rankvals) |
|---|
| 1139 | n = len(sorted) |
|---|
| 1140 | T = 0.0 |
|---|
| 1141 | i = 0 |
|---|
| 1142 | while (i<n-1): |
|---|
| 1143 | if sorted[i] == sorted[i+1]: |
|---|
| 1144 | nties = 1 |
|---|
| 1145 | while (i<n-1) and (sorted[i] == sorted[i+1]): |
|---|
| 1146 | nties = nties +1 |
|---|
| 1147 | i = i +1 |
|---|
| 1148 | T = T + nties**3 - nties |
|---|
| 1149 | i = i+1 |
|---|
| 1150 | T = T / float(n**3-n) |
|---|
| 1151 | return 1.0 - T |
|---|
| 1152 | |
|---|
| 1153 | |
|---|
| 1154 | def lranksums(x,y): |
|---|
| 1155 | """ |
|---|
| 1156 | Calculates the rank sums statistic on the provided scores and |
|---|
| 1157 | returns the result. Use only when the n in each condition is > 20 and you |
|---|
| 1158 | have 2 independent samples of ranks. |
|---|
| 1159 | |
|---|
| 1160 | Usage: lranksums(x,y) |
|---|
| 1161 | Returns: a z-statistic, two-tailed p-value |
|---|
| 1162 | """ |
|---|
| 1163 | n1 = len(x) |
|---|
| 1164 | n2 = len(y) |
|---|
| 1165 | alldata = x+y |
|---|
| 1166 | ranked = rankdata(alldata) |
|---|
| 1167 | x = ranked[:n1] |
|---|
| 1168 | y = ranked[n1:] |
|---|
| 1169 | s = sum(x) |
|---|
| 1170 | expected = n1*(n1+n2+1) / 2.0 |
|---|
| 1171 | z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) |
|---|
| 1172 | prob = 2*(1.0 -zprob(abs(z))) |
|---|
| 1173 | return z, prob |
|---|
| 1174 | |
|---|
| 1175 | |
|---|
| 1176 | def lwilcoxont(x,y): |
|---|
| 1177 | """ |
|---|
| 1178 | Calculates the Wilcoxon T-test for related samples and returns the |
|---|
| 1179 | result. A non-parametric T-test. |
|---|
| 1180 | |
|---|
| 1181 | Usage: lwilcoxont(x,y) |
|---|
| 1182 | Returns: a t-statistic, two-tail probability estimate |
|---|
| 1183 | """ |
|---|
| 1184 | if len(x) <> len(y): |
|---|
| 1185 | raise ValueError, 'Unequal N in wilcoxont. Aborting.' |
|---|
| 1186 | d=[] |
|---|
| 1187 | for i in range(len(x)): |
|---|
| 1188 | diff = x[i] - y[i] |
|---|
| 1189 | if diff <> 0: |
|---|
| 1190 | d.append(diff) |
|---|
| 1191 | count = len(d) |
|---|
| 1192 | absd = map(abs,d) |
|---|
| 1193 | absranked = rankdata(absd) |
|---|
| 1194 | r_plus = 0.0 |
|---|
| 1195 | r_minus = 0.0 |
|---|
| 1196 | for i in range(len(absd)): |
|---|
| 1197 | if d[i] < 0: |
|---|
| 1198 | r_minus = r_minus + absranked[i] |
|---|
| 1199 | else: |
|---|
| 1200 | r_plus = r_plus + absranked[i] |
|---|
| 1201 | wt = min(r_plus, r_minus) |
|---|
| 1202 | mn = count * (count+1) * 0.25 |
|---|
| 1203 | se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) |
|---|
| 1204 | z = math.fabs(wt-mn) / se |
|---|
| 1205 | prob = 2*(1.0 -zprob(abs(z))) |
|---|
| 1206 | return wt, prob |
|---|
| 1207 | |
|---|
| 1208 | |
|---|
| 1209 | def lkruskalwallish(*args): |
|---|
| 1210 | """ |
|---|
| 1211 | The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more |
|---|
| 1212 | groups, requiring at least 5 subjects in each group. This function |
|---|
| 1213 | calculates the Kruskal-Wallis H-test for 3 or more independent samples |
|---|
| 1214 | and returns the result. |
|---|
| 1215 | |
|---|
| 1216 | Usage: lkruskalwallish(*args) |
|---|
| 1217 | Returns: H-statistic (corrected for ties), associated p-value |
|---|
| 1218 | """ |
|---|
| 1219 | args = list(args) |
|---|
| 1220 | n = [0]*len(args) |
|---|
| 1221 | all = [] |
|---|
| 1222 | n = map(len,args) |
|---|
| 1223 | for i in range(len(args)): |
|---|
| 1224 | all = all + args[i] |
|---|
| 1225 | ranked = rankdata(all) |
|---|
| 1226 | T = tiecorrect(ranked) |
|---|
| 1227 | for i in range(len(args)): |
|---|
| 1228 | args[i] = ranked[0:n[i]] |
|---|
| 1229 | del ranked[0:n[i]] |
|---|
| 1230 | rsums = [] |
|---|
| 1231 | for i in range(len(args)): |
|---|
| 1232 | rsums.append(sum(args[i])**2) |
|---|
| 1233 | rsums[i] = rsums[i] / float(n[i]) |
|---|
| 1234 | ssbn = sum(rsums) |
|---|
| 1235 | totaln = sum(n) |
|---|
| 1236 | h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) |
|---|
| 1237 | df = len(args) - 1 |
|---|
| 1238 | if T == 0: |
|---|
| 1239 | raise ValueError, 'All numbers are identical in lkruskalwallish' |
|---|
| 1240 | h = h / float(T) |
|---|
| 1241 | return h, chisqprob(h,df) |
|---|
| 1242 | |
|---|
| 1243 | |
|---|
| 1244 | def lfriedmanchisquare(*args): |
|---|
| 1245 | """ |
|---|
| 1246 | Friedman Chi-Square is a non-parametric, one-way within-subjects |
|---|
| 1247 | ANOVA. This function calculates the Friedman Chi-square test for repeated |
|---|
| 1248 | measures and returns the result, along with the associated probability |
|---|
| 1249 | value. It assumes 3 or more repeated measures. Only 3 levels requires a |
|---|
| 1250 | minimum of 10 subjects in the study. Four levels requires 5 subjects per |
|---|
| 1251 | level(??). |
|---|
| 1252 | |
|---|
| 1253 | Usage: lfriedmanchisquare(*args) |
|---|
| 1254 | Returns: chi-square statistic, associated p-value |
|---|
| 1255 | """ |
|---|
| 1256 | k = len(args) |
|---|
| 1257 | if k < 3: |
|---|
| 1258 | raise ValueError, 'Less than 3 levels. Friedman test not appropriate.' |
|---|
| 1259 | n = len(args[0]) |
|---|
| 1260 | data = apply(pstat.abut,tuple(args)) |
|---|
| 1261 | for i in range(len(data)): |
|---|
| 1262 | data[i] = rankdata(data[i]) |
|---|
| 1263 | ssbn = 0 |
|---|
| 1264 | for i in range(k): |
|---|
| 1265 | ssbn = ssbn + sum(args[i])**2 |
|---|
| 1266 | chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) |
|---|
| 1267 | return chisq, chisqprob(chisq,k-1) |
|---|
| 1268 | |
|---|
| 1269 | |
|---|
| 1270 | #################################### |
|---|
| 1271 | #### PROBABILITY CALCULATIONS #### |
|---|
| 1272 | #################################### |
|---|
| 1273 | |
|---|
| 1274 | def lchisqprob(chisq,df): |
|---|
| 1275 | """ |
|---|
| 1276 | Returns the (1-tailed) probability value associated with the provided |
|---|
| 1277 | chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat. |
|---|
| 1278 | |
|---|
| 1279 | Usage: lchisqprob(chisq,df) |
|---|
| 1280 | """ |
|---|
| 1281 | BIG = 20.0 |
|---|
| 1282 | def ex(x): |
|---|
| 1283 | BIG = 20.0 |
|---|
| 1284 | if x < -BIG: |
|---|
| 1285 | return 0.0 |
|---|
| 1286 | else: |
|---|
| 1287 | return math.exp(x) |
|---|
| 1288 | |
|---|
| 1289 | if chisq <=0 or df < 1: |
|---|
| 1290 | return 1.0 |
|---|
| 1291 | a = 0.5 * chisq |
|---|
| 1292 | if df%2 == 0: |
|---|
| 1293 | even = 1 |
|---|
| 1294 | else: |
|---|
| 1295 | even = 0 |
|---|
| 1296 | if df > 1: |
|---|
| 1297 | y = ex(-a) |
|---|
| 1298 | if even: |
|---|
| 1299 | s = y |
|---|
| 1300 | else: |
|---|
| 1301 | s = 2.0 * zprob(-math.sqrt(chisq)) |
|---|
| 1302 | if (df > 2): |
|---|
| 1303 | chisq = 0.5 * (df - 1.0) |
|---|
| 1304 | if even: |
|---|
| 1305 | z = 1.0 |
|---|
| 1306 | else: |
|---|
| 1307 | z = 0.5 |
|---|
| 1308 | if a > BIG: |
|---|
| 1309 | if even: |
|---|
| 1310 | e = 0.0 |
|---|
| 1311 | else: |
|---|
| 1312 | e = math.log(math.sqrt(math.pi)) |
|---|
| 1313 | c = math.log(a) |
|---|
| 1314 | while (z <= chisq): |
|---|
| 1315 | e = math.log(z) + e |
|---|
| 1316 | s = s + ex(c*z-a-e) |
|---|
| 1317 | z = z + 1.0 |
|---|
| 1318 | return s |
|---|
| 1319 | else: |
|---|
| 1320 | if even: |
|---|
| 1321 | e = 1.0 |
|---|
| 1322 | else: |
|---|
| 1323 | e = 1.0 / math.sqrt(math.pi) / math.sqrt(a) |
|---|
| 1324 | c = 0.0 |
|---|
| 1325 | while (z <= chisq): |
|---|
| 1326 | e = e * (a/float(z)) |
|---|
| 1327 | c = c + e |
|---|
| 1328 | z = z + 1.0 |
|---|
| 1329 | return (c*y+s) |
|---|
| 1330 | else: |
|---|
| 1331 | return s |
|---|
| 1332 | |
|---|
| 1333 | |
|---|
| 1334 | def lerfcc(x): |
|---|
| 1335 | """ |
|---|
| 1336 | Returns the complementary error function erfc(x) with fractional |
|---|
| 1337 | error everywhere less than 1.2e-7. Adapted from Numerical Recipies. |
|---|
| 1338 | |
|---|
| 1339 | Usage: lerfcc(x) |
|---|
| 1340 | """ |
|---|
| 1341 | z = abs(x) |
|---|
| 1342 | t = 1.0 / (1.0+0.5*z) |
|---|
| 1343 | ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) |
|---|
| 1344 | if x >= 0: |
|---|
| 1345 | return ans |
|---|
| 1346 | else: |
|---|
| 1347 | return 2.0 - ans |
|---|
| 1348 | |
|---|
| 1349 | |
|---|
| 1350 | def lzprob(z): |
|---|
| 1351 | """ |
|---|
| 1352 | Returns the area under the normal curve 'to the left of' the given z value. |
|---|
| 1353 | Thus, |
|---|
| 1354 | for z<0, zprob(z) = 1-tail probability |
|---|
| 1355 | for z>0, 1.0-zprob(z) = 1-tail probability |
|---|
| 1356 | for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability |
|---|
| 1357 | Adapted from z.c in Gary Perlman's |Stat. |
|---|
| 1358 | |
|---|
| 1359 | Usage: lzprob(z) |
|---|
| 1360 | """ |
|---|
| 1361 | Z_MAX = 6.0 # maximum meaningful z-value |
|---|
| 1362 | if z == 0.0: |
|---|
| 1363 | x = 0.0 |
|---|
| 1364 | else: |
|---|
| 1365 | y = 0.5 * math.fabs(z) |
|---|
| 1366 | if y >= (Z_MAX*0.5): |
|---|
| 1367 | x = 1.0 |
|---|
| 1368 | elif (y < 1.0): |
|---|
| 1369 | w = y*y |
|---|
| 1370 | x = ((((((((0.000124818987 * w |
|---|
| 1371 | -0.001075204047) * w +0.005198775019) * w |
|---|
| 1372 | -0.019198292004) * w +0.059054035642) * w |
|---|
| 1373 | -0.151968751364) * w +0.319152932694) * w |
|---|
| 1374 | -0.531923007300) * w +0.797884560593) * y * 2.0 |
|---|
| 1375 | else: |
|---|
| 1376 | y = y - 2.0 |
|---|
| 1377 | x = (((((((((((((-0.000045255659 * y |
|---|
| 1378 | +0.000152529290) * y -0.000019538132) * y |
|---|
| 1379 | -0.000676904986) * y +0.001390604284) * y |
|---|
| 1380 | -0.000794620820) * y -0.002034254874) * y |
|---|
| 1381 | +0.006549791214) * y -0.010557625006) * y |
|---|
| 1382 | +0.011630447319) * y -0.009279453341) * y |
|---|
| 1383 | +0.005353579108) * y -0.002141268741) * y |
|---|
| 1384 | +0.000535310849) * y +0.999936657524 |
|---|
| 1385 | if z > 0.0: |
|---|
| 1386 | prob = ((x+1.0)*0.5) |
|---|
| 1387 | else: |
|---|
| 1388 | prob = ((1.0-x)*0.5) |
|---|
| 1389 | return prob |
|---|
| 1390 | |
|---|
| 1391 | |
|---|
| 1392 | def lksprob(alam): |
|---|
| 1393 | """ |
|---|
| 1394 | Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from |
|---|
| 1395 | Numerical Recipies. |
|---|
| 1396 | |
|---|
| 1397 | Usage: lksprob(alam) |
|---|
| 1398 | """ |
|---|
| 1399 | fac = 2.0 |
|---|
| 1400 | sum = 0.0 |
|---|
| 1401 | termbf = 0.0 |
|---|
| 1402 | a2 = -2.0*alam*alam |
|---|
| 1403 | for j in range(1,201): |
|---|
| 1404 | term = fac*math.exp(a2*j*j) |
|---|
| 1405 | sum = sum + term |
|---|
| 1406 | if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum): |
|---|
| 1407 | return sum |
|---|
| 1408 | fac = -fac |
|---|
| 1409 | termbf = math.fabs(term) |
|---|
| 1410 | return 1.0 # Get here only if fails to converge; was 0.0!! |
|---|
| 1411 | |
|---|
| 1412 | |
|---|
| 1413 | def lfprob (dfnum, dfden, F): |
|---|
| 1414 | """ |
|---|
| 1415 | Returns the (1-tailed) significance level (p-value) of an F |
|---|
| 1416 | statistic given the degrees of freedom for the numerator (dfR-dfF) and |
|---|
| 1417 | the degrees of freedom for the denominator (dfF). |
|---|
| 1418 | |
|---|
| 1419 | Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn |
|---|
| 1420 | """ |
|---|
| 1421 | p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) |
|---|
| 1422 | return p |
|---|
| 1423 | |
|---|
| 1424 | |
|---|
| 1425 | def lbetacf(a,b,x): |
|---|
| 1426 | """ |
|---|
| 1427 | This function evaluates the continued fraction form of the incomplete |
|---|
| 1428 | Beta function, betai. (Adapted from: Numerical Recipies in C.) |
|---|
| 1429 | |
|---|
| 1430 | Usage: lbetacf(a,b,x) |
|---|
| 1431 | """ |
|---|
| 1432 | ITMAX = 200 |
|---|
| 1433 | EPS = 3.0e-7 |
|---|
| 1434 | |
|---|
| 1435 | bm = az = am = 1.0 |
|---|
| 1436 | qab = a+b |
|---|
| 1437 | qap = a+1.0 |
|---|
| 1438 | qam = a-1.0 |
|---|
| 1439 | bz = 1.0-qab*x/qap |
|---|
| 1440 | for i in range(ITMAX+1): |
|---|
| 1441 | em = float(i+1) |
|---|
| 1442 | tem = em + em |
|---|
| 1443 | d = em*(b-em)*x/((qam+tem)*(a+tem)) |
|---|
| 1444 | ap = az + d*am |
|---|
| 1445 | bp = bz+d*bm |
|---|
| 1446 | d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) |
|---|
| 1447 | app = ap+d*az |
|---|
| 1448 | bpp = bp+d*bz |
|---|
| 1449 | aold = az |
|---|
| 1450 | am = ap/bpp |
|---|
| 1451 | bm = bp/bpp |
|---|
| 1452 | az = app/bpp |
|---|
| 1453 | bz = 1.0 |
|---|
| 1454 | if (abs(az-aold)<(EPS*abs(az))): |
|---|
| 1455 | return az |
|---|
| 1456 | print 'a or b too big, or ITMAX too small in Betacf.' |
|---|
| 1457 | |
|---|
| 1458 | |
|---|
| 1459 | def lgammln(xx): |
|---|
| 1460 | """ |
|---|
| 1461 | Returns the gamma function of xx. |
|---|
| 1462 | Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. |
|---|
| 1463 | (Adapted from: Numerical Recipies in C.) |
|---|
| 1464 | |
|---|
| 1465 | Usage: lgammln(xx) |
|---|
| 1466 | """ |
|---|
| 1467 | |
|---|
| 1468 | coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, |
|---|
| 1469 | 0.120858003e-2, -0.536382e-5] |
|---|
| 1470 | x = xx - 1.0 |
|---|
| 1471 | tmp = x + 5.5 |
|---|
| 1472 | tmp = tmp - (x+0.5)*math.log(tmp) |
|---|
| 1473 | ser = 1.0 |
|---|
| 1474 | for j in range(len(coeff)): |
|---|
| 1475 | x = x + 1 |
|---|
| 1476 | ser = ser + coeff[j]/x |
|---|
| 1477 | return -tmp + math.log(2.50662827465*ser) |
|---|
| 1478 | |
|---|
| 1479 | |
|---|
| 1480 | def lbetai(a,b,x): |
|---|
| 1481 | """ |
|---|
| 1482 | Returns the incomplete beta function: |
|---|
| 1483 | |
|---|
| 1484 | I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) |
|---|
| 1485 | |
|---|
| 1486 | where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma |
|---|
| 1487 | function of a. The continued fraction formulation is implemented here, |
|---|
| 1488 | using the betacf function. (Adapted from: Numerical Recipies in C.) |
|---|
| 1489 | |
|---|
| 1490 | Usage: lbetai(a,b,x) |
|---|
| 1491 | """ |
|---|
| 1492 | if (x<0.0 or x>1.0): |
|---|
| 1493 | raise ValueError, 'Bad x in lbetai' |
|---|
| 1494 | if (x==0.0 or x==1.0): |
|---|
| 1495 | bt = 0.0 |
|---|
| 1496 | else: |
|---|
| 1497 | bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b* |
|---|
| 1498 | math.log(1.0-x)) |
|---|
| 1499 | if (x<(a+1.0)/(a+b+2.0)): |
|---|
| 1500 | return bt*betacf(a,b,x)/float(a) |
|---|
| 1501 | else: |
|---|
| 1502 | return 1.0-bt*betacf(b,a,1.0-x)/float(b) |
|---|
| 1503 | |
|---|
| 1504 | |
|---|
| 1505 | #################################### |
|---|
| 1506 | ####### ANOVA CALCULATIONS ####### |
|---|
| 1507 | #################################### |
|---|
| 1508 | |
|---|
| 1509 | def lF_oneway(*lists): |
|---|
| 1510 | """ |
|---|
| 1511 | Performs a 1-way ANOVA, returning an F-value and probability given |
|---|
| 1512 | any number of groups. From Heiman, pp.394-7. |
|---|
| 1513 | |
|---|
| 1514 | Usage: F_oneway(*lists) where *lists is any number of lists, one per |
|---|
| 1515 | treatment group |
|---|
| 1516 | Returns: F value, one-tailed p-value |
|---|
| 1517 | """ |
|---|
| 1518 | a = len(lists) # ANOVA on 'a' groups, each in it's own list |
|---|
| 1519 | means = [0]*a |
|---|
| 1520 | vars = [0]*a |
|---|
| 1521 | ns = [0]*a |
|---|
| 1522 | alldata = [] |
|---|
| 1523 | tmp = map(N.array,lists) |
|---|
| 1524 | means = map(amean,tmp) |
|---|
| 1525 | vars = map(avar,tmp) |
|---|
| 1526 | ns = map(len,lists) |
|---|
| 1527 | for i in range(len(lists)): |
|---|
| 1528 | alldata = alldata + lists[i] |
|---|
| 1529 | alldata = N.array(alldata) |
|---|
| 1530 | bign = len(alldata) |
|---|
| 1531 | sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) |
|---|
| 1532 | ssbn = 0 |
|---|
| 1533 | for list in lists: |
|---|
| 1534 | ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list)) |
|---|
| 1535 | ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) |
|---|
| 1536 | sswn = sstot-ssbn |
|---|
| 1537 | dfbn = a-1 |
|---|
| 1538 | dfwn = bign - a |
|---|
| 1539 | msb = ssbn/float(dfbn) |
|---|
| 1540 | msw = sswn/float(dfwn) |
|---|
| 1541 | f = msb/msw |
|---|
| 1542 | prob = fprob(dfbn,dfwn,f) |
|---|
| 1543 | return f, prob |
|---|
| 1544 | |
|---|
| 1545 | |
|---|
| 1546 | def lF_value (ER,EF,dfnum,dfden): |
|---|
| 1547 | """ |
|---|
| 1548 | Returns an F-statistic given the following: |
|---|
| 1549 | ER = error associated with the null hypothesis (the Restricted model) |
|---|
| 1550 | EF = error associated with the alternate hypothesis (the Full model) |
|---|
| 1551 | dfR-dfF = degrees of freedom of the numerator |
|---|
| 1552 | dfF = degrees of freedom associated with the denominator/Full model |
|---|
| 1553 | |
|---|
| 1554 | Usage: lF_value(ER,EF,dfnum,dfden) |
|---|
| 1555 | """ |
|---|
| 1556 | return ((ER-EF)/float(dfnum) / (EF/float(dfden))) |
|---|
| 1557 | |
|---|
| 1558 | |
|---|
| 1559 | #################################### |
|---|
| 1560 | ######## SUPPORT FUNCTIONS ####### |
|---|
| 1561 | #################################### |
|---|
| 1562 | |
|---|
| 1563 | def writecc (listoflists,file,writetype='w',extra=2): |
|---|
| 1564 | """ |
|---|
| 1565 | Writes a list of lists to a file in columns, customized by the max |
|---|
| 1566 | size of items within the columns (max size of items in col, +2 characters) |
|---|
| 1567 | to specified file. File-overwrite is the default. |
|---|
| 1568 | |
|---|
| 1569 | Usage: writecc (listoflists,file,writetype='w',extra=2) |
|---|
| 1570 | Returns: None |
|---|
| 1571 | """ |
|---|
| 1572 | if type(listoflists[0]) not in [ListType,TupleType]: |
|---|
| 1573 | listoflists = [listoflists] |
|---|
| 1574 | outfile = open(file,writetype) |
|---|
| 1575 | rowstokill = [] |
|---|
| 1576 | list2print = copy.deepcopy(listoflists) |
|---|
| 1577 | for i in range(len(listoflists)): |
|---|
| 1578 | if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes': |
|---|
| 1579 | rowstokill = rowstokill + [i] |
|---|
| 1580 | rowstokill.reverse() |
|---|
| 1581 | for row in rowstokill: |
|---|
| 1582 | del list2print[row] |
|---|
| 1583 | maxsize = [0]*len(list2print[0]) |
|---|
| 1584 | for col in range(len(list2print[0])): |
|---|
| 1585 | items = pstat.colex(list2print,col) |
|---|
| 1586 | items = map(pstat.makestr,items) |
|---|
| 1587 | maxsize[col] = max(map(len,items)) + extra |
|---|
| 1588 | for row in listoflists: |
|---|
| 1589 | if row == ['\n'] or row == '\n': |
|---|
| 1590 | outfile.write('\n') |
|---|
| 1591 | elif row == ['dashes'] or row == 'dashes': |
|---|
| 1592 | dashes = [0]*len(maxsize) |
|---|
| 1593 | for j in range(len(maxsize)): |
|---|
| 1594 | dashes[j] = '-'*(maxsize[j]-2) |
|---|
| 1595 | outfile.write(pstat.lineincustcols(dashes,maxsize)) |
|---|
| 1596 | else: |
|---|
| 1597 | outfile.write(pstat.lineincustcols(row,maxsize)) |
|---|
| 1598 | outfile.write('\n') |
|---|
| 1599 | outfile.close() |
|---|
| 1600 | return None |
|---|
| 1601 | |
|---|
| 1602 | |
|---|
| 1603 | def lincr(l,cap): # to increment a list up to a max-list of 'cap' |
|---|
| 1604 | """ |
|---|
| 1605 | Simulate a counting system from an n-dimensional list. |
|---|
| 1606 | |
|---|
| 1607 | Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n |
|---|
| 1608 | Returns: next set of values for list l, OR -1 (if overflow) |
|---|
| 1609 | """ |
|---|
| 1610 | l[0] = l[0] + 1 # e.g., [0,0,0] --> [2,4,3] (=cap) |
|---|
| 1611 | for i in range(len(l)): |
|---|
| 1612 | if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done |
|---|
| 1613 | l[i] = 0 |
|---|
| 1614 | l[i+1] = l[i+1] + 1 |
|---|
| 1615 | elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished |
|---|
| 1616 | l = -1 |
|---|
| 1617 | return l |
|---|
| 1618 | |
|---|
| 1619 | |
|---|
| 1620 | def lsum (inlist): |
|---|
| 1621 | """ |
|---|
| 1622 | Returns the sum of the items in the passed list. |
|---|
| 1623 | |
|---|
| 1624 | Usage: lsum(inlist) |
|---|
| 1625 | """ |
|---|
| 1626 | s = 0 |
|---|
| 1627 | for item in inlist: |
|---|
| 1628 | s = s + item |
|---|
| 1629 | return s |
|---|
| 1630 | |
|---|
| 1631 | |
|---|
| 1632 | def lcumsum (inlist): |
|---|
| 1633 | """ |
|---|
| 1634 | Returns a list consisting of the cumulative sum of the items in the |
|---|
| 1635 | passed list. |
|---|
| 1636 | |
|---|
| 1637 | Usage: lcumsum(inlist) |
|---|
| 1638 | """ |
|---|
| 1639 | newlist = copy.deepcopy(inlist) |
|---|
| 1640 | for i in range(1,len(newlist)): |
|---|
| 1641 | newlist[i] = newlist[i] + newlist[i-1] |
|---|
| 1642 | return newlist |
|---|
| 1643 | |
|---|
| 1644 | |
|---|
| 1645 | def lss(inlist): |
|---|
| 1646 | """ |
|---|
| 1647 | Squares each value in the passed list, adds up these squares and |
|---|
| 1648 | returns the result. |
|---|
| 1649 | |
|---|
| 1650 | Usage: lss(inlist) |
|---|
| 1651 | """ |
|---|
| 1652 | ss = 0 |
|---|
| 1653 | for item in inlist: |
|---|
| 1654 | ss = ss + item*item |
|---|
| 1655 | return ss |
|---|
| 1656 | |
|---|
| 1657 | |
|---|
| 1658 | def lsummult (list1,list2): |
|---|
| 1659 | """ |
|---|
| 1660 | Multiplies elements in list1 and list2, element by element, and |
|---|
| 1661 | returns the sum of all resulting multiplications. Must provide equal |
|---|
| 1662 | length lists. |
|---|
| 1663 | |
|---|
| 1664 | Usage: lsummult(list1,list2) |
|---|
| 1665 | """ |
|---|
| 1666 | if len(list1) <> len(list2): |
|---|
| 1667 | raise ValueError, "Lists not equal length in summult." |
|---|
| 1668 | s = 0 |
|---|
| 1669 | for item1,item2 in pstat.abut(list1,list2): |
|---|
| 1670 | s = s + item1*item2 |
|---|
| 1671 | return s |
|---|
| 1672 | |
|---|
| 1673 | |
|---|
| 1674 | def lsumdiffsquared(x,y): |
|---|
| 1675 | """ |
|---|
| 1676 | Takes pairwise differences of the values in lists x and y, squares |
|---|
| 1677 | these differences, and returns the sum of these squares. |
|---|
| 1678 | |
|---|
| 1679 | Usage: lsumdiffsquared(x,y) |
|---|
| 1680 | Returns: sum[(x[i]-y[i])**2] |
|---|
| 1681 | """ |
|---|
| 1682 | sds = 0 |
|---|
| 1683 | for i in range(len(x)): |
|---|
| 1684 | sds = sds + (x[i]-y[i])**2 |
|---|
| 1685 | return sds |
|---|
| 1686 | |
|---|
| 1687 | |
|---|
| 1688 | def lsquare_of_sums(inlist): |
|---|
| 1689 | """ |
|---|
| 1690 | Adds the values in the passed list, squares the sum, and returns |
|---|
| 1691 | the result. |
|---|
| 1692 | |
|---|
| 1693 | Usage: lsquare_of_sums(inlist) |
|---|
| 1694 | Returns: sum(inlist[i])**2 |
|---|
| 1695 | """ |
|---|
| 1696 | s = sum(inlist) |
|---|
| 1697 | return float(s)*s |
|---|
| 1698 | |
|---|
| 1699 | |
|---|
| 1700 | def lshellsort(inlist): |
|---|
| 1701 | """ |
|---|
| 1702 | Shellsort algorithm. Sorts a 1D-list. |
|---|
| 1703 | |
|---|
| 1704 | Usage: lshellsort(inlist) |
|---|
| 1705 | Returns: sorted-inlist, sorting-index-vector (for original list) |
|---|
| 1706 | """ |
|---|
| 1707 | n = len(inlist) |
|---|
| 1708 | svec = copy.deepcopy(inlist) |
|---|
| 1709 | ivec = range(n) |
|---|
| 1710 | gap = n/2 # integer division needed |
|---|
| 1711 | while gap >0: |
|---|
| 1712 | for i in range(gap,n): |
|---|
| 1713 | for j in range(i-gap,-1,-gap): |
|---|
| 1714 | while j>=0 and svec[j]>svec[j+gap]: |
|---|
| 1715 | temp = svec[j] |
|---|
| 1716 | svec[j] = svec[j+gap] |
|---|
| 1717 | svec[j+gap] = temp |
|---|
| 1718 | itemp = ivec[j] |
|---|
| 1719 | ivec[j] = ivec[j+gap] |
|---|
| 1720 | ivec[j+gap] = itemp |
|---|
| 1721 | gap = gap / 2 # integer division needed |
|---|
| 1722 | # svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]] |
|---|
| 1723 | return svec, ivec |
|---|
| 1724 | |
|---|
| 1725 | |
|---|
| 1726 | def lrankdata(inlist): |
|---|
| 1727 | """ |
|---|
| 1728 | Ranks the data in inlist, dealing with ties appropritely. Assumes |
|---|
| 1729 | a 1D inlist. Adapted from Gary Perlman's |Stat ranksort. |
|---|
| 1730 | |
|---|
| 1731 | Usage: lrankdata(inlist) |
|---|
| 1732 | Returns: a list of length equal to inlist, containing rank scores |
|---|
| 1733 | """ |
|---|
| 1734 | n = len(inlist) |
|---|
| 1735 | svec, ivec = shellsort(inlist) |
|---|
| 1736 | sumranks = 0 |
|---|
| 1737 | dupcount = 0 |
|---|
| 1738 | newlist = [0]*n |
|---|
| 1739 | for i in range(n): |
|---|
| 1740 | sumranks = sumranks + i |
|---|
| 1741 | dupcount = dupcount + 1 |
|---|
| 1742 | if i==n-1 or svec[i] <> svec[i+1]: |
|---|
| 1743 | averank = sumranks / float(dupcount) + 1 |
|---|
| 1744 | for j in range(i-dupcount+1,i+1): |
|---|
| 1745 | newlist[ivec[j]] = averank |
|---|
| 1746 | sumranks = 0 |
|---|
| 1747 | dupcount = 0 |
|---|
| 1748 | return newlist |
|---|
| 1749 | |
|---|
| 1750 | |
|---|
| 1751 | def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob): |
|---|
| 1752 | """ |
|---|
| 1753 | Prints or write to a file stats for two groups, using the name, n, |
|---|
| 1754 | mean, sterr, min and max for each group, as well as the statistic name, |
|---|
| 1755 | its value, and the associated p-value. |
|---|
| 1756 | |
|---|
| 1757 | Usage: outputpairedstats(fname,writemode, |
|---|
| 1758 | name1,n1,mean1,stderr1,min1,max1, |
|---|
| 1759 | name2,n2,mean2,stderr2,min2,max2, |
|---|
| 1760 | statname,stat,prob) |
|---|
| 1761 | Returns: None |
|---|
| 1762 | """ |
|---|
| 1763 | suffix = '' # for *s after the p-value |
|---|
| 1764 | try: |
|---|
| 1765 | x = prob.shape |
|---|
| 1766 | prob = prob[0] |
|---|
| 1767 | except: |
|---|
| 1768 | pass |
|---|
| 1769 | if prob < 0.001: suffix = ' ***' |
|---|
| 1770 | elif prob < 0.01: suffix = ' **' |
|---|
| 1771 | elif prob < 0.05: suffix = ' *' |
|---|
| 1772 | title = [['Name','N','Mean','SD','Min','Max']] |
|---|
| 1773 | lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1], |
|---|
| 1774 | [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]] |
|---|
| 1775 | if type(fname)<>StringType or len(fname)==0: |
|---|
| 1776 | print |
|---|
| 1777 | print statname |
|---|
| 1778 | print |
|---|
| 1779 | pstat.printcc(lofl) |
|---|
| 1780 | print |
|---|
| 1781 | try: |
|---|
| 1782 | if stat.shape == (): |
|---|
| 1783 | stat = stat[0] |
|---|
| 1784 | if prob.shape == (): |
|---|
| 1785 | prob = prob[0] |
|---|
| 1786 | except: |
|---|
| 1787 | pass |
|---|
| 1788 | print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix |
|---|
| 1789 | print |
|---|
| 1790 | else: |
|---|
| 1791 | file = open(fname,writemode) |
|---|
| 1792 | file.write('\n'+statname+'\n\n') |
|---|
| 1793 | file.close() |
|---|
| 1794 | writecc(lofl,fname,'a') |
|---|
| 1795 | file = open(fname,'a') |
|---|
| 1796 | try: |
|---|
| 1797 | if stat.shape == (): |
|---|
| 1798 | stat = stat[0] |
|---|
| 1799 | if prob.shape == (): |
|---|
| 1800 | prob = prob[0] |
|---|
| 1801 | except: |
|---|
| 1802 | pass |
|---|
| 1803 | file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n'])) |
|---|
| 1804 | file.close() |
|---|
| 1805 | return None |
|---|
| 1806 | |
|---|
| 1807 | |
|---|
| 1808 | def lfindwithin (data): |
|---|
| 1809 | """ |
|---|
| 1810 | Returns an integer representing a binary vector, where 1=within- |
|---|
| 1811 | subject factor, 0=between. Input equals the entire data 2D list (i.e., |
|---|
| 1812 | column 0=random factor, column -1=measured values (those two are skipped). |
|---|
| 1813 | Note: input data is in |Stat format ... a list of lists ("2D list") with |
|---|
| 1814 | one row per measured value, first column=subject identifier, last column= |
|---|
| 1815 | score, one in-between column per factor (these columns contain level |
|---|
| 1816 | designations on each factor). See also stats.anova.__doc__. |
|---|
| 1817 | |
|---|
| 1818 | Usage: lfindwithin(data) data in |Stat format |
|---|
| 1819 | """ |
|---|
| 1820 | |
|---|
| 1821 | numfact = len(data[0])-1 |
|---|
| 1822 | withinvec = 0 |
|---|
| 1823 | for col in range(1,numfact): |
|---|
| 1824 | examplelevel = pstat.unique(pstat.colex(data,col))[0] |
|---|
| 1825 | rows = pstat.linexand(data,col,examplelevel) # get 1 level of this factor |
|---|
| 1826 | factsubjs = pstat.unique(pstat.colex(rows,0)) |
|---|
| 1827 | allsubjs = pstat.unique(pstat.colex(data,0)) |
|---|
| 1828 | if len(factsubjs) == len(allsubjs): # fewer Ss than scores on this factor? |
|---|
| 1829 | withinvec = withinvec + (1 << col) |
|---|
| 1830 | return withinvec |
|---|
| 1831 | |
|---|
| 1832 | |
|---|
| 1833 | ######################################################### |
|---|
| 1834 | ######################################################### |
|---|
| 1835 | ####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS ######### |
|---|
| 1836 | ######################################################### |
|---|
| 1837 | ######################################################### |
|---|
| 1838 | |
|---|
| 1839 | ## CENTRAL TENDENCY: |
|---|
| 1840 | geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), ) |
|---|
| 1841 | harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), ) |
|---|
| 1842 | mean = Dispatch ( (lmean, (ListType, TupleType)), ) |
|---|
| 1843 | median = Dispatch ( (lmedian, (ListType, TupleType)), ) |
|---|
| 1844 | medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), ) |
|---|
| 1845 | mode = Dispatch ( (lmode, (ListType, TupleType)), ) |
|---|
| 1846 | |
|---|
| 1847 | ## MOMENTS: |
|---|
| 1848 | moment = Dispatch ( (lmoment, (ListType, TupleType)), ) |
|---|
| 1849 | variation = Dispatch ( (lvariation, (ListType, TupleType)), ) |
|---|
| 1850 | skew = Dispatch ( (lskew, (ListType, TupleType)), ) |
|---|
| 1851 | kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), ) |
|---|
| 1852 | describe = Dispatch ( (ldescribe, (ListType, TupleType)), ) |
|---|
| 1853 | |
|---|
| 1854 | ## FREQUENCY STATISTICS: |
|---|
| 1855 | itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), ) |
|---|
| 1856 | scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), ) |
|---|
| 1857 | percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), ) |
|---|
| 1858 | histogram = Dispatch ( (lhistogram, (ListType, TupleType)), ) |
|---|
| 1859 | cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), ) |
|---|
| 1860 | relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), ) |
|---|
| 1861 | |
|---|
| 1862 | ## VARIABILITY: |
|---|
| 1863 | obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), ) |
|---|
| 1864 | samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), ) |
|---|
| 1865 | samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), ) |
|---|
| 1866 | var = Dispatch ( (lvar, (ListType, TupleType)), ) |
|---|
| 1867 | stdev = Dispatch ( (lstdev, (ListType, TupleType)), ) |
|---|
| 1868 | sterr = Dispatch ( (lsterr, (ListType, TupleType)), ) |
|---|
| 1869 | sem = Dispatch ( (lsem, (ListType, TupleType)), ) |
|---|
| 1870 | z = Dispatch ( (lz, (ListType, TupleType)), ) |
|---|
| 1871 | zs = Dispatch ( (lzs, (ListType, TupleType)), ) |
|---|
| 1872 | |
|---|
| 1873 | ## TRIMMING FCNS: |
|---|
| 1874 | trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), ) |
|---|
| 1875 | trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), ) |
|---|
| 1876 | |
|---|
| 1877 | ## CORRELATION FCNS: |
|---|
| 1878 | paired = Dispatch ( (lpaired, (ListType, TupleType)), ) |
|---|
| 1879 | pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), ) |
|---|
| 1880 | spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), ) |
|---|
| 1881 | pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), ) |
|---|
| 1882 | kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), ) |
|---|
| 1883 | linregress = Dispatch ( (llinregress, (ListType, TupleType)), ) |
|---|
| 1884 | |
|---|
| 1885 | ## INFERENTIAL STATS: |
|---|
| 1886 | ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), ) |
|---|
| 1887 | ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), ) |
|---|
| 1888 | ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), ) |
|---|
| 1889 | chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), ) |
|---|
| 1890 | ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), ) |
|---|
| 1891 | mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), ) |
|---|
| 1892 | ranksums = Dispatch ( (lranksums, (ListType, TupleType)), ) |
|---|
| 1893 | tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), ) |
|---|
| 1894 | wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), ) |
|---|
| 1895 | kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), ) |
|---|
| 1896 | friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), ) |
|---|
| 1897 | |
|---|
| 1898 | ## PROBABILITY CALCS: |
|---|
| 1899 | chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), ) |
|---|
| 1900 | zprob = Dispatch ( (lzprob, (IntType, FloatType)), ) |
|---|
| 1901 | ksprob = Dispatch ( (lksprob, (IntType, FloatType)), ) |
|---|
| 1902 | fprob = Dispatch ( (lfprob, (IntType, FloatType)), ) |
|---|
| 1903 | betacf = Dispatch ( (lbetacf, (IntType, FloatType)), ) |
|---|
| 1904 | betai = Dispatch ( (lbetai, (IntType, FloatType)), ) |
|---|
| 1905 | erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), ) |
|---|
| 1906 | gammln = Dispatch ( (lgammln, (IntType, FloatType)), ) |
|---|
| 1907 | |
|---|
| 1908 | ## ANOVA FUNCTIONS: |
|---|
| 1909 | F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), ) |
|---|
| 1910 | F_value = Dispatch ( (lF_value, (ListType, TupleType)), ) |
|---|
| 1911 | |
|---|
| 1912 | ## SUPPORT FUNCTIONS: |
|---|
| 1913 | incr = Dispatch ( (lincr, (ListType, TupleType)), ) |
|---|
| 1914 | sum = Dispatch ( (lsum, (ListType, TupleType)), ) |
|---|
| 1915 | cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), ) |
|---|
| 1916 | ss = Dispatch ( (lss, (ListType, TupleType)), ) |
|---|
| 1917 | summult = Dispatch ( (lsummult, (ListType, TupleType)), ) |
|---|
| 1918 | square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), ) |
|---|
| 1919 | sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), ) |
|---|
| 1920 | shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), ) |
|---|
| 1921 | rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), ) |
|---|
| 1922 | findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), ) |
|---|
| 1923 | |
|---|
| 1924 | |
|---|
| 1925 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1926 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1927 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1928 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1929 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1930 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1931 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1932 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1933 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1934 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1935 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1936 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1937 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1938 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1939 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1940 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1941 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1942 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1943 | #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== |
|---|
| 1944 | |
|---|
| 1945 | try: # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE |
|---|
| 1946 | import Numeric |
|---|
| 1947 | N = Numeric |
|---|
| 1948 | import LinearAlgebra |
|---|
| 1949 | LA = LinearAlgebra |
|---|
| 1950 | |
|---|
| 1951 | |
|---|
| 1952 | ##################################### |
|---|
| 1953 | ######## ACENTRAL TENDENCY ######## |
|---|
| 1954 | ##################################### |
|---|
| 1955 | |
|---|
| 1956 | def ageometricmean (inarray,dimension=None,keepdims=0): |
|---|
| 1957 | """ |
|---|
| 1958 | Calculates the geometric mean of the values in the passed array. |
|---|
| 1959 | That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in |
|---|
| 1960 | the passed array. Use dimension=None to flatten array first. REMEMBER: if |
|---|
| 1961 | dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and |
|---|
| 1962 | if dimension is a sequence, it collapses over all specified dimensions. If |
|---|
| 1963 | keepdims is set to 1, the resulting array will have as many dimensions as |
|---|
| 1964 | inarray, with only 1 'level' per dim that was collapsed over. |
|---|
| 1965 | |
|---|
| 1966 | Usage: ageometricmean(inarray,dimension=None,keepdims=0) |
|---|
| 1967 | Returns: geometric mean computed over dim(s) listed in dimension |
|---|
| 1968 | """ |
|---|
| 1969 | inarray = N.array(inarray,N.Float) |
|---|
| 1970 | if dimension == None: |
|---|
| 1971 | inarray = N.ravel(inarray) |
|---|
| 1972 | size = len(inarray) |
|---|
| 1973 | mult = N.power(inarray,1.0/size) |
|---|
| 1974 | mult = N.multiply.reduce(mult) |
|---|
| 1975 | elif type(dimension) in [IntType,FloatType]: |
|---|
| 1976 | size = inarray.shape[dimension] |
|---|
| 1977 | mult = N.power(inarray,1.0/size) |
|---|
| 1978 | mult = N.multiply.reduce(mult,dimension) |
|---|
| 1979 | if keepdims == 1: |
|---|
| 1980 | shp = list(inarray.shape) |
|---|
| 1981 | shp[dimension] = 1 |
|---|
| 1982 | sum = N.reshape(sum,shp) |
|---|
| 1983 | else: # must be a SEQUENCE of dims to average over |
|---|
| 1984 | dims = list(dimension) |
|---|
| 1985 | dims.sort() |
|---|
| 1986 | dims.reverse() |
|---|
| 1987 | size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) |
|---|
| 1988 | mult = N.power(inarray,1.0/size) |
|---|
| 1989 | for dim in dims: |
|---|
| 1990 | mult = N.multiply.reduce(mult,dim) |
|---|
| 1991 | if keepdims == 1: |
|---|
| 1992 | shp = list(inarray.shape) |
|---|
| 1993 | for dim in dims: |
|---|
| 1994 | shp[dim] = 1 |
|---|
| 1995 | mult = N.reshape(mult,shp) |
|---|
| 1996 | return mult |
|---|
| 1997 | |
|---|
| 1998 | |
|---|
| 1999 | def aharmonicmean (inarray,dimension=None,keepdims=0): |
|---|
| 2000 | """ |
|---|
| 2001 | Calculates the harmonic mean of the values in the passed array. |
|---|
| 2002 | That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in |
|---|
| 2003 | the passed array. Use dimension=None to flatten array first. REMEMBER: if |
|---|
| 2004 | dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and |
|---|
| 2005 | if dimension is a sequence, it collapses over all specified dimensions. If |
|---|
| 2006 | keepdims is set to 1, the resulting array will have as many dimensions as |
|---|
| 2007 | inarray, with only 1 'level' per dim that was collapsed over. |
|---|
| 2008 | |
|---|
| 2009 | Usage: aharmonicmean(inarray,dimension=None,keepdims=0) |
|---|
| 2010 | Returns: harmonic mean computed over dim(s) in dimension |
|---|
| 2011 | """ |
|---|
| 2012 | inarray = inarray.astype(N.Float) |
|---|
| 2013 | if dimension == None: |
|---|
| 2014 | inarray = N.ravel(inarray) |
|---|
| 2015 | size = len(inarray) |
|---|
| 2016 | s = N.add.reduce(1.0 / inarray) |
|---|
| 2017 | elif type(dimension) in [IntType,FloatType]: |
|---|
| 2018 | size = float(inarray.shape[dimension]) |
|---|
| 2019 | s = N.add.reduce(1.0/inarray, dimension) |
|---|
| 2020 | if keepdims == 1: |
|---|
| 2021 | shp = list(inarray.shape) |
|---|
| 2022 | shp[dimension] = 1 |
|---|
| 2023 | s = N.reshape(s,shp) |
|---|
| 2024 | else: # must be a SEQUENCE of dims to average over |
|---|
| 2025 | dims = list(dimension) |
|---|
| 2026 | dims.sort() |
|---|
| 2027 | nondims = [] |
|---|
| 2028 | for i in range(len(inarray.shape)): |
|---|
| 2029 | if i not in dims: |
|---|
| 2030 | nondims.append(i) |
|---|
| 2031 | tinarray = N.transpose(inarray,nondims+dims) # put keep-dims first |
|---|
| 2032 | idx = [0] *len(nondims) |
|---|
| 2033 | if idx == []: |
|---|
| 2034 | size = len(N.ravel(inarray)) |
|---|
| 2035 | s = asum(1.0 / inarray) |
|---|
| 2036 | if keepdims == 1: |
|---|
| 2037 | s = N.reshape([s],N.ones(len(inarray.shape))) |
|---|
| 2038 | else: |
|---|
| 2039 | idx[0] = -1 |
|---|
| 2040 | loopcap = N.array(tinarray.shape[0:len(nondims)]) -1 |
|---|
| 2041 | s = N.zeros(loopcap+1,N.Float) |
|---|
| 2042 | while incr(idx,loopcap) <> -1: |
|---|
| 2043 | s[idx] = asum(1.0/tinarray[idx]) |
|---|
| 2044 | size = N.multiply.reduce(N.take(inarray.shape,dims)) |
|---|
| 2045 | if keepdims == 1: |
|---|
| 2046 | shp = list(inarray.shape) |
|---|
| 2047 | for dim in dims: |
|---|
| 2048 | shp[dim] = 1 |
|---|
| 2049 | s = N.reshape(s,shp) |
|---|
| 2050 | return size / s |
|---|
| 2051 | |
|---|
| 2052 | |
|---|
| 2053 | def amean (inarray,dimension=None,keepdims=0): |
|---|
| 2054 | """ |
|---|
| 2055 | Calculates the arithmatic mean of the values in the passed array. |
|---|
| 2056 | That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the |
|---|
| 2057 | passed array. Use dimension=None to flatten array first. REMEMBER: if |
|---|
| 2058 | dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and |
|---|
| 2059 | if dimension is a sequence, it collapses over all specified dimensions. If |
|---|
| 2060 | keepdims is set to 1, the resulting array will have as many dimensions as |
|---|
| 2061 | inarray, with only 1 'level' per dim that was collapsed over. |
|---|
| 2062 | |
|---|
| 2063 | Usage: amean(inarray,dimension=None,keepdims=0) |
|---|
| 2064 | Returns: arithematic mean calculated over dim(s) in dimension |
|---|
| 2065 | """ |
|---|
| 2066 | if inarray.typecode() in ['l','s','b']: |
|---|
| 2067 | inarray = inarray.astype(N.Float) |
|---|
| 2068 | if dimension == None: |
|---|
| 2069 | inarray = N.ravel(inarray) |
|---|
| 2070 | sum = N.add.reduce(inarray) |
|---|
| 2071 | denom = float(len(inarray)) |
|---|
| 2072 | elif type(dimension) in [IntType,FloatType]: |
|---|
| 2073 | sum = asum(inarray,dimension) |
|---|
| 2074 | denom = float(inarray.shape[dimension]) |
|---|
| 2075 | if keepdims == 1: |
|---|
| 2076 | shp = list(inarray.shape) |
|---|
| 2077 | shp[dimension] = 1 |
|---|
| 2078 | sum = N.reshape(sum,shp) |
|---|
| 2079 | else: # must be a TUPLE of dims to average over |
|---|
| 2080 | dims = list(dimension) |
|---|
| 2081 | dims.sort() |
|---|
| 2082 | dims.reverse() |
|---|
| 2083 | sum = inarray *1.0 |
|---|
| 2084 | for dim in dims: |
|---|
| 2085 | sum = N.add.reduce(sum,dim) |
|---|
| 2086 | denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) |
|---|
| 2087 | if keepdims == 1: |
|---|
| 2088 | shp = list(inarray.shape) |
|---|
| 2089 | for dim in dims: |
|---|
| 2090 | shp[dim] = 1 |
|---|
| 2091 | sum = N.reshape(sum,shp) |
|---|
| 2092 | return sum/denom |
|---|
| 2093 | |
|---|
| 2094 | |
|---|
| 2095 | def amedian (inarray,numbins=1000): |
|---|
| 2096 | """ |
|---|
| 2097 | Calculates the COMPUTED median value of an array of numbers, given the |
|---|
| 2098 | number of bins to use for the histogram (more bins approaches finding the |
|---|
| 2099 | precise median value of the array; default number of bins = 1000). From |
|---|
| 2100 | G.W. Heiman's Basic Stats, or CRC Probability & Statistics. |
|---|
| 2101 | NOTE: THIS ROUTINE ALWAYS uses the entire passed array (flattens it first). |
|---|
| 2102 | |
|---|
| 2103 | Usage: amedian(inarray,numbins=1000) |
|---|
| 2104 | Returns: median calculated over ALL values in inarray |
|---|
| 2105 | """ |
|---|
| 2106 | inarray = N.ravel(inarray) |
|---|
| 2107 | (hist, smallest, binsize, extras) = ahistogram(inarray,numbins) |
|---|
| 2108 | cumhist = N.cumsum(hist) # make cumulative histogram |
|---|
| 2109 | otherbins = N.greater_equal(cumhist,len(inarray)/2.0) |
|---|
| 2110 | otherbins = list(otherbins) # list of 0/1s, 1s start at median bin |
|---|
| 2111 | cfbin = otherbins.index(1) # get 1st(!) index holding 50%ile score |
|---|
| 2112 | LRL = smallest + binsize*cfbin # get lower read limit of that bin |
|---|
| 2113 | cfbelow = N.add.reduce(hist[0:cfbin]) # cum. freq. below bin |
|---|
| 2114 | freq = hist[cfbin] # frequency IN the 50%ile bin |
|---|
| 2115 | median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN |
|---|
| 2116 | return median |
|---|
| 2117 | |
|---|
| 2118 | |
|---|
| 2119 | def amedianscore (inarray,dimension=None): |
|---|
| 2120 | """ |
|---|
| 2121 | Returns the 'middle' score of the passed array. If there is an even |
|---|
| 2122 | number of scores, the mean of the 2 middle scores is returned. Can function |
|---|
| 2123 | with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can |
|---|
| 2124 | be None, to pre-flatten the array, or else dimension must equal 0). |
|---|
| 2125 | |
|---|
| 2126 | Usage: amedianscore(inarray,dimension=None) |
|---|
| 2127 | Returns: 'middle' score of the array, or the mean of the 2 middle scores |
|---|
| 2128 | """ |
|---|
| 2129 | if dimension == None: |
|---|
| 2130 | inarray = N.ravel(inarray) |
|---|
| 2131 | dimension = 0 |
|---|
| 2132 | inarray = N.sort(inarray,dimension) |
|---|
| 2133 | if inarray.shape[dimension] % 2 == 0: # if even number of elements |
|---|
| 2134 | indx = inarray.shape[dimension]/2 # integer division correct |
|---|
| 2135 | median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0 |
|---|
| 2136 | else: |
|---|
| 2137 | indx = inarray.shape[dimension] / 2 # integer division correct |
|---|
| 2138 | median = N.take(inarray,[indx],dimension) |
|---|
| 2139 | if median.shape == (1,): |
|---|
| 2140 | median = median[0] |
|---|
| 2141 | return median |
|---|
| 2142 | |
|---|
| 2143 | |
|---|
| 2144 | def amode(a, dimension=None): |
|---|
| 2145 | """ |
|---|
| 2146 | Returns an array of the modal (most common) score in the passed array. |
|---|
| 2147 | If there is more than one such score, ONLY THE FIRST is returned. |
|---|
| 2148 | The bin-count for the modal values is also returned. Operates on whole |
|---|
| 2149 | array (dimension=None), or on a given dimension. |
|---|
| 2150 | |
|---|
| 2151 | Usage: amode(a, dimension=None) |
|---|
| 2152 | Returns: array of bin-counts for mode(s), array of corresponding modal values |
|---|
| 2153 | """ |
|---|
| 2154 | |
|---|
| 2155 | if dimension == None: |
|---|
| 2156 | a = N.ravel(a) |
|---|
| 2157 | dimension = 0 |
|---|
| 2158 | scores = pstat.aunique(N.ravel(a)) # get ALL unique values |
|---|
| 2159 | testshape = list(a.shape) |
|---|
| 2160 | testshape[dimension] = 1 |
|---|
| 2161 | oldmostfreq = N.zeros(testshape) |
|---|
| 2162 | oldcounts = N.zeros(testshape) |
|---|
| 2163 | for score in scores: |
|---|
| 2164 | template = N.equal(a,score) |
|---|
| 2165 | counts = asum(template,dimension,1) |
|---|
| 2166 | mostfrequent = N.where(N.greater(counts,oldcounts),score,oldmostfreq) |
|---|
| 2167 | oldcounts = N.where(N.greater(counts,oldcounts),counts,oldcounts) |
|---|
| 2168 | oldmostfreq = mostfrequent |
|---|
| 2169 | return oldcounts, mostfrequent |
|---|
| 2170 | |
|---|
| 2171 | |
|---|
| 2172 | def atmean(a,limits=None,inclusive=(1,1)): |
|---|
| 2173 | """ |
|---|
| 2174 | Returns the arithmetic mean of all values in an array, ignoring values |
|---|
| 2175 | strictly outside the sequence passed to 'limits'. Note: either limit |
|---|
| 2176 | in the sequence, or the value of limits itself, can be set to None. The |
|---|
| 2177 | inclusive list/tuple determines whether the lower and upper limiting bounds |
|---|
| 2178 | (respectively) are open/exclusive (0) or closed/inclusive (1). |
|---|
| 2179 | |
|---|
| 2180 | Usage: atmean(a,limits=None,inclusive=(1,1)) |
|---|
| 2181 | """ |
|---|
| 2182 | if a.typecode() in ['l','s','b']: |
|---|
| 2183 | a = a.astype(N.Float) |
|---|
| 2184 | if limits == None: |
|---|
| 2185 | return mean(a) |
|---|
| 2186 | assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atmean" |
|---|
| 2187 | if inclusive[0]: lowerfcn = N.greater_equal |
|---|
| 2188 | else: lowerfcn = N.greater |
|---|
| 2189 | if inclusive[1]: upperfcn = N.less_equal |
|---|
| 2190 | else: upperfcn = N.less |
|---|
| 2191 | if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): |
|---|
| 2192 | raise ValueError, "No array values within given limits (atmean)." |
|---|
| 2193 | elif limits[0]==None and limits[1]<>None: |
|---|
| 2194 | mask = upperfcn(a,limits[1]) |
|---|
| 2195 | elif limits[0]<>None and limits[1]==None: |
|---|
| 2196 | mask = lowerfcn(a,limits[0]) |
|---|
| 2197 | elif limits[0]<>None and limits[1]<>None: |
|---|
| 2198 | mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) |
|---|
| 2199 | s = float(N.add.reduce(N.ravel(a*mask))) |
|---|
| 2200 | n = float(N.add.reduce(N.ravel(mask))) |
|---|
| 2201 | return s/n |
|---|
| 2202 | |
|---|
| 2203 | |
|---|
| 2204 | def atvar(a,limits=None,inclusive=(1,1)): |
|---|
| 2205 | """ |
|---|
| 2206 | Returns the sample variance of values in an array, (i.e., using N-1), |
|---|
| 2207 | ignoring values strictly outside the sequence passed to 'limits'. |
|---|
| 2208 | Note: either limit in the sequence, or the value of limits itself, |
|---|
| 2209 | can be set to None. The inclusive list/tuple determines whether the lower |
|---|
| 2210 | and upper limiting bounds (respectively) are open/exclusive (0) or |
|---|
| 2211 | closed/inclusive (1). |
|---|
| 2212 | |
|---|
| 2213 | Usage: atvar(a,limits=None,inclusive=(1,1)) |
|---|
| 2214 | """ |
|---|
| 2215 | a = a.astype(N.Float) |
|---|
| 2216 | if limits == None or limits == [None,None]: |
|---|
| 2217 | term1 = N.add.reduce(N.ravel(a*a)) |
|---|
| 2218 | n = float(len(N.ravel(a))) - 1 |
|---|
| 2219 | term2 = N.add.reduce(N.ravel(a))**2 / n |
|---|
| 2220 | print term1, term2, n |
|---|
| 2221 | return (term1 - term2) / n |
|---|
| 2222 | assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atvar" |
|---|
| 2223 | if inclusive[0]: lowerfcn = N.greater_equal |
|---|
| 2224 | else: lowerfcn = N.greater |
|---|
| 2225 | if inclusive[1]: upperfcn = N.less_equal |
|---|
| 2226 | else: upperfcn = N.less |
|---|
| 2227 | if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): |
|---|
| 2228 | raise ValueError, "No array values within given limits (atvar)." |
|---|
| 2229 | elif limits[0]==None and limits[1]<>None: |
|---|
| 2230 | mask = upperfcn(a,limits[1]) |
|---|
| 2231 | elif limits[0]<>None and limits[1]==None: |
|---|
| 2232 | mask = lowerfcn(a,limits[0]) |
|---|
| 2233 | elif limits[0]<>None and limits[1]<>None: |
|---|
| 2234 | mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) |
|---|
| 2235 | term1 = N.add.reduce(N.ravel(a*a*mask)) |
|---|
| 2236 | n = float(N.add.reduce(N.ravel(mask))) - 1 |
|---|
| 2237 | term2 = N.add.reduce(N.ravel(a*mask))**2 / n |
|---|
| 2238 | print term1, term2, n |
|---|
| 2239 | return (term1 - term2) / n |
|---|
| 2240 | |
|---|
| 2241 | |
|---|
| 2242 | def atmin(a,lowerlimit=None,dimension=None,inclusive=1): |
|---|
| 2243 | """ |
|---|
| 2244 | Returns the minimum value of a, along dimension, including only values less |
|---|
| 2245 | than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None, |
|---|
| 2246 | all values in the array are used. |
|---|
| 2247 | |
|---|
| 2248 | Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1) |
|---|
| 2249 | """ |
|---|
| 2250 | if inclusive: lowerfcn = N.greater |
|---|
| 2251 | else: lowerfcn = N.greater_equal |
|---|
| 2252 | if dimension == None: |
|---|
| 2253 | a = N.ravel(a) |
|---|
| 2254 | dimension = 0 |
|---|
| 2255 | if lowerlimit == None: |
|---|
| 2256 | lowerlimit = N.minimum.reduce(N.ravel(a))-11 |
|---|
| 2257 | biggest = N.maximum.reduce(N.ravel(a)) |
|---|
| 2258 | ta = N.where(lowerfcn(a,lowerlimit),a,biggest) |
|---|
| 2259 | return N.minimum.reduce(ta,dimension) |
|---|
| 2260 | |
|---|
| 2261 | |
|---|
| 2262 | def atmax(a,upperlimit,dimension=None,inclusive=1): |
|---|
| 2263 | """ |
|---|
| 2264 | Returns the maximum value of a, along dimension, including only values greater |
|---|
| 2265 | than (or equal to, if inclusive=1) upperlimit. If the limit is set to None, |
|---|
| 2266 | a limit larger than the max value in the array is used. |
|---|
| 2267 | |
|---|
| 2268 | Usage: atmax(a,upperlimit,dimension=None,inclusive=1) |
|---|
| 2269 | """ |
|---|
| 2270 | if inclusive: upperfcn = N.less |
|---|
| 2271 | else: upperfcn = N.less_equal |
|---|
| 2272 | if dimension == None: |
|---|
| 2273 | a = N.ravel(a) |
|---|
| 2274 | dimension = 0 |
|---|
| 2275 | if upperlimit == None: |
|---|
| 2276 | upperlimit = N.maximum.reduce(N.ravel(a))+1 |
|---|
| 2277 | smallest = N.minimum.reduce(N.ravel(a)) |
|---|
| 2278 | ta = N.where(upperfcn(a,upperlimit),a,smallest) |
|---|
| 2279 | return N.maximum.reduce(ta,dimension) |
|---|
| 2280 | |
|---|
| 2281 | |
|---|
| 2282 | def atstdev(a,limits=None,inclusive=(1,1)): |
|---|
| 2283 | """ |
|---|
| 2284 | Returns the standard deviation of all values in an array, ignoring values |
|---|
| 2285 | strictly outside the sequence passed to 'limits'. Note: either limit |
|---|
| 2286 | in the sequence, or the value of limits itself, can be set to None. The |
|---|
| 2287 | inclusive list/tuple determines whether the lower and upper limiting bounds |
|---|
| 2288 | (respectively) are open/exclusive (0) or closed/inclusive (1). |
|---|
| 2289 | |
|---|
| 2290 | Usage: atstdev(a,limits=None,inclusive=(1,1)) |
|---|
| 2291 | """ |
|---|
| 2292 | return N.sqrt(tvar(a,limits,inclusive)) |
|---|
| 2293 | |
|---|
| 2294 | |
|---|
| 2295 | def atsem(a,limits=None,inclusive=(1,1)): |
|---|
| 2296 | """ |
|---|
| 2297 | Returns the standard error of the mean for the values in an array, |
|---|
| 2298 | (i.e., using N for the denominator), ignoring values strictly outside |
|---|
| 2299 | the sequence passed to 'limits'. Note: either limit in the sequence, |
|---|
| 2300 | or the value of limits itself, can be set to None. The inclusive list/tuple |
|---|
| 2301 | determines whether the lower and upper limiting bounds (respectively) are |
|---|
| 2302 | open/exclusive (0) or closed/inclusive (1). |
|---|
| 2303 | |
|---|
| 2304 | Usage: atsem(a,limits=None,inclusive=(1,1)) |
|---|
| 2305 | """ |
|---|
| 2306 | sd = tstdev(a,limits,inclusive) |
|---|
| 2307 | if limits == None or limits == [None,None]: |
|---|
| 2308 | n = float(len(N.ravel(a))) |
|---|
| 2309 | assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atsem" |
|---|
| 2310 | if inclusive[0]: lowerfcn = N.greater_equal |
|---|
| 2311 | else: lowerfcn = N.greater |
|---|
| 2312 | if inclusive[1]: upperfcn = N.less_equal |
|---|
| 2313 | else: upperfcn = N.less |
|---|
| 2314 | if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): |
|---|
| 2315 | raise ValueError, "No array values within given limits (atsem)." |
|---|
| 2316 | elif limits[0]==None and limits[1]<>None: |
|---|
| 2317 | mask = upperfcn(a,limits[1]) |
|---|
| 2318 | elif limits[0]<>None and limits[1]==None: |
|---|
| 2319 | mask = lowerfcn(a,limits[0]) |
|---|
| 2320 | elif limits[0]<>None and limits[1]<>None: |
|---|
| 2321 | mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) |
|---|
| 2322 | term1 = N.add.reduce(N.ravel(a*a*mask)) |
|---|
| 2323 | n = float(N.add.reduce(N.ravel(mask))) |
|---|
| 2324 | return sd/math.sqrt(n) |
|---|
| 2325 | |
|---|
| 2326 | |
|---|
| 2327 | ##################################### |
|---|
| 2328 | ############ AMOMENTS ############# |
|---|
| 2329 | ##################################### |
|---|
| 2330 | |
|---|
| 2331 | def amoment(a,moment=1,dimension=None): |
|---|
| 2332 | """ |
|---|
| 2333 | Calculates the nth moment about the mean for a sample (defaults to the |
|---|
| 2334 | 1st moment). Generally used to calculate coefficients of skewness and |
|---|
| 2335 | kurtosis. Dimension can equal None (ravel array first), an integer |
|---|
| 2336 | (the dimension over which to operate), or a sequence (operate over |
|---|
| 2337 | multiple dimensions). |
|---|
| 2338 | |
|---|
| 2339 | Usage: amoment(a,moment=1,dimension=None) |
|---|
| 2340 | Returns: appropriate moment along given dimension |
|---|
| 2341 | """ |
|---|
| 2342 | if dimension == None: |
|---|
| 2343 | a = N.ravel(a) |
|---|
| 2344 | dimension = 0 |
|---|
| 2345 | if moment == 1: |
|---|
| 2346 | return 0.0 |
|---|
| 2347 | else: |
|---|
| 2348 | mn = amean(a,dimension,1) # 1=keepdims |
|---|
| 2349 | s = N.power((a-mn),moment) |
|---|
| 2350 | return amean(s,dimension) |
|---|
| 2351 | |
|---|
| 2352 | |
|---|
| 2353 | def avariation(a,dimension=None): |
|---|
| 2354 | """ |
|---|
| 2355 | Returns the coefficient of variation, as defined in CRC Standard |
|---|
| 2356 | Probability and Statistics, p.6. Dimension can equal None (ravel array |
|---|
| 2357 | first), an integer (the dimension over which to operate), or a |
|---|
| 2358 | sequence (operate over multiple dimensions). |
|---|
| 2359 | |
|---|
| 2360 | Usage: avariation(a,dimension=None) |
|---|
| 2361 | """ |
|---|
| 2362 | return 100.0*asamplestdev(a,dimension)/amean(a,dimension) |
|---|
| 2363 | |
|---|
| 2364 | |
|---|
| 2365 | def askew(a,dimension=None): |
|---|
| 2366 | """ |
|---|
| 2367 | Returns the skewness of a distribution (normal ==> 0.0; >0 means extra |
|---|
| 2368 | weight in left tail). Use askewtest() to see if it's close enough. |
|---|
| 2369 | Dimension can equal None (ravel array first), an integer (the |
|---|
| 2370 | dimension over which to operate), or a sequence (operate over multiple |
|---|
| 2371 | dimensions). |
|---|
| 2372 | |
|---|
| 2373 | Usage: askew(a, dimension=None) |
|---|
| 2374 | Returns: skew of vals in a along dimension, returning ZERO where all vals equal |
|---|
| 2375 | """ |
|---|
| 2376 | denom = N.power(amoment(a,2,dimension),1.5) |
|---|
| 2377 | zero = N.equal(denom,0) |
|---|
| 2378 | if type(denom) == N.ArrayType and asum(zero) <> 0: |
|---|
| 2379 | print "Number of zeros in askew: ",asum(zero) |
|---|
| 2380 | denom = denom + zero # prevent divide-by-zero |
|---|
| 2381 | return N.where(zero, 0, amoment(a,3,dimension)/denom) |
|---|
| 2382 | |
|---|
| 2383 | |
|---|
| 2384 | def akurtosis(a,dimension=None): |
|---|
| 2385 | """ |
|---|
| 2386 | Returns the kurtosis of a distribution (normal ==> 3.0; >3 means |
|---|
| 2387 | heavier in the tails, and usually more peaked). Use akurtosistest() |
|---|
| 2388 | to see if it's close enough. Dimension can equal None (ravel array |
|---|
| 2389 | first), an integer (the dimension over which to operate), or a |
|---|
| 2390 | sequence (operate over multiple dimensions). |
|---|
| 2391 | |
|---|
| 2392 | Usage: akurtosis(a,dimension=None) |
|---|
| 2393 | Returns: kurtosis of values in a along dimension, and ZERO where all vals equal |
|---|
| 2394 | """ |
|---|
| 2395 | denom = N.power(amoment(a,2,dimension),2) |
|---|
| 2396 | zero = N.equal(denom,0) |
|---|
| 2397 | if type(denom) == N.ArrayType and asum(zero) <> 0: |
|---|
| 2398 | print "Number of zeros in akurtosis: ",asum(zero) |
|---|
| 2399 | denom = denom + zero # prevent divide-by-zero |
|---|
| 2400 | return N.where(zero,0,amoment(a,4,dimension)/denom) |
|---|
| 2401 | |
|---|
| 2402 | |
|---|
| 2403 | def adescribe(inarray,dimension=None): |
|---|
| 2404 | """ |
|---|
| 2405 | Returns several descriptive statistics of the passed array. Dimension |
|---|
| 2406 | can equal None (ravel array first), an integer (the dimension over |
|---|
| 2407 | which to operate), or a sequence (operate over multiple dimensions). |
|---|
| 2408 | |
|---|
| 2409 | Usage: adescribe(inarray,dimension=None) |
|---|
| 2410 | Returns: n, (min,max), mean, standard deviation, skew, kurtosis |
|---|
| 2411 | """ |
|---|
| 2412 | if dimension == None: |
|---|
| 2413 | inarray = N.ravel(inarray) |
|---|
| 2414 | dimension = 0 |
|---|
| 2415 | n = inarray.shape[dimension] |
|---|
| 2416 | mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray)) |
|---|
| 2417 | m = amean(inarray,dimension) |
|---|
| 2418 | sd = astdev(inarray,dimension) |
|---|
| 2419 | skew = askew(inarray,dimension) |
|---|
| 2420 | kurt = akurtosis(inarray,dimension) |
|---|
| 2421 | return n, mm, m, sd, skew, kurt |
|---|
| 2422 | |
|---|
| 2423 | |
|---|
| 2424 | ##################################### |
|---|
| 2425 | ######## NORMALITY TESTS ########## |
|---|
| 2426 | ##################################### |
|---|
| 2427 | |
|---|
| 2428 | def askewtest(a,dimension=None): |
|---|
| 2429 | """ |
|---|
| 2430 | Tests whether the skew is significantly different from a normal |
|---|
| 2431 | distribution. Dimension can equal None (ravel array first), an |
|---|
| 2432 | integer (the dimension over which to operate), or a sequence (operate |
|---|
| 2433 | over multiple dimensions). |
|---|
| 2434 | |
|---|
| 2435 | Usage: askewtest(a,dimension=None) |
|---|
| 2436 | Returns: z-score and 2-tail z-probability |
|---|
| 2437 | """ |
|---|
| 2438 | if dimension == None: |
|---|
| 2439 | a = N.ravel(a) |
|---|
| 2440 | dimension = 0 |
|---|
| 2441 | b2 = askew(a,dimension) |
|---|
| 2442 | n = float(a.shape[dimension]) |
|---|
| 2443 | y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) ) |
|---|
| 2444 | beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) ) |
|---|
| 2445 | W2 = -1 + N.sqrt(2*(beta2-1)) |
|---|
| 2446 | delta = 1/N.sqrt(N.log(N.sqrt(W2))) |
|---|
| 2447 | alpha = N.sqrt(2/(W2-1)) |
|---|
| 2448 | y = N.where(N.equal(y,0),1,y) |
|---|
| 2449 | Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1)) |
|---|
| 2450 | return Z, (1.0-zprob(Z))*2 |
|---|
| 2451 | |
|---|
| 2452 | |
|---|
| 2453 | def akurtosistest(a,dimension=None): |
|---|
| 2454 | """ |
|---|
| 2455 | Tests whether a dataset has normal kurtosis (i.e., |
|---|
| 2456 | kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None |
|---|
| 2457 | (ravel array first), an integer (the dimension over which to operate), |
|---|
| 2458 | or a sequence (operate over multiple dimensions). |
|---|
| 2459 | |
|---|
| 2460 | Usage: akurtosistest(a,dimension=None) |
|---|
| 2461 | Returns: z-score and 2-tail z-probability, returns 0 for bad pixels |
|---|
| 2462 | """ |
|---|
| 2463 | if dimension == None: |
|---|
| 2464 | a = N.ravel(a) |
|---|
| 2465 | dimension = 0 |
|---|
| 2466 | n = float(a.shape[dimension]) |
|---|
| 2467 | if n<20: |
|---|
| 2468 | print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n |
|---|
| 2469 | b2 = akurtosis(a,dimension) |
|---|
| 2470 | E = 3.0*(n-1) /(n+1) |
|---|
| 2471 | varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5)) |
|---|
| 2472 | x = (b2-E)/N.sqrt(varb2) |
|---|
| 2473 | sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/ |
|---|
| 2474 | (n*(n-2)*(n-3))) |
|---|
| 2475 | A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2))) |
|---|
| 2476 | term1 = 1 -2/(9.0*A) |
|---|
| 2477 | denom = 1 +x*N.sqrt(2/(A-4.0)) |
|---|
| 2478 | denom = N.where(N.less(denom,0), 99, denom) |
|---|
| 2479 | term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0)) |
|---|
| 2480 | Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A)) |
|---|
| 2481 | Z = N.where(N.equal(denom,99), 0, Z) |
|---|
| 2482 | return Z, (1.0-zprob(Z))*2 |
|---|
| 2483 | |
|---|
| 2484 | |
|---|
| 2485 | def anormaltest(a,dimension=None): |
|---|
| 2486 | """ |
|---|
| 2487 | Tests whether skew and/OR kurtosis of dataset differs from normal |
|---|
| 2488 | curve. Can operate over multiple dimensions. Dimension can equal |
|---|
| 2489 | None (ravel array first), an integer (the dimension over which to |
|---|
| 2490 | operate), or a sequence (operate over multiple dimensions). |
|---|
| 2491 | |
|---|
| 2492 | Usage: anormaltest(a,dimension=None) |
|---|
| 2493 | Returns: z-score and 2-tail probability |
|---|
| 2494 | """ |
|---|
| 2495 | if dimension == None: |
|---|
| 2496 | a = N.ravel(a) |
|---|
| 2497 | dimension = 0 |
|---|
| 2498 | s,p = askewtest(a,dimension) |
|---|
| 2499 | k,p = akurtosistest(a,dimension) |
|---|
| 2500 | k2 = N.power(s,2) + N.power(k,2) |
|---|
| 2501 | return k2, achisqprob(k2,2) |
|---|
| 2502 | |
|---|
| 2503 | |
|---|
| 2504 | ##################################### |
|---|
| 2505 | ###### AFREQUENCY FUNCTIONS ####### |
|---|
| 2506 | ##################################### |
|---|
| 2507 | |
|---|
| 2508 | def aitemfreq(a): |
|---|
| 2509 | """ |
|---|
| 2510 | Returns a 2D array of item frequencies. Column 1 contains item values, |
|---|
| 2511 | column 2 contains their respective counts. Assumes a 1D array is passed. |
|---|
| 2512 | |
|---|
| 2513 | Usage: aitemfreq(a) |
|---|
| 2514 | Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) |
|---|
| 2515 | """ |
|---|
| 2516 | scores = pstat.aunique(a) |
|---|
| 2517 | scores = N.sort(scores) |
|---|
| 2518 | freq = N.zeros(len(scores)) |
|---|
| 2519 | for i in range(len(scores)): |
|---|
| 2520 | freq[i] = N.add.reduce(N.equal(a,scores[i])) |
|---|
| 2521 | return N.array(pstat.aabut(scores, freq)) |
|---|
| 2522 | |
|---|
| 2523 | |
|---|
| 2524 | def ascoreatpercentile (inarray, percent): |
|---|
| 2525 | """ |
|---|
| 2526 | Usage: ascoreatpercentile(inarray,percent) 0<percent<100 |
|---|
| 2527 | Returns: score at given percentile, relative to inarray distribution |
|---|
| 2528 | """ |
|---|
| 2529 | percent = percent / 100.0 |
|---|
| 2530 | targetcf = percent*len(inarray) |
|---|
| 2531 | h, lrl, binsize, extras = histogram(inarray) |
|---|
| 2532 | cumhist = cumsum(h*1) |
|---|
| 2533 | for i in range(len(cumhist)): |
|---|
| 2534 | if cumhist[i] >= targetcf: |
|---|
| 2535 | break |
|---|
| 2536 | score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) |
|---|
| 2537 | return score |
|---|
| 2538 | |
|---|
| 2539 | |
|---|
| 2540 | def apercentileofscore (inarray,score,histbins=10,defaultlimits=None): |
|---|
| 2541 | """ |
|---|
| 2542 | Note: result of this function depends on the values used to histogram |
|---|
| 2543 | the data(!). |
|---|
| 2544 | |
|---|
| 2545 | Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None) |
|---|
| 2546 | Returns: percentile-position of score (0-100) relative to inarray |
|---|
| 2547 | """ |
|---|
| 2548 | h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits) |
|---|
| 2549 | cumhist = cumsum(h*1) |
|---|
| 2550 | i = int((score - lrl)/float(binsize)) |
|---|
| 2551 | pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100 |
|---|
| 2552 | return pct |
|---|
| 2553 | |
|---|
| 2554 | |
|---|
| 2555 | def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1): |
|---|
| 2556 | """ |
|---|
| 2557 | Returns (i) an array of histogram bin counts, (ii) the smallest value |
|---|
| 2558 | of the histogram binning, and (iii) the bin width (the last 2 are not |
|---|
| 2559 | necessarily integers). Default number of bins is 10. Defaultlimits |
|---|
| 2560 | can be None (the routine picks bins spanning all the numbers in the |
|---|
| 2561 | inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the |
|---|
| 2562 | following: array of bin values, lowerreallimit, binsize, extrapoints. |
|---|
| 2563 | |
|---|
| 2564 | Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1) |
|---|
| 2565 | Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range) |
|---|
| 2566 | """ |
|---|
| 2567 | inarray = N.ravel(inarray) # flatten any >1D arrays |
|---|
| 2568 | if (defaultlimits <> None): |
|---|
| 2569 | lowerreallimit = defaultlimits[0] |
|---|
| 2570 | upperreallimit = defaultlimits[1] |
|---|
| 2571 | binsize = (upperreallimit-lowerreallimit) / float(numbins) |
|---|
| 2572 | else: |
|---|
| 2573 | Min = N.minimum.reduce(inarray) |
|---|
| 2574 | Max = N.maximum.reduce(inarray) |
|---|
| 2575 | estbinwidth = float(Max - Min)/float(numbins) + 1 |
|---|
| 2576 | binsize = (Max-Min+estbinwidth)/float(numbins) |
|---|
| 2577 | lowerreallimit = Min - binsize/2.0 #lower real limit,1st bin |
|---|
| 2578 | bins = N.zeros(numbins) |
|---|
| 2579 | extrapoints = 0 |
|---|
| 2580 | for num in inarray: |
|---|
| 2581 | try: |
|---|
| 2582 | if (num-lowerreallimit) < 0: |
|---|
| 2583 | extrapoints = extrapoints + 1 |
|---|
| 2584 | else: |
|---|
| 2585 | bintoincrement = int((num-lowerreallimit) / float(binsize)) |
|---|
| 2586 | bins[bintoincrement] = bins[bintoincrement] + 1 |
|---|
| 2587 | except: # point outside lower/upper limits |
|---|
| 2588 | extrapoints = extrapoints + 1 |
|---|
| 2589 | if (extrapoints > 0 and printextras == 1): |
|---|
| 2590 | print '\nPoints outside given histogram range =',extrapoints |
|---|
| 2591 | return (bins, lowerreallimit, binsize, extrapoints) |
|---|
| 2592 | |
|---|
| 2593 | |
|---|
| 2594 | def acumfreq(a,numbins=10,defaultreallimits=None): |
|---|
| 2595 | """ |
|---|
| 2596 | Returns a cumulative frequency histogram, using the histogram function. |
|---|
| 2597 | Defaultreallimits can be None (use all data), or a 2-sequence containing |
|---|
| 2598 | lower and upper limits on values to include. |
|---|
| 2599 | |
|---|
| 2600 | Usage: acumfreq(a,numbins=10,defaultreallimits=None) |
|---|
| 2601 | Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints |
|---|
| 2602 | """ |
|---|
| 2603 | h,l,b,e = histogram(a,numbins,defaultreallimits) |
|---|
| 2604 | cumhist = cumsum(h*1) |
|---|
| 2605 | return cumhist,l,b,e |
|---|
| 2606 | |
|---|
| 2607 | |
|---|
| 2608 | def arelfreq(a,numbins=10,defaultreallimits=None): |
|---|
| 2609 | """ |
|---|
| 2610 | Returns a relative frequency histogram, using the histogram function. |
|---|
| 2611 | Defaultreallimits can be None (use all data), or a 2-sequence containing |
|---|
| 2612 | lower and upper limits on values to include. |
|---|
| 2613 | |
|---|
| 2614 | Usage: arelfreq(a,numbins=10,defaultreallimits=None) |
|---|
| 2615 | Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints |
|---|
| 2616 | """ |
|---|
| 2617 | h,l,b,e = histogram(a,numbins,defaultreallimits) |
|---|
| 2618 | h = N.array(h/float(a.shape[0])) |
|---|
| 2619 | return h,l,b,e |
|---|
| 2620 | |
|---|
| 2621 | |
|---|
| 2622 | ##################################### |
|---|
| 2623 | ###### AVARIABILITY FUNCTIONS ##### |
|---|
| 2624 | ##################################### |
|---|
| 2625 | |
|---|
| 2626 | def aobrientransform(*args): |
|---|
| 2627 | """ |
|---|
| 2628 | Computes a transform on input data (any number of columns). Used to |
|---|
| 2629 | test for homogeneity of variance prior to running one-way stats. Each |
|---|
| 2630 | array in *args is one level of a factor. If an F_oneway() run on the |
|---|
| 2631 | transformed data and found significant, variances are unequal. From |
|---|
| 2632 | Maxwell and Delaney, p.112. |
|---|
| 2633 | |
|---|
| 2634 | Usage: aobrientransform(*args) *args = 1D arrays, one per level of factor |
|---|
| 2635 | Returns: transformed data for use in an ANOVA |
|---|
| 2636 | """ |
|---|
| 2637 | TINY = 1e-10 |
|---|
| 2638 | k = len(args) |
|---|
| 2639 | n = N.zeros(k,N.Float) |
|---|
| 2640 | v = N.zeros(k,N.Float) |
|---|
| 2641 | m = N.zeros(k,N.Float) |
|---|
| 2642 | nargs = [] |
|---|
| 2643 | for i in range(k): |
|---|
| 2644 | nargs.append(args[i].astype(N.Float)) |
|---|
| 2645 | n[i] = float(len(nargs[i])) |
|---|
| 2646 | v[i] = var(nargs[i]) |
|---|
| 2647 | m[i] = mean(nargs[i]) |
|---|
| 2648 | for j in range(k): |
|---|
| 2649 | for i in range(n[j]): |
|---|
| 2650 | t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 |
|---|
| 2651 | t2 = 0.5*v[j]*(n[j]-1.0) |
|---|
| 2652 | t3 = (n[j]-1.0)*(n[j]-2.0) |
|---|
| 2653 | nargs[j][i] = (t1-t2) / float(t3) |
|---|
| 2654 | check = 1 |
|---|
| 2655 | for j in range(k): |
|---|
| 2656 | if v[j] - mean(nargs[j]) > TINY: |
|---|
| 2657 | check = 0 |
|---|
| 2658 | if check <> 1: |
|---|
| 2659 | raise ValueError, 'Lack of convergence in obrientransform.' |
|---|
| 2660 | else: |
|---|
| 2661 | return N.array(nargs) |
|---|
| 2662 | |
|---|
| 2663 | |
|---|
| 2664 | def asamplevar (inarray,dimension=None,keepdims=0): |
|---|
| 2665 | """ |
|---|
| 2666 | Returns the sample standard deviation of the values in the passed |
|---|
| 2667 | array (i.e., using N). Dimension can equal None (ravel array first), |
|---|
| 2668 | an integer (the dimension over which to operate), or a sequence |
|---|
| 2669 | (operate over multiple dimensions). Set keepdims=1 to return an array |
|---|
| 2670 | with the same number of dimensions as inarray. |
|---|
| 2671 | |
|---|
| 2672 | Usage: asamplevar(inarray,dimension=None,keepdims=0) |
|---|
| 2673 | """ |
|---|
| 2674 | if dimension == None: |
|---|
| 2675 | inarray = N.ravel(inarray) |
|---|
| 2676 | dimension = 0 |
|---|
| 2677 | if dimension == 1: |
|---|
| 2678 | mn = amean(inarray,dimension)[:,N.NewAxis] |
|---|
| 2679 | else: |
|---|
| 2680 | mn = amean(inarray,dimension,keepdims=1) |
|---|
| 2681 | deviations = inarray - mn |
|---|
| 2682 | if type(dimension) == ListType: |
|---|
| 2683 | n = 1 |
|---|
| 2684 | for d in dimension: |
|---|
| 2685 | n = n*inarray.shape[d] |
|---|
| 2686 | else: |
|---|
| 2687 | n = inarray.shape[dimension] |
|---|
| 2688 | svar = ass(deviations,dimension,keepdims) / float(n) |
|---|
| 2689 | return svar |
|---|
| 2690 | |
|---|
| 2691 | |
|---|
| 2692 | def asamplestdev (inarray, dimension=None, keepdims=0): |
|---|
| 2693 | """ |
|---|
| 2694 | Returns the sample standard deviation of the values in the passed |
|---|
| 2695 | array (i.e., using N). Dimension can equal None (ravel array first), |
|---|
| 2696 | an integer (the dimension over which to operate), or a sequence |
|---|
| 2697 | (operate over multiple dimensions). Set keepdims=1 to return an array |
|---|
| 2698 | with the same number of dimensions as inarray. |
|---|
| 2699 | |
|---|
| 2700 | Usage: asamplestdev(inarray,dimension=None,keepdims=0) |
|---|
| 2701 | """ |
|---|
| 2702 | return N.sqrt(asamplevar(inarray,dimension,keepdims)) |
|---|
| 2703 | |
|---|
| 2704 | |
|---|
| 2705 | def asignaltonoise(instack,dimension=0): |
|---|
| 2706 | """ |
|---|
| 2707 | Calculates signal-to-noise. Dimension can equal None (ravel array |
|---|
| 2708 | first), an integer (the dimension over which to operate), or a |
|---|
| 2709 | sequence (operate over multiple dimensions). |
|---|
| 2710 | |
|---|
| 2711 | Usage: asignaltonoise(instack,dimension=0): |
|---|
| 2712 | Returns: array containing the value of (mean/stdev) along dimension, |
|---|
| 2713 | or 0 when stdev=0 |
|---|
| 2714 | """ |
|---|
| 2715 | m = mean(instack,dimension) |
|---|
| 2716 | sd = stdev(instack,dimension) |
|---|
| 2717 | return N.where(N.equal(sd,0),0,m/sd) |
|---|
| 2718 | |
|---|
| 2719 | |
|---|
| 2720 | def avar (inarray, dimension=None,keepdims=0): |
|---|
| 2721 | """ |
|---|
| 2722 | Returns the estimated population variance of the values in the passed |
|---|
| 2723 | array (i.e., N-1). Dimension can equal None (ravel array first), an |
|---|
| 2724 | integer (the dimension over which to operate), or a sequence (operate |
|---|
| 2725 | over multiple dimensions). Set keepdims=1 to return an array with the |
|---|
| 2726 | same number of dimensions as inarray. |
|---|
| 2727 | |
|---|
| 2728 | Usage: avar(inarray,dimension=None,keepdims=0) |
|---|
| 2729 | """ |
|---|
| 2730 | if dimension == None: |
|---|
| 2731 | inarray = N.ravel(inarray) |
|---|
| 2732 | dimension = 0 |
|---|
| 2733 | mn = amean(inarray,dimension,1) |
|---|
| 2734 | deviations = inarray - mn |
|---|
| 2735 | if type(dimension) == ListType: |
|---|
| 2736 | n = 1 |
|---|
| 2737 | for d in dimension: |
|---|
| 2738 | n = n*inarray.shape[d] |
|---|
| 2739 | else: |
|---|
| 2740 | n = inarray.shape[dimension] |
|---|
| 2741 | var = ass(deviations,dimension,keepdims)/float(n-1) |
|---|
| 2742 | return var |
|---|
| 2743 | |
|---|
| 2744 | |
|---|
| 2745 | def astdev (inarray, dimension=None, keepdims=0): |
|---|
| 2746 | """ |
|---|
| 2747 | Returns the estimated population standard deviation of the values in |
|---|
| 2748 | the passed array (i.e., N-1). Dimension can equal None (ravel array |
|---|
| 2749 | first), an integer (the dimension over which to operate), or a |
|---|
| 2750 | sequence (operate over multiple dimensions). Set keepdims=1 to return |
|---|
| 2751 | an array with the same number of dimensions as inarray. |
|---|
| 2752 | |
|---|
| 2753 | Usage: astdev(inarray,dimension=None,keepdims=0) |
|---|
| 2754 | """ |
|---|
| 2755 | return N.sqrt(avar(inarray,dimension,keepdims)) |
|---|
| 2756 | |
|---|
| 2757 | |
|---|
| 2758 | def asterr (inarray, dimension=None, keepdims=0): |
|---|
| 2759 | """ |
|---|
| 2760 | Returns the estimated population standard error of the values in the |
|---|
| 2761 | passed array (i.e., N-1). Dimension can equal None (ravel array |
|---|
| 2762 | first), an integer (the dimension over which to operate), or a |
|---|
| 2763 | sequence (operate over multiple dimensions). Set keepdims=1 to return |
|---|
| 2764 | an array with the same number of dimensions as inarray. |
|---|
| 2765 | |
|---|
| 2766 | Usage: asterr(inarray,dimension=None,keepdims=0) |
|---|
| 2767 | """ |
|---|
| 2768 | if dimension == None: |
|---|
| 2769 | inarray = N.ravel(inarray) |
|---|
| 2770 | dimension = 0 |
|---|
| 2771 | return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension])) |
|---|
| 2772 | |
|---|
| 2773 | |
|---|
| 2774 | def asem (inarray, dimension=None, keepdims=0): |
|---|
| 2775 | """ |
|---|
| 2776 | Returns the standard error of the mean (i.e., using N) of the values |
|---|
| 2777 | in the passed array. Dimension can equal None (ravel array first), an |
|---|
| 2778 | integer (the dimension over which to operate), or a sequence (operate |
|---|
| 2779 | over multiple dimensions). Set keepdims=1 to return an array with the |
|---|
| 2780 | same number of dimensions as inarray. |
|---|
| 2781 | |
|---|
| 2782 | Usage: asem(inarray,dimension=None, keepdims=0) |
|---|
| 2783 | """ |
|---|
| 2784 | if dimension == None: |
|---|
| 2785 | inarray = N.ravel(inarray) |
|---|
| 2786 | dimension = 0 |
|---|
| 2787 | if type(dimension) == ListType: |
|---|
| 2788 | n = 1 |
|---|
| 2789 | for d in dimension: |
|---|
| 2790 | n = n*inarray.shape[d] |
|---|
| 2791 | else: |
|---|
| 2792 | n = inarray.shape[dimension] |
|---|
| 2793 | s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1) |
|---|
| 2794 | return s |
|---|
| 2795 | |
|---|
| 2796 | |
|---|
| 2797 | def az (a, score): |
|---|
| 2798 | """ |
|---|
| 2799 | Returns the z-score of a given input score, given thearray from which |
|---|
| 2800 | that score came. Not appropriate for population calculations, nor for |
|---|
| 2801 | arrays > 1D. |
|---|
| 2802 | |
|---|
| 2803 | Usage: az(a, score) |
|---|
| 2804 | """ |
|---|
| 2805 | z = (score-amean(a)) / asamplestdev(a) |
|---|
| 2806 | return z |
|---|
| 2807 | |
|---|
| 2808 | |
|---|
| 2809 | def azs (a): |
|---|
| 2810 | """ |
|---|
| 2811 | Returns a 1D array of z-scores, one for each score in the passed array, |
|---|
| 2812 | computed relative to the passed array. |
|---|
| 2813 | |
|---|
| 2814 | Usage: azs(a) |
|---|
| 2815 | """ |
|---|
| 2816 | zscores = [] |
|---|
| 2817 | for item in a: |
|---|
| 2818 | zscores.append(z(a,item)) |
|---|
| 2819 | return N.array(zscores) |
|---|
| 2820 | |
|---|
| 2821 | |
|---|
| 2822 | def azmap (scores, compare, dimension=0): |
|---|
| 2823 | """ |
|---|
| 2824 | Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to |
|---|
| 2825 | array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0 |
|---|
| 2826 | of the compare array. |
|---|
| 2827 | |
|---|
| 2828 | Usage: azs(scores, compare, dimension=0) |
|---|
| 2829 | """ |
|---|
| 2830 | mns = amean(compare,dimension) |
|---|
| 2831 | sstd = asamplestdev(compare,0) |
|---|
| 2832 | return (scores - mns) / sstd |
|---|
| 2833 | |
|---|
| 2834 | |
|---|
| 2835 | ##################################### |
|---|
| 2836 | ####### ATRIMMING FUNCTIONS ####### |
|---|
| 2837 | ##################################### |
|---|
| 2838 | |
|---|
| 2839 | def around(a,digits=1): |
|---|
| 2840 | """ |
|---|
| 2841 | Rounds all values in array a to 'digits' decimal places. |
|---|
| 2842 | |
|---|
| 2843 | Usage: around(a,digits) |
|---|
| 2844 | Returns: a, where each value is rounded to 'digits' decimals |
|---|
| 2845 | """ |
|---|
| 2846 | def ar(x,d=digits): |
|---|
| 2847 | return round(x,d) |
|---|
| 2848 | |
|---|
| 2849 | if type(a) <> N.ArrayType: |
|---|
| 2850 | try: |
|---|
| 2851 | a = N.array(a) |
|---|
| 2852 | except: |
|---|
| 2853 | a = N.array(a,'O') |
|---|
| 2854 | shp = a.shape |
|---|
| 2855 | if a.typecode() in ['f','F','d','D']: |
|---|
| 2856 | b = N.ravel(a) |
|---|
| 2857 | b = N.array(map(ar,b)) |
|---|
| 2858 | b.shape = shp |
|---|
| 2859 | elif a.typecode() in ['o','O']: |
|---|
| 2860 | b = N.ravel(a)*1 |
|---|
| 2861 | for i in range(len(b)): |
|---|
| 2862 | if type(b[i]) == FloatType: |
|---|
| 2863 | b[i] = round(b[i],digits) |
|---|
| 2864 | b.shape = shp |
|---|
| 2865 | else: # not a float, double or Object array |
|---|
| 2866 | b = a*1 |
|---|
| 2867 | return b |
|---|
| 2868 | |
|---|
| 2869 | |
|---|
| 2870 | def athreshold(a,threshmin=None,threshmax=None,newval=0): |
|---|
| 2871 | """ |
|---|
| 2872 | Like Numeric.clip() except that values <threshmid or >threshmax are replaced |
|---|
| 2873 | by newval instead of by threshmin/threshmax (respectively). |
|---|
| 2874 | |
|---|
| 2875 | Usage: athreshold(a,threshmin=None,threshmax=None,newval=0) |
|---|
| 2876 | Returns: a, with values <threshmin or >threshmax replaced with newval |
|---|
| 2877 | """ |
|---|
| 2878 | mask = N.zeros(a.shape) |
|---|
| 2879 | if threshmin <> None: |
|---|
| 2880 | mask = mask + N.where(N.less(a,threshmin),1,0) |
|---|
| 2881 | if threshmax <> None: |
|---|
| 2882 | mask = mask + N.where(N.greater(a,threshmax),1,0) |
|---|
| 2883 | mask = N.clip(mask,0,1) |
|---|
| 2884 | return N.where(mask,newval,a) |
|---|
| 2885 | |
|---|
| 2886 | |
|---|
| 2887 | def atrimboth (a,proportiontocut): |
|---|
| 2888 | """ |
|---|
| 2889 | Slices off the passed proportion of items from BOTH ends of the passed |
|---|
| 2890 | array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND |
|---|
| 2891 | 'rightmost' 10% of scores. You must pre-sort the array if you want |
|---|
| 2892 | "proper" trimming. Slices off LESS if proportion results in a |
|---|
| 2893 | non-integer slice index (i.e., conservatively slices off |
|---|
| 2894 | proportiontocut). |
|---|
| 2895 | |
|---|
| 2896 | Usage: atrimboth (a,proportiontocut) |
|---|
| 2897 | Returns: trimmed version of array a |
|---|
| 2898 | """ |
|---|
| 2899 | lowercut = int(proportiontocut*len(a)) |
|---|
| 2900 | uppercut = len(a) - lowercut |
|---|
| 2901 | return a[lowercut:uppercut] |
|---|
| 2902 | |
|---|
| 2903 | |
|---|
| 2904 | def atrim1 (a,proportiontocut,tail='right'): |
|---|
| 2905 | """ |
|---|
| 2906 | Slices off the passed proportion of items from ONE end of the passed |
|---|
| 2907 | array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' |
|---|
| 2908 | 10% of scores). Slices off LESS if proportion results in a non-integer |
|---|
| 2909 | slice index (i.e., conservatively slices off proportiontocut). |
|---|
| 2910 | |
|---|
| 2911 | Usage: atrim1(a,proportiontocut,tail='right') or set tail='left' |
|---|
| 2912 | Returns: trimmed version of array a |
|---|
| 2913 | """ |
|---|
| 2914 | if string.lower(tail) == 'right': |
|---|
| 2915 | lowercut = 0 |
|---|
| 2916 | uppercut = len(a) - int(proportiontocut*len(a)) |
|---|
| 2917 | elif string.lower(tail) == 'left': |
|---|
| 2918 | lowercut = int(proportiontocut*len(a)) |
|---|
| 2919 | uppercut = len(a) |
|---|
| 2920 | return a[lowercut:uppercut] |
|---|
| 2921 | |
|---|
| 2922 | |
|---|
| 2923 | ##################################### |
|---|
| 2924 | ##### ACORRELATION FUNCTIONS ###### |
|---|
| 2925 | ##################################### |
|---|
| 2926 | |
|---|
| 2927 | def acovariance(X): |
|---|
| 2928 | """ |
|---|
| 2929 | Computes the covariance matrix of a matrix X. Requires a 2D matrix input. |
|---|
| 2930 | |
|---|
| 2931 | Usage: acovariance(X) |
|---|
| 2932 | Returns: covariance matrix of X |
|---|
| 2933 | """ |
|---|
| 2934 | if len(X.shape) <> 2: |
|---|
| 2935 | raise TypeError, "acovariance requires 2D matrices" |
|---|
| 2936 | n = X.shape[0] |
|---|
| 2937 | mX = amean(X,0) |
|---|
| 2938 | return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX) |
|---|
| 2939 | |
|---|
| 2940 | |
|---|
| 2941 | def acorrelation(X): |
|---|
| 2942 | """ |
|---|
| 2943 | Computes the correlation matrix of a matrix X. Requires a 2D matrix input. |
|---|
| 2944 | |
|---|
| 2945 | Usage: acorrelation(X) |
|---|
| 2946 | Returns: correlation matrix of X |
|---|
| 2947 | """ |
|---|
| 2948 | C = acovariance(X) |
|---|
| 2949 | V = N.diagonal(C) |
|---|
| 2950 | return C / N.sqrt(N.multiply.outer(V,V)) |
|---|
| 2951 | |
|---|
| 2952 | |
|---|
| 2953 | def apaired(x,y): |
|---|
| 2954 | """ |
|---|
| 2955 | Interactively determines the type of data in x and y, and then runs the |
|---|
| 2956 | appropriated statistic for paired group data. |
|---|
| 2957 | |
|---|
| 2958 | Usage: apaired(x,y) x,y = the two arrays of values to be compared |
|---|
| 2959 | Returns: appropriate statistic name, value, and probability |
|---|
| 2960 | """ |
|---|
| 2961 | samples = '' |
|---|
| 2962 | while samples not in ['i','r','I','R','c','C']: |
|---|
| 2963 | print '\nIndependent or related samples, or correlation (i,r,c): ', |
|---|
| 2964 | samples = raw_input() |
|---|
| 2965 | |
|---|
| 2966 | if samples in ['i','I','r','R']: |
|---|
| 2967 | print '\nComparing variances ...', |
|---|
| 2968 | # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 |
|---|
| 2969 | r = obrientransform(x,y) |
|---|
| 2970 | f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) |
|---|
| 2971 | if p<0.05: |
|---|
| 2972 | vartype='unequal, p='+str(round(p,4)) |
|---|
| 2973 | else: |
|---|
| 2974 | vartype='equal' |
|---|
| 2975 | print vartype |
|---|
| 2976 | if samples in ['i','I']: |
|---|
| 2977 | if vartype[0]=='e': |
|---|
| 2978 | t,p = ttest_ind(x,y,None,0) |
|---|
| 2979 | print '\nIndependent samples t-test: ', round(t,4),round(p,4) |
|---|
| 2980 | else: |
|---|
| 2981 | if len(x)>20 or len(y)>20: |
|---|
| 2982 | z,p = ranksums(x,y) |
|---|
| 2983 | print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4) |
|---|
| 2984 | else: |
|---|
| 2985 | u,p = mannwhitneyu(x,y) |
|---|
| 2986 | print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4) |
|---|
| 2987 | |
|---|
| 2988 | else: # RELATED SAMPLES |
|---|
| 2989 | if vartype[0]=='e': |
|---|
| 2990 | t,p = ttest_rel(x,y,0) |
|---|
| 2991 | print '\nRelated samples t-test: ', round(t,4),round(p,4) |
|---|
| 2992 | else: |
|---|
| 2993 | t,p = ranksums(x,y) |
|---|
| 2994 | print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4) |
|---|
| 2995 | else: # CORRELATION ANALYSIS |
|---|
| 2996 | corrtype = '' |
|---|
| 2997 | while corrtype not in ['c','C','r','R','d','D']: |
|---|
| 2998 | print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', |
|---|
| 2999 | corrtype = raw_input() |
|---|
| 3000 | if corrtype in ['c','C']: |
|---|
| 3001 | m,b,r,p,see = linregress(x,y) |
|---|
| 3002 | print '\nLinear regression for continuous variables ...' |
|---|
| 3003 | lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] |
|---|
| 3004 | pstat.printcc(lol) |
|---|
| 3005 | elif corrtype in ['r','R']: |
|---|
| 3006 | r,p = spearmanr(x,y) |
|---|
| 3007 | print '\nCorrelation for ranked variables ...' |
|---|
| 3008 | print "Spearman's r: ",round(r,4),round(p,4) |
|---|
| 3009 | else: # DICHOTOMOUS |
|---|
| 3010 | r,p = pointbiserialr(x,y) |
|---|
| 3011 | print '\nAssuming x contains a dichotomous variable ...' |
|---|
| 3012 | print 'Point Biserial r: ',round(r,4),round(p,4) |
|---|
| 3013 | print '\n\n' |
|---|
| 3014 | return None |
|---|
| 3015 | |
|---|
| 3016 | |
|---|
| 3017 | def apearsonr(x,y,verbose=1): |
|---|
| 3018 | """ |
|---|
| 3019 | Calculates a Pearson correlation coefficient and returns p. Taken |
|---|
| 3020 | from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195. |
|---|
| 3021 | |
|---|
| 3022 | Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays |
|---|
| 3023 | Returns: Pearson's r, two-tailed p-value |
|---|
| 3024 | """ |
|---|
| 3025 | TINY = 1.0e-20 |
|---|
| 3026 | n = len(x) |
|---|
| 3027 | xmean = amean(x) |
|---|
| 3028 | ymean = amean(y) |
|---|
| 3029 | r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) |
|---|
| 3030 | r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) |
|---|
| 3031 | r = (r_num / r_den) |
|---|
| 3032 | df = n-2 |
|---|
| 3033 | t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) |
|---|
| 3034 | prob = abetai(0.5*df,0.5,df/(df+t*t),verbose) |
|---|
| 3035 | return r,prob |
|---|
| 3036 | |
|---|
| 3037 | |
|---|
| 3038 | def aspearmanr(x,y): |
|---|
| 3039 | """ |
|---|
| 3040 | Calculates a Spearman rank-order correlation coefficient. Taken |
|---|
| 3041 | from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. |
|---|
| 3042 | |
|---|
| 3043 | Usage: aspearmanr(x,y) where x,y are equal-length arrays |
|---|
| 3044 | Returns: Spearman's r, two-tailed p-value |
|---|
| 3045 | """ |
|---|
| 3046 | TINY = 1e-30 |
|---|
| 3047 | n = len(x) |
|---|
| 3048 | rankx = rankdata(x) |
|---|
| 3049 | ranky = rankdata(y) |
|---|
| 3050 | dsq = N.add.reduce((rankx-ranky)**2) |
|---|
| 3051 | rs = 1 - 6*dsq / float(n*(n**2-1)) |
|---|
| 3052 | t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) |
|---|
| 3053 | df = n-2 |
|---|
| 3054 | probrs = abetai(0.5*df,0.5,df/(df+t*t)) |
|---|
| 3055 | # probability values for rs are from part 2 of the spearman function in |
|---|
| 3056 | # Numerical Recipies, p.510. They close to tables, but not exact.(?) |
|---|
| 3057 | return rs, probrs |
|---|
| 3058 | |
|---|
| 3059 | |
|---|
| 3060 | def apointbiserialr(x,y): |
|---|
| 3061 | """ |
|---|
| 3062 | Calculates a point-biserial correlation coefficient and the associated |
|---|
| 3063 | probability value. Taken from Heiman's Basic Statistics for the Behav. |
|---|
| 3064 | Sci (1st), p.194. |
|---|
| 3065 | |
|---|
| 3066 | Usage: apointbiserialr(x,y) where x,y are equal length arrays |
|---|
| 3067 | Returns: Point-biserial r, two-tailed p-value |
|---|
| 3068 | """ |
|---|
| 3069 | TINY = 1e-30 |
|---|
| 3070 | categories = pstat.aunique(x) |
|---|
| 3071 | data = pstat.aabut(x,y) |
|---|
| 3072 | if len(categories) <> 2: |
|---|
| 3073 | raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()." |
|---|
| 3074 | else: # there are 2 categories, continue |
|---|
| 3075 | codemap = pstat.aabut(categories,N.arange(2)) |
|---|
| 3076 | recoded = pstat.arecode(data,codemap,0) |
|---|
| 3077 | x = pstat.alinexand(data,0,categories[0]) |
|---|
| 3078 | y = pstat.alinexand(data,0,categories[1]) |
|---|
| 3079 | xmean = amean(pstat.acolex(x,1)) |
|---|
| 3080 | ymean = amean(pstat.acolex(y,1)) |
|---|
| 3081 | n = len(data) |
|---|
| 3082 | adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) |
|---|
| 3083 | rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust |
|---|
| 3084 | df = n-2 |
|---|
| 3085 | t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) |
|---|
| 3086 | prob = abetai(0.5*df,0.5,df/(df+t*t)) |
|---|
| 3087 | return rpb, prob |
|---|
| 3088 | |
|---|
| 3089 | |
|---|
| 3090 | def akendalltau(x,y): |
|---|
| 3091 | """ |
|---|
| 3092 | Calculates Kendall's tau ... correlation of ordinal data. Adapted |
|---|
| 3093 | from function kendl1 in Numerical Recipies. Needs good test-cases.@@@ |
|---|
| 3094 | |
|---|
| 3095 | Usage: akendalltau(x,y) |
|---|
| 3096 | Returns: Kendall's tau, two-tailed p-value |
|---|
| 3097 | """ |
|---|
| 3098 | n1 = 0 |
|---|
| 3099 | n2 = 0 |
|---|
| 3100 | iss = 0 |
|---|
| 3101 | for j in range(len(x)-1): |
|---|
| 3102 | for k in range(j,len(y)): |
|---|
| 3103 | a1 = x[j] - x[k] |
|---|
| 3104 | a2 = y[j] - y[k] |
|---|
| 3105 | aa = a1 * a2 |
|---|
| 3106 | if (aa): # neither array has a tie |
|---|
| 3107 | n1 = n1 + 1 |
|---|
| 3108 | n2 = n2 + 1 |
|---|
| 3109 | if aa > 0: |
|---|
| 3110 | iss = iss + 1 |
|---|
| 3111 | else: |
|---|
| 3112 | iss = iss -1 |
|---|
| 3113 | else: |
|---|
| 3114 | if (a1): |
|---|
| 3115 | n1 = n1 + 1 |
|---|
| 3116 | else: |
|---|
| 3117 | n2 = n2 + 1 |
|---|
| 3118 | tau = iss / math.sqrt(n1*n2) |
|---|
| 3119 | svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) |
|---|
| 3120 | z = tau / math.sqrt(svar) |
|---|
| 3121 | prob = erfcc(abs(z)/1.4142136) |
|---|
| 3122 | return tau, prob |
|---|
| 3123 | |
|---|
| 3124 | |
|---|
| 3125 | def alinregress(*args): |
|---|
| 3126 | """ |
|---|
| 3127 | Calculates a regression line on two arrays, x and y, corresponding to x,y |
|---|
| 3128 | pairs. If a single 2D array is passed, alinregress finds dim with 2 levels |
|---|
| 3129 | and splits data into x,y pairs along that dim. |
|---|
| 3130 | |
|---|
| 3131 | Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array |
|---|
| 3132 | Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate |
|---|
| 3133 | """ |
|---|
| 3134 | TINY = 1.0e-20 |
|---|
| 3135 | if len(args) == 1: # more than 1D array? |
|---|
| 3136 | args = args[0] |
|---|
| 3137 | if len(args) == 2: |
|---|
| 3138 | x = args[0] |
|---|
| 3139 | y = args[1] |
|---|
| 3140 | else: |
|---|
| 3141 | x = args[:,0] |
|---|
| 3142 | y = args[:,1] |
|---|
| 3143 | else: |
|---|
| 3144 | x = args[0] |
|---|
| 3145 | y = args[1] |
|---|
| 3146 | n = len(x) |
|---|
| 3147 | xmean = amean(x) |
|---|
| 3148 | ymean = amean(y) |
|---|
| 3149 | r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) |
|---|
| 3150 | r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) |
|---|
| 3151 | r = r_num / r_den |
|---|
| 3152 | z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) |
|---|
| 3153 | df = n-2 |
|---|
| 3154 | t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) |
|---|
| 3155 | prob = abetai(0.5*df,0.5,df/(df+t*t)) |
|---|
| 3156 | slope = r_num / (float(n)*ass(x) - asquare_of_sums(x)) |
|---|
| 3157 | intercept = ymean - slope*xmean |
|---|
| 3158 | sterrest = math.sqrt(1-r*r)*asamplestdev(y) |
|---|
| 3159 | return slope, intercept, r, prob, sterrest |
|---|
| 3160 | |
|---|
| 3161 | |
|---|
| 3162 | ##################################### |
|---|
| 3163 | ##### AINFERENTIAL STATISTICS ##### |
|---|
| 3164 | ##################################### |
|---|
| 3165 | |
|---|
| 3166 | def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'): |
|---|
| 3167 | """ |
|---|
| 3168 | Calculates the t-obtained for the independent samples T-test on ONE group |
|---|
| 3169 | of scores a, given a population mean. If printit=1, results are printed |
|---|
| 3170 | to the screen. If printit='filename', the results are output to 'filename' |
|---|
| 3171 | using the given writemode (default=append). Returns t-value, and prob. |
|---|
| 3172 | |
|---|
| 3173 | Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') |
|---|
| 3174 | Returns: t-value, two-tailed prob |
|---|
| 3175 | """ |
|---|
| 3176 | if type(a) != N.ArrayType: |
|---|
| 3177 | a = N.array(a) |
|---|
| 3178 | x = amean(a) |
|---|
| 3179 | v = avar(a) |
|---|
| 3180 | n = len(a) |
|---|
| 3181 | df = n-1 |
|---|
| 3182 | svar = ((n-1)*v) / float(df) |
|---|
| 3183 | t = (x-popmean)/math.sqrt(svar*(1.0/n)) |
|---|
| 3184 | prob = abetai(0.5*df,0.5,df/(df+t*t)) |
|---|
| 3185 | |
|---|
| 3186 | if printit <> 0: |
|---|
| 3187 | statname = 'Single-sample T-test.' |
|---|
| 3188 | outputpairedstats(printit,writemode, |
|---|
| 3189 | 'Population','--',popmean,0,0,0, |
|---|
| 3190 | name,n,x,v,N.minimum.reduce(N.ravel(a)), |
|---|
| 3191 | N.maximum.reduce(N.ravel(a)), |
|---|
| 3192 | statname,t,prob) |
|---|
| 3193 | return t,prob |
|---|
| 3194 | |
|---|
| 3195 | |
|---|
| 3196 | def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'): |
|---|
| 3197 | """ |
|---|
| 3198 | Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores |
|---|
| 3199 | a, and b. From Numerical Recipies, p.483. If printit=1, results are |
|---|
| 3200 | printed to the screen. If printit='filename', the results are output |
|---|
| 3201 | to 'filename' using the given writemode (default=append). Dimension |
|---|
| 3202 | can equal None (ravel array first), or an integer (the dimension over |
|---|
| 3203 | which to operate on a and b). |
|---|
| 3204 | |
|---|
| 3205 | Usage: attest_ind (a,b,dimension=None,printit=0, |
|---|
| 3206 | Name1='Samp1',Name2='Samp2',writemode='a') |
|---|
| 3207 | Returns: t-value, two-tailed p-value |
|---|
| 3208 | """ |
|---|
| 3209 | if dimension == None: |
|---|
| 3210 | a = N.ravel(a) |
|---|
| 3211 | b = N.ravel(b) |
|---|
| 3212 | dimension = 0 |
|---|
| 3213 | x1 = amean(a,dimension) |
|---|
| 3214 | x2 = amean(b,dimension) |
|---|
| 3215 | v1 = avar(a,dimension) |
|---|
| 3216 | v2 = avar(b,dimension) |
|---|
| 3217 | n1 = a.shape[dimension] |
|---|
| 3218 | n2 = b.shape[dimension] |
|---|
| 3219 | df = n1+n2-2 |
|---|
| 3220 | svar = ((n1-1)*v1+(n2-1)*v2) / float(df) |
|---|
| 3221 | zerodivproblem = N.equal(svar,0) |
|---|
| 3222 | svar = N.where(zerodivproblem,1,svar) # avoid zero-division in 1st place |
|---|
| 3223 | t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2)) # N-D COMPUTATION HERE!!!!!! |
|---|
| 3224 | t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0 |
|---|
| 3225 | probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) |
|---|
| 3226 | |
|---|
| 3227 | if type(t) == N.ArrayType: |
|---|
| 3228 | probs = N.reshape(probs,t.shape) |
|---|
| 3229 | if len(probs) == 1: |
|---|
| 3230 | probs = probs[0] |
|---|
| 3231 | |
|---|
| 3232 | if printit <> 0: |
|---|
| 3233 | if type(t) == N.ArrayType: |
|---|
| 3234 | t = t[0] |
|---|
| 3235 | if type(probs) == N.ArrayType: |
|---|
| 3236 | probs = probs[0] |
|---|
| 3237 | statname = 'Independent samples T-test.' |
|---|
| 3238 | outputpairedstats(printit,writemode, |
|---|
| 3239 | name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)), |
|---|
| 3240 | N.maximum.reduce(N.ravel(a)), |
|---|
| 3241 | name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)), |
|---|
| 3242 | N.maximum.reduce(N.ravel(b)), |
|---|
| 3243 | statname,t,probs) |
|---|
| 3244 | return |
|---|
| 3245 | return t, probs |
|---|
| 3246 | |
|---|
| 3247 | |
|---|
| 3248 | def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'): |
|---|
| 3249 | """ |
|---|
| 3250 | Calculates the t-obtained T-test on TWO RELATED samples of scores, a |
|---|
| 3251 | and b. From Numerical Recipies, p.483. If printit=1, results are |
|---|
| 3252 | printed to the screen. If printit='filename', the results are output |
|---|
| 3253 | to 'filename' using the given writemode (default=append). Dimension |
|---|
| 3254 | can equal None (ravel array first), or an integer (the dimension over |
|---|
| 3255 | which to operate on a and b). |
|---|
| 3256 | |
|---|
| 3257 | Usage: attest_rel(a,b,dimension=None,printit=0, |
|---|
| 3258 | name1='Samp1',name2='Samp2',writemode='a') |
|---|
| 3259 | Returns: t-value, two-tailed p-value |
|---|
| 3260 | """ |
|---|
| 3261 | if dimension == None: |
|---|
| 3262 | a = N.ravel(a) |
|---|
| 3263 | b = N.ravel(b) |
|---|
| 3264 | dimension = 0 |
|---|
| 3265 | if len(a)<>len(b): |
|---|
| 3266 | raise ValueError, 'Unequal length arrays.' |
|---|
| 3267 | x1 = amean(a,dimension) |
|---|
| 3268 | x2 = amean(b,dimension) |
|---|
| 3269 | v1 = avar(a,dimension) |
|---|
| 3270 | v2 = avar(b,dimension) |
|---|
| 3271 | n = a.shape[dimension] |
|---|
| 3272 | df = float(n-1) |
|---|
| 3273 | d = (a-b).astype('d') |
|---|
| 3274 | |
|---|
| 3275 | denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df) |
|---|
| 3276 | zerodivproblem = N.equal(denom,0) |
|---|
| 3277 | denom = N.where(zerodivproblem,1,denom) # avoid zero-division in 1st place |
|---|
| 3278 | t = N.add.reduce(d,dimension) / denom # N-D COMPUTATION HERE!!!!!! |
|---|
| 3279 | t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0 |
|---|
| 3280 | probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) |
|---|
| 3281 | if type(t) == N.ArrayType: |
|---|
| 3282 | probs = N.reshape(probs,t.shape) |
|---|
| 3283 | if len(probs) == 1: |
|---|
| 3284 | probs = probs[0] |
|---|
| 3285 | |
|---|
| 3286 | if printit <> 0: |
|---|
| 3287 | statname = 'Related samples T-test.' |
|---|
| 3288 | outputpairedstats(printit,writemode, |
|---|
| 3289 | name1,n,x1,v1,N.minimum.reduce(N.ravel(a)), |
|---|
| 3290 | N.maximum.reduce(N.ravel(a)), |
|---|
| 3291 | name2,n,x2,v2,N.minimum.reduce(N.ravel(b)), |
|---|
| 3292 | N.maximum.reduce(N.ravel(b)), |
|---|
| 3293 | statname,t,probs) |
|---|
| 3294 | return |
|---|
| 3295 | return t, probs |
|---|
| 3296 | |
|---|
| 3297 | |
|---|
| 3298 | def achisquare(f_obs,f_exp=None): |
|---|
| 3299 | """ |
|---|
| 3300 | Calculates a one-way chi square for array of observed frequencies and returns |
|---|
| 3301 | the result. If no expected frequencies are given, the total N is assumed to |
|---|
| 3302 | be equally distributed across all groups. |
|---|
| 3303 | |
|---|
| 3304 | Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq. |
|---|
| 3305 | Returns: chisquare-statistic, associated p-value |
|---|
| 3306 | """ |
|---|
| 3307 | |
|---|
| 3308 | k = len(f_obs) |
|---|
| 3309 | if f_exp == None: |
|---|
| 3310 | f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.Float) |
|---|
| 3311 | f_exp = f_exp.astype(N.Float) |
|---|
| 3312 | chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp) |
|---|
| 3313 | return chisq, chisqprob(chisq, k-1) |
|---|
| 3314 | |
|---|
| 3315 | |
|---|
| 3316 | def aks_2samp (data1,data2): |
|---|
| 3317 | """ |
|---|
| 3318 | Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from |
|---|
| 3319 | Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc- |
|---|
| 3320 | like. |
|---|
| 3321 | |
|---|
| 3322 | Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays |
|---|
| 3323 | Returns: KS D-value, p-value |
|---|
| 3324 | """ |
|---|
| 3325 | j1 = 0 # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE |
|---|
| 3326 | j2 = 0 # N.zeros(data2.shape[1:]) |
|---|
| 3327 | fn1 = 0.0 # N.zeros(data1.shape[1:],N.Float) |
|---|
| 3328 | fn2 = 0.0 # N.zeros(data2.shape[1:],N.Float) |
|---|
| 3329 | n1 = data1.shape[0] |
|---|
| 3330 | n2 = data2.shape[0] |
|---|
| 3331 | en1 = n1*1 |
|---|
| 3332 | en2 = n2*1 |
|---|
| 3333 | d = N.zeros(data1.shape[1:],N.Float) |
|---|
| 3334 | data1 = N.sort(data1,0) |
|---|
| 3335 | data2 = N.sort(data2,0) |
|---|
| 3336 | while j1 < n1 and j2 < n2: |
|---|
| 3337 | d1=data1[j1] |
|---|
| 3338 | d2=data2[j2] |
|---|
| 3339 | if d1 <= d2: |
|---|
| 3340 | fn1 = (j1)/float(en1) |
|---|
| 3341 | j1 = j1 + 1 |
|---|
| 3342 | if d2 <= d1: |
|---|
| 3343 | fn2 = (j2)/float(en2) |
|---|
| 3344 | j2 = j2 + 1 |
|---|
| 3345 | dt = (fn2-fn1) |
|---|
| 3346 | if abs(dt) > abs(d): |
|---|
| 3347 | d = dt |
|---|
| 3348 | try: |
|---|
| 3349 | en = math.sqrt(en1*en2/float(en1+en2)) |
|---|
| 3350 | prob = aksprob((en+0.12+0.11/en)*N.fabs(d)) |
|---|
| 3351 | except: |
|---|
| 3352 | prob = 1.0 |
|---|
| 3353 | return d, prob |
|---|
| 3354 | |
|---|
| 3355 | |
|---|
| 3356 | def amannwhitneyu(x,y): |
|---|
| 3357 | """ |
|---|
| 3358 | Calculates a Mann-Whitney U statistic on the provided scores and |
|---|
| 3359 | returns the result. Use only when the n in each condition is < 20 and |
|---|
| 3360 | you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is |
|---|
| 3361 | significant if the u-obtained is LESS THAN or equal to the critical |
|---|
| 3362 | value of U. |
|---|
| 3363 | |
|---|
| 3364 | Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions |
|---|
| 3365 | Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) |
|---|
| 3366 | """ |
|---|
| 3367 | n1 = len(x) |
|---|
| 3368 | n2 = len(y) |
|---|
| 3369 | ranked = rankdata(N.concatenate((x,y))) |
|---|
| 3370 | rankx = ranked[0:n1] # get the x-ranks |
|---|
| 3371 | ranky = ranked[n1:] # the rest are y-ranks |
|---|
| 3372 | u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x |
|---|
| 3373 | u2 = n1*n2 - u1 # remainder is U for y |
|---|
| 3374 | bigu = max(u1,u2) |
|---|
| 3375 | smallu = min(u1,u2) |
|---|
| 3376 | T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores |
|---|
| 3377 | if T == 0: |
|---|
| 3378 | raise ValueError, 'All numbers are identical in amannwhitneyu' |
|---|
| 3379 | sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) |
|---|
| 3380 | z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc |
|---|
| 3381 | return smallu, 1.0 - zprob(z) |
|---|
| 3382 | |
|---|
| 3383 | |
|---|
| 3384 | def atiecorrect(rankvals): |
|---|
| 3385 | """ |
|---|
| 3386 | Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests. |
|---|
| 3387 | See Siegel, S. (1956) Nonparametric Statistics for the Behavioral |
|---|
| 3388 | Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c |
|---|
| 3389 | code. |
|---|
| 3390 | |
|---|
| 3391 | Usage: atiecorrect(rankvals) |
|---|
| 3392 | Returns: T correction factor for U or H |
|---|
| 3393 | """ |
|---|
| 3394 | sorted,posn = ashellsort(N.array(rankvals)) |
|---|
| 3395 | n = len(sorted) |
|---|
| 3396 | T = 0.0 |
|---|
| 3397 | i = 0 |
|---|
| 3398 | while (i<n-1): |
|---|
| 3399 | if sorted[i] == sorted[i+1]: |
|---|
| 3400 | nties = 1 |
|---|
| 3401 | while (i<n-1) and (sorted[i] == sorted[i+1]): |
|---|
| 3402 | nties = nties +1 |
|---|
| 3403 | i = i +1 |
|---|
| 3404 | T = T + nties**3 - nties |
|---|
| 3405 | i = i+1 |
|---|
| 3406 | T = T / float(n**3-n) |
|---|
| 3407 | return 1.0 - T |
|---|
| 3408 | |
|---|
| 3409 | |
|---|
| 3410 | def aranksums(x,y): |
|---|
| 3411 | """ |
|---|
| 3412 | Calculates the rank sums statistic on the provided scores and returns |
|---|
| 3413 | the result. |
|---|
| 3414 | |
|---|
| 3415 | Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions |
|---|
| 3416 | Returns: z-statistic, two-tailed p-value |
|---|
| 3417 | """ |
|---|
| 3418 | n1 = len(x) |
|---|
| 3419 | n2 = len(y) |
|---|
| 3420 | alldata = N.concatenate((x,y)) |
|---|
| 3421 | ranked = arankdata(alldata) |
|---|
| 3422 | x = ranked[:n1] |
|---|
| 3423 | y = ranked[n1:] |
|---|
| 3424 | s = sum(x) |
|---|
| 3425 | expected = n1*(n1+n2+1) / 2.0 |
|---|
| 3426 | z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) |
|---|
| 3427 | prob = 2*(1.0 -zprob(abs(z))) |
|---|
| 3428 | return z, prob |
|---|
| 3429 | |
|---|
| 3430 | |
|---|
| 3431 | def awilcoxont(x,y): |
|---|
| 3432 | """ |
|---|
| 3433 | Calculates the Wilcoxon T-test for related samples and returns the |
|---|
| 3434 | result. A non-parametric T-test. |
|---|
| 3435 | |
|---|
| 3436 | Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions |
|---|
| 3437 | Returns: t-statistic, two-tailed p-value |
|---|
| 3438 | """ |
|---|
| 3439 | if len(x) <> len(y): |
|---|
| 3440 | raise ValueError, 'Unequal N in awilcoxont. Aborting.' |
|---|
| 3441 | d = x-y |
|---|
| 3442 | d = N.compress(N.not_equal(d,0),d) # Keep all non-zero differences |
|---|
| 3443 | count = len(d) |
|---|
| 3444 | absd = abs(d) |
|---|
| 3445 | absranked = arankdata(absd) |
|---|
| 3446 | r_plus = 0.0 |
|---|
| 3447 | r_minus = 0.0 |
|---|
| 3448 | for i in range(len(absd)): |
|---|
| 3449 | if d[i] < 0: |
|---|
| 3450 | r_minus = r_minus + absranked[i] |
|---|
| 3451 | else: |
|---|
| 3452 | r_plus = r_plus + absranked[i] |
|---|
| 3453 | wt = min(r_plus, r_minus) |
|---|
| 3454 | mn = count * (count+1) * 0.25 |
|---|
| 3455 | se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) |
|---|
| 3456 | z = math.fabs(wt-mn) / se |
|---|
| 3457 | z = math.fabs(wt-mn) / se |
|---|
| 3458 | prob = 2*(1.0 -zprob(abs(z))) |
|---|
| 3459 | return wt, prob |
|---|
| 3460 | |
|---|
| 3461 | |
|---|
| 3462 | def akruskalwallish(*args): |
|---|
| 3463 | """ |
|---|
| 3464 | The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more |
|---|
| 3465 | groups, requiring at least 5 subjects in each group. This function |
|---|
| 3466 | calculates the Kruskal-Wallis H and associated p-value for 3 or more |
|---|
| 3467 | independent samples. |
|---|
| 3468 | |
|---|
| 3469 | Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions |
|---|
| 3470 | Returns: H-statistic (corrected for ties), associated p-value |
|---|
| 3471 | """ |
|---|
| 3472 | assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()" |
|---|
| 3473 | args = list(args) |
|---|
| 3474 | n = [0]*len(args) |
|---|
| 3475 | n = map(len,args) |
|---|
| 3476 | all = [] |
|---|
| 3477 | for i in range(len(args)): |
|---|
| 3478 | all = all + args[i].tolist() |
|---|
| 3479 | ranked = rankdata(all) |
|---|
| 3480 | T = tiecorrect(ranked) |
|---|
| 3481 | for i in range(len(args)): |
|---|
| 3482 | args[i] = ranked[0:n[i]] |
|---|
| 3483 | del ranked[0:n[i]] |
|---|
| 3484 | rsums = [] |
|---|
| 3485 | for i in range(len(args)): |
|---|
| 3486 | rsums.append(sum(args[i])**2) |
|---|
| 3487 | rsums[i] = rsums[i] / float(n[i]) |
|---|
| 3488 | ssbn = sum(rsums) |
|---|
| 3489 | totaln = sum(n) |
|---|
| 3490 | h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) |
|---|
| 3491 | df = len(args) - 1 |
|---|
| 3492 | if T == 0: |
|---|
| 3493 | raise ValueError, 'All numbers are identical in akruskalwallish' |
|---|
| 3494 | h = h / float(T) |
|---|
| 3495 | return h, chisqprob(h,df) |
|---|
| 3496 | |
|---|
| 3497 | |
|---|
| 3498 | def afriedmanchisquare(*args): |
|---|
| 3499 | """ |
|---|
| 3500 | Friedman Chi-Square is a non-parametric, one-way within-subjects |
|---|
| 3501 | ANOVA. This function calculates the Friedman Chi-square test for |
|---|
| 3502 | repeated measures and returns the result, along with the associated |
|---|
| 3503 | probability value. It assumes 3 or more repeated measures. Only 3 |
|---|
| 3504 | levels requires a minimum of 10 subjects in the study. Four levels |
|---|
| 3505 | requires 5 subjects per level(??). |
|---|
| 3506 | |
|---|
| 3507 | Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions |
|---|
| 3508 | Returns: chi-square statistic, associated p-value |
|---|
| 3509 | """ |
|---|
| 3510 | k = len(args) |
|---|
| 3511 | if k < 3: |
|---|
| 3512 | raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n' |
|---|
| 3513 | n = len(args[0]) |
|---|
| 3514 | data = apply(pstat.aabut,args) |
|---|
| 3515 | data = data.astype(N.Float) |
|---|
| 3516 | for i in range(len(data)): |
|---|
| 3517 | data[i] = arankdata(data[i]) |
|---|
| 3518 | ssbn = asum(asum(args,1)**2) |
|---|
| 3519 | chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) |
|---|
| 3520 | return chisq, chisqprob(chisq,k-1) |
|---|
| 3521 | |
|---|
| 3522 | |
|---|
| 3523 | ##################################### |
|---|
| 3524 | #### APROBABILITY CALCULATIONS #### |
|---|
| 3525 | ##################################### |
|---|
| 3526 | |
|---|
| 3527 | def achisqprob(chisq,df): |
|---|
| 3528 | """ |
|---|
| 3529 | Returns the (1-tail) probability value associated with the provided chi-square |
|---|
| 3530 | value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can |
|---|
| 3531 | handle multiple dimensions. |
|---|
| 3532 | |
|---|
| 3533 | Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom |
|---|
| 3534 | """ |
|---|
| 3535 | BIG = 200.0 |
|---|
| 3536 | def ex(x): |
|---|
| 3537 | BIG = 200.0 |
|---|
| 3538 | exponents = N.where(N.less(x,-BIG),-BIG,x) |
|---|
| 3539 | return N.exp(exponents) |
|---|
| 3540 | |
|---|
| 3541 | if type(chisq) == N.ArrayType: |
|---|
| 3542 | arrayflag = 1 |
|---|
| 3543 | else: |
|---|
| 3544 | arrayflag = 0 |
|---|
| 3545 | chisq = N.array([chisq]) |
|---|
| 3546 | if df < 1: |
|---|
| 3547 | return N.ones(chisq.shape,N.float) |
|---|
| 3548 | probs = N.zeros(chisq.shape,N.Float) |
|---|
| 3549 | probs = N.where(N.less_equal(chisq,0),1.0,probs) # set prob=1 for chisq<0 |
|---|
| 3550 | a = 0.5 * chisq |
|---|
| 3551 | if df > 1: |
|---|
| 3552 | y = ex(-a) |
|---|
| 3553 | if df%2 == 0: |
|---|
| 3554 | even = 1 |
|---|
| 3555 | s = y*1 |
|---|
| 3556 | s2 = s*1 |
|---|
| 3557 | else: |
|---|
| 3558 | even = 0 |
|---|
| 3559 | s = 2.0 * azprob(-N.sqrt(chisq)) |
|---|
| 3560 | s2 = s*1 |
|---|
| 3561 | if (df > 2): |
|---|
| 3562 | chisq = 0.5 * (df - 1.0) |
|---|
| 3563 | if even: |
|---|
| 3564 | z = N.ones(probs.shape,N.Float) |
|---|
| 3565 | else: |
|---|
| 3566 | z = 0.5 *N.ones(probs.shape,N.Float) |
|---|
| 3567 | if even: |
|---|
| 3568 | e = N.zeros(probs.shape,N.Float) |
|---|
| 3569 | else: |
|---|
| 3570 | e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.Float) |
|---|
| 3571 | c = N.log(a) |
|---|
| 3572 | mask = N.zeros(probs.shape) |
|---|
| 3573 | a_big = N.greater(a,BIG) |
|---|
| 3574 | a_big_frozen = -1 *N.ones(probs.shape,N.Float) |
|---|
| 3575 | totalelements = N.multiply.reduce(N.array(probs.shape)) |
|---|
| 3576 | while asum(mask)<>totalelements: |
|---|
| 3577 | e = N.log(z) + e |
|---|
| 3578 | s = s + ex(c*z-a-e) |
|---|
| 3579 | z = z + 1.0 |
|---|
| 3580 | # print z, e, s |
|---|
| 3581 | newmask = N.greater(z,chisq) |
|---|
| 3582 | a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen) |
|---|
| 3583 | mask = N.clip(newmask+mask,0,1) |
|---|
| 3584 | if even: |
|---|
| 3585 | z = N.ones(probs.shape,N.Float) |
|---|
| 3586 | e = N.ones(probs.shape,N.Float) |
|---|
| 3587 | else: |
|---|
| 3588 | z = 0.5 *N.ones(probs.shape,N.Float) |
|---|
| 3589 | e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.Float) |
|---|
| 3590 | c = 0.0 |
|---|
| 3591 | mask = N.zeros(probs.shape) |
|---|
| 3592 | a_notbig_frozen = -1 *N.ones(probs.shape,N.Float) |
|---|
| 3593 | while asum(mask)<>totalelements: |
|---|
| 3594 | e = e * (a/z.astype(N.Float)) |
|---|
| 3595 | c = c + e |
|---|
| 3596 | z = z + 1.0 |
|---|
| 3597 | # print '#2', z, e, c, s, c*y+s2 |
|---|
| 3598 | newmask = N.greater(z,chisq) |
|---|
| 3599 | a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big), |
|---|
| 3600 | c*y+s2, a_notbig_frozen) |
|---|
| 3601 | mask = N.clip(newmask+mask,0,1) |
|---|
| 3602 | probs = N.where(N.equal(probs,1),1, |
|---|
| 3603 | N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen)) |
|---|
| 3604 | return probs |
|---|
| 3605 | else: |
|---|
| 3606 | return s |
|---|
| 3607 | |
|---|
| 3608 | |
|---|
| 3609 | def aerfcc(x): |
|---|
| 3610 | """ |
|---|
| 3611 | Returns the complementary error function erfc(x) with fractional error |
|---|
| 3612 | everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can |
|---|
| 3613 | handle multiple dimensions. |
|---|
| 3614 | |
|---|
| 3615 | Usage: aerfcc(x) |
|---|
| 3616 | """ |
|---|
| 3617 | z = abs(x) |
|---|
| 3618 | t = 1.0 / (1.0+0.5*z) |
|---|
| 3619 | ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) |
|---|
| 3620 | return N.where(N.greater_equal(x,0), ans, 2.0-ans) |
|---|
| 3621 | |
|---|
| 3622 | |
|---|
| 3623 | def azprob(z): |
|---|
| 3624 | """ |
|---|
| 3625 | Returns the area under the normal curve 'to the left of' the given z value. |
|---|
| 3626 | Thus, |
|---|
| 3627 | for z<0, zprob(z) = 1-tail probability |
|---|
| 3628 | for z>0, 1.0-zprob(z) = 1-tail probability |
|---|
| 3629 | for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability |
|---|
| 3630 | Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions. |
|---|
| 3631 | |
|---|
| 3632 | Usage: azprob(z) where z is a z-value |
|---|
| 3633 | """ |
|---|
| 3634 | def yfunc(y): |
|---|
| 3635 | x = (((((((((((((-0.000045255659 * y |
|---|
| 3636 | +0.000152529290) * y -0.000019538132) * y |
|---|
| 3637 | -0.000676904986) * y +0.001390604284) * y |
|---|
| 3638 | -0.000794620820) * y -0.002034254874) * y |
|---|
| 3639 | +0.006549791214) * y -0.010557625006) * y |
|---|
| 3640 | +0.011630447319) * y -0.009279453341) * y |
|---|
| 3641 | +0.005353579108) * y -0.002141268741) * y |
|---|
| 3642 | +0.000535310849) * y +0.999936657524 |
|---|
| 3643 | return x |
|---|
| 3644 | |
|---|
| 3645 | def wfunc(w): |
|---|
| 3646 | x = ((((((((0.000124818987 * w |
|---|
| 3647 | -0.001075204047) * w +0.005198775019) * w |
|---|
| 3648 | -0.019198292004) * w +0.059054035642) * w |
|---|
| 3649 | -0.151968751364) * w +0.319152932694) * w |
|---|
| 3650 | -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0 |
|---|
| 3651 | return x |
|---|
| 3652 | |
|---|
| 3653 | Z_MAX = 6.0 # maximum meaningful z-value |
|---|
| 3654 | x = N.zeros(z.shape,N.Float) # initialize |
|---|
| 3655 | y = 0.5 * N.fabs(z) |
|---|
| 3656 | x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0)) # get x's |
|---|
| 3657 | x = N.where(N.greater(y,Z_MAX*0.5),1.0,x) # kill those with big Z |
|---|
| 3658 | prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5) |
|---|
| 3659 | return prob |
|---|
| 3660 | |
|---|
| 3661 | |
|---|
| 3662 | def aksprob(alam): |
|---|
| 3663 | """ |
|---|
| 3664 | Returns the probability value for a K-S statistic computed via ks_2samp. |
|---|
| 3665 | Adapted from Numerical Recipies. Can handle multiple dimensions. |
|---|
| 3666 | |
|---|
| 3667 | Usage: aksprob(alam) |
|---|
| 3668 | """ |
|---|
| 3669 | if type(alam) == N.ArrayType: |
|---|
| 3670 | frozen = -1 *N.ones(alam.shape,N.Float64) |
|---|
| 3671 | alam = alam.astype(N.Float64) |
|---|
| 3672 | arrayflag = 1 |
|---|
| 3673 | else: |
|---|
| 3674 | frozen = N.array(-1.) |
|---|
| 3675 | alam = N.array(alam,N.Float64) |
|---|
| 3676 | mask = N.zeros(alam.shape) |
|---|
| 3677 | fac = 2.0 *N.ones(alam.shape,N.Float) |
|---|
| 3678 | sum = N.zeros(alam.shape,N.Float) |
|---|
| 3679 | termbf = N.zeros(alam.shape,N.Float) |
|---|
| 3680 | a2 = N.array(-2.0*alam*alam,N.Float64) |
|---|
| 3681 | totalelements = N.multiply.reduce(N.array(mask.shape)) |
|---|
| 3682 | for j in range(1,201): |
|---|
| 3683 | if asum(mask) == totalelements: |
|---|
| 3684 | break |
|---|
| 3685 | exponents = (a2*j*j) |
|---|
| 3686 | overflowmask = N.less(exponents,-746) |
|---|
| 3687 | frozen = N.where(overflowmask,0,frozen) |
|---|
| 3688 | mask = mask+overflowmask |
|---|
| 3689 | term = fac*N.exp(exponents) |
|---|
| 3690 | sum = sum + term |
|---|
| 3691 | newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) + |
|---|
| 3692 | N.less(abs(term),1.0e-8*sum), 1, 0) |
|---|
| 3693 | frozen = N.where(newmask*N.equal(mask,0), sum, frozen) |
|---|
| 3694 | mask = N.clip(mask+newmask,0,1) |
|---|
| 3695 | fac = -fac |
|---|
| 3696 | termbf = abs(term) |
|---|
| 3697 | if arrayflag: |
|---|
| 3698 | return N.where(N.equal(frozen,-1), 1.0, frozen) # 1.0 if doesn't converge |
|---|
| 3699 | else: |
|---|
| 3700 | return N.where(N.equal(frozen,-1), 1.0, frozen)[0] # 1.0 if doesn't converge |
|---|
| 3701 | |
|---|
| 3702 | |
|---|
| 3703 | def afprob (dfnum, dfden, F): |
|---|
| 3704 | """ |
|---|
| 3705 | Returns the 1-tailed significance level (p-value) of an F statistic |
|---|
| 3706 | given the degrees of freedom for the numerator (dfR-dfF) and the degrees |
|---|
| 3707 | of freedom for the denominator (dfF). Can handle multiple dims for F. |
|---|
| 3708 | |
|---|
| 3709 | Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn |
|---|
| 3710 | """ |
|---|
| 3711 | if type(F) == N.ArrayType: |
|---|
| 3712 | return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F)) |
|---|
| 3713 | else: |
|---|
| 3714 | return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) |
|---|
| 3715 | |
|---|
| 3716 | |
|---|
| 3717 | def abetacf(a,b,x,verbose=1): |
|---|
| 3718 | """ |
|---|
| 3719 | Evaluates the continued fraction form of the incomplete Beta function, |
|---|
| 3720 | betai. (Adapted from: Numerical Recipies in C.) Can handle multiple |
|---|
| 3721 | dimensions for x. |
|---|
| 3722 | |
|---|
| 3723 | Usage: abetacf(a,b,x,verbose=1) |
|---|
| 3724 | """ |
|---|
| 3725 | ITMAX = 200 |
|---|
| 3726 | EPS = 3.0e-7 |
|---|
| 3727 | |
|---|
| 3728 | arrayflag = 1 |
|---|
| 3729 | if type(x) == N.ArrayType: |
|---|
| 3730 | frozen = N.ones(x.shape,N.Float) *-1 #start out w/ -1s, should replace all |
|---|
| 3731 | else: |
|---|
| 3732 | arrayflag = 0 |
|---|
| 3733 | frozen = N.array([-1]) |
|---|
| 3734 | x = N.array([x]) |
|---|
| 3735 | mask = N.zeros(x.shape) |
|---|
| 3736 | bm = az = am = 1.0 |
|---|
| 3737 | qab = a+b |
|---|
| 3738 | qap = a+1.0 |
|---|
| 3739 | qam = a-1.0 |
|---|
| 3740 | bz = 1.0-qab*x/qap |
|---|
| 3741 | for i in range(ITMAX+1): |
|---|
| 3742 | if N.sum(N.ravel(N.equal(frozen,-1)))==0: |
|---|
| 3743 | break |
|---|
| 3744 | em = float(i+1) |
|---|
| 3745 | tem = em + em |
|---|
| 3746 | d = em*(b-em)*x/((qam+tem)*(a+tem)) |
|---|
| 3747 | ap = az + d*am |
|---|
| 3748 | bp = bz+d*bm |
|---|
| 3749 | d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) |
|---|
| 3750 | app = ap+d*az |
|---|
| 3751 | bpp = bp+d*bz |
|---|
| 3752 | aold = az*1 |
|---|
| 3753 | am = ap/bpp |
|---|
| 3754 | bm = bp/bpp |
|---|
| 3755 | az = app/bpp |
|---|
| 3756 | bz = 1.0 |
|---|
| 3757 | newmask = N.less(abs(az-aold),EPS*abs(az)) |
|---|
| 3758 | frozen = N.where(newmask*N.equal(mask,0), az, frozen) |
|---|
| 3759 | mask = N.clip(mask+newmask,0,1) |
|---|
| 3760 | noconverge = asum(N.equal(frozen,-1)) |
|---|
| 3761 | if noconverge <> 0 and verbose: |
|---|
| 3762 | print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements' |
|---|
| 3763 | if arrayflag: |
|---|
| 3764 | return frozen |
|---|
| 3765 | else: |
|---|
| 3766 | return frozen[0] |
|---|
| 3767 | |
|---|
| 3768 | |
|---|
| 3769 | def agammln(xx): |
|---|
| 3770 | """ |
|---|
| 3771 | Returns the gamma function of xx. |
|---|
| 3772 | Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. |
|---|
| 3773 | Adapted from: Numerical Recipies in C. Can handle multiple dims ... but |
|---|
| 3774 | probably doesn't normally have to. |
|---|
| 3775 | |
|---|
| 3776 | Usage: agammln(xx) |
|---|
| 3777 | """ |
|---|
| 3778 | coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, |
|---|
| 3779 | 0.120858003e-2, -0.536382e-5] |
|---|
| 3780 | x = xx - 1.0 |
|---|
| 3781 | tmp = x + 5.5 |
|---|
| 3782 | tmp = tmp - (x+0.5)*N.log(tmp) |
|---|
| 3783 | ser = 1.0 |
|---|
| 3784 | for j in range(len(coeff)): |
|---|
| 3785 | x = x + 1 |
|---|
| 3786 | ser = ser + coeff[j]/x |
|---|
| 3787 | return -tmp + N.log(2.50662827465*ser) |
|---|
| 3788 | |
|---|
| 3789 | |
|---|
| 3790 | def abetai(a,b,x,verbose=1): |
|---|
| 3791 | """ |
|---|
| 3792 | Returns the incomplete beta function: |
|---|
| 3793 | |
|---|
| 3794 | I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) |
|---|
| 3795 | |
|---|
| 3796 | where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma |
|---|
| 3797 | function of a. The continued fraction formulation is implemented |
|---|
| 3798 | here, using the betacf function. (Adapted from: Numerical Recipies in |
|---|
| 3799 | C.) Can handle multiple dimensions. |
|---|
| 3800 | |
|---|
| 3801 | Usage: abetai(a,b,x,verbose=1) |
|---|
| 3802 | """ |
|---|
| 3803 | TINY = 1e-15 |
|---|
| 3804 | if type(a) == N.ArrayType: |
|---|
| 3805 | if asum(N.less(x,0)+N.greater(x,1)) <> 0: |
|---|
| 3806 | raise ValueError, 'Bad x in abetai' |
|---|
| 3807 | x = N.where(N.equal(x,0),TINY,x) |
|---|
| 3808 | x = N.where(N.equal(x,1.0),1-TINY,x) |
|---|
| 3809 | |
|---|
| 3810 | bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1) |
|---|
| 3811 | exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b* |
|---|
| 3812 | N.log(1.0-x) ) |
|---|
| 3813 | # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW |
|---|
| 3814 | exponents = N.where(N.less(exponents,-740),-740,exponents) |
|---|
| 3815 | bt = N.exp(exponents) |
|---|
| 3816 | if type(x) == N.ArrayType: |
|---|
| 3817 | ans = N.where(N.less(x,(a+1)/(a+b+2.0)), |
|---|
| 3818 | bt*abetacf(a,b,x,verbose)/float(a), |
|---|
| 3819 | 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)) |
|---|
| 3820 | else: |
|---|
| 3821 | if x<(a+1)/(a+b+2.0): |
|---|
| 3822 | ans = bt*abetacf(a,b,x,verbose)/float(a) |
|---|
| 3823 | else: |
|---|
| 3824 | ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b) |
|---|
| 3825 | return ans |
|---|
| 3826 | |
|---|
| 3827 | |
|---|
| 3828 | ##################################### |
|---|
| 3829 | ####### AANOVA CALCULATIONS ####### |
|---|
| 3830 | ##################################### |
|---|
| 3831 | |
|---|
| 3832 | import LinearAlgebra, operator |
|---|
| 3833 | LA = LinearAlgebra |
|---|
| 3834 | |
|---|
| 3835 | def aglm(data,para): |
|---|
| 3836 | """ |
|---|
| 3837 | Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken |
|---|
| 3838 | from: |
|---|
| 3839 | Peterson et al. Statistical limitations in functional neuroimaging |
|---|
| 3840 | I. Non-inferential methods and statistical models. Phil Trans Royal Soc |
|---|
| 3841 | Lond B 354: 1239-1260. |
|---|
| 3842 | |
|---|
| 3843 | Usage: aglm(data,para) |
|---|
| 3844 | Returns: statistic, p-value ??? |
|---|
| 3845 | """ |
|---|
| 3846 | if len(para) <> len(data): |
|---|
| 3847 | print "data and para must be same length in aglm" |
|---|
| 3848 | return |
|---|
| 3849 | n = len(para) |
|---|
| 3850 | p = pstat.aunique(para) |
|---|
| 3851 | x = N.zeros((n,len(p))) # design matrix |
|---|
| 3852 | for l in range(len(p)): |
|---|
| 3853 | x[:,l] = N.equal(para,p[l]) |
|---|
| 3854 | b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x),x)), # i.e., b=inv(X'X)X'Y |
|---|
| 3855 | N.transpose(x)), |
|---|
| 3856 | data) |
|---|
| 3857 | diffs = (data - N.dot(x,b)) |
|---|
| 3858 | s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs) |
|---|
| 3859 | |
|---|
| 3860 | if len(p) == 2: # ttest_ind |
|---|
| 3861 | c = N.array([1,-1]) |
|---|
| 3862 | df = n-2 |
|---|
| 3863 | fact = asum(1.0/asum(x,0)) # i.e., 1/n1 + 1/n2 + 1/n3 ... |
|---|
| 3864 | t = N.dot(c,b) / N.sqrt(s_sq*fact) |
|---|
| 3865 | probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) |
|---|
| 3866 | return t, probs |
|---|
| 3867 | |
|---|
| 3868 | |
|---|
| 3869 | def aF_oneway(*args): |
|---|
| 3870 | """ |
|---|
| 3871 | Performs a 1-way ANOVA, returning an F-value and probability given |
|---|
| 3872 | any number of groups. From Heiman, pp.394-7. |
|---|
| 3873 | |
|---|
| 3874 | Usage: aF_oneway (*args) where *args is 2 or more arrays, one per |
|---|
| 3875 | treatment group |
|---|
| 3876 | Returns: f-value, probability |
|---|
| 3877 | """ |
|---|
| 3878 | na = len(args) # ANOVA on 'na' groups, each in it's own array |
|---|
| 3879 | means = [0]*na |
|---|
| 3880 | vars = [0]*na |
|---|
| 3881 | ns = [0]*na |
|---|
| 3882 | alldata = [] |
|---|
| 3883 | tmp = map(N.array,args) |
|---|
| 3884 | means = map(amean,tmp) |
|---|
| 3885 | vars = map(avar,tmp) |
|---|
| 3886 | ns = map(len,args) |
|---|
| 3887 | alldata = N.concatenate(args) |
|---|
| 3888 | bign = len(alldata) |
|---|
| 3889 | sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) |
|---|
| 3890 | ssbn = 0 |
|---|
| 3891 | for a in args: |
|---|
| 3892 | ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a)) |
|---|
| 3893 | ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) |
|---|
| 3894 | sswn = sstot-ssbn |
|---|
| 3895 | dfbn = na-1 |
|---|
| 3896 | dfwn = bign - na |
|---|
| 3897 | msb = ssbn/float(dfbn) |
|---|
| 3898 | msw = sswn/float(dfwn) |
|---|
| 3899 | f = msb/msw |
|---|
| 3900 | prob = fprob(dfbn,dfwn,f) |
|---|
| 3901 | return f, prob |
|---|
| 3902 | |
|---|
| 3903 | |
|---|
| 3904 | def aF_value (ER,EF,dfR,dfF): |
|---|
| 3905 | """ |
|---|
| 3906 | Returns an F-statistic given the following: |
|---|
| 3907 | ER = error associated with the null hypothesis (the Restricted model) |
|---|
| 3908 | EF = error associated with the alternate hypothesis (the Full model) |
|---|
| 3909 | dfR = degrees of freedom the Restricted model |
|---|
| 3910 | dfF = degrees of freedom associated with the Restricted model |
|---|
| 3911 | """ |
|---|
| 3912 | return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF))) |
|---|
| 3913 | |
|---|
| 3914 | |
|---|
| 3915 | def outputfstats(Enum, Eden, dfnum, dfden, f, prob): |
|---|
| 3916 | Enum = round(Enum,3) |
|---|
| 3917 | Eden = round(Eden,3) |
|---|
| 3918 | dfnum = round(Enum,3) |
|---|
| 3919 | dfden = round(dfden,3) |
|---|
| 3920 | f = round(f,3) |
|---|
| 3921 | prob = round(prob,3) |
|---|
| 3922 | suffix = '' # for *s after the p-value |
|---|
| 3923 | if prob < 0.001: suffix = ' ***' |
|---|
| 3924 | elif prob < 0.01: suffix = ' **' |
|---|
| 3925 | elif prob < 0.05: suffix = ' *' |
|---|
| 3926 | title = [['EF/ER','DF','Mean Square','F-value','prob','']] |
|---|
| 3927 | lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix], |
|---|
| 3928 | [Eden, dfden, round(Eden/float(dfden),3),'','','']] |
|---|
| 3929 | pstat.printcc(lofl) |
|---|
| 3930 | return |
|---|
| 3931 | |
|---|
| 3932 | |
|---|
| 3933 | def F_value_multivariate(ER, EF, dfnum, dfden): |
|---|
| 3934 | """ |
|---|
| 3935 | Returns an F-statistic given the following: |
|---|
| 3936 | ER = error associated with the null hypothesis (the Restricted model) |
|---|
| 3937 | EF = error associated with the alternate hypothesis (the Full model) |
|---|
| 3938 | dfR = degrees of freedom the Restricted model |
|---|
| 3939 | dfF = degrees of freedom associated with the Restricted model |
|---|
| 3940 | where ER and EF are matrices from a multivariate F calculation. |
|---|
| 3941 | """ |
|---|
| 3942 | if type(ER) in [IntType, FloatType]: |
|---|
| 3943 | ER = N.array([[ER]]) |
|---|
| 3944 | if type(EF) in [IntType, FloatType]: |
|---|
| 3945 | EF = N.array([[EF]]) |
|---|
| 3946 | n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum) |
|---|
| 3947 | d_en = LA.determinant(EF) / float(dfden) |
|---|
| 3948 | return n_um / d_en |
|---|
| 3949 | |
|---|
| 3950 | |
|---|
| 3951 | ##################################### |
|---|
| 3952 | ####### ASUPPORT FUNCTIONS ######## |
|---|
| 3953 | ##################################### |
|---|
| 3954 | |
|---|
| 3955 | def asign(a): |
|---|
| 3956 | """ |
|---|
| 3957 | Usage: asign(a) |
|---|
| 3958 | Returns: array shape of a, with -1 where a<0 and +1 where a>=0 |
|---|
| 3959 | """ |
|---|
| 3960 | a = N.asarray(a) |
|---|
| 3961 | if ((type(a) == type(1.4)) or (type(a) == type(1))): |
|---|
| 3962 | return a-a-N.less(a,0)+N.greater(a,0) |
|---|
| 3963 | else: |
|---|
| 3964 | return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0) |
|---|
| 3965 | |
|---|
| 3966 | |
|---|
| 3967 | def asum (a, dimension=None,keepdims=0): |
|---|
| 3968 | """ |
|---|
| 3969 | An alternative to the Numeric.add.reduce function, which allows one to |
|---|
| 3970 | (1) collapse over multiple dimensions at once, and/or (2) to retain |
|---|
| 3971 | all dimensions in the original array (squashing one down to size. |
|---|
| 3972 | Dimension can equal None (ravel array first), an integer (the |
|---|
| 3973 | dimension over which to operate), or a sequence (operate over multiple |
|---|
| 3974 | dimensions). If keepdims=1, the resulting array will have as many |
|---|
| 3975 | dimensions as the input array. |
|---|
| 3976 | |
|---|
| 3977 | Usage: asum(a, dimension=None, keepdims=0) |
|---|
| 3978 | Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1 |
|---|
| 3979 | """ |
|---|
| 3980 | if type(a) == N.ArrayType and a.typecode() in ['l','s','b']: |
|---|
| 3981 | a = a.astype(N.Float) |
|---|
| 3982 | if dimension == None: |
|---|
| 3983 | s = N.sum(N.ravel(a)) |
|---|
| 3984 | elif type(dimension) in [IntType,FloatType]: |
|---|
| 3985 | s = N.add.reduce(a, dimension) |
|---|
| 3986 | if keepdims == 1: |
|---|
| 3987 | shp = list(a.shape) |
|---|
| 3988 | shp[dimension] = 1 |
|---|
| 3989 | s = N.reshape(s,shp) |
|---|
| 3990 | else: # must be a SEQUENCE of dims to sum over |
|---|
| 3991 | dims = list(dimension) |
|---|
| 3992 | dims.sort() |
|---|
| 3993 | dims.reverse() |
|---|
| 3994 | s = a *1.0 |
|---|
| 3995 | for dim in dims: |
|---|
| 3996 | s = N.add.reduce(s,dim) |
|---|
| 3997 | if keepdims == 1: |
|---|
| 3998 | shp = list(a.shape) |
|---|
| 3999 | for dim in dims: |
|---|
| 4000 | shp[dim] = 1 |
|---|
| 4001 | s = N.reshape(s,shp) |
|---|
| 4002 | return s |
|---|
| 4003 | |
|---|
| 4004 | |
|---|
| 4005 | def acumsum (a,dimension=None): |
|---|
| 4006 | """ |
|---|
| 4007 | Returns an array consisting of the cumulative sum of the items in the |
|---|
| 4008 | passed array. Dimension can equal None (ravel array first), an |
|---|
| 4009 | integer (the dimension over which to operate), or a sequence (operate |
|---|
| 4010 | over multiple dimensions, but this last one just barely makes sense). |
|---|
| 4011 | |
|---|
| 4012 | Usage: acumsum(a,dimension=None) |
|---|
| 4013 | """ |
|---|
| 4014 | if dimension == None: |
|---|
| 4015 | a = N.ravel(a) |
|---|
| 4016 | dimension = 0 |
|---|
| 4017 | if type(dimension) in [ListType, TupleType, N.ArrayType]: |
|---|
| 4018 | dimension = list(dimension) |
|---|
| 4019 | dimension.sort() |
|---|
| 4020 | dimension.reverse() |
|---|
| 4021 | for d in dimension: |
|---|
| 4022 | a = N.add.accumulate(a,d) |
|---|
| 4023 | return a |
|---|
| 4024 | else: |
|---|
| 4025 | return N.add.accumulate(a,dimension) |
|---|
| 4026 | |
|---|
| 4027 | |
|---|
| 4028 | def ass(inarray, dimension=None, keepdims=0): |
|---|
| 4029 | """ |
|---|
| 4030 | Squares each value in the passed array, adds these squares & returns |
|---|
| 4031 | the result. Unfortunate function name. :-) Defaults to ALL values in |
|---|
| 4032 | the array. Dimension can equal None (ravel array first), an integer |
|---|
| 4033 | (the dimension over which to operate), or a sequence (operate over |
|---|
| 4034 | multiple dimensions). Set keepdims=1 to maintain the original number |
|---|
| 4035 | of dimensions. |
|---|
| 4036 | |
|---|
| 4037 | Usage: ass(inarray, dimension=None, keepdims=0) |
|---|
| 4038 | Returns: sum-along-'dimension' for (inarray*inarray) |
|---|
| 4039 | """ |
|---|
| 4040 | if dimension == None: |
|---|
| 4041 | inarray = N.ravel(inarray) |
|---|
| 4042 | dimension = 0 |
|---|
| 4043 | return asum(inarray*inarray,dimension,keepdims) |
|---|
| 4044 | |
|---|
| 4045 | |
|---|
| 4046 | def asummult (array1,array2,dimension=None,keepdims=0): |
|---|
| 4047 | """ |
|---|
| 4048 | Multiplies elements in array1 and array2, element by element, and |
|---|
| 4049 | returns the sum (along 'dimension') of all resulting multiplications. |
|---|
| 4050 | Dimension can equal None (ravel array first), an integer (the |
|---|
| 4051 | dimension over which to operate), or a sequence (operate over multiple |
|---|
| 4052 | dimensions). A trivial function, but included for completeness. |
|---|
| 4053 | |
|---|
| 4054 | Usage: asummult(array1,array2,dimension=None,keepdims=0) |
|---|
| 4055 | """ |
|---|
| 4056 | if dimension == None: |
|---|
| 4057 | array1 = N.ravel(array1) |
|---|
| 4058 | array2 = N.ravel(array2) |
|---|
| 4059 | dimension = 0 |
|---|
| 4060 | return asum(array1*array2,dimension,keepdims) |
|---|
| 4061 | |
|---|
| 4062 | |
|---|
| 4063 | def asquare_of_sums(inarray, dimension=None, keepdims=0): |
|---|
| 4064 | """ |
|---|
| 4065 | Adds the values in the passed array, squares that sum, and returns the |
|---|
| 4066 | result. Dimension can equal None (ravel array first), an integer (the |
|---|
| 4067 | dimension over which to operate), or a sequence (operate over multiple |
|---|
| 4068 | dimensions). If keepdims=1, the returned array will have the same |
|---|
| 4069 | NUMBER of dimensions as the original. |
|---|
| 4070 | |
|---|
| 4071 | Usage: asquare_of_sums(inarray, dimension=None, keepdims=0) |
|---|
| 4072 | Returns: the square of the sum over dim(s) in dimension |
|---|
| 4073 | """ |
|---|
| 4074 | if dimension == None: |
|---|
| 4075 | inarray = N.ravel(inarray) |
|---|
| 4076 | dimension = 0 |
|---|
| 4077 | s = asum(inarray,dimension,keepdims) |
|---|
| 4078 | if type(s) == N.ArrayType: |
|---|
| 4079 | return s.astype(N.Float)*s |
|---|
| 4080 | else: |
|---|
| 4081 | return float(s)*s |
|---|
| 4082 | |
|---|
| 4083 | |
|---|
| 4084 | def asumdiffsquared(a,b, dimension=None, keepdims=0): |
|---|
| 4085 | """ |
|---|
| 4086 | Takes pairwise differences of the values in arrays a and b, squares |
|---|
| 4087 | these differences, and returns the sum of these squares. Dimension |
|---|
| 4088 | can equal None (ravel array first), an integer (the dimension over |
|---|
| 4089 | which to operate), or a sequence (operate over multiple dimensions). |
|---|
| 4090 | keepdims=1 means the return shape = len(a.shape) = len(b.shape) |
|---|
| 4091 | |
|---|
| 4092 | Usage: asumdiffsquared(a,b) |
|---|
| 4093 | Returns: sum[ravel(a-b)**2] |
|---|
| 4094 | """ |
|---|
| 4095 | if dimension == None: |
|---|
| 4096 | inarray = N.ravel(a) |
|---|
| 4097 | dimension = 0 |
|---|
| 4098 | return asum((a-b)**2,dimension,keepdims) |
|---|
| 4099 | |
|---|
| 4100 | |
|---|
| 4101 | def ashellsort(inarray): |
|---|
| 4102 | """ |
|---|
| 4103 | Shellsort algorithm. Sorts a 1D-array. |
|---|
| 4104 | |
|---|
| 4105 | Usage: ashellsort(inarray) |
|---|
| 4106 | Returns: sorted-inarray, sorting-index-vector (for original array) |
|---|
| 4107 | """ |
|---|
| 4108 | n = len(inarray) |
|---|
| 4109 | svec = inarray *1.0 |
|---|
| 4110 | ivec = range(n) |
|---|
| 4111 | gap = n/2 # integer division needed |
|---|
| 4112 | while gap >0: |
|---|
| 4113 | for i in range(gap,n): |
|---|
| 4114 | for j in range(i-gap,-1,-gap): |
|---|
| 4115 | while j>=0 and svec[j]>svec[j+gap]: |
|---|
| 4116 | temp = svec[j] |
|---|
| 4117 | svec[j] = svec[j+gap] |
|---|
| 4118 | svec[j+gap] = temp |
|---|
| 4119 | itemp = ivec[j] |
|---|
| 4120 | ivec[j] = ivec[j+gap] |
|---|
| 4121 | ivec[j+gap] = itemp |
|---|
| 4122 | gap = gap / 2 # integer division needed |
|---|
| 4123 | # svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]] |
|---|
| 4124 | return svec, ivec |
|---|
| 4125 | |
|---|
| 4126 | |
|---|
| 4127 | def arankdata(inarray): |
|---|
| 4128 | """ |
|---|
| 4129 | Ranks the data in inarray, dealing with ties appropritely. Assumes |
|---|
| 4130 | a 1D inarray. Adapted from Gary Perlman's |Stat ranksort. |
|---|
| 4131 | |
|---|
| 4132 | Usage: arankdata(inarray) |
|---|
| 4133 | Returns: array of length equal to inarray, containing rank scores |
|---|
| 4134 | """ |
|---|
| 4135 | n = len(inarray) |
|---|
| 4136 | svec, ivec = ashellsort(inarray) |
|---|
| 4137 | sumranks = 0 |
|---|
| 4138 | dupcount = 0 |
|---|
| 4139 | newarray = N.zeros(n,N.Float) |
|---|
| 4140 | for i in range(n): |
|---|
| 4141 | sumranks = sumranks + i |
|---|
| 4142 | dupcount = dupcount + 1 |
|---|
| 4143 | if i==n-1 or svec[i] <> svec[i+1]: |
|---|
| 4144 | averank = sumranks / float(dupcount) + 1 |
|---|
| 4145 | for j in range(i-dupcount+1,i+1): |
|---|
| 4146 | newarray[ivec[j]] = averank |
|---|
| 4147 | sumranks = 0 |
|---|
| 4148 | dupcount = 0 |
|---|
| 4149 | return newarray |
|---|
| 4150 | |
|---|
| 4151 | |
|---|
| 4152 | def afindwithin(data): |
|---|
| 4153 | """ |
|---|
| 4154 | Returns a binary vector, 1=within-subject factor, 0=between. Input |
|---|
| 4155 | equals the entire data array (i.e., column 1=random factor, last |
|---|
| 4156 | column = measured values. |
|---|
| 4157 | |
|---|
| 4158 | Usage: afindwithin(data) data in |Stat format |
|---|
| 4159 | """ |
|---|
| 4160 | numfact = len(data[0])-2 |
|---|
| 4161 | withinvec = [0]*numfact |
|---|
| 4162 | for col in range(1,numfact+1): |
|---|
| 4163 | rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0]) # get 1 level of this factor |
|---|
| 4164 | if len(pstat.unique(pstat.colex(rows,0))) < len(rows): # if fewer subjects than scores on this factor |
|---|
| 4165 | withinvec[col-1] = 1 |
|---|
| 4166 | return withinvec |
|---|
| 4167 | |
|---|
| 4168 | |
|---|
| 4169 | ######################################################### |
|---|
| 4170 | ######################################################### |
|---|
| 4171 | ###### RE-DEFINE DISPATCHES TO INCLUDE ARRAYS ######### |
|---|
| 4172 | ######################################################### |
|---|
| 4173 | ######################################################### |
|---|
| 4174 | |
|---|
| 4175 | ## CENTRAL TENDENCY: |
|---|
| 4176 | geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), |
|---|
| 4177 | (ageometricmean, (N.ArrayType,)) ) |
|---|
| 4178 | harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), |
|---|
| 4179 | (aharmonicmean, (N.ArrayType,)) ) |
|---|
| 4180 | mean = Dispatch ( (lmean, (ListType, TupleType)), |
|---|
| 4181 | (amean, (N.ArrayType,)) ) |
|---|
| 4182 | median = Dispatch ( (lmedian, (ListType, TupleType)), |
|---|
| 4183 | (amedian, (N.ArrayType,)) ) |
|---|
| 4184 | medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), |
|---|
| 4185 | (amedianscore, (N.ArrayType,)) ) |
|---|
| 4186 | mode = Dispatch ( (lmode, (ListType, TupleType)), |
|---|
| 4187 | (amode, (N.ArrayType,)) ) |
|---|
| 4188 | tmean = Dispatch ( (atmean, (N.ArrayType,)) ) |
|---|
| 4189 | tvar = Dispatch ( (atvar, (N.ArrayType,)) ) |
|---|
| 4190 | tstdev = Dispatch ( (atstdev, (N.ArrayType,)) ) |
|---|
| 4191 | tsem = Dispatch ( (atsem, (N.ArrayType,)) ) |
|---|
| 4192 | |
|---|
| 4193 | ## VARIATION: |
|---|
| 4194 | moment = Dispatch ( (lmoment, (ListType, TupleType)), |
|---|
| 4195 | (amoment, (N.ArrayType,)) ) |
|---|
| 4196 | variation = Dispatch ( (lvariation, (ListType, TupleType)), |
|---|
| 4197 | (avariation, (N.ArrayType,)) ) |
|---|
| 4198 | skew = Dispatch ( (lskew, (ListType, TupleType)), |
|---|
| 4199 | (askew, (N.ArrayType,)) ) |
|---|
| 4200 | kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), |
|---|
| 4201 | (akurtosis, (N.ArrayType,)) ) |
|---|
| 4202 | describe = Dispatch ( (ldescribe, (ListType, TupleType)), |
|---|
| 4203 | (adescribe, (N.ArrayType,)) ) |
|---|
| 4204 | |
|---|
| 4205 | ## DISTRIBUTION TESTS |
|---|
| 4206 | |
|---|
| 4207 | skewtest = Dispatch ( (askewtest, (ListType, TupleType)), |
|---|
| 4208 | (askewtest, (N.ArrayType,)) ) |
|---|
| 4209 | kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)), |
|---|
| 4210 | (akurtosistest, (N.ArrayType,)) ) |
|---|
| 4211 | normaltest = Dispatch ( (anormaltest, (ListType, TupleType)), |
|---|
| 4212 | (anormaltest, (N.ArrayType,)) ) |
|---|
| 4213 | |
|---|
| 4214 | ## FREQUENCY STATS: |
|---|
| 4215 | itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), |
|---|
| 4216 | (aitemfreq, (N.ArrayType,)) ) |
|---|
| 4217 | scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), |
|---|
| 4218 | (ascoreatpercentile, (N.ArrayType,)) ) |
|---|
| 4219 | percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), |
|---|
| 4220 | (apercentileofscore, (N.ArrayType,)) ) |
|---|
| 4221 | histogram = Dispatch ( (lhistogram, (ListType, TupleType)), |
|---|
| 4222 | (ahistogram, (N.ArrayType,)) ) |
|---|
| 4223 | cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), |
|---|
| 4224 | (acumfreq, (N.ArrayType,)) ) |
|---|
| 4225 | relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), |
|---|
| 4226 | (arelfreq, (N.ArrayType,)) ) |
|---|
| 4227 | |
|---|
| 4228 | ## VARIABILITY: |
|---|
| 4229 | obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), |
|---|
| 4230 | (aobrientransform, (N.ArrayType,)) ) |
|---|
| 4231 | samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), |
|---|
| 4232 | (asamplevar, (N.ArrayType,)) ) |
|---|
| 4233 | samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), |
|---|
| 4234 | (asamplestdev, (N.ArrayType,)) ) |
|---|
| 4235 | signaltonoise = Dispatch( (asignaltonoise, (N.ArrayType,)),) |
|---|
| 4236 | var = Dispatch ( (lvar, (ListType, TupleType)), |
|---|
| 4237 | (avar, (N.ArrayType,)) ) |
|---|
| 4238 | stdev = Dispatch ( (lstdev, (ListType, TupleType)), |
|---|
| 4239 | (astdev, (N.ArrayType,)) ) |
|---|
| 4240 | sterr = Dispatch ( (lsterr, (ListType, TupleType)), |
|---|
| 4241 | (asterr, (N.ArrayType,)) ) |
|---|
| 4242 | sem = Dispatch ( (lsem, (ListType, TupleType)), |
|---|
| 4243 | (asem, (N.ArrayType,)) ) |
|---|
| 4244 | z = Dispatch ( (lz, (ListType, TupleType)), |
|---|
| 4245 | (az, (N.ArrayType,)) ) |
|---|
| 4246 | zs = Dispatch ( (lzs, (ListType, TupleType)), |
|---|
| 4247 | (azs, (N.ArrayType,)) ) |
|---|
| 4248 | |
|---|
| 4249 | ## TRIMMING FCNS: |
|---|
| 4250 | threshold = Dispatch( (athreshold, (N.ArrayType,)),) |
|---|
| 4251 | trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), |
|---|
| 4252 | (atrimboth, (N.ArrayType,)) ) |
|---|
| 4253 | trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), |
|---|
| 4254 | (atrim1, (N.ArrayType,)) ) |
|---|
| 4255 | |
|---|
| 4256 | ## CORRELATION FCNS: |
|---|
| 4257 | paired = Dispatch ( (lpaired, (ListType, TupleType)), |
|---|
| 4258 | (apaired, (N.ArrayType,)) ) |
|---|
| 4259 | pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), |
|---|
| 4260 | (apearsonr, (N.ArrayType,)) ) |
|---|
| 4261 | spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), |
|---|
| 4262 | (aspearmanr, (N.ArrayType,)) ) |
|---|
| 4263 | pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), |
|---|
| 4264 | (apointbiserialr, (N.ArrayType,)) ) |
|---|
| 4265 | kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), |
|---|
| 4266 | (akendalltau, (N.ArrayType,)) ) |
|---|
| 4267 | linregress = Dispatch ( (llinregress, (ListType, TupleType)), |
|---|
| 4268 | (alinregress, (N.ArrayType,)) ) |
|---|
| 4269 | |
|---|
| 4270 | ## INFERENTIAL STATS: |
|---|
| 4271 | ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), |
|---|
| 4272 | (attest_1samp, (N.ArrayType,)) ) |
|---|
| 4273 | ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), |
|---|
| 4274 | (attest_ind, (N.ArrayType,)) ) |
|---|
| 4275 | ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), |
|---|
| 4276 | (attest_rel, (N.ArrayType,)) ) |
|---|
| 4277 | chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), |
|---|
| 4278 | (achisquare, (N.ArrayType,)) ) |
|---|
| 4279 | ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), |
|---|
| 4280 | (aks_2samp, (N.ArrayType,)) ) |
|---|
| 4281 | mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), |
|---|
| 4282 | (amannwhitneyu, (N.ArrayType,)) ) |
|---|
| 4283 | tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), |
|---|
| 4284 | (atiecorrect, (N.ArrayType,)) ) |
|---|
| 4285 | ranksums = Dispatch ( (lranksums, (ListType, TupleType)), |
|---|
| 4286 | (aranksums, (N.ArrayType,)) ) |
|---|
| 4287 | wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), |
|---|
| 4288 | (awilcoxont, (N.ArrayType,)) ) |
|---|
| 4289 | kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), |
|---|
| 4290 | (akruskalwallish, (N.ArrayType,)) ) |
|---|
| 4291 | friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), |
|---|
| 4292 | (afriedmanchisquare, (N.ArrayType,)) ) |
|---|
| 4293 | |
|---|
| 4294 | ## PROBABILITY CALCS: |
|---|
| 4295 | chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), |
|---|
| 4296 | (achisqprob, (N.ArrayType,)) ) |
|---|
| 4297 | zprob = Dispatch ( (lzprob, (IntType, FloatType)), |
|---|
| 4298 | (azprob, (N.ArrayType,)) ) |
|---|
| 4299 | ksprob = Dispatch ( (lksprob, (IntType, FloatType)), |
|---|
| 4300 | (aksprob, (N.ArrayType,)) ) |
|---|
| 4301 | fprob = Dispatch ( (lfprob, (IntType, FloatType)), |
|---|
| 4302 | (afprob, (N.ArrayType,)) ) |
|---|
| 4303 | betacf = Dispatch ( (lbetacf, (IntType, FloatType)), |
|---|
| 4304 | (abetacf, (N.ArrayType,)) ) |
|---|
| 4305 | betai = Dispatch ( (lbetai, (IntType, FloatType)), |
|---|
| 4306 | (abetai, (N.ArrayType,)) ) |
|---|
| 4307 | erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), |
|---|
| 4308 | (aerfcc, (N.ArrayType,)) ) |
|---|
| 4309 | gammln = Dispatch ( (lgammln, (IntType, FloatType)), |
|---|
| 4310 | (agammln, (N.ArrayType,)) ) |
|---|
| 4311 | |
|---|
| 4312 | ## ANOVA FUNCTIONS: |
|---|
| 4313 | F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), |
|---|
| 4314 | (aF_oneway, (N.ArrayType,)) ) |
|---|
| 4315 | F_value = Dispatch ( (lF_value, (ListType, TupleType)), |
|---|
| 4316 | (aF_value, (N.ArrayType,)) ) |
|---|
| 4317 | |
|---|
| 4318 | ## SUPPORT FUNCTIONS: |
|---|
| 4319 | incr = Dispatch ( (lincr, (ListType, TupleType, N.ArrayType)), ) |
|---|
| 4320 | sum = Dispatch ( (lsum, (ListType, TupleType)), |
|---|
| 4321 | (asum, (N.ArrayType,)) ) |
|---|
| 4322 | cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), |
|---|
| 4323 | (acumsum, (N.ArrayType,)) ) |
|---|
| 4324 | ss = Dispatch ( (lss, (ListType, TupleType)), |
|---|
| 4325 | (ass, (N.ArrayType,)) ) |
|---|
| 4326 | summult = Dispatch ( (lsummult, (ListType, TupleType)), |
|---|
| 4327 | (asummult, (N.ArrayType,)) ) |
|---|
| 4328 | square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), |
|---|
| 4329 | (asquare_of_sums, (N.ArrayType,)) ) |
|---|
| 4330 | sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), |
|---|
| 4331 | (asumdiffsquared, (N.ArrayType,)) ) |
|---|
| 4332 | shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), |
|---|
| 4333 | (ashellsort, (N.ArrayType,)) ) |
|---|
| 4334 | rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), |
|---|
| 4335 | (arankdata, (N.ArrayType,)) ) |
|---|
| 4336 | findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), |
|---|
| 4337 | (afindwithin, (N.ArrayType,)) ) |
|---|
| 4338 | |
|---|
| 4339 | ###################### END OF NUMERIC FUNCTION BLOCK ##################### |
|---|
| 4340 | |
|---|
| 4341 | ###################### END OF STATISTICAL FUNCTIONS ###################### |
|---|
| 4342 | |
|---|
| 4343 | except ImportError: |
|---|
| 4344 | pass |
|---|