[2] | 1 | <tool id="compute_p-values_max_variances_feature_occurrences_in_one_dataset_using_discrete_wavelet_transfom" name="Compute P-values and Max Variances for Feature Occurrences" version="1.0.0"> |
---|
| 2 | <description>in one dataset using Discrete Wavelet Transfoms</description> |
---|
| 3 | |
---|
| 4 | <command interpreter="perl"> |
---|
| 5 | execute_dwt_var_perClass.pl $inputFile $outputFile1 $outputFile2 $outputFile3 |
---|
| 6 | </command> |
---|
| 7 | |
---|
| 8 | <inputs> |
---|
| 9 | <param format="tabular" name="inputFile" type="data" label="Select the input file"/> |
---|
| 10 | </inputs> |
---|
| 11 | |
---|
| 12 | <outputs> |
---|
| 13 | <data format="tabular" name="outputFile1"/> |
---|
| 14 | <data format="tabular" name="outputFile2"/> |
---|
| 15 | <data format="pdf" name="outputFile3"/> |
---|
| 16 | </outputs> |
---|
| 17 | |
---|
| 18 | <help> |
---|
| 19 | |
---|
| 20 | .. class:: infomark |
---|
| 21 | |
---|
| 22 | **What it does** |
---|
| 23 | |
---|
| 24 | This program generates plots and computes table matrix of maximum variances, p-values, and test orientations at multiple scales for the occurrences of a class of features in one dataset of DNA sequences using multiscale wavelet analysis technique. |
---|
| 25 | |
---|
| 26 | The program assumes that the user has one set of DNA sequences, S, which consists of one or more sequences of equal length. Each sequence in S is divided into the same number of multiple intervals n such that n = 2^k, where k is a positive integer and k >= 1. Thus, n could be any value of the set {2, 4, 8, 16, 32, 64, 128, ...}. k represents the number of scales. |
---|
| 27 | |
---|
| 28 | The program has one input file obtained as follows: |
---|
| 29 | |
---|
| 30 | For a given set of features, say motifs, the user counts the number of occurrences of each feature in each interval of each sequence in S, and builds a tabular file representing the count results in each interval of S. This is the input file of the program. |
---|
| 31 | |
---|
| 32 | The program gives three output files: |
---|
| 33 | |
---|
| 34 | - The first output file is a TABULAR format file giving the scales at which each features has a maximum variances. |
---|
| 35 | - The second output file is a TABULAR format file representing the variances, p-values, and test orientation for the occurrences of features at each scale based on a random permutation test and using multiscale wavelet analysis technique. |
---|
| 36 | - The third output file is a PDF file plotting the wavelet variances of each feature at each scale. |
---|
| 37 | |
---|
| 38 | ----- |
---|
| 39 | |
---|
| 40 | .. class:: warningmark |
---|
| 41 | |
---|
| 42 | **Note** |
---|
| 43 | |
---|
| 44 | - If the number of features is greater than 12, the program will divide each output file into subfiles, such that each subfile represents the results of a group of 12 features except the last subfile that will represents the results of the rest. For example, if the number of features is 17, the p-values file will consists of two subfiles, the first for the features 1-12 and the second for the features 13-17. As for the PDF file, it will consists of two pages in this case. |
---|
| 45 | - In order to obtain empirical p-values, a random perumtation test is implemented by the program, which results in the fact that the program gives slightly different results each time it is run on the same input file. |
---|
| 46 | |
---|
| 47 | ----- |
---|
| 48 | |
---|
| 49 | |
---|
| 50 | **Example** |
---|
| 51 | |
---|
| 52 | Counting the occurrences of 8 features (motifs) in 16 intervals (one line per interval) of set of DNA sequences in S gives the following tabular file:: |
---|
| 53 | |
---|
| 54 | deletionHoptspot insertionHoptspot dnaPolPauseFrameshift indelHotspot topoisomeraseCleavageSite translinTarget vDjRecombinationSignal x-likeSite |
---|
| 55 | 226 403 416 221 1165 832 749 1056 |
---|
| 56 | 236 444 380 241 1223 746 782 1207 |
---|
| 57 | 242 496 391 195 1116 643 770 1219 |
---|
| 58 | 243 429 364 191 1118 694 783 1223 |
---|
| 59 | 244 410 371 236 1063 692 805 1233 |
---|
| 60 | 230 386 370 217 1087 657 787 1215 |
---|
| 61 | 275 404 402 214 1044 697 831 1188 |
---|
| 62 | 265 443 365 231 1086 694 782 1184 |
---|
| 63 | 255 390 354 246 1114 642 773 1176 |
---|
| 64 | 281 384 406 232 1102 719 787 1191 |
---|
| 65 | 263 459 369 251 1135 643 810 1215 |
---|
| 66 | 280 433 400 251 1159 701 777 1151 |
---|
| 67 | 278 385 382 231 1147 697 707 1161 |
---|
| 68 | 248 393 389 211 1162 723 759 1183 |
---|
| 69 | 251 403 385 246 1114 752 776 1153 |
---|
| 70 | 239 383 347 227 1172 759 789 1141 |
---|
| 71 | |
---|
| 72 | We notice that the number of scales here is 4 because 16 = 2^4. Runnig the program on the above input file gives the following 3 output files: |
---|
| 73 | |
---|
| 74 | The first output file:: |
---|
| 75 | |
---|
| 76 | motifs max_var at scale |
---|
| 77 | deletionHoptspot NA |
---|
| 78 | insertionHoptspot NA |
---|
| 79 | dnaPolPauseFrameshift NA |
---|
| 80 | indelHotspot NA |
---|
| 81 | topoisomeraseCleavageSite 3 |
---|
| 82 | translinTarget NA |
---|
| 83 | vDjRecombinationSignal NA |
---|
| 84 | x.likeSite NA |
---|
| 85 | |
---|
| 86 | The second output file:: |
---|
| 87 | |
---|
| 88 | motif 1_var 1_pval 1_test 2_var 2_pval 2_test 3_var 3_pval 3_test 4_var 4_pval 4_test |
---|
| 89 | |
---|
| 90 | deletionHoptspot 0.457 0.048 L 1.18 0.334 R 1.61 0.194 R 3.41 0.055 R |
---|
| 91 | insertionHoptspot 0.556 0.109 L 1.34 0.272 R 1.59 0.223 R 2.02 0.157 R |
---|
| 92 | dnaPolPauseFrameshift 1.42 0.089 R 0.66 0.331 L 0.421 0.305 L 0.121 0.268 L |
---|
| 93 | indelHotspot 0.373 0.021 L 1.36 0.254 R 1.24 0.301 R 4.09 0.047 R |
---|
| 94 | topoisomeraseCleavageSite 0.305 0.002 L 0.936 0.489 R 3.78 0.01 R 1.25 0.272 R |
---|
| 95 | translinTarget 0.525 0.061 L 1.69 0.11 R 2.02 0.131 R 0.00891 0.069 L |
---|
| 96 | vDjRecombinationSignal 0.68 0.138 L 0.957 0.46 R 2.35 0.071 R 1.03 0.357 R |
---|
| 97 | x.likeSite 0.928 0.402 L 1.33 0.261 R 0.735 0.431 L 0.783 0.422 R |
---|
| 98 | |
---|
| 99 | The third output file: |
---|
| 100 | |
---|
| 101 | .. image:: ../static/operation_icons/dwt_var_perClass.png |
---|
| 102 | |
---|
| 103 | </help> |
---|
| 104 | |
---|
| 105 | </tool> |
---|