1 | <tool id="compute_p-values_max_variances_feature_occurrences_in_one_dataset_using_discrete_wavelet_transfom" name="Compute P-values and Max Variances for Feature Occurrences" version="1.0.0"> |
---|
2 | <description>in one dataset using Discrete Wavelet Transfoms</description> |
---|
3 | |
---|
4 | <command interpreter="perl"> |
---|
5 | execute_dwt_var_perClass.pl $inputFile $outputFile1 $outputFile2 $outputFile3 |
---|
6 | </command> |
---|
7 | |
---|
8 | <inputs> |
---|
9 | <param format="tabular" name="inputFile" type="data" label="Select the input file"/> |
---|
10 | </inputs> |
---|
11 | |
---|
12 | <outputs> |
---|
13 | <data format="tabular" name="outputFile1"/> |
---|
14 | <data format="tabular" name="outputFile2"/> |
---|
15 | <data format="pdf" name="outputFile3"/> |
---|
16 | </outputs> |
---|
17 | |
---|
18 | <help> |
---|
19 | |
---|
20 | .. class:: infomark |
---|
21 | |
---|
22 | **What it does** |
---|
23 | |
---|
24 | This program generates plots and computes table matrix of maximum variances, p-values, and test orientations at multiple scales for the occurrences of a class of features in one dataset of DNA sequences using multiscale wavelet analysis technique. |
---|
25 | |
---|
26 | The program assumes that the user has one set of DNA sequences, S, which consists of one or more sequences of equal length. Each sequence in S is divided into the same number of multiple intervals n such that n = 2^k, where k is a positive integer and k >= 1. Thus, n could be any value of the set {2, 4, 8, 16, 32, 64, 128, ...}. k represents the number of scales. |
---|
27 | |
---|
28 | The program has one input file obtained as follows: |
---|
29 | |
---|
30 | For a given set of features, say motifs, the user counts the number of occurrences of each feature in each interval of each sequence in S, and builds a tabular file representing the count results in each interval of S. This is the input file of the program. |
---|
31 | |
---|
32 | The program gives three output files: |
---|
33 | |
---|
34 | - The first output file is a TABULAR format file giving the scales at which each features has a maximum variances. |
---|
35 | - The second output file is a TABULAR format file representing the variances, p-values, and test orientation for the occurrences of features at each scale based on a random permutation test and using multiscale wavelet analysis technique. |
---|
36 | - The third output file is a PDF file plotting the wavelet variances of each feature at each scale. |
---|
37 | |
---|
38 | ----- |
---|
39 | |
---|
40 | .. class:: warningmark |
---|
41 | |
---|
42 | **Note** |
---|
43 | |
---|
44 | - If the number of features is greater than 12, the program will divide each output file into subfiles, such that each subfile represents the results of a group of 12 features except the last subfile that will represents the results of the rest. For example, if the number of features is 17, the p-values file will consists of two subfiles, the first for the features 1-12 and the second for the features 13-17. As for the PDF file, it will consists of two pages in this case. |
---|
45 | - In order to obtain empirical p-values, a random perumtation test is implemented by the program, which results in the fact that the program gives slightly different results each time it is run on the same input file. |
---|
46 | |
---|
47 | ----- |
---|
48 | |
---|
49 | |
---|
50 | **Example** |
---|
51 | |
---|
52 | Counting the occurrences of 8 features (motifs) in 16 intervals (one line per interval) of set of DNA sequences in S gives the following tabular file:: |
---|
53 | |
---|
54 | deletionHoptspot insertionHoptspot dnaPolPauseFrameshift indelHotspot topoisomeraseCleavageSite translinTarget vDjRecombinationSignal x-likeSite |
---|
55 | 226 403 416 221 1165 832 749 1056 |
---|
56 | 236 444 380 241 1223 746 782 1207 |
---|
57 | 242 496 391 195 1116 643 770 1219 |
---|
58 | 243 429 364 191 1118 694 783 1223 |
---|
59 | 244 410 371 236 1063 692 805 1233 |
---|
60 | 230 386 370 217 1087 657 787 1215 |
---|
61 | 275 404 402 214 1044 697 831 1188 |
---|
62 | 265 443 365 231 1086 694 782 1184 |
---|
63 | 255 390 354 246 1114 642 773 1176 |
---|
64 | 281 384 406 232 1102 719 787 1191 |
---|
65 | 263 459 369 251 1135 643 810 1215 |
---|
66 | 280 433 400 251 1159 701 777 1151 |
---|
67 | 278 385 382 231 1147 697 707 1161 |
---|
68 | 248 393 389 211 1162 723 759 1183 |
---|
69 | 251 403 385 246 1114 752 776 1153 |
---|
70 | 239 383 347 227 1172 759 789 1141 |
---|
71 | |
---|
72 | We notice that the number of scales here is 4 because 16 = 2^4. Runnig the program on the above input file gives the following 3 output files: |
---|
73 | |
---|
74 | The first output file:: |
---|
75 | |
---|
76 | motifs max_var at scale |
---|
77 | deletionHoptspot NA |
---|
78 | insertionHoptspot NA |
---|
79 | dnaPolPauseFrameshift NA |
---|
80 | indelHotspot NA |
---|
81 | topoisomeraseCleavageSite 3 |
---|
82 | translinTarget NA |
---|
83 | vDjRecombinationSignal NA |
---|
84 | x.likeSite NA |
---|
85 | |
---|
86 | The second output file:: |
---|
87 | |
---|
88 | motif 1_var 1_pval 1_test 2_var 2_pval 2_test 3_var 3_pval 3_test 4_var 4_pval 4_test |
---|
89 | |
---|
90 | deletionHoptspot 0.457 0.048 L 1.18 0.334 R 1.61 0.194 R 3.41 0.055 R |
---|
91 | insertionHoptspot 0.556 0.109 L 1.34 0.272 R 1.59 0.223 R 2.02 0.157 R |
---|
92 | dnaPolPauseFrameshift 1.42 0.089 R 0.66 0.331 L 0.421 0.305 L 0.121 0.268 L |
---|
93 | indelHotspot 0.373 0.021 L 1.36 0.254 R 1.24 0.301 R 4.09 0.047 R |
---|
94 | topoisomeraseCleavageSite 0.305 0.002 L 0.936 0.489 R 3.78 0.01 R 1.25 0.272 R |
---|
95 | translinTarget 0.525 0.061 L 1.69 0.11 R 2.02 0.131 R 0.00891 0.069 L |
---|
96 | vDjRecombinationSignal 0.68 0.138 L 0.957 0.46 R 2.35 0.071 R 1.03 0.357 R |
---|
97 | x.likeSite 0.928 0.402 L 1.33 0.261 R 0.735 0.431 L 0.783 0.422 R |
---|
98 | |
---|
99 | The third output file: |
---|
100 | |
---|
101 | .. image:: ../static/operation_icons/dwt_var_perClass.png |
---|
102 | |
---|
103 | </help> |
---|
104 | |
---|
105 | </tool> |
---|