1 | <tool id="Grep1" name="Select" version="1.0.1"> |
---|
2 | <description>lines that match an expression</description> |
---|
3 | <command interpreter="python">grep.py -i $input -o $out_file1 -pattern '$pattern' -v $invert</command> |
---|
4 | <inputs> |
---|
5 | <param format="txt" name="input" type="data" label="Select lines from"/> |
---|
6 | <param name="invert" type="select" label="that"> |
---|
7 | <option value="false">Matching</option> |
---|
8 | <option value="true">NOT Matching</option> |
---|
9 | </param> |
---|
10 | <param name="pattern" size="40" type="text" value="^chr([0-9A-Za-z])+" label="the pattern" help="here you can enter text or regular expression (for syntax check lower part of this frame)"> |
---|
11 | <sanitizer> |
---|
12 | <valid initial="string.printable"> |
---|
13 | <remove value="'"/> |
---|
14 | </valid> |
---|
15 | <mapping initial="none"> |
---|
16 | <add source="'" target="__sq__"/> |
---|
17 | </mapping> |
---|
18 | </sanitizer> |
---|
19 | </param> |
---|
20 | </inputs> |
---|
21 | <outputs> |
---|
22 | <data format="input" name="out_file1" metadata_source="input"/> |
---|
23 | </outputs> |
---|
24 | <tests> |
---|
25 | <test> |
---|
26 | <param name="input" value="1.bed"/> |
---|
27 | <param name="invert" value="false"/> |
---|
28 | <param name="pattern" value="^chr[0-9]*"/> |
---|
29 | <output name="out_file1" file="fs-grep.dat"/> |
---|
30 | </test> |
---|
31 | </tests> |
---|
32 | <help> |
---|
33 | |
---|
34 | .. class:: infomark |
---|
35 | |
---|
36 | **TIP:** If your data is not TAB delimited, use *Text Manipulation->Convert* |
---|
37 | |
---|
38 | ----- |
---|
39 | |
---|
40 | **Syntax** |
---|
41 | |
---|
42 | The select tool searches the data for lines containing or not containing a match to the given pattern. Regular Expression is introduced in this tool. A Regular Expression is a pattern describing a certain amount of text. |
---|
43 | |
---|
44 | - **( ) { } [ ] . * ? + \ ^ $** are all special characters. **\\** can be used to "escape" a special character, allowing that special character to be searched for. |
---|
45 | - **\\A** matches the beginning of a string(but not an internal line). |
---|
46 | - **\\d** matches a digit, same as [0-9]. |
---|
47 | - **\\D** matches a non-digit. |
---|
48 | - **\\s** matches a whitespace character. |
---|
49 | - **\\S** matches anything BUT a whitespace. |
---|
50 | - **\\t** matches a tab. |
---|
51 | - **\\w** matches an alphanumeric character. |
---|
52 | - **\\W** matches anything but an alphanumeric character. |
---|
53 | - **(** .. **)** groups a particular pattern. |
---|
54 | - **\\Z** matches the end of a string(but not a internal line). |
---|
55 | - **{** n or n, or n,m **}** specifies an expected number of repetitions of the preceding pattern. |
---|
56 | |
---|
57 | - **{n}** The preceding item is matched exactly n times. |
---|
58 | - **{n,}** The preceding item is matched n or more times. |
---|
59 | - **{n,m}** The preceding item is matched at least n times but not more than m times. |
---|
60 | |
---|
61 | - **[** ... **]** creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as **a-z**. |
---|
62 | - **.** Matches any single character except a newline. |
---|
63 | - ***** The preceding item will be matched zero or more times. |
---|
64 | - **?** The preceding item is optional and matched at most once. |
---|
65 | - **+** The preceding item will be matched one or more times. |
---|
66 | - **^** has two meaning: |
---|
67 | - matches the beginning of a line or string. |
---|
68 | - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets. |
---|
69 | - **$** matches the end of a line or string. |
---|
70 | - **\|** Separates alternate possibilities. |
---|
71 | |
---|
72 | ----- |
---|
73 | |
---|
74 | **Example** |
---|
75 | |
---|
76 | - **^chr([0-9A-Za-z])+** would match lines that begin with chromosomes, such as lines in a BED format file. |
---|
77 | - **(ACGT){1,5}** would match at least 1 "ACGT" and at most 5 "ACGT" consecutively. |
---|
78 | - **([^,][0-9]{1,3})(,[0-9]{3})\*** would match a large integer that is properly separated with commas such as 23,078,651. |
---|
79 | - **(abc)|(def)** would match either "abc" or "def". |
---|
80 | - **^\\W+#** would match any line that is a comment. |
---|
81 | </help> |
---|
82 | </tool> |
---|