| 1 | <tool id="Grep1" name="Select" version="1.0.1"> |
|---|
| 2 | <description>lines that match an expression</description> |
|---|
| 3 | <command interpreter="python">grep.py -i $input -o $out_file1 -pattern '$pattern' -v $invert</command> |
|---|
| 4 | <inputs> |
|---|
| 5 | <param format="txt" name="input" type="data" label="Select lines from"/> |
|---|
| 6 | <param name="invert" type="select" label="that"> |
|---|
| 7 | <option value="false">Matching</option> |
|---|
| 8 | <option value="true">NOT Matching</option> |
|---|
| 9 | </param> |
|---|
| 10 | <param name="pattern" size="40" type="text" value="^chr([0-9A-Za-z])+" label="the pattern" help="here you can enter text or regular expression (for syntax check lower part of this frame)"> |
|---|
| 11 | <sanitizer> |
|---|
| 12 | <valid initial="string.printable"> |
|---|
| 13 | <remove value="'"/> |
|---|
| 14 | </valid> |
|---|
| 15 | <mapping initial="none"> |
|---|
| 16 | <add source="'" target="__sq__"/> |
|---|
| 17 | </mapping> |
|---|
| 18 | </sanitizer> |
|---|
| 19 | </param> |
|---|
| 20 | </inputs> |
|---|
| 21 | <outputs> |
|---|
| 22 | <data format="input" name="out_file1" metadata_source="input"/> |
|---|
| 23 | </outputs> |
|---|
| 24 | <tests> |
|---|
| 25 | <test> |
|---|
| 26 | <param name="input" value="1.bed"/> |
|---|
| 27 | <param name="invert" value="false"/> |
|---|
| 28 | <param name="pattern" value="^chr[0-9]*"/> |
|---|
| 29 | <output name="out_file1" file="fs-grep.dat"/> |
|---|
| 30 | </test> |
|---|
| 31 | </tests> |
|---|
| 32 | <help> |
|---|
| 33 | |
|---|
| 34 | .. class:: infomark |
|---|
| 35 | |
|---|
| 36 | **TIP:** If your data is not TAB delimited, use *Text Manipulation->Convert* |
|---|
| 37 | |
|---|
| 38 | ----- |
|---|
| 39 | |
|---|
| 40 | **Syntax** |
|---|
| 41 | |
|---|
| 42 | The select tool searches the data for lines containing or not containing a match to the given pattern. Regular Expression is introduced in this tool. A Regular Expression is a pattern describing a certain amount of text. |
|---|
| 43 | |
|---|
| 44 | - **( ) { } [ ] . * ? + \ ^ $** are all special characters. **\\** can be used to "escape" a special character, allowing that special character to be searched for. |
|---|
| 45 | - **\\A** matches the beginning of a string(but not an internal line). |
|---|
| 46 | - **\\d** matches a digit, same as [0-9]. |
|---|
| 47 | - **\\D** matches a non-digit. |
|---|
| 48 | - **\\s** matches a whitespace character. |
|---|
| 49 | - **\\S** matches anything BUT a whitespace. |
|---|
| 50 | - **\\t** matches a tab. |
|---|
| 51 | - **\\w** matches an alphanumeric character. |
|---|
| 52 | - **\\W** matches anything but an alphanumeric character. |
|---|
| 53 | - **(** .. **)** groups a particular pattern. |
|---|
| 54 | - **\\Z** matches the end of a string(but not a internal line). |
|---|
| 55 | - **{** n or n, or n,m **}** specifies an expected number of repetitions of the preceding pattern. |
|---|
| 56 | |
|---|
| 57 | - **{n}** The preceding item is matched exactly n times. |
|---|
| 58 | - **{n,}** The preceding item is matched n or more times. |
|---|
| 59 | - **{n,m}** The preceding item is matched at least n times but not more than m times. |
|---|
| 60 | |
|---|
| 61 | - **[** ... **]** creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as **a-z**. |
|---|
| 62 | - **.** Matches any single character except a newline. |
|---|
| 63 | - ***** The preceding item will be matched zero or more times. |
|---|
| 64 | - **?** The preceding item is optional and matched at most once. |
|---|
| 65 | - **+** The preceding item will be matched one or more times. |
|---|
| 66 | - **^** has two meaning: |
|---|
| 67 | - matches the beginning of a line or string. |
|---|
| 68 | - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets. |
|---|
| 69 | - **$** matches the end of a line or string. |
|---|
| 70 | - **\|** Separates alternate possibilities. |
|---|
| 71 | |
|---|
| 72 | ----- |
|---|
| 73 | |
|---|
| 74 | **Example** |
|---|
| 75 | |
|---|
| 76 | - **^chr([0-9A-Za-z])+** would match lines that begin with chromosomes, such as lines in a BED format file. |
|---|
| 77 | - **(ACGT){1,5}** would match at least 1 "ACGT" and at most 5 "ACGT" consecutively. |
|---|
| 78 | - **([^,][0-9]{1,3})(,[0-9]{3})\*** would match a large integer that is properly separated with commas such as 23,078,651. |
|---|
| 79 | - **(abc)|(def)** would match either "abc" or "def". |
|---|
| 80 | - **^\\W+#** would match any line that is a comment. |
|---|
| 81 | </help> |
|---|
| 82 | </tool> |
|---|