| 1 | <tool id="Grep1" name="Select" version="1.0.1"> | 
|---|
| 2 | <description>lines that match an expression</description> | 
|---|
| 3 | <command interpreter="python">grep.py -i $input -o $out_file1 -pattern '$pattern' -v $invert</command> | 
|---|
| 4 | <inputs> | 
|---|
| 5 | <param format="txt" name="input" type="data" label="Select lines from"/> | 
|---|
| 6 | <param name="invert" type="select" label="that"> | 
|---|
| 7 | <option value="false">Matching</option> | 
|---|
| 8 | <option value="true">NOT Matching</option> | 
|---|
| 9 | </param> | 
|---|
| 10 | <param name="pattern" size="40" type="text" value="^chr([0-9A-Za-z])+" label="the pattern" help="here you can enter text or regular expression (for syntax check lower part of this frame)"> | 
|---|
| 11 | <sanitizer> | 
|---|
| 12 | <valid initial="string.printable"> | 
|---|
| 13 | <remove value="'"/> | 
|---|
| 14 | </valid> | 
|---|
| 15 | <mapping initial="none"> | 
|---|
| 16 | <add source="'" target="__sq__"/> | 
|---|
| 17 | </mapping> | 
|---|
| 18 | </sanitizer> | 
|---|
| 19 | </param> | 
|---|
| 20 | </inputs> | 
|---|
| 21 | <outputs> | 
|---|
| 22 | <data format="input" name="out_file1" metadata_source="input"/> | 
|---|
| 23 | </outputs> | 
|---|
| 24 | <tests> | 
|---|
| 25 | <test> | 
|---|
| 26 | <param name="input" value="1.bed"/> | 
|---|
| 27 | <param name="invert" value="false"/> | 
|---|
| 28 | <param name="pattern" value="^chr[0-9]*"/> | 
|---|
| 29 | <output name="out_file1" file="fs-grep.dat"/> | 
|---|
| 30 | </test> | 
|---|
| 31 | </tests> | 
|---|
| 32 | <help> | 
|---|
| 33 |  | 
|---|
| 34 | .. class:: infomark | 
|---|
| 35 |  | 
|---|
| 36 | **TIP:** If your data is not TAB delimited, use *Text Manipulation->Convert* | 
|---|
| 37 |  | 
|---|
| 38 | ----- | 
|---|
| 39 |  | 
|---|
| 40 | **Syntax** | 
|---|
| 41 |  | 
|---|
| 42 | The select tool searches the data for lines containing or not containing a match to the given pattern. Regular Expression is introduced in this tool. A Regular Expression is a pattern describing a certain amount of text. | 
|---|
| 43 |  | 
|---|
| 44 | - **( ) { } [ ] . * ? + \ ^ $** are all special characters. **\\** can be used to "escape" a special character, allowing that special character to be searched for. | 
|---|
| 45 | - **\\A** matches the beginning of a string(but not an internal line). | 
|---|
| 46 | - **\\d** matches a digit, same as [0-9]. | 
|---|
| 47 | - **\\D** matches a non-digit. | 
|---|
| 48 | - **\\s** matches a whitespace character. | 
|---|
| 49 | - **\\S** matches anything BUT a whitespace. | 
|---|
| 50 | - **\\t** matches a tab. | 
|---|
| 51 | - **\\w** matches an alphanumeric character. | 
|---|
| 52 | - **\\W** matches anything but an alphanumeric character. | 
|---|
| 53 | - **(** .. **)** groups a particular pattern. | 
|---|
| 54 | - **\\Z** matches the end of a string(but not a internal line). | 
|---|
| 55 | - **{** n or n, or n,m **}** specifies an expected number of repetitions of the preceding pattern. | 
|---|
| 56 |  | 
|---|
| 57 | - **{n}** The preceding item is matched exactly n times. | 
|---|
| 58 | - **{n,}** The preceding item is matched n or more times. | 
|---|
| 59 | - **{n,m}** The preceding item is matched at least n times but not more than m times. | 
|---|
| 60 |  | 
|---|
| 61 | - **[** ... **]** creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as **a-z**. | 
|---|
| 62 | - **.** Matches any single character except a newline. | 
|---|
| 63 | - ***** The preceding item will be matched zero or more times. | 
|---|
| 64 | - **?** The preceding item is optional and matched at most once. | 
|---|
| 65 | - **+** The preceding item will be matched one or more times. | 
|---|
| 66 | - **^** has two meaning: | 
|---|
| 67 | - matches the beginning of a line or string. | 
|---|
| 68 | - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets. | 
|---|
| 69 | - **$** matches the end of a line or string. | 
|---|
| 70 | - **\|** Separates alternate possibilities. | 
|---|
| 71 |  | 
|---|
| 72 | ----- | 
|---|
| 73 |  | 
|---|
| 74 | **Example** | 
|---|
| 75 |  | 
|---|
| 76 | - **^chr([0-9A-Za-z])+** would match lines that begin with chromosomes, such as lines in a BED format file. | 
|---|
| 77 | - **(ACGT){1,5}** would match at least 1 "ACGT" and at most 5 "ACGT" consecutively. | 
|---|
| 78 | - **([^,][0-9]{1,3})(,[0-9]{3})\*** would match a large integer that is properly separated with commas such as 23,078,651. | 
|---|
| 79 | - **(abc)|(def)** would match either "abc" or "def". | 
|---|
| 80 | - **^\\W+#** would match any line that is a comment. | 
|---|
| 81 | </help> | 
|---|
| 82 | </tool> | 
|---|