|   | textsearch | 
Please help by correcting and extending the Wiki pages.
textsearch searches for words (specified as a regular expression) in the description text of one or more input sequences. It writes an output file with optional contents such as the name, description and accession number of any sequence whose description line from the annotation matches the search term. Optionally, the search is case-sensitive and the results output as an HTML table. textsearch is convenient for small input files but will be slow for larger files and databases; you should use use SRS or Entrez instead.
textsearch searches only the description line, not the full sequence annotation.
Search for 'lactose':
| % textsearch "tsw:*" "lactose" Search the textual description of sequence(s) Output file [cru4_arath.textsearch]: | 
Go to the output files for this example
Example 2
Search for 'lactose' or 'permease' in E.coli proteins:
| % textsearch "tsw:*_ecoli" "lactose | permease" Search the textual description of sequence(s) Output file [bgal_ecoli.textsearch]: | 
Go to the input files for this example
Go to the output files for this example
Example 3
Output a search for 'lacz' formatted with HTML to a file:
| % textsearch "tembl:*" "lacz" -html -outfile embl.lacz.html Search the textual description of sequence(s) | 
Go to the output files for this example
| 
Search the textual description of sequence(s)
Version: EMBOSS:6.6.0.0
   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     (Gapped) sequence(s) filename and optional
                                  format, or reference (input USA)
  [-pattern]           string     The search pattern is a regular expression.
                                  Use a | to indicate OR.
                                  For example:
                                  human|mouse
                                  will find text with either 'human' OR
                                  'mouse' in the text (Any string)
  [-outfile]           outfile    [*.textsearch] Output file name
   Additional (Optional) qualifiers:
   -casesensitive      boolean    [N] Do a case-sensitive search
   -html               boolean    [N] Format output as an HTML table
   Advanced (Unprompted) qualifiers:
   -only               boolean    [N] This is a way of shortening the command
                                  line if you only want a few things to be
                                  displayed. Instead of specifying:
                                  '-nohead -noname -nousa -noacc -nodesc'
                                  to get only the name output, you can specify
                                  '-only -name'
   -heading            boolean    [@(!$(only))] Display column headings
   -usa                boolean    [@(!$(only))] Display the USA of the
                                  sequence
   -accession          boolean    [@(!$(only))] Display 'accession' column
   -name               boolean    [@(!$(only))] Display 'name' column
   -description        boolean    [@(!$(only))] Display 'description' column
   Associated qualifiers:
   "-sequence" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -scircular1         boolean    Sequence is circular
   -squick1            boolean    Read id and sequence only
   -sformat1           string     Input sequence format
   -iquery1            string     Input query fields or ID list
   -ioffset1           integer    Input start position offset
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name
   "-outfile" associated qualifiers
   -odirectory3        string     Output directory
   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit
 | 
| Qualifier | Type | Description | Allowed values | Default | 
|---|---|---|---|---|
| Standard (Mandatory) qualifiers | ||||
| [-sequence] (Parameter 1) | seqall | (Gapped) sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required | 
| [-pattern] (Parameter 2) | string | The search pattern is a regular expression. Use a | to indicate OR. For example: human|mouse will find text with either 'human' OR 'mouse' in the text | Any string | |
| [-outfile] (Parameter 3) | outfile | Output file name | Output file | <*>.textsearch | 
| Additional (Optional) qualifiers | ||||
| -casesensitive | boolean | Do a case-sensitive search | Boolean value Yes/No | No | 
| -html | boolean | Format output as an HTML table | Boolean value Yes/No | No | 
| Advanced (Unprompted) qualifiers | ||||
| -only | boolean | This is a way of shortening the command line if you only want a few things to be displayed. Instead of specifying: '-nohead -noname -nousa -noacc -nodesc' to get only the name output, you can specify '-only -name' | Boolean value Yes/No | No | 
| -heading | boolean | Display column headings | Boolean value Yes/No | @(!$(only)) | 
| -usa | boolean | Display the USA of the sequence | Boolean value Yes/No | @(!$(only)) | 
| -accession | boolean | Display 'accession' column | Boolean value Yes/No | @(!$(only)) | 
| -name | boolean | Display 'name' column | Boolean value Yes/No | @(!$(only)) | 
| -description | boolean | Display 'description' column | Boolean value Yes/No | @(!$(only)) | 
| Associated qualifiers | ||||
| "-sequence" associated seqall qualifiers | ||||
| -sbegin1 -sbegin_sequence | integer | Start of each sequence to be used | Any integer value | 0 | 
| -send1 -send_sequence | integer | End of each sequence to be used | Any integer value | 0 | 
| -sreverse1 -sreverse_sequence | boolean | Reverse (if DNA) | Boolean value Yes/No | N | 
| -sask1 -sask_sequence | boolean | Ask for begin/end/reverse | Boolean value Yes/No | N | 
| -snucleotide1 -snucleotide_sequence | boolean | Sequence is nucleotide | Boolean value Yes/No | N | 
| -sprotein1 -sprotein_sequence | boolean | Sequence is protein | Boolean value Yes/No | N | 
| -slower1 -slower_sequence | boolean | Make lower case | Boolean value Yes/No | N | 
| -supper1 -supper_sequence | boolean | Make upper case | Boolean value Yes/No | N | 
| -scircular1 -scircular_sequence | boolean | Sequence is circular | Boolean value Yes/No | N | 
| -squick1 -squick_sequence | boolean | Read id and sequence only | Boolean value Yes/No | N | 
| -sformat1 -sformat_sequence | string | Input sequence format | Any string | |
| -iquery1 -iquery_sequence | string | Input query fields or ID list | Any string | |
| -ioffset1 -ioffset_sequence | integer | Input start position offset | Any integer value | 0 | 
| -sdbname1 -sdbname_sequence | string | Database name | Any string | |
| -sid1 -sid_sequence | string | Entryname | Any string | |
| -ufo1 -ufo_sequence | string | UFO features | Any string | |
| -fformat1 -fformat_sequence | string | Features format | Any string | |
| -fopenfile1 -fopenfile_sequence | string | Features file name | Any string | |
| "-outfile" associated outfile qualifiers | ||||
| -odirectory3 -odirectory_outfile | string | Output directory | Any string | |
| General qualifiers | ||||
| -auto | boolean | Turn off prompts | Boolean value Yes/No | N | 
| -stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | 
| -filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | 
| -options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | 
| -debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | 
| -verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | 
| -help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | 
| -warning | boolean | Report warnings | Boolean value Yes/No | Y | 
| -error | boolean | Report errors | Boolean value Yes/No | Y | 
| -fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | 
| -die | boolean | Report dying program messages | Boolean value Yes/No | Y | 
| -version | boolean | Report version number and exit | Boolean value Yes/No | N | 
The input is a standard EMBOSS sequence query (also known as a 'USA').
Major sequence database sources defined as standard in EMBOSS installations include srs:embl, srs:uniprot and ensembl
Data can also be read from sequence output in any supported format written by an EMBOSS or third-party application.
The input format can be specified by using the command-line qualifier -sformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: gff (gff3), gff2, embl (em), genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw), dasgff and debug.
See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats.
| # Search for: lactose tsw-id:LACI_ECOLI LACI_ECOLI P03023 Lactose operon repressor tsw-id:LACY_ECOLI LACY_ECOLI P02920 Lactose permease (Lactose-proton symport) | 
| # Search for: lactose | permease tsw-id:LACI_ECOLI LACI_ECOLI P03023 Lactose operon repressor tsw-id:LACY_ECOLI LACY_ECOLI P02920 Lactose permease (Lactose-proton symport) | 
| 
 | 
The first column in the name or ID of each sequence. The remaining text is the description line of the sequence.
When the -html qualifier is specified, then the output will be wrapped in HTML tags, ready for inclusion in a Web page. Note that tags such as <HTML>, <BODY>, </BODY> and </HTML> are not output by this program as the table of databases is expected to form only part of the contents of a web page - the rest of the web page must be supplier by the user.
The lines of out information are guaranteed not to have trailing white-space at the end. So if '-nodesc' is used, there will not be any whitespace after the ID name.
| Program name | Description | 
|---|---|
| drtext | Get data resource entries complete text | 
| entret | Retrieve sequence entries from flatfile databases and files | 
| ontotext | Get ontology term(s) original full text | 
| textget | Get text data entries | 
| xmltext | Get XML document original full text | 
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.