Date: Mon, 9 Dec 2013 06:34:28 -0800
From: jen@bx.psu.edu
To: zhusy88@msn.cn; galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] How to filter the sequences containing not[ATCG] character£¿
Hello,
If the data was in .fastqsanger format, you could use the tool
"Manipulate FASTQ", but with .fasta, this is a good way.
But watch your regular expression - test it out on a smaller set to
make sure it is doing what you want. I see a "start of the line"
character in the middle of your expression ("^"). I see why it could
be working, with the prior expression being zero or more (*), but
knowing what each character does is generally a good idea. The help
on the tool is good as are many web sites, but this is simple. Also,
you don't need the // slashes, just enter the expression.
To get you started: I would use something like this, with the Select
tool and "Matching":
^..*\t[ATCGatcg]+$
(Only one dot is really required, this is just how I always do it.
Adds a bit of a format sanity check into the filter).
Hope this helps!
Jen
Galaxy team
On 12/8/13 6:21 PM, ÖìʦÔÆ wrote:
Hi Jen,
As the title, I have a [fasta] file that obtained from a
[gtf] file,
>cuff102.1
atcgtaaagggcgat
>cuff103.1
gtcgttgactNNNNNNNNgtc
and I want to get the output like this to filter the
sequences that contain any not[ATCG] character?
>cuff102.1
atcgtaaagggcgat
I have a large of sequences to filter. I thought a way that
firstly convert the file to [interval] file, and secondly
SELECT the line not matching the patten /\t[ATCGatcg]*[^ATCGatcg]/.
Am I right? Or there is a
one-step way ?
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
--
Jennifer Hillman-Jackson
http://galaxyproject.org