Re: [galaxy-user] Filter fastq by percentage of ambiguous (N) bases

6 Aug 2013

      Hello Anto,

There is no specific tool that I know of to do this based off read 
content, but you could use the very low quality score (2) assigned to 
ambiguous bases and the tool 'Filter by quality' to do a filter by 
percentage. Be aware that other bases may have scores assigned to this 
lower value, but these would very likely not be of practical usage anyway.

You could clip these end first, then do the filter, discarding any that 
have very short usable sequence left. If the data is Illumina, is likely 
a sign of a sequence that failed vendor quality checks, and these are no 
longer removed by default as of Casava 1.8+.

Creating regular expression with the Select tool is another option, but 
this probably more effort than it is worth to construct. But, your 
choice. A google will bring up syntax advice.

Ideally the first will do the job,

Jen
Galaxy team

On 7/29/13 3:17 AM, Anto Praveen Rajkumar Rajamani wrote:
...
Hello,
I like to filter my fastq files (50 bp single end Illumina RNA seq 
reads) by a maximum threshold (10%) of ambiguous (N) bases.
I can see that the "CLIP" tool removes all reads with one or more N 
bases.
Is there a way to remove only the reads with five or more N bases 
using Galaxy?
Thank you.
Best wishes,
Anto
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
-- 
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

Re: [galaxy-user] Filter fastq by percentage of ambiguous (N) bases

Jennifer Jackson