Have you looked at http://code.google.com/p/cutadapt/ Features - Gapped alignment with mismatches and indels, that is, errors in the adapter are tolerated - Finds adapters both in the 5' and 3' ends of reads - Accepts FASTQ, FASTA or .csfasta and .qual files (for AB SOLiD data) - Any input or output file can be gzip-compressed - Outputs FASTA or FASTQ - Trims color space reads correctly - Optionally removes primer base in color space data - Can produce MAQ- or BWA-compatible output only had the chance to play around with this for a while. but looks promising! On Thu, Feb 3, 2011 at 7:54 PM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
Hi all,
I'm currently working with some 454 data where the sample was amplified with selective primers, and therefore the reads need a little processing to remove the primer sequences before assembly or mapping (something that sff_extract cleverly spots and warns the user about when doing an SFF to FASTA/FASTQ conversion).
The actual processing I want to do is very similar to spotting and removing barcodes or adapters - except that PRC primers are often degenerate, i.e. have an N in them representing the fact it is a pool of primers covering A, C, G and T at that point, and primers may come in pairs.
Looking over the provided tools in Galaxy, the only relevant ones I saw are as follows:
emboss_5/emboss_primersearch.xml - the text output does not look helpful for trimming my sequences - nothing else in Galaxy uses this format, does it?