On Tue, Feb 8, 2011 at 8:15 AM, Kevin Lam <aboulia@gmail.com> wrote:
Have you looked at http://code.google.com/p/cutadapt/ Features
Gapped alignment with mismatches and indels, that is, errors in the adapter are tolerated Finds adapters both in the 5' and 3' ends of reads Accepts FASTQ, FASTA or .csfasta and .qual files (for AB SOLiD data) Any input or output file can be gzip-compressed Outputs FASTA or FASTQ Trims color space reads correctly Optionally removes primer base in color space data Can produce MAQ- or BWA-compatible output
only had the chance to play around with this for a while. but looks promising!
Hi Kevin, Thanks for the link - I think I skimmed over all the source code (its in Python + C which is nice from my personal perspective), and I'm pretty sure it does NOT handle ambiguous IUPAC codes (either in the adapter/barcode/primer or the read sequences). For the particular task I'm working on I do have degenerate PCR primers, i.e. they have N's in them representing the fact it is a pool of primers covering A, C, G and T at that point. This would be pretty strange for an adapter or barcode! Peter