Hello Peter, If these are standard length PCR primers, then UCSC's In-Silico PCR tool would be an option. It is a varient of BLAT and the source is available from Kent Informatics. Here is a UCSC link to the online version (send Jim Kent an email for a copy): http://genome.ucsc.edu/cgi-bin/hgPcr?command=start A wrapper could be made for your own instance or just use it command-line before loading data. If this is not what you had in mind, please let us know, Best, Jen Galaxy team On 2/3/11 4:14 AM, Peter Cock wrote:
On Thu, Feb 3, 2011 at 11:54 AM, Peter Cock<p.j.a.cock@googlemail.com> wrote:
Hi all,
I'm currently working with some 454 data where the sample was amplified with selective primers, and therefore the reads need a little processing to remove the primer sequences before assembly or mapping (something that sff_extract cleverly spots and warns the user about when doing an SFF to FASTA/FASTQ conversion).
The actual processing I want to do is very similar to spotting and removing barcodes or adapters - except that PRC primers are often degenerate, i.e. have an N in them representing the fact it is a pool of primers covering A, C, G and T at that point, and primers may come in pairs.
Looking over the provided tools in Galaxy, the only relevant ones I saw are as follows:
emboss_5/emboss_primersearch.xml - the text output does not look helpful for trimming my sequences - nothing else in Galaxy uses this format, does it?
fastx_toolkit/fastx_barcode_splitter.xml - copes with 5' or 3' barcodes, but only handles fastqsolexa (discussed recently on the mailing list - I guess it could handle fastqsanger and fastqillumina as well), not FASTA or SFF. Also according to the FASTX docs for fastx_barcode_splitter.pl it require non-ambiguous barcodes (i.e. ACGT only), so using it with ambiguous primers won't work: http://hannonlab.cshl.edu/fastx_toolkit/commandline.html
I did look on the tool shed and noticed Edward Kirton has done some wrappers for the "Suite of Newbler tools", but his sfffile wrapper does not (yet) include support for splitting SFF files using Roche's MID barcodes.
Are there any other relevant tools I have overlooked?
I forgot to mention fastx_toolkit/fastx_clipper.xml aka "Clip" which does handle FASTA and FASTQ files, but apparently only deals with 3' adapters (although perhaps the poorly documented -d switch is relevant for a 5' adapter?), and appears to only handle one adapter sequence at a time. The documentation doesn't mention what happens if you want to use an ambiguous adapter sequence (e.g. with an N in it).
Peter _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org