Hi all, I'm currently working with some 454 data where the sample was amplified with selective primers, and therefore the reads need a little processing to remove the primer sequences before assembly or mapping (something that sff_extract cleverly spots and warns the user about when doing an SFF to FASTA/FASTQ conversion). The actual processing I want to do is very similar to spotting and removing barcodes or adapters - except that PRC primers are often degenerate, i.e. have an N in them representing the fact it is a pool of primers covering A, C, G and T at that point, and primers may come in pairs. Looking over the provided tools in Galaxy, the only relevant ones I saw are as follows: emboss_5/emboss_primersearch.xml - the text output does not look helpful for trimming my sequences - nothing else in Galaxy uses this format, does it? fastx_toolkit/fastx_barcode_splitter.xml - copes with 5' or 3' barcodes, but only handles fastqsolexa (discussed recently on the mailing list - I guess it could handle fastqsanger and fastqillumina as well), not FASTA or SFF. Also according to the FASTX docs for fastx_barcode_splitter.pl it require non-ambiguous barcodes (i.e. ACGT only), so using it with ambiguous primers won't work: http://hannonlab.cshl.edu/fastx_toolkit/commandline.html I did look on the tool shed and noticed Edward Kirton has done some wrappers for the "Suite of Newbler tools", but his sfffile wrapper does not (yet) include support for splitting SFF files using Roche's MID barcodes. Are there any other relevant tools I have overlooked? In the meantime I've started Galaxy wrappers for my own Python code to find and remove PCR primers, adapters, or barcodes (basically any short sequences). These can also be used to filter the reads (choose if non-matching reads are kept or not). However this isn't ideal for barcodes where you'd have to run the tool once for each barcode (or set of barcodes) to get them in a separate file. For specifying the barcode/primer/adapter sequences, are FASTA or tabular files more commonly used? Should I look at the Galaxy *.loc system to allow commonly used things the Roche MID barcodes to be predefined at the system level? I am currently taking a FASTA input file for the primers. A simple tabular file with ID and sequence in the first two columns would be easy to add - that is what fastx_toolkit/fastx_barcode_splitter.xml expects. EMBOSS primersearch wants a three column tabular file with ID, forward primer, reverse primer sequence. However, so far I am only looking at single primer analysis (our primers were pooled so I can remove the forward and reverse primers in two steps). Currently I have used three separate scripts, with three separate XML files - one for each supported file type (FASTA, FASTQ, SFF). They all have the same interface, so could be done as a single XML wrapper. The only potential downside to that is that as written the SFF script requires Biopython, while the FASTA and FASTQ scripts currently use the Galaxy libraries instead. This external dependency may be an issue if Galaxy were interested in including this tool in the main distribution - or if I bundled this as a single tool or tool-suite on the Tool Shed. Any thoughts? Regards, Peter