Hi all,
Something I want to do in several of my workflows is to filter a FASTA file (or potentially other format sequence files) using a list of desired identifiers (e.g. a column from a tabular file).
Right now I can achieve this with three steps in Galaxy. Suppose I have:
Dataset #1, FASTA file
Dataset #2, Tabular file with identifiers of interest (e.g. BLAST hits, or filtered output from a sequence analysis tool)
Then:
Create tabular Dataset #3 using FASTA-to-tabular on Dataset #1, subject to the enhancement proposed here: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003717.html
Create tabular Dataset #4 using join on Datasets #2 and #3 using the matched identifier columns. This does the filtering.
Create FASTA Dataset #5 using tabular-to-FASTA on Dataset #4.
This works (at least for reasonably sized datasets), but requires three steps and the creation of at least two temporary files.
I'd like to introduce another tool under "FASTA manipulation" to do it on one step (rather than three). Am I going against the apparent Galaxy ideal that complex manipulations should be done with tabular files? Would such a FASTA filter tool be of interest to add directly to Galaxy (e.g. under the "FASTA manipulation" section), or better off on the community tool shed?
Here is my current implementation for discussion/consideration: http://bitbucket.org/peterjc/galaxy-central/changeset/730b89c4da26
Thanks,
Peter