Re: [galaxy-dev] Changing the FASTA to tabular converter

9 Nov 2010


      Hi Peter, as long as the user can choose, and the previously functionality remains the default (for backward compatibility) I'm all for this.

On Nov 9, 2010, at 11:34 AM, Peter wrote:
...
Hello all,
I ran into a problem in a work flow manipulating FASTA and tabular files.
I traced this to an unexpected behaviour of the FASTA to tabular converter.
In my experience most command line tools which take FASTA files as
input treat the first word after the ">" as the identifier for each FASTA
record, and any subsequent text as an optional description. It could
then make sense to turn a FASTA file into a three column tabular file
(identifier, description, sequence). Currently Galaxy does not make this
distinction, so we have just two columns (identifier+description, seq).
Would you all be amenable to my extending this script to allow the user
to choose between 2 column output (current behaviour) and 3 column
output (splitting the FASTA ">" line at the first white space)?
Alternatively, I have written a less invasive patch to allow an easy
way to extract the identifier (first word) and sequence:
http://bitbucket.org/peterjc/galaxy-central/changeset/f57552b4f9fb
Note that currently the converter does allow the ">" line to be trimmed
which can achieve the same goal but ONLY when all the identifiers
are the same length (rarely the case in my experience).
Similarly, I'd like to extend the tabular to FASTA converter to allow
a third column to be selected as the description, giving for example
">c1 c3" as the ">" line, with the sequence coming from c2.
I look forward to comments,
Thanks,
Peter
P.S. All these comments apply equally to the FASTQ to/from tabular
converters.
_______________________________________________
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev
-- jt

James Taylor, Assistant Professor, Biology / Computer Science, Emory University