Hi all,
I was just surprised to find what I consider to be a major bug in
fasta_to_tabular_converter.py used to convert FASTA into tabular.
Consider this toy example:
alpha
ACGTAC
beta
AGTGTA
gamma with some description
AGGTACCA
What the converter gives is two columns (title line and sequence),
but the '>' is left in:
alpha (tab) ACGTAC
beta (tab) AGTGTA
gamma with some description (tab) AGGTACCA
Given just two columns, what I was expecting was:
alpha (tab) ACGTAC
beta (tab) AGTGTA
gamma with some description (tab) AGGTACCA
I think this is a bug. In support of this view, I note the user-facing
(now in the Tool Shed) removes the '>' symbol:
https://toolshed.g2.bx.psu.edu/view/devteam/fasta_to_tabular
https://github.com/galaxyproject/tools-devteam/tree/master/tools/fasta_to...
I have submitted a pull request to address this:
https://github.com/galaxyproject/galaxy/pull/11
Note what I really wanted was three columns, the ID, comment
and sequence:
alpha (tab) (empty) (tab) ACGTAC
beta (tab) (empty) (tab) AGTGTA
gamma (tab) with some description (tab) AGGTACCA
The user-facing tool does support this. I appreciate that changing the
built-in implicit converter to give three column output could be a
problem for backward compatibility (if anyone has written a workflow
using the '>' version of the implicit conversion?), so I can make this
conversion explicit in my workflow.
Regards,
Peter