I was just surprised to find what I consider to be a major bug in fasta_to_tabular_converter.py used to convert FASTA into tabular.
Consider this toy example:
>alpha ACGTAC >beta AGTGTA >gamma with some description AGGTACCA
What the converter gives is two columns (title line and sequence), but the '>' is left in:
>alpha (tab) ACGTAC >beta (tab) AGTGTA >gamma with some description (tab) AGGTACCA
Given just two columns, what I was expecting was:
alpha (tab) ACGTAC beta (tab) AGTGTA gamma with some description (tab) AGGTACCA
I think this is a bug. In support of this view, I note the user-facing (now in the Tool Shed) removes the '>' symbol:
I have submitted a pull request to address this:
Note what I really wanted was three columns, the ID, comment and sequence:
alpha (tab) (empty) (tab) ACGTAC beta (tab) (empty) (tab) AGTGTA gamma (tab) with some description (tab) AGGTACCA
The user-facing tool does support this. I appreciate that changing the built-in implicit converter to give three column output could be a problem for backward compatibility (if anyone has written a workflow using the '>' version of the implicit conversion?), so I can make this conversion explicit in my workflow.