One idea to address both of these issues is to embed the original format in the fasta name so that it's clear whether the coords are BED or GFF (e.g. > hg17_BED_chr1_147962192_147962580).
Or hg17_gtf_chr1_147962192_147962580 etc.
That certainly seems better than the current situation.
However, my preferred solution is to take the FASTA ID from the annotation file. In GFF3 this would be the ID tag in column nine (if present), perhaps with an option to use another custom tag like locus_tag or transcript_id if preferred.
Hi Peter, This seems reasonable. Of course, the implementation needs to be done with care to (a) ensure the default choice is somewhat similar to what is done now and (b) support all flavors of GFF. If you choose to implement this, you'll also need to update all the existing test output files.
For BED I had initially thought this would the optional column 4, name. This made me wonder what Galaxy is doing in converting GFF3 to BED, since column 4 was populated with generic feature types (gene, CDS, etc from GFF3 column 2). Shouldn't this be using the feature's ID tag (if present)?
Yes, I'd say that's correct. The GFF-to-BED converter was written before we had GFF parsing support, and at the time it wasn't possible to extract the name from the attributes. Finally, note that all changes made to any GFF code must work for GFF, GFF3, and GTF formats. Thanks, J.