Re: [galaxy-dev] Extract Genomic DNA insisting on build for GFF3 file

16 Aug 2011

      ...
...
One idea to address both of these issues is to embed the
original format in the fasta name so that it's clear whether
the coords are BED or GFF (e.g. >
hg17_BED_chr1_147962192_147962580).
Or hg17_gtf_chr1_147962192_147962580 etc.
That certainly seems better than the current situation.
However, my preferred solution is to take the FASTA ID from
the annotation file. In GFF3 this would be the ID tag in column
nine (if present), perhaps with an option to use another
custom tag like locus_tag or transcript_id if preferred.
Hi Peter,

This seems reasonable. Of course, the implementation needs to be done with care to (a) ensure the default choice is somewhat similar to what is done now and (b) support all flavors of GFF. If you choose to implement this, you'll also need to update all the existing test output files.
...
For BED I had initially thought this would the optional
column 4, name. This made me wonder what Galaxy
is doing in converting GFF3 to BED, since column 4 was
populated with generic feature types (gene, CDS, etc
from GFF3 column 2). Shouldn't this be using the feature's
ID tag (if present)?
Yes, I'd say that's correct. The GFF-to-BED converter was written before we had GFF parsing support, and at the time it wasn't possible to extract the name from the attributes. 

Finally, note that all changes made to any GFF code must work for GFF, GFF3, and GTF formats.

Thanks,
J.

Re: [galaxy-dev] Extract Genomic DNA insisting on build for GFF3 file

Jeremy Goecks