Hello Miranda,

The problem is most likely that GFF3 is not supported by the tool (GFF & GTF definitely are). If this is in fact true, I will open a development ticket to block the datatype as being an accepted input type until (or if) GFF3 is included. This datatype does not have content organized the same way as the other input types, so supporting it may have a few wrinkles.

That said, the formats are all similar enough in the key fields used by the tool that it might work on GFF3- or rather, mostly work - if you are willing to accept some duplications in the output. I haven't gone through all potential scenarios to see what might come that is odd/different. Not including any fasta sequence or comment lines at the end of the GFF3 file is the first format issue to adjust that comes to mind.

But, just as a guess for your results right now - regarding the "sequence" that you cannot locate, perhaps these are coordinate regions associated with the negative strand? The resulting fasta will be reported as a reverse-complement of the reference genomic. When interpreting coordinates for these negatively stranded regions, you won't need to account for the end being 0-based (instead of the start). All of these file types have a 1-based start, not a 0-based start coordinate (unlike bed, interval). If you are used to bed/interval format, this may explain why the start seems off by one.
https://wiki.galaxyproject.org/Learn/Datatypes#GFF

Please review the data in this context and see if this helps to explain it. Then try using a GFF/GTF or even just an interval version of the coordinates, if possible.  Tools in 'Text Manipulation' plus 'Filter and Sort' should be able to help transform the file. And we'll post an update if there is more to share.

Hopefully this helps!

Jen
Galaxy team


On 2/19/14 9:23 AM, Lu, Mengmeng wrote:
Hi Galaxy team,

I recently met two problems when I used " Fetch sequences>  Extract Genomic DNA" in Galaxy Main instance.

I wanted to extract the exons according to the coordinates in  a GFF3 file from my reference sequences (from history, and I specified the genome build) which are in FASTA format.

But after checking the output, I found:

1. The first base of each extracted exon was missing in the output, so each extracted exon sequence is one nucleotide shorter than the real length.

2. Some extracted exons are correct,but some extracted exons are wrong. The questionable exons could not be found in the corresponding reference. I can not figure out where they are from.


I tried to read the manual/warnings in the page. But I have no idea with  my strange output. Could anyone give me some clues,please?

Thanks.

Best,

Miranda





___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
http://galaxyproject.org