Re: [galaxy-user] confusion from "Extract Genomic DNA (version 2.2.3) "

20 Feb 2014

      Hello Miranda,

The problem is most likely that GFF3 is not supported by the tool (GFF & 
GTF definitely are). If this is in fact true, I will open a development 
ticket to block the datatype as being an accepted input type until (or 
if) GFF3 is included. This datatype does not have content organized the 
same way as the other input types, so supporting it may have a few wrinkles.

That said, the formats are all similar enough in the key fields used by 
the tool that it might work on GFF3- or rather, mostly work - if you are 
willing to accept some duplications in the output. I haven't gone 
through all potential scenarios to see what might come that is 
odd/different. Not including any fasta sequence or comment lines at the 
end of the GFF3 file is the first format issue to adjust that comes to mind.

But, just as a guess for your results right now - regarding the 
"sequence" that you cannot locate, perhaps these are coordinate regions 
associated with the negative strand? The resulting fasta will be 
reported as a reverse-complement of the reference genomic. When 
interpreting coordinates for these negatively stranded regions, you 
won't need to account for the end being 0-based (instead of the start). 
All of these file types have a 1-based start, not a 0-based start 
coordinate (unlike bed, interval). If you are used to bed/interval 
format, this may explain why the start seems off by one.
https://wiki.galaxyproject.org/Learn/Datatypes#GFF

Please review the data in this context and see if this helps to explain 
it. Then try using a GFF/GTF or even just an interval version of the 
coordinates, if possible.  Tools in 'Text Manipulation' plus 'Filter and 
Sort' should be able to help transform the file. And we'll post an 
update if there is more to share.

Hopefully this helps!

Jen
Galaxy team

On 2/19/14 9:23 AM, Lu, Mengmeng wrote:
...
Hi Galaxy team,
I recently met two problems when I used " Fetch sequences>  Extract Genomic DNA" in Galaxy Main instance.
I wanted to extract the exons according to the coordinates in  a GFF3 file from my reference sequences (from history, and I specified the genome build) which are in FASTA format.
But after checking the output, I found:
1. The first base of each extracted exon was missing in the output, so each extracted exon sequence is one nucleotide shorter than the real length.
2. Some extracted exons are correct,but some extracted exons are wrong. The questionable exons could not be found in the corresponding reference. I can not figure out where they are from.
I tried to read the manual/warnings in the page. But I have no idea with  my strange output. Could anyone give me some clues,please?
Thanks.
Best,
Miranda
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
-- 
Jennifer Hillman-Jackson
http://galaxyproject.org

Re: [galaxy-user] confusion from "Extract Genomic DNA (version 2.2.3) "

Jennifer Jackson