Galaxy unable to set metadata for GFF files
Dear all, I'm been trying to get Galaxy to recognize this GFF from NCBI ( ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Lactobacillus_reuteri_JCM_1112_uid58875/NC_010609.gff) but it failed to recognize the format after I uploaded it. Manual setting didn't work either because it gave me a "unable to set metadata" error to me as soon as I started a cufflinks run using that GFF. I have tried to reformat the file several times and even tried using the popular bp_genbank2gff3.pl script to re-parse the records from the original genbank file. Would anyone kindly look at the NCBI GFF and guide me to a solution to get this file recognized by Galaxy? I've been stuck for a couple of weeks now and would appreciate some suggestions. Thank you! Sincerely yours, Peera Hemarajata, M.D. Advanced graduate student - Versalovic lab Department of Molecular Virology and Microbiology - Baylor College of Medicine Department of Pathology - Texas Children's Hospital Suite 830, 8th Floor Feigin Center. Tel: 832-824-8245
On Sun, Mar 4, 2012 at 6:34 PM, Hemarajata, Peera <hemaraja@bcm.edu> wrote:
Dear all,
I’m been trying to get Galaxy to recognize this GFF from NCBI ( ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Lactobacillus_reuteri_JCM_1112_uid58875/NC_010609.gff) but it failed to recognize the format after I uploaded it.
That *could* be because the NCBI's GFF3 is still horrible broken, but they are working on it and the next release should have valid GFF which I am looking forward to. http://blastedbio.blogspot.com/2011/08/why-are-ncbi-gff3-files-still-broken.... However, if you get similar problems with a GFF3 file converted from GenBank using BioPerl, then I guess it is a Galaxy issue. Peter
Hi Peera, I downloaded the file, stripped off extra comment lines (extra two at top starting with "#!" and one at bottom "##"). I loaded this to Galaxy as text, and when I attempted to set datatype as GFF3 ran into the metadata issues. This links at GMOD have a GFF3 format specification: http://gmod.org/wiki/GFF#GFF3_Format Bringing the data into spec will be the only solution if you want to use it. While simple format errors could be corrected by working with the file in tabular format in Galaxy, more complex errors will likely need to be fixed before upload into Galaxy. The GMOD validation tool can help pinpoint the errors. Enter the ftp URL into the form. When I ran, the errors seem to be with the "type" keywords used (do not meet spec): http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Line Number Error/Warning ----------- ------------- 4 [WARNING] unknown directive (directive: ##Type DNA NC_010609.1) 5 [ERROR] invalid type (type: source) 10 [ERROR] invalid type (type: misc_feature) 11 [ERROR] invalid type (type: misc_feature) 12 [ERROR] invalid type (type: misc_feature) 13 [ERROR] invalid type (type: misc_feature) 14 [ERROR] invalid type (type: misc_feature) 15 [ERROR] invalid type (type: misc_feature) 16 [ERROR] invalid type (type: misc_feature) 17 [ERROR] invalid type (type: misc_feature) ... 158 pages of errors... If you have a history with a GFF3 file from the bioperl program (the one you used and Peter suggested) that you believe to produce a file in spec (does not have the above content/errors) and verified by passing the above validation test, and is still giving errors with Cufflinks, there could be another problem. A chromosome naming mismatch between the reference genome and reference annotation is a common problem that you can examined first (all chromosome identifiers between BAM/SAM results, GTF/GFF3 annotation, and the reference genome must be identical). If that checks out, then please send a bug report from that failed Cufflinks job (green bug icon) and note in the comments that that bug report is from you, if your Galaxy account has a different email address than the one used for this email. We can help rule out other types of problems that are common with this tool set. Hopefully this helps, but if not, we can work with your bug report, Best, Jen Galaxy team On 3/4/12 10:34 AM, Hemarajata, Peera wrote:
Dear all,
I’m been trying to get Galaxy to recognize this GFF from NCBI ( ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Lactobacillus_reuteri_JCM_1112_uid58875/NC_010609.gff) but it failed to recognize the format after I uploaded it. Manual setting didn’t work either because it gave me a “unable to set metadata” error to me as soon as I started a cufflinks run using that GFF. I have tried to reformat the file several times and even tried using the popular bp_genbank2gff3.pl script to re-parse the records from the original genbank file.
Would anyone kindly look at the NCBI GFF and guide me to a solution to get this file recognized by Galaxy? I’ve been stuck for a couple of weeks now and would appreciate some suggestions. Thank you!
Sincerely yours,
Peera Hemarajata, M.D.
Advanced graduate student - Versalovic lab Department of Molecular Virology and Microbiology - Baylor College of Medicine Department of Pathology - Texas Children's Hospital Suite 830, 8th Floor Feigin Center. Tel: 832-824-8245
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
participants (3)
-
Hemarajata, Peera
-
Jennifer Jackson
-
Peter Cock