Trackster and gff file with multiple chromosome annotations
Hello, Is it possible to load a unique gff file with the annotations of several chromosomes for my custom build in one step (one gff file)? With the current version of galaxy, it seems that I can load a gff file referring only to one chromosome. That's pretty tedious to load 43 gff files separatly for my custom build... If I try, I get this error: Traceback (most recent call last): File "~/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 91, in main() File "~/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 30, in main for feature in read_unordered_gtf( open( in_fname, 'r' ) ): File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 389, in read_unordered_gtf feature = GFFFeature( None, intervals=intervals ) File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 65, in __init__ ( interval.chrom, self.chrom ) ) ValueError: interval chrom does not match self chrom: SAGS2 != SAGS1 Thanks Yec'han ================================================ Yec'han LAIZET Ingenieur Plateforme Genome Transcriptome Tel: 05 57 12 27 75 _________________________________ INRA-UMR BIOGECO 1202 Equipe Genetique 69 route d'Arcachon 33612 CESTAS ================================================
Yes, you should be able to use a single GFF for the complete genome. This error stems from the same issue as before, namely that Galaxy is treating your GFF file as GTF. If you think your GFF is well formatted and there is an issue with Galaxy's handling of GFF, please send me your GFF and I'll take a look. Best, J. On Oct 23, 2012, at 9:24 AM, Yec'han Laizet wrote:
Hello,
Is it possible to load a unique gff file with the annotations of several chromosomes for my custom build in one step (one gff file)?
With the current version of galaxy, it seems that I can load a gff file referring only to one chromosome. That's pretty tedious to load 43 gff files separatly for my custom build...
If I try, I get this error:
Traceback (most recent call last): File "~/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 91, in main() File "~/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 30, in main for feature in read_unordered_gtf( open( in_fname, 'r' ) ): File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 389, in read_unordered_gtf feature = GFFFeature( None, intervals=intervals ) File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 65, in __init__ ( interval.chrom, self.chrom ) ) ValueError: interval chrom does not match self chrom: SAGS2 != SAGS1
Thanks
Yec'han
================================================
Yec'han LAIZET Ingenieur Plateforme Genome Transcriptome Tel: 05 57 12 27 75 _________________________________ INRA-UMR BIOGECO 1202 Equipe Genetique 69 route d'Arcachon 33612 CESTAS ================================================
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Here are the links to get the gff and the related genome files: http://genomeportal.jgi-psf.org/Crypa2/download/Cparasiticav2.GeneCatalog200... http://genomeportal.jgi-psf.org/Crypa2/download/Cryphonectria_parasiticav2.n... Whatever the file type I set for the gff file (gff3, gff or gtf), I get the transcript_id error: Traceback (most recent call last): File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 91, in main() File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 30, in main for feature in read_unordered_gtf( open( in_fname, 'r' ) ): File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 375, in read_unordered_gtf transcript_id = line_attrs[ 'transcript_id' ] KeyError: 'transcript_id' If I fix the transcript_id problem, I get the other error: Traceback (most recent call last): File "~/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 91, in main() File "~/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 30, in main for feature in read_unordered_gtf( open( in_fname, 'r' ) ): File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 389, in read_unordered_gtf feature = GFFFeature( None, intervals=intervals ) File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 65, in __init__ ( interval.chrom, self.chrom ) ) ValueError: interval chrom does not match self chrom: scaffold_10 != scaffold_10 Is the gff file not correct? PS : I use the galaxy changeset : 7828:b5bda7a5c345 Yec'han ================================================ Yec'han LAIZET Ingenieur Plateforme Genome Transcriptome Tel: 05 57 12 27 75 _________________________________ INRA-UMR BIOGECO 1202 Equipe Genetique 69 route d'Arcachon 33612 CESTAS ================================================ Le 23/10/2012 18:37, Jeremy Goecks a écrit :
Yes, you should be able to use a single GFF for the complete genome.
This error stems from the same issue as before, namely that Galaxy is treating your GFF file as GTF.
If you think your GFF is well formatted and there is an issue with Galaxy's handling of GFF, please send me your GFF and I'll take a look.
Best, J.
On Oct 23, 2012, at 9:24 AM, Yec'han Laizet wrote:
Hello,
Is it possible to load a unique gff file with the annotations of several chromosomes for my custom build in one step (one gff file)?
With the current version of galaxy, it seems that I can load a gff file referring only to one chromosome. That's pretty tedious to load 43 gff files separatly for my custom build...
If I try, I get this error:
Traceback (most recent call last): File "~/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 91, in main() File "~/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 30, in main for feature in read_unordered_gtf( open( in_fname, 'r' ) ): File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 389, in read_unordered_gtf feature = GFFFeature( None, intervals=intervals ) File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 65, in __init__ ( interval.chrom, self.chrom ) ) ValueError: interval chrom does not match self chrom: SAGS2 != SAGS1
Thanks
Yec'han
================================================
Yec'han LAIZET Ingenieur Plateforme Genome Transcriptome Tel: 05 57 12 27 75 _________________________________ INRA-UMR BIOGECO 1202 Equipe Genetique 69 route d'Arcachon 33612 CESTAS ================================================
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Whatever the file type I set for the gff file (gff3, gff or gtf), I get the transcript_id error:
Traceback (most recent call last): File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 91, in main() File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 30, in main for feature in read_unordered_gtf( open( in_fname, 'r' ) ): File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 375, in read_unordered_gtf transcript_id = line_attrs[ 'transcript_id' ] KeyError: 'transcript_id'
This was due to an incomplete feature. Turns out that GFF support hadn't been included in feature search; I've added it in -central changeset fa045aad74e9: https://bitbucket.org/galaxy/galaxy-central/changeset/fa045aad74e90f16995e0c...
Is the gff file not correct?
I believe there is an issue with your GFF: it is using non-standard identifiers in the attributes (last) column. To the best of my knowledge, 'name' is not a valid field for connecting features in GFF3 (which is my best guess for the file version), but your GFF uses this field anyways. To fix this issue, I replaced 'name' with 'ID' (which is compliant GFF3) from the command line: -- % sed s/name/ID/ ~/Downloads/test.gff > ~/Downloads/test_with_ids.gff -- and this fixed the issue. Finally, there is a sed wrapper in the toolshed should you want to do this conversion in Galaxy: http://toolshed.g2.bx.psu.edu/repository/browse_categories?sort=name&operation=view_or_manage_repository&f-deleted=False&f-free-text-search=sed&id=9652a50c5a932f3e Best, J.
I will modify the gff file as you mentioned and update galaxy. Thanks a lot. Yec'han ================================================ Yec'han LAIZET Ingenieur Plateforme Genome Transcriptome Tel: 05 57 12 27 75 _________________________________ INRA-UMR BIOGECO 1202 Equipe Genetique 69 route d'Arcachon 33612 CESTAS ================================================ Le 29/10/2012 15:59, Jeremy Goecks a écrit :
Whatever the file type I set for the gff file (gff3, gff or gtf), I get the transcript_id error:
Traceback (most recent call last): File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 91, in main() File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py", line 30, in main for feature in read_unordered_gtf( open( in_fname, 'r' ) ): File "/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 375, in read_unordered_gtf transcript_id = line_attrs[ 'transcript_id' ] KeyError: 'transcript_id'
This was due to an incomplete feature. Turns out that GFF support hadn't been included in feature search; I've added it in -central changeset fa045aad74e9:
https://bitbucket.org/galaxy/galaxy-central/changeset/fa045aad74e90f16995e0c...
Is the gff file not correct?
I believe there is an issue with your GFF: it is using non-standard identifiers in the attributes (last) column. To the best of my knowledge, 'name' is not a valid field for connecting features in GFF3 (which is my best guess for the file version), but your GFF uses this field anyways.
To fix this issue, I replaced 'name' with 'ID' (which is compliant GFF3) from the command line:
-- % sed s/name/ID/ ~/Downloads/test.gff > ~/Downloads/test_with_ids.gff --
and this fixed the issue.
Finally, there is a sed wrapper in the toolshed should you want to do this conversion in Galaxy:
Best, J.
participants (2)
-
Jeremy Goecks
-
Yec'han Laizet