Cuffmerge error: duplicate GFF ID encountered
Hello, I was doing a RNA analyse and I wished to compare the transcription and expression of two samples using a reference annotation, however this is the error message I got: =====Quote===== Error running cuffmerge. [Thu Jul 4 07:32:59 2013] Beginning transcriptome assembly merge ------------------------------------------- [Thu Jul 4 07:32:59 2013] Preparing output location cm_output/ [Thu Jul 4 07:34:07 2013] Converting GTF files to SAM [07:34:07] Loading reference annotation. [07:34:07] Loading reference annotation. [Thu Jul 4 07:34:08 2013] Quantitating transcripts You are using Cufflinks v2.1.1, which is the most recent release. Command line: cufflinks -o cm_output/ -F 0.05 -g /galaxy/main_pool/pool7/files/006/446/dataset_6446730.dat -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 4 cm_output/tmp/mergeSam_fileIO17rb [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File cm_output/tmp/mergeSam_fileIO17rb doesn't appear to be a valid BAM file, trying SAM... [07:34:08] Loading reference annotation. [07:35:53] Inspecting reads and determining fragment length distribution. Processed 33854 loci.
Map Properties: Normalized Map Mass: 8719.00 Raw Map Mass: 8719.00 Fragment Length Distribution: Truncated Gaussian (default) Default Mean: 200 Default Std Dev: 80 [07:35:53] Assembling transcripts and estimating abundances. Processed 33854 loci. [Thu Jul 4 07:39:29 2013] Comparing against reference file /galaxy/main_pool/pool7/files/006/446/dataset_6446730.dat You are using Cufflinks v2.1.1, which is the most recent release. Error: duplicate GFF ID 'ENST00000361547.2' encountered! [FAILED] Error: could not execute cuffcompare
======End quote====== The job goes well without the annotation reference. The annotation file I used can be downloaded here: ftp://ftp.sanger.ac.uk/pub/gencode/release_17/gencode.v17.annotation.gtf.gz Can anyone help me please? Thanks, Delong
Hello Delong, Duplicated GFF IDs are not permitted in reference annotation inputs for this tool suite. There are a few options. 1 - edit the file to remove/reduce the duplicates. There could be scientific consequences when doing this, so consider carefully. 2 - use another source. iGenomes is a recommended option. An added benefit is that these files contain additional attributes in the 9th field utilized by the tools, enabling full functionality. You can read about this in the "inputs" section for each tool in the manual, I'll link it below. The human iGenomes gtf file is already in the public Main Galaxy instance in Shared Data -> Data Libraries -> iGenomes. Or, you can download the original data at the Cufflinks web site, extract the gtf, and load where ever you are using Galaxy (local, cloud, other public instance). http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server / Example/ ? /*RNA-seq analysis*/ *tools.* http://cufflinks.cbcb.umd.edu/manual.html http://cufflinks.cbcb.umd.edu/igenomes.html Best, Jen Galaxy team On 7/4/13 6:30 AM, Delong, Zhou wrote:
Hello, I was doing a RNA analyse and I wished to compare the transcription and expression of two samples using a reference annotation, however this is the error message I got:
=====Quote===== Error running cuffmerge. [Thu Jul 4 07:32:59 2013] Beginning transcriptome assembly merge -------------------------------------------
[Thu Jul 4 07:32:59 2013] Preparing output location cm_output/ [Thu Jul 4 07:34:07 2013] Converting GTF files to SAM [07:34:07] Loading reference annotation. [07:34:07] Loading reference annotation. [Thu Jul 4 07:34:08 2013] Quantitating transcripts You are using Cufflinks v2.1.1, which is the most recent release. Command line: cufflinks -o cm_output/ -F 0.05 -g /galaxy/main_pool/pool7/files/006/446/dataset_6446730.dat -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 4 cm_output/tmp/mergeSam_fileIO17rb [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File cm_output/tmp/mergeSam_fileIO17rb doesn't appear to be a valid BAM file, trying SAM... [07:34:08] Loading reference annotation. [07:35:53] Inspecting reads and determining fragment length distribution. Processed 33854 loci.
Map Properties: Normalized Map Mass: 8719.00 Raw Map Mass: 8719.00 Fragment Length Distribution: Truncated Gaussian (default) Default Mean: 200 Default Std Dev: 80 [07:35:53] Assembling transcripts and estimating abundances. Processed 33854 loci. [Thu Jul 4 07:39:29 2013] Comparing against reference file /galaxy/main_pool/pool7/files/006/446/dataset_6446730.dat You are using Cufflinks v2.1.1, which is the most recent release. Error: duplicate GFF ID 'ENST00000361547.2' encountered! [FAILED] Error: could not execute cuffcompare
======End quote======
The job goes well without the annotation reference. The annotation file I used can be downloaded here: ftp://ftp.sanger.ac.uk/pub/gencode/release_17/gencode.v17.annotation.gtf.gz
Can anyone help me please? Thanks, Delong
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
Thank you for the reply. I will try this annotation GIF when I get my local galaxy to function. Have a good day, Delong ________________________________ De : Jennifer Jackson [jen@bx.psu.edu] Envoyé : 9 juillet 2013 14:30 À : Delong, Zhou Cc : galaxy-user@bx.psu.edu Objet : Re: [galaxy-user] Cuffmerge error: duplicate GFF ID encountered Hello Delong, Duplicated GFF IDs are not permitted in reference annotation inputs for this tool suite. There are a few options. 1 - edit the file to remove/reduce the duplicates. There could be scientific consequences when doing this, so consider carefully. 2 - use another source. iGenomes is a recommended option. An added benefit is that these files contain additional attributes in the 9th field utilized by the tools, enabling full functionality. You can read about this in the "inputs" section for each tool in the manual, I'll link it below. The human iGenomes gtf file is already in the public Main Galaxy instance in Shared Data -> Data Libraries -> iGenomes. Or, you can download the original data at the Cufflinks web site, extract the gtf, and load where ever you are using Galaxy (local, cloud, other public instance). http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server Example → RNA-seq analysis tools. http://cufflinks.cbcb.umd.edu/manual.html http://cufflinks.cbcb.umd.edu/igenomes.html Best, Jen Galaxy team On 7/4/13 6:30 AM, Delong, Zhou wrote: Hello, I was doing a RNA analyse and I wished to compare the transcription and expression of two samples using a reference annotation, however this is the error message I got: =====Quote===== Error running cuffmerge. [Thu Jul 4 07:32:59 2013] Beginning transcriptome assembly merge ------------------------------------------- [Thu Jul 4 07:32:59 2013] Preparing output location cm_output/ [Thu Jul 4 07:34:07 2013] Converting GTF files to SAM [07:34:07] Loading reference annotation. [07:34:07] Loading reference annotation. [Thu Jul 4 07:34:08 2013] Quantitating transcripts You are using Cufflinks v2.1.1, which is the most recent release. Command line: cufflinks -o cm_output/ -F 0.05 -g /galaxy/main_pool/pool7/files/006/446/dataset_6446730.dat -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 4 cm_output/tmp/mergeSam_fileIO17rb [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File cm_output/tmp/mergeSam_fileIO17rb doesn't appear to be a valid BAM file, trying SAM... [07:34:08] Loading reference annotation. [07:35:53] Inspecting reads and determining fragment length distribution. Processed 33854 loci. Map Properties: Normalized Map Mass: 8719.00 Raw Map Mass: 8719.00 Fragment Length Distribution: Truncated Gaussian (default) Default Mean: 200 Default Std Dev: 80 [07:35:53] Assembling transcripts and estimating abundances. Processed 33854 loci. [Thu Jul 4 07:39:29 2013] Comparing against reference file /galaxy/main_pool/pool7/files/006/446/dataset_6446730.dat You are using Cufflinks v2.1.1, which is the most recent release. Error: duplicate GFF ID 'ENST00000361547.2' encountered! [FAILED] Error: could not execute cuffcompare ======End quote====== The job goes well without the annotation reference. The annotation file I used can be downloaded here: ftp://ftp.sanger.ac.uk/pub/gencode/release_17/gencode.v17.annotation.gtf.gz Can anyone help me please? Thanks, Delong ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
participants (2)
-
Delong, Zhou
-
Jennifer Jackson