Cufflinks creates invalid output with duplicate GFF IDs
Hi List, I have a user running a Tuxedo pipeline on our local Galaxy but it has been fraught with errors. The first issue was with running cufflinks with an annotation file which had duplicate IDs - fixed by me running the following: awk '($3 == "exon" || $3 == "CDS")' dataset_8640.dat >> newref Just tossing out the transcript and gene lines seemed to help. Second issue was Error: sequence lines in a FASTA record must have the same length! Converting to and from tabular fixed that. Now the issue is with cuffmerge. The user has run Tophat on a pair of fastq files and a fasta genome. They then ran cufflinks on the Tophat assembly with an annotation file (not the previously mentioned one). This worked. But using the gtf file produced by cufflinks in a cuffmerge step results in: Error running cuffmerge. [Sat Mar 22 14:00:10 2014] Beginning transcriptome assembly merge ------------------------------------------- [Sat Mar 22 14:00:10 2014] Preparing output location cm_output/ [Sat Mar 22 14:02:37 2014] Converting GTF files to SAM [14:02:38] Loading reference annotation. Error: duplicate GFF ID 'CDS:GBG_brugia_K07A12.4b' encountered! [FAILED] Error: could not execute gtf_to_sam I took out the sequence and annotation file in the cuffmerge step with no change in result. I ran gffread on the cufflinks output and sure enough, it explodes. But why would Cufflinks create an invalid file? The file itself has these entries: Bmal_v3_scaffold1 Cufflinks transcript 280476 280630 1 + . gene_id ""; transcript_id "CDS:GBG_brugia_K07A12.4b"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; Bmal_v3_scaffold1 Cufflinks exon 280476 280630 1 + . gene_id ""; transcript_id "CDS:GBG_brugia_K07A12.4b"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; Bmal_v3_scaffold1 Cufflinks transcript 281149 281207 1 + . gene_id ""; transcript_id "CDS:GBG_brugia_K07A12.4b"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; Bmal_v3_scaffold1 Cufflinks exon 281149 281207 1 + . gene_id ""; transcript_id "CDS:GBG_brugia_K07A12.4b"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; # Shortening for brevity: Bmal_v3_scaffold1 Cufflinks transcript 281521 281622 1 ... Bmal_v3_scaffold1 Cufflinks exon 281521 281622 1 ... Bmal_v3_scaffold1 Cufflinks transcript 281743 281863 1 ... Bmal_v3_scaffold1 Cufflinks exon 281743 281863 1 ... Bmal_v3_scaffold1 Cufflinks transcript 282355 282537 1 .. Bmal_v3_scaffold1 Cufflinks exon 282355 282537 1 ... Bmal_v3_scaffold1 Cufflinks transcript 283063 283190 1 ... Bmal_v3_scaffold1 Cufflinks exon 283063 283190 1 ... Bmal_v3_scaffold1 Cufflinks transcript 283879 284035 1 ... Bmal_v3_scaffold1 Cufflinks exon 283879 284035 1 ... Bmal_v3_scaffold1 Cufflinks transcript 280652 280683 1 ... Bmal_v3_scaffold1 Cufflinks exon 280652 280683 1 ... Here's my setup: Galaxy changeset: dc067a95261d was my last pull $ cuffmerge --version merge_cuff_asms v1.0.0 $ cufflinks cufflinks v2.2.0 linked against Boost version 104700 $ tophat --version TopHat v1.3.3 Using tools: Cuffdiff devteam revision: 604fa75232a2 Cufflinks devteam revision: 9aab29e159a7 Cuffmerge devteam revision: 424d49834830 Tophat devteam revision: 1030acbecce6 Has anyone else seen this? I'm re-running the workflow from scratch but I don't really have any leads. Sincerely, Carrie Ganote
participants (1)
-
Ganote, Carrie L