Re: [galaxy-user] Running Cufflinks on Bacterial RNAseq data

1 Aug 2012

      Peter is correct (I oversimplified)! And Cufflinks does allow for an ID 
attribute to span lines as long as it represents the same feature.

To be clear, this error was a true format issue.

The best way to understand the finer points is to see the specification 
(also linked from wiki below):

http://www.sequenceontology.org/gff3.shtml
(quote)

Column 9: "attributes"
<...>
ID 	Indicates the ID of the feature. IDs for each feature must be unique 
within the scope of the GFF file. In the case of discontinuous features 
(i.e. a single feature that exists over multiple genomic locations) the 
same ID may appear on multiple lines. All lines that share an ID 
collectively represent a single feature.

Thanks Peter for the clarification!

Jen
Galaxy team

On 7/31/12 11:35 AM, Jennifer Jackson wrote:
...
Hello Rachel,
When datasets are in a grey "waiting to run" state this indicates that
they are in the queue and in line to run. For the majority of cases,
including yours, leaving the job alone and allowing it to run is the
correct option. The missing metadata only means that the result has not
yet posted to your history (expected when still grey).
It looks as if your jobs have now run, but resulted in errors. I can let
you know that the problem is with the input GFF3 dataset. It contains at
least one duplicated "ID" attribute, which is required to be unique
within GFF3 files. Clicking on the green bug icon in any of the red
error datasets will point to the example duplicated ID. To my knowledge,
the content being based on a bacterial genome is unrelated to this
format problem.
For reference, this is the specification help for GFF3:
http://wiki.g2.bx.psu.edu/Learn/Datatypes#GFF3
This can be a difficult problem to resolve on your own since the scope
of the true file issues are unknown. Locating an alternate source or
contacting the original source of this GFF3 dataset to request a
correction would be potential solutions. The tophat.cufflinks@gmail.com
mailing list or seqanswers.com are suggested places to query for
reference annotation file recommendations.
Best,
Jen
Galaxy team
On 7/27/12 10:30 AM, Rachel Krasich wrote:
...
I am attempting to run Cufflinks on Galaxy main to analyze my E. coli
RNAseq data.  I have mapped my reads using an outside program (Genious)
and uploaded the resulting BAM file.  I also have uploaded the E. coli
annotations as a gtf file.  However when I attempt to run Cufflinks
using my annotations it just stays on "Job is waiting to run" for
hours.  If I click on "Edit attributes", I see an error message
"Required metadata values are missing".  Does this mean that my files
are somehow incomplete and cufflinks will never run, or do I just need
to wait longer?  When searching around the mailing lists I saw others
have had issues with bacteria due to its circular chromosome, and was
wondering if this might somehow be related.  Thanks.
Rachel
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
-- 
Jennifer Jackson
http://galaxyproject.org