Hello David, You are correct about the tools, so the problem is most likely with the original GTF file. If gene_id is not assigned there correctly, then the data will not be sorted by gene_id. Although GTF format is consistent (mostly!) between sources, the actual content can vary. One example is from UCSC - the GTF format from the Table browser will have the transcript name assigned to both the gene_id and the transcript_id tags in the attributes field (f9). Post processing to extract gene name from the track and swapping it into the GTF file's gene_id attribute tag would be a necessary pre-processing step before using the downstream tools with functionality that would use the attribute. The good news is that you should be able to use Galaxy's Text Manipulation tools to do whatever file processing you need to do, from whatever input source you are using, once you have the data content loaded into your history. Create->save->use a workflow so that you only have to work out tedious file conversions step-by-step one time. If you need more help, please let us know and share your history: Options -> Share or Publish -> Share with a user "jen@bx.pus.edu". Thanks, Jen Galaxy team ps. It is best to send data questions to galaxy-use mail list to help the community learn from each other. I am going to forward this answer there now, since this question has come up a few times recently after the addition of the new tools. On 12/1/10 7:00 AM, David Matthews wrote:
Dear Jennifer,
Hope you can help, after using cuffdiff on my data using the combined gtf files from cuffcompare I get the usual list of files back. However, in the genes tracking file and the genes fpkm files many genes are listed more than once. My understanding was that cuffdiff was supposed to amalgamate these into one whole number for that gene id, am I doing something wrong?
Cheers David