Hello Kristen,

Our RNA-seq tutorial and FAQ can help out with the general workflow:

https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise
https://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq

And an iGenomes reference annotation GTF dataset for mm9 is in the Shared Libraries here:
(Import " genes.gtf" to your history, please ignore other content as it is under revision)

http://usegalaxy.org  -> Shared Data -> Data Libraries  -> iGenomes -> mm9


To address your questions, one key misunderstanding may be the difference between a "reference genome" and a "reference annotation" dataset.

*  "reference genome" = genomic sequence (sourced in .fasta format) that the data is mapped against with TopHat and used as a scaffold for the RNA-seq tools. Since you are using mm9, selecting the "built-in index" for mm9 is an appropriate choice. A reference genome does not provide annotation beyond genomic positional coordinates. When using a mapping tool, including TopHat, there are mapping parameters that can be set to specify whether to keep only the best or all hits - it sounds as if you need to adjust these parameters in your run. The filter you ran (question #2) may have removed most or all hits - check the output from the SAM filter, was the output greatly reduced or empty? If so, re-run TopHat with parameters that keep the best hit from the start and move to Cufflinks from there without filtering through SAMTools. Help is on the tool form itself and in the links to the manual.

* "reference annotation" = known transcripts (sourced in .gtf or .gff3 format) that are also mapped against the reference genome. These transcript annotations are the most useful when they contain gene, transcript start site, and other key attributes that the Cuff* tools can interpret. This annotation can guide assembly at various levels (loose or strict) depending on how the tool parameters are configured. The annotation MUST be mapped to the same exact reference genome that your FASTQ datasets are mapped to, with the same exact chromosome naming (see the RNA-seq FAQ for details). Help is also on the Cuff* tools including links to the manuals.

More help, including links to tool help is on our wiki here:
(see ' Tools on the Main server: Example: unexpected results with RNA-seq analysis tools.)
http://wiki.g2.bx.psu.edu/Support#Interpreting_scientific_results

Hopefully this helps,

Jen
Galaxy team

On 6/13/12 7:07 AM, Kristen Roop wrote:
Hello,

Galaxy Main

1.) I am having trouble adding annotations to my Tophat and Cufflinks tools. 
I used the Mus.Musculus 9MM reference using the built in index. For the Tophat mapping but no annotations were available in the output files.
I then tried converting the the Ref Genome from the UCSC to a SAM file using Sam Tools. Tophat would not recognize this but Cufflinks did. The Cufflinks output file did not have the annotation either.

Any thoughts on the proper way to add annotations?



2.) I am also trying to filter the single mapped reads from the multiple mapped reads that resulted from Tophat. After converting the output file from Tophat I used the filter tool in the Sam Tools choosing 0x100 map is not primary. Afterwards I tried to run Cufflinks on the filtered output only to have it fail.


My ultimate goal is to look at RNA seq gene expression. I know that I have to upload my files -> groom using FASTQ groomer -> download a reference sequence from UCSC -> convert the reference genome file to a usable format ->Run Tophat for mapping using the groomed file and the converted reference annotation -> Filter the single mapped reads -> Run cufflinks using the filtered single mapped reads from Tophat.

From here I will continue with some other statistical analysis but right now I need to get this basic pipeline to work.


Thanks,
Kristen Roop


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

-- 
Jennifer Jackson
http://galaxyproject.org