Re: [galaxy-user] Trouble getting tophat to run on built-in index genome

18 Jan 2012

      Hi Noa,

There are two issues: Using the correct genome and setting up paired 
data properly.

1 - Custom genomes

Perhaps you have discovered this already, but to set the TopHat tool 
form to use a custom reference genome from the history, this option:

"Will you select a reference genome from your history or use a built-in 
index?:"

Should be set to:

"Use one from the history"

The form will then refresh, offering a new pull-down menu under the option:

"Select the reference genome:"

that now includes datasets from the right history panel (instead of 
genome names, native to Galaxy). Select the fasta formatted custom 
reference genome and initiate the job.

I am not sure how you were able to get 100% mapped if the wrong genome 
was used ..., perhaps the genome you ended up mapping to was similar to 
the one you wanted to map to (different releases of the same species)? 
Or maybe you did use the genome from the history after all? Moving on to 
Cufflinks might give you some more information about the map quality 
(especially if used with a reference gene GTF file).

2 - Mated pairs

Flagstat is not the best tool for the latest RNA-seq data. Instead, try 
"NGS: Picard (beta) -> SAM/BAM Alignment Summary Metrics or BAM Index 
Statistics".

Hopefully this has been worked out or this is helpful! Apologies for the 
delay in reply, this question was overlooked during the holiday shuffle,

Best,

Jen
Galaxy team

On 12/13/11 1:17 PM, Noa Sher wrote:
...
I am trying to run Tophat on some Illumina data from an organism with no
UCSC-supported genome.
I have taken my Illumina data and run it through the FASTQ groomer.
I input this data into tophat, and choose the built-in index option for
------------this was the problem --------------^^^^^^^^^^^^^^^--------
the reference genome, using an uploaded GenBank FASTA genome.
I run this, but when I check the output by running *NGS: SAM Tools >
flagstat* on the bam output from tophat, I get 0% properly paired (see
below for the exact output).
I have BLASTed the Illumina data just to double-check that I have not
mixed files; it is in fact from the organism I am tophat-ing against.
I would greatly appreciate input on what I may be doing wrong? Is the
"built-in index" supposed to be just the genome sequence in FASTA format
or is it something more complex?
Thanks
Noa
6667700 in total
0 QC failure
0 duplicates
6667700 mapped (100.00%)
0 paired in sequencing
0 read1
0 read2
0 properly paired (-nan%)
0 with itself and mate mapped
0 singletons (-nan%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
-- 
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support