SOLID RNA-Seq De Novo Transcriptome Assembly

31 Oct 2013

      Hi Oscar,

You most likely want to explore tools that are designed specifically for 
this purpose, if the reference genome you are talking about is the 
assembled transcriptome. Trinity is one tool, but there are others in 
the Tool Shed and on some of the Public Servers.

Links:
http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server
http://wiki.galaxyproject.org/Support#Custom_reference_genome
http://wiki.galaxyproject.org/BigPicture/Choices
http://wiki.galaxyproject.org/Tool%20Shed

Your question is a bit confusing because the 'annotations' may already 
be what these tools would produce and I am not sure what you are trying 
to do next. If it is the assignment of putative function, then there are 
many paths to follow, some better suited for viral genomes. You'll want 
to find out what others doing this exact work are using right now and 
consider the same tools. Start by checking out the public Galaxy 
servers, many have trial tools that you can later include in a 
local/cloud from the tool shed: 
http://wiki.galaxyproject.org/PublicGalaxyServers

If your question was misunderstood (the reference genome is in fact a 
DNA genome - and you have RNA sequence to align), then the RNA-seq 
pipeline can be used as-is with 'Tophat for SOLiD', Cufflinks, 
CuffMerge, CuffDiff - all on a local/cloud/slipstream with the reference 
genome as a cluster reference genome. There is no requirement for 
reference annotation with any of these tool - it helps to gain full 
functionality - especially with CuffDiff, but is not required. More 
assistance is at tophat.cufflinks@gmail.com.

Hopefully this helps,

Jen
Galaxy team

On 10/24/13 6:06 PM, Oscar Aguilar wrote:
...
Hi Dr. Jackson,
I'm sorry to bother you but I have been searching for answers but I 
can't seem to find any and I'm sure that you would be able to answer 
my question.
So I am trying to find a novel gene using de novo tramscriptome 
assembly and I see that TopHat might just be able to help me out with 
my dilemma. The viral genome not available on the galaxy website, and 
the other issue is that I am using SOLID data. So my question is, can 
I use TopHat with SOLID data by converting to nucleotide base fastq? 
or do I have to use TopHat2 with a colourspace viral genome? I also 
have to admit that I am completely new to bioinformatics and my 
project as lead me here so I am trying to tackle it on my own.
Fo the custom genome, I have managed to load it (in fasta, and 
annotation in BED) but I am not sure how to assign the annotations to 
the genome. Also, does TopHat require an annotated genome? I read that 
it doesn't but I'm not sure...I fear that my gene is a spliced one and 
I would like to be able to pull it out from output data.
I'm sorry to bother you as I'm sure the answer is out there I just 
really can't seem to find it and am now desperate.
Thank you in advance,
Oscar
-- 
Oscar A. Aguilar, M.Sc
PhD Candidate
Sunnybrook Research Institute
Department of Immunology
University of Toronto
416-480-6100x89492
oscar.aguilar@utoronto.ca <mailto:oscar.aguilar@utoronto.ca>
-- 
Jennifer Hillman-Jackson
http://galaxyproject.org

Jennifer Jackson

tags

participants (1)