Hi, You need to run fastq groomer on your rna-seq data. Your reference is fine as a fasta. Austin On Fri, May 6, 2011 at 10:26 AM, <puvan001@umn.edu> wrote:
Hi David,
Thanks!When I tried to run Tophat, it doesn't recognise my FASTA file and it says "History does not include a dataset of the required format / build". Do you have any thoughts about this?
Now it makes more sense about "multihits". Thanks for sharing your workflow.
With regards
Sumathy
On May 6 2011, David Matthews wrote:
Hi,
I have done exactly the same kind of thing for adenovirus so I can help
with it. In answer to question 1 you do not need to index it will be done for you when tophat is called. Secondly you should leave the 40 multihits as it is and post analysis filter out the multihits - this will allow you to determine if you do have a multihit problem or not and if so whether it is a big problem and where it is on the genome. I have a workflow on Galaxy which you can use called "Bristol workflow to get sorted unique proper pair mapped reads". If you plug in your sam file it should give you files listing only unique hits and those which map more than once. This workflow assumes you have paired end data but it can be modified to work with single end reads as well.
Hope this helps.
Best Wishes, David.
__________________________________ Dr David A. Matthews
Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K.
Tel. +44 117 3312058 Fax. +44 117 3312091
D.A.Matthews@bristol.ac.uk
On 6 May 2011, at 17:09, puvan001@umn.edu wrote:
Hi
I have a couple of questions regarding RNA seq analysis. My questions are
1.I need to use a viral genome (very small, ~2kb ) as a reference genome
and it is not available in Galaxy. I guess I can use this data from my history. I have a fasta file but I am not sure whether I have to do some kind of indexing or not.
2. In Tophat, default for "maximum number of alignments to be allowed"
is 40. What my understanding is a single read can be aligned maximum 40 different places. I am wondering why this is 40. Is there any specific reason? If I need unique mapping, I have to use 1 instead of 40. Am I correct?
Thanks
SP
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Sumathy Puvanendiran Graduate student
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at: