I have some paired-end datasets to be analyzed, but I am not sure about their Mean Inner Distance between Mate Pairs.
Can I convert these paired-end datasets into single-end ones and use them as single-end dataset as follows?
1) Use the tool "Manipulate FASTQ" to convert the sequence of reverse reads into its reverse-complement counter part, so that all of the reverse reads actually become forward reads.
2) run Tophat on the manipulated datasets as single-end ones.
I am analyzing the downloaded RNA-seq datasets. However I am not sure how much is Mean Inner Distance between Mate Pairs for these paired-end datasets.
Take a paired-end RNA-seq dataset as an example, there is a description for this dataset in SRA database of NCBI: "Layout: PAIRED, Orientation: 5'-3'-3'-5', Nominal length: 400, Nominal Std Dev: 20".
At first I thought the Mean Inner Distance between Mate Pairs should be 325bps because the length of reads on both ends is 36bps. However when I aligned the sequence of the paired reads on to transcripts and genome using BLASTn, the distance between the paired reads is about 200bps. How should I decide the Mean Inner Distance between Mate Pairs in my case?
I have used for the first time this tool, picard bam statistic.
I have aligned my reads to a custom genome (7904 bp long) and had the
length= 7904 Aligned= 44 Unaligned= 0
Why is unaligned =0.
I had the unaligned =0 also when aligning agaist hg19...
I am working on RNA-seq data. First, I mapped the reads to the reference
transcriptome using bowtie. I found some different reads mapped to the same
gene with different positions. Before running Cufflinks, I would like to
combine the reads that mapped to the same gene though with different
positions. Is there a tool in Galaxy can fulfill this purpose? Any
suggestion would be much appreciated. Thanks!
Is there a way to use a library of shRNA sequences as my custom genome when using Bowtie? Currently my library is in multiple-sequence FASTA format.
Thanks for any help!
UT Southwestern Medical Center
The future of medicine, today.
what is the difference between using "NGS:mapping-------Map with
Bowtie for Illumina" and "NGS: RNA analysis---------Tophat for
Illumina" when mapping reads against a reference/custom genome?
I have some FASTQ datasets in phred 33 offset, and I have already assinged them Fastqsanger format. Do I need to run FASTQ Groomer on these datasets before I check the data quality by "Fastqc: Fastqc QC" and "FASTQ Trimmer by column" to remove bad nucleotides at 3' end of reads?
Should I select "Sanger" as "Input FASTQ quality scores type:" if I need to run Groomer?
I am going to search the alternative splicing events bentween datasets. I am not sure about the settings of mouse reference genome (mm9) when I upload it from UCSC Main.
Would you please tell me the settings for
4) Output format:
I am going to run Tophat with mouse RNA-seq datasets. When I uploaded the datasets with URL method, I chose "Mouse July 2007 (NCBI37/mm9) (mm9)" under Genome. So " database: mm9" appears in the brief description of each dataset in history.
My question is: when I run Tophat, under "Will you select a reference genome from your history or use a built-in index?", should I selct "Use a built-in index" or "Use one from the history" ?