When mapping pair end RNA-seq reads using tophat, we need to type in "Mean Inner Distance between Mate Pairs". In galaxy, we can read the following information:
This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments
selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter
is required for paired end runs.
I think the size of fragment (here 300bp) includes not only the length of pair end reads, but also the length of adaptors. so, maybe the Mean Inner Distance between Mate Pairs should be : fragment length - pair end read length - adaptor length. Am I right? or did I miss something?
Is it a must to type in the accurate value?
Looking forward to your reply
We used the generate pileup tool with consensus base calling option using
Maq with default options. In the output, the quality scores of the bases
were changed. For example if the input score of bases in the SAM file were
'IIJIH'. They were changed to '22321'. Is this a glitch or is this
expected? Thank you.
After mapping, I used IGV to have a look at the mapping. There are a lot of mapped reads without pair reads. Should I keep these reads? or Is this a problem for cufflinks analysis?
What I tried is:
1. BAM to SAM
2. Filter SAM: set Read mapped in a proper pair to Yes.
The result is that only 1/5 reads were left.
Can anybody tell me if this operation is proper??
How do you normally optimize the mapping rerults from Tophat?
Which considerations should I take into account?
Looking forward to your reply.
I am analyzing my RNA-Seq data. After running cuffdiff, I got a list of differentially expressed transcrpts or genes.
As far as I know, log2 value = fold change. However, there are minus values. Is this possible?? log2 value can not be minus. Did I miss something??
Looking forward to your help.
Thanks in advance.
Earlier I tried to upload larger bam files (3.5GB, 3.4GB and
4GB) to Galaxy account, but failed. Your
advice was to use FTP upload at http://wiki.g2.bx.psu.edu/FTPUpload.
I followed the screencast in the website and did exactly as it has advised. I
used FileZilla ftp client, uploaded the files to Galaxy account and executed.
Now the problem is at the execution step. For example, my 3.5GB file is
accurately uploaded, but once I execute the file I get is 2.5GB. The file seems
to be somehow truncated. Please advice!
I have a sam file after running BWASW and want to extract unique
(alignments that are aligning once to genome) from this sam file. I read in
other posts that I may be able to use Sam tools> filter Sam option to
filter the said flag on wise flag. However I could not find whether I have
to use default setting of column 2? when I use option of add flags there
are different options for pair reads, however my data is single reads. So
exactly single read alignments sam file how we extract unique reads.
Am I missing something. I can also share history in order to explain my
point if required.
Hello, I am using the subtract (whole dataset) tool. I converted my fastq
file to tabular with 2 columns: 1. Identifier and 2. sequence. I then
"selected (a few) lines that match an expression" from this initial tabular
file and am trying to get a final dataset that is devoid of reads with the
few selected lines - thus I subtract the dataset of selected lines from the
initial dataset. This tool works with I am performing the workflow on a
relatively small file (1/50 the size of a whole sequencing experiment) but
repeatly fails when I input the full fastq file. Any idea why this is so?
Hi Galaxy Developers,
This is regarding the wrapper for cuffdiff and the -c parameter for "Min
alignment count". We noted that Galaxy's default value is set to 1000. In
our experience, setting this parameter that high yields drastically
different results from the ones obtained with the author's original default
value of c = 10.
The -c parameter is defined as:
-c/--min-alignment-count <int> The minimum number of alignments in a locus
for needed to conduct significance testing on changes in that locus observed
between samples. If no testing is performed, changes in the locus are deemed
not signficant, and the locus' observed changes don't contribute to
correction for multiple testing. The default is 10 fragment alignments.
We have found that the value entered will require that at least this number
of fragment alignments be found for *each* sample (or at least each
condition) in order for the locus to be tested for differential expression.
Although one could save themselves a lot of locus comparisons (and hence
take a much smaller hit for multiple hypothesis testing) if he/she were to
raise this threshold, this will screen out the most desirable differentially
expressed genes - specifically those that are not expressed at all in one
condition but highly expressed in the other condition.
The attached image has a real example of a gene that was counted as
differentially expressed (as it should have been) with '10' but was not even
tested at '100' nor '1000'.
I got the following error infomation during Tophat mapping
An error occurred running this job: Job output not returned by PBS: the output datasets were deleted while the job was running, the job was manually dequeued or there was a cluster error.
Please let me know what's wrong.
Help will be appreciated.