The short answer is that if your data is not pairing, then there may be
a quality problem. Or, there may be a problem with the TopHat mapping
run. The best advice is to take a sample of your data and experiment
with some TopHat alternate parameters (using the protocol suggestions in
paper below or manual http://tophat.cbcb.umd.edu/
) and see what works.
If your overall goal is simply transcript/gene discovery/assembly, then
filtering is probably OK. But if you are going to be doing any
statistical expression analysis, then targeted filtering of the data
(e.g. beyond general quality) should be done with caution, if at all, as
you risk skewing the results.
You may have seen this already, but the Cufflinks tool authors put out a
new paper that covers best practice RNA-seq protocols:
(also linked from http://cufflinks.cbcb.umd.edu/
, 2nd item down)
Apologies for the delayed reply. There were a few questions from you
around this same time, but it wasn't clear if everything was addressed
or not. And I don't think the paper link was sent out in reply, which
will likely be the most helpful.
On 4/20/12 4:51 AM, 杨继文 wrote:
After mapping, I used IGV to have a look at the mapping. There are a
lot of mapped reads without pair reads. Should I keep these reads? or
Is this a problem for cufflinks analysis?
What I tried is:
1. BAM to SAM
2. Filter SAM: set /Read mapped in a proper pair/ to *Yes*.
The result is that only 1/5 reads were left.
Can anybody tell me if this operation is proper??
How do you normally optimize the mapping rerults from Tophat?
Which considerations should I take into account?
Looking forward to your reply.
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at: