Thank you, Peter! I think I should put more detailed information here. What I'm doing is piRNA data. Two groups of piRNA (named sense and antisense)are in the library. As I said, they are complementary to each other for about 10 nt, while the whole length is about 30nt. For the sense group, they share the feature of having an "A" at their 10th. In this case, how can I deal with it? One possible way come up is inverting all sequences and aligning them. Thanks! Best, Zhiqiang Quoting "Peter Cock" <p.j.a.cock@googlemail.com>:
On Mon, Nov 26, 2012 at 6:47 PM, Zhiqiang Shu <zshu@bio.fsu.edu> wrote:
Hi, Galaxy users!
I have a question on how to find out sense and antisense sequence. I've got RNA seq data in the fastq format. The sequences inside are partially complementary to each other (complementary is 10nt, while entire is about 30nt). How can I separate these sequences into two groups: sense and antisense
Depending on how your sequences were prepared, you might be able to look for a poly-A tail as a clue to orientation. Another approach is to compare the (assembled) transcripts to known genes and if you only get matches on one strand that is probably the correct orientation.
(one thing I know is for the sense sequence the 10th nucleotide is always "A")?
Why is that? Is this related to your library preparation?
Peter
-------------------------- Zhiqiang Shu/Deng Lab Department of Biological Science Florida State University 319 Stadium Dr. Tallahassee, FL, USA, 32306-4295 zshu@bio.fsu.edu