Thanks Jeremy,
   I will do it before try the de novo assembly.

Luciano

On Fri, May 18, 2012 at 1:44 PM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
I find a lot of potential new genes (hundreds or thousands of reads aligning to regions where there is no gene annotation),

This shouldn't be completely unexpected. High-coverage RNA-seq data is constantly revealing new exons/splicing/transcripts, even in well-annotated genomes.

I also find new exons for some genes or exons with different sizes. I was thinking to do an de novo assembly to find new transcripts and genes, but I was wondering if there is something else I could do.

My suggestion: do reference-guided assembly with Cufflinks; this will yield both existing and new transcripts.

For example, maybe I could just extract those regions where thousands of reads align (new gene). I know that we can extract the sequence data for specific transcript, is it possible to extract reads for regions without annotation, only based in the number of reads aligned?

You could subtract known genes from the Cufflinks assembly to get only novel transcripts.

Best,
J.