Counts of mapped reads for each gene?

Hi Jen and other galaxy-users, I am analyzing our RNA-seq data. First, I mapped the RNA-seq data to the reference genome. I am wondering if there is a tool that could count the number of reads that mapped to each gene. That's important information for my subsequent analysis. Any reply is highly appreciated! Thanks, Yan

Hi, Yan The htseq_bams_to_count_matrix tool in the test toolshed might be worth a try - it creates tabular count matrices from any number of individual sample bam/sam files (it is NOT read group aware!). Each row contains the count for that contig for each sample. It uses HTSeq code and you supply your favourite gene model as a GTF file for defining the regions to count and how to amalgamate - eg count reads overlapping exons and sum those into total counts for each gene. Please give it a try. Install from the admin interface and let me know how you get on. There's a companion tool differential_count_models also in the test toolshed that includes edgeR, DESeq2 and VOOM from Bioconductor - it runs 1 or 2 way GLMs using the count matrices generated by the htseq tool - be warned that it takes a long time to install everything so be patient and allow 20 minutes or so for the installation to finish because it compiles and installs R 3.0.1 and Bioconductor packages. Suggestions for improvement or bug reports always welcomed. Good luck. On Thu, Aug 22, 2013 at 3:35 PM, Yan He <yanhe83@hotmail.com> wrote:
Hi Jen and other galaxy-users,****
** **
I am analyzing our RNA-seq data. First, I mapped the RNA-seq data to the reference genome. I am wondering if there is a tool that could count the number of reads that mapped to each gene. That’s important information for my subsequent analysis. Any reply is highly appreciated! Thanks,****
** **
Yan****
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:

Hi Yan, You may use the HTseq count wrapper in the http://galaxy.nbic.nl/. It does a good job and I could employ edgeR on that count matrix. Good luck. Best wishes, Anto

Hi Anto, Thank you very much for your reply! I tried Galaxy/NBIC. However, I had problem with uploading my files. I used FTP, because the file I had was larger than 2G, but I couldn’t connect to the NBIC FTP. Do you have some idea how to solve the problem? Thanks! Yan 发件人: Anto Praveen Rajkumar Rajamani [mailto:APR@hum-gen.au.dk] 发送时间: Thursday, August 22, 2013 3:14 PM 收件人: Yan He; galaxy-user@lists.bx.psu.edu 主题: RE: [galaxy-user] Counts of mapped reads for each gene? Hi Yan, You may use the HTseq count wrapper in the http://galaxy.nbic.nl/. It does a good job and I could employ edgeR on that count matrix. Good luck. Best wishes, Anto

Hi Yan, I also had problems with NBIC FTP. NBIC allows only 10 GB space for user. I made my BAM files in main server (using Tophat2) and then uploaded them to NBIC using their download URLs. It was fast. It took me less than a hour to move 16 BAM files (around 9.5 GB). You may try this. Good luck. Best wishes, Anto ________________________________ From: Yan He [yanhe83@hotmail.com] Sent: 22 August 2013 09:36 To: Anto Praveen Rajkumar Rajamani; galaxy-user@lists.bx.psu.edu Subject: 答复: [galaxy-user] Counts of mapped reads for each gene? Hi Anto, Thank you very much for your reply! I tried Galaxy/NBIC. However, I had problem with uploading my files. I used FTP, because the file I had was larger than 2G, but I couldn’t connect to the NBIC FTP. Do you have some idea how to solve the problem? Thanks! Yan 发件人: Anto Praveen Rajkumar Rajamani [mailto:APR@hum-gen.au.dk] 发送时间: Thursday, August 22, 2013 3:14 PM 收件人: Yan He; galaxy-user@lists.bx.psu.edu 主题: RE: [galaxy-user] Counts of mapped reads for each gene? Hi Yan, You may use the HTseq count wrapper in the http://galaxy.nbic.nl/. It does a good job and I could employ edgeR on that count matrix. Good luck. Best wishes, Anto

Hi Anto, Thanks so much! I will try. Best wishes, Yan 发件人: Anto Praveen Rajkumar Rajamani [mailto:APR@hum-gen.au.dk] 发送时间: Thursday, August 22, 2013 3:57 PM 收件人: Yan He; galaxy-user@lists.bx.psu.edu 主题: RE: [galaxy-user] Counts of mapped reads for each gene? Hi Yan, I also had problems with NBIC FTP. NBIC allows only 10 GB space for user. I made my BAM files in main server (using Tophat2) and then uploaded them to NBIC using their download URLs. It was fast. It took me less than a hour to move 16 BAM files (around 9.5 GB). You may try this. Good luck. Best wishes, Anto _____ From: Yan He [yanhe83@hotmail.com] Sent: 22 August 2013 09:36 To: Anto Praveen Rajkumar Rajamani; galaxy-user@lists.bx.psu.edu Subject: 答复: [galaxy-user] Counts of mapped reads for each gene? Hi Anto, Thank you very much for your reply! I tried Galaxy/NBIC. However, I had problem with uploading my files. I used FTP, because the file I had was larger than 2G, but I couldn’t connect to the NBIC FTP. Do you have some idea how to solve the problem? Thanks! Yan 发件人: Anto Praveen Rajkumar Rajamani [mailto:APR@hum-gen.au.dk] 发送时间: Thursday, August 22, 2013 3:14 PM 收件人: Yan He; galaxy-user@lists.bx.psu.edu 主题: RE: [galaxy-user] Counts of mapped reads for each gene? Hi Yan, You may use the HTseq count wrapper in the http://galaxy.nbic.nl/. It does a good job and I could employ edgeR on that count matrix. Good luck. Best wishes, Anto
participants (3)
-
Anto Praveen Rajkumar Rajamani
-
Ross
-
Yan He