Table with gene count reads
Hi, I was wondering if there is any tool on Galaxy were I can obtain a table with how many reads have been mapped to a given sample and to a given gene (for example, use a Tophat output and use a GFF file to obtain the table). I am using HTSeq to get it (htseq-count). There is also GenomicRanges and easyRNASeq packages in bioconductor. Thank you. Luciano
Hello Luciano, There is no single tool do to this operation (although there has been some discussion about including one in the Tool Shed), but the same information can be obtained by using a combination of existing tools. First, start by converting both starting datasets to interval format. mapped reads: - for TopHat output, "NGS: SAM Tools -> Convert SAM to interval" features: - for GFF file (convert to tabular if necessary), subtract "1" from the start position's value using tool "Text Manipulation -> Compute" - cut columns chrom, new start, stop, strand, name, and score from this result file using "Text Manipulation -> Cut" - set the data type to "interval" using the 'Edit attributes form (pencil icon) Next, use a tool in the group "Operate on Genomic Intervals" to compare these intervals for overlap. The tool "Cluster" with the option "Find" is mostly likely the one you will want to use. As a final step, summarize the data by feature using the tool "Join, Subtract and Group -> Group". Hopefully this helps, Best, Jen Galaxy team On 3/19/12 4:36 PM, Luciano Cosme wrote:
participants (2)
-
Jennifer Jackson
-
Luciano Cosme