Table with gene count reads
Hello Mike, SAM datasets can be used as tabular data with the Text manipulation, Filter and Sort, Join, Subtract and Group, etc. if the headers are removed and the datatype changed. Or, you can convert the format to Interval. For the simplest count directly on the SAM/BAM format itself, the tool group "NGS: SAM Tools" and "NGS: Picard (beta)" have options. There is no "score" value from a SAM file - perhaps the reply below was misinterpreted? The score was derived from a GTF file, the GTF file and an Interval file were joined, then some summaries were created with the "Group" function. The Tool Shed has a repository for the DESeq package. The instructions explain how to prepare the inputs. "NGS: Picard (beta)" also has tools for assigning read groups if you need to do that. http://getgalaxy.org http://toolshed.g2.bx.psu.edu/ Search for: DESeq Hopefully this helps! Jen Galaxy team On 8/19/12 6:21 AM, mic@mb.au.dk wrote:
Hi Jennifer,
I could't find htseq or similar tool in galaxy tool sheds (sam2counts does not work), what is a bit problematic if one what's to work with Deseq (fastaq->tophat->sam->counts->Deseq).
Could you please explain in more detail how to convert SAM to counts using available galaxy tools. It is not clear for me where to find "score" in interval produced from SAM and etc.
Best, Mike
<quote author='Jennifer Jackson'> Hello Luciano,
There is no single tool do to this operation (although there has been some discussion about including one in the Tool Shed), but the same information can be obtained by using a combination of existing tools.
First, start by converting both starting datasets to interval format.
mapped reads: - for TopHat output, "NGS: SAM Tools -> Convert SAM to interval" features: - for GFF file (convert to tabular if necessary), subtract "1" from the start position's value using tool "Text Manipulation -> Compute" - cut columns chrom, new start, stop, strand, name, and score from this result file using "Text Manipulation -> Cut" - set the data type to "interval" using the 'Edit attributes form (pencil icon)
Next, use a tool in the group "Operate on Genomic Intervals" to compare these intervals for overlap. The tool "Cluster" with the option "Find" is mostly likely the one you will want to use.
As a final step, summarize the data by feature using the tool "Join, Subtract and Group -> Group".
Hopefully this helps,
Best,
Jen Galaxy team
On 3/19/12 4:36 PM, Luciano Cosme wrote:
Hi, I was wondering if there is any tool on Galaxy were I can obtain a table with how many reads have been mapped to a given sample and to a given gene (for example, use a Tophat output and use a GFF file to obtain the table). I am using HTSeq to get it (htseq-count). There is also GenomicRanges and easyRNASeq packages in bioconductor. Thank you.
Luciano
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
</quote> Quoted from: http://gmod.827538.n3.nabble.com/Table-with-gene-count-reads-tp3840714p38515...
-- Jennifer Jackson http://galaxyproject.org
participants (1)
-
Jennifer Jackson