Message: 1
Date: Sun, 9 Feb 2014 16:43:14 -0500
From: 7plusorminus 3 <
7plusorminus3@gmail.com>
To:
galaxy-user@lists.bx.psu.edu
Subject: [galaxy-user] Finding constitutive exons using expression
data
Message-ID:
<
CALfFDirXk1LYY6t+JHnnRCfOGiqcaENubBZ3J_movCmV6bRUSg@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi, I'm trying to find over the entire human genome, for each gene, which
exons are the most constitutively expressed. To do this, I'd like to
combine expression data (RNA-seq or Microarray) and exons data (UCSC
track). Then, for each gene, I'd like to pick the 1 or 2 exons with the
highest levels of expression (my proxy for constitutiveness).
An additional nicety would be to somehow work in a preference for 5' exons.
For example, let's say a gene has 3 exons and, with the expression data,
all 3 exons are equally expressed. I'd like to selectively get the first 2
exons.
I've started learning Galaxy and was able to import BED files for UCSC
exons (as in the Galaxy 101 tutorial) and a BED file for Affy microarray
expression data. (I tried also importing the Burge RNA-seq track as BED but
couldn't get it to work). I did an inner join on genomic sequences to join
the expression data with the exons and sorted them from most expressed to
least. But how do I sort within genes? That is, how do I get the top 2
exons per gene (highest expressing exons per gene) and, if there are more
than 2 with equally high expression, how do I preferentially get the 5`
exons?
I'm also open to ways to do this without using Galaxy, etc. I want to do
this for an entire genome, so I figured it would be good to have a Galaxy
workflow, which I could then apply to other genomes as needed.
Thanks for any help