extract alignment for a set of genes
To Whom It May Concern, Sorry to bother you with what is likely a fairly simple problem, but I have trying to figure this out myself for several days and just can't figure out how to do it. I have a set of 8766 genes that I would like to test for positive selection in using various other programs (HyPhy for example). To do this I obviously need an alignment of these genes across various species, but I just can't figure out how to get the alignment in a fasta format. For example, I have a BED12 file from UCSC with the data for the 8766 genes, I thought the easiest way was to use the "Stitch Gene blocks" option and then select locally cached alignments as the MAF source for the species I care about. However, because these 8766 genes have multiple transcripts I end up with 23,581 regions. Is there a way to merge the multiple regions for each gene into a single region for the longest transcript? Then I should have 8766 regions and can use Stitch Gene blocks". (Unless there is a more economical way to do this.)\ Thanks Vinny Vincent J. Lynch, Associate Research Scientist Department of Ecology and Evolutionary Biology & Yale Systems Biology Institute Yale University http://pantheon.yale.edu/~vjl4/profpage/ "There is a grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that whilst this planet has gone on cycling according to the fixed laws of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved." -C. Darwin, 1859 (Walker, Wisconsin, Madison, Maddow, Tea Party, Obama, global warming)
Hi Vinny, One option is to filter for a single representative transcript in your BED file from UCSC as a first step or to use that sort of list as a filter for your final result (if the data is still labeled by transcriptIDs). If using the "UCSC Genes" track, the table is called "knownCanonical". Another option is to consider the tools in "Operate on Genomic Intervals" and to if any meet your criteria. https://bitbucket.org/galaxy/galaxy-central/wiki/GopsDesc Merge or Cluster may be what you want. Note: this can result in gene models that are not represented by a single transcript in the primary query species. If you have more questions, please let us know, and kindly keep the cc to galaxy-user so that the Galaxy team and community can offer input, Best, Jen Galaxy team On 6/9/11 10:17 AM, Vincent Joseph Lynch wrote:
To Whom It May Concern,
Sorry to bother you with what is likely a fairly simple problem, but I have trying to figure this out myself for several days and just can't figure out how to do it.
I have a set of 8766 genes that I would like to test for positive selection in using various other programs (HyPhy for example). To do this I obviously need an alignment of these genes across various species, but I just can't figure out how to get the alignment in a fasta format. For example, I have a BED12 file from UCSC with the data for the 8766 genes, I thought the easiest way was to use the "Stitch Gene blocks" option and then select locally cached alignments as the MAF source for the species I care about. However, because these 8766 genes have multiple transcripts I end up with 23,581 regions. Is there a way to merge the multiple regions for each gene into a single region for the longest transcript? Then I should have 8766 regions and can use Stitch Gene blocks". (Unless there is a more economical way to do this.)\
Thanks Vinny
Vincent J. Lynch, Associate Research Scientist Department of Ecology and Evolutionary Biology& Yale Systems Biology Institute Yale University http://pantheon.yale.edu/~vjl4/profpage/
"There is a grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that whilst this planet has gone on cycling according to the fixed laws of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved." -C. Darwin, 1859
(Walker, Wisconsin, Madison, Maddow, Tea Party, Obama, global warming)
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Vincent Joseph Lynch