) will put you on the right path. Let us know if you have more
questions after you watch them.
Thanks,
anton
galaxy team
On Dec 11, 2008, at 1:32 PM, Anyuan Guo wrote:
Dear sir,
I need a up-to-date file of UCSC 17-way multi-alignment for the
upstream 1000 bp of Refseq. UCSC has a download
(
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.m...
). This file begin with human refseq ID and do multiz for the
upstream 1000bp of each refseq. It is exactly match my requirement,
but it is a little old. It was build at 2007. I need a new version
of such a file.
I have asked UCSC group, they told me that I can't get such a
file using UCSC table. But they told I can get it from your Galaxy.
I tried some times but failed. Can you told me how to get a 17-way
(or just human and mouse) alignment file for upstream 1000bp of all
refseq?
Thanks very much.
Anyuan Guo
================
Anyuan Guo Ph.D.
Postdoc Fellow
Virginia Institute for Psychiatric and Behavioral Genetics
Virginia Commonwealth University
P.O. Box 980126
Richmond, VA 23298-0126, USA
Email: aguo(a)vcu.edu
Brooke Rhead wrote:
>
> Hi Anyuan,
>
> I don't believe it is possible to retain the RefSeq ID in this case
> when using the Table Browser. However, I think that Galaxy has
> this capacity, either by doing the intersection from scratch using
> their tools, or by joining your MAF with your custom track based on
> the genome coordinates.
>
> Galaxy has screencasts:
>
http://galaxy.psu.edu/screencasts.html
>
> and a wiki:
>
http://g2.trac.bx.psu.edu/
>
> This screencast might be particularly helpful:
>
http://screencast.g2.bx.psu.edu/galaxy/MAF_manipulation/
>
> If you have more questions about how to accomplish your task using
> Galaxy, you can contact them at galaxy-user(a)bx.psu.edu.
>
> Good luck with your research.
>
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
>
>
> a lot of at galaxy-user(a)bx.psu.edu for help.
>
> On 12/10/08 15:17, Anyuan Guo wrote:
>> Dear Brooke,
>> Thanks very much. I learned a lot about creating custom track
>> in your email. I can download a ~76Mb compressed file when I
>> follow your instruction to create a custom track for upstream 1000
>> bp of RefseqGene and intersect with 17-way Cons. But I found the
>> file format is not begin with Refseq ID (NM_xxxx). The following
>> is the first 4 lines of the file.
>> ##maf version=1
>> a score=-55252.000000
>> s hg18.chr1 14754 99 + 247249719
>> CTGTGGGTCGGAGCCGGAGCGTCAGAGC---------CACCCACGACCACCGGCACGCC----
>> CCCACCACA-GGGCAGCGTGG-TGTTGAGACAAC------A
>>
>> In fact, I need a file begin with Refseq ID, the downloaded
>> maf file
(
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.m...
>> ) exactly match my requirement. But because some refseq sequences
>> were updated, the downloaded file is out of date.
>> The following is the first 4 lines of the downloaded file,
>> which I need.
>> ##maf version=1 scoring=zero
>> a score=0.000000
>> s NM_198943 0 1000 + 1000 GCATTTTAAACCCAAGTG----
>> AAATCTCCTAGG----------CCCTTCATGCCACACTCA-----TCCATCCCTACCTAC--
>> TTGTGTTGCAACCAAGGGCCCCAC
>>
>> How can I get the up-to-date version of this download file?
>> Thanks.
>>
>> Anyuan
>>
>>
>> Brooke Rhead wrote:
>>> Hello Anyuan,
>>>
>>> The reason that the sequence is different via the download file
>>> and the Table Browser is that the sequence associated with
>>> NM_014223 at RefSeq has changed since the download file was
>>> made. The items in the RefSeq Genes track are updated daily; the
>>> download files are generally only made once.
>>>
>>> You can see the revision history for any GenBank accession at NCBI:
>>>
http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=NM_014223
>>>
>>> The download file was last updated on 7-7-2007. I tried blatting
>>> the NM_014223 sequence from the "Jun 3 2007 1:10 PM" update to
>>> the hg18 assembly, and the sequence aligned starting at the
>>> genomic coordinate chr1:40,929,952. The upstream sequence from
>>> the file you downloaded corresponds to the 1,000 bases upstream
>>> of that base.
>>>
>>> You can get an up-to-date version of the download file by
>>> creating yourself with the Table Browser. First, make a custom
>>> track of the upstream regions of RefSeq Genes. If you select the
>>> RefSeq Genes track in the Table Browser and choose "output
>>> format: custom track", you will be presented with an option to
>>> create one BED record per region that is "Upstream by ___
>>> bases". Enter 1,000 or 2,000 in this box and hit "get custom
>>> track in genome browser". You should see a new custom track
>>> containing blocks representing regions upstream of all RefSeq
>>> Genes.
>>>
>>> Now you can intersect your new custom track with the multiz
>>> alignment in the Conservation track to get only the upstream
>>> regions. To do this step, select the 17-way (or 28-way)
>>> Conservation track in the Table Browser. Select the table
>>> 'multiz17way' and region: genome. Hit the "intersection:
create"
>>> button and select your custom track. Choose the option for "Base-
>>> pair-wise intersection (AND) of 17-Way Cons and upstream regions
>>> from refGene" and hit submit. Back on the main Table Browser
>>> page, select "output format: MAF". The size of the file you will
>>> be creating is quite large (76 Mb compressed for 1,000 base
>>> regions). I suggest entering a name for the file and selecting
>>> the option to get a gzip compressed version of it. Hit "get
>>> output". You should end up with a MAF file that contains only
>>> the regions upstream of RefSeq Genes.
>>>
>>> You may also be interested in the tools for working with MAF
>>> alignments at Galaxy:
http://galaxy.psu.edu/ . Galaxy is run by
>>> our collaborators at Penn State and extends the functionality of
>>> the Table Browser. For instance, there is a tool to filter any
>>> undesired species from a MAF file, leaving only the species of
>>> interest to you.
>>>
>>> I hope this is helpful. If you have further questions, please
>>> feel free to contact us again at genome(a)soe.ucsc.edu. If you
>>> have questions specific to Galaxy, their helpdesk email address
>>> is galaxy-user(a)bx.psu.edu.
>>>
>>> --
>>> Brooke Rhead
>>> UCSC Genome Bioinformatics Group
>>>
>>>
>>>
>>> Subject: question or bug about UCSC genome browser sequence
>>> From: Anyuan Guo <aguo(a)vcu.edu>
>>> Date: Mon, 17 Nov 2008 10:54:21 -0800
>>> To: genome(a)soe.ucsc.edu
>>>
>>>
>>> Dear author,
>>> Thanks for you providing the wonderful database and website
>>> of UCSC
>>> genome browser.
>>> I have question about the sequence in it.
>>> I downloaded the human upstream 1000bp multiz alignment
>>> file from
>>>
ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.ma...
>>> When I check my sequence id NM_014223.
>>> I can find the upstream 1000 bp sequence of this refseq gene
>>> in the
>>> downloaded multiz alignment file.
>>> I also can search this id in genome browser and get the
>>> upstream
>>> 1000 bp using the "DNA" or "Tables" menu at the top of
genome
>>> browser page.
>>> But I find these two upstream 1000 bp sequence are totally
>>> different. I think the one using genome browser is right.
>>> But I am not just need the upstream 1000bp sequence, I need the
>>> alignment with mouse sequence.
>>>
>>> Can I just get the sequence alignment between human and mouse
>>> for
>>> all the refseq gene and the upstream 1000 or 2000 of these genes?
>>> Where
>>> can I find it?
>>> I think those ortholog gene alignment (including upstream
>>> regulatory sequence alignment) between two popular genome will be
>>> very
>>> useful.
>>>
>>> thanks.
>>>
>>> Anyuan
>>>
_______________________________________________
galaxy-user mailing list
galaxy-user(a)bx.psu.edu
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Anton Nekrutenko
Asst. Professor
Department of Biochemistry and Molecular Biology
Center for Comparative Genomics and Bioinformatics
Penn State University
anton(a)bx.psu.edu