On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson
<jen@bx.psu.edu> wrote:
Hi Raja,
This tool uses a <database>.2bit file to extract sequence data when the 'Locally cashed' option is used. The <database> is a genome that you install locally. ".2bit" format was developed by UCSC and they are the source for many genomes in this format already and for tools (compiled and uncompiled) to transform fasta data into/from .2bit format (faTwoToBit and twoBitToFa):
http://hgdownload.cse.ucsc.edu/downloads.html (genomes + source)
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ (compiled utilities)
For the extract tool, the builds list is required:
http://wiki.g2.bx.psu.edu/Admin/Data%20Integration
You don't actually need to have more NGS set up beyond that. Still, this wiki can help.
http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup
For example, the <database>.2bit file could be placed with your .fa files like:
/galaxy-dist/tool-data/genome/<databaseA>/seq/<databaseA>.2bit <<
/galaxy-dist/tool-data/genome/<databaseA>/seq/<databaseA>.fa
/galaxy-dist/tool-data/genome/<databaseB>/bowtie/
/galaxy-dist/tool-data/genome/<databaseB>/sam/
/galaxy-dist/tool-data/genome/<databaseB>/seq/<databaseB>.2bit <<
/galaxy-dist/tool-data/genome/<databaseB>/seq/<databaseB>.fa
/galaxy-dist/tool-data/genome/<databaseC>/seq/<databaseC>.2bit <<
/galaxy-dist/tool-data/genome/<databaseC>/seq/<databaseC>.fa
/galaxy-dist/tool-data/genome/<databaseD>/seq/<databaseD>.2bit <<
/galaxy-dist/tool-data/genome/<databaseD>/seq/<databaseD>.fa
Then the .loc file is here:
/galaxy-dist/tool-data/twobit.loc.sample
You will probably have this for all genomes as well:
/galaxy-dist/tool-data/all_fasta.loc.sample
Remove the ".sample" before using these. Instructions for how to populate each are in the files themselves.
The only gtf/gff files associated with this tool would be datasets from the history, so there are no gtf/gff data to stage before using the tool. To have the tool use a particular genome, set the query dataset (interval, bed, gtf) to have the same database identifier as you used for the "<database>" part of the "<database>.2bit" file. (This is why the builds list is required).
If you make changes to data, don't forget to restart your server to see the changes.
Hopefully this helps,
Jen
Galaxy team
On 5/8/12 12:46 PM, Raja Kelkar wrote:
I have two questions that pertain to a local install of galaxy:
1. I have been having trouble getting the “fetch sequences” à “extract
genomic DNA” tool to work. Can someone identify the specific *.loc file
that needs to have the info about the location of the genome sequence files?
I get the following error when I run the extract tool:
/No sequences are available for 'hg19’, request them by reporting this
error./
//
2. What configuration file(s) need to contain locations for the gtf/gff
files?
Thanks.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
--
Jennifer Jackson
http://galaxyproject.org