Loc file configuration question
I have two questions that pertain to a local install of galaxy: 1. I have been having trouble getting the “fetch sequences” à “extract genomic DNA” tool to work. Can someone identify the specific *.loc file that needs to have the info about the location of the genome sequence files? I get the following error when I run the extract tool: *No sequences are available for 'hg19’, request them by reporting this error.* * * 2. What configuration file(s) need to contain locations for the gtf/gff files? Thanks.
Hi Raja, This tool uses a <database>.2bit file to extract sequence data when the 'Locally cashed' option is used. The <database> is a genome that you install locally. ".2bit" format was developed by UCSC and they are the source for many genomes in this format already and for tools (compiled and uncompiled) to transform fasta data into/from .2bit format (faTwoToBit and twoBitToFa): http://hgdownload.cse.ucsc.edu/downloads.html (genomes + source) http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ (compiled utilities) For the extract tool, the builds list is required: http://wiki.g2.bx.psu.edu/Admin/Data%20Integration You don't actually need to have more NGS set up beyond that. Still, this wiki can help. http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup For example, the <database>.2bit file could be placed with your .fa files like: /galaxy-dist/tool-data/genome/<databaseA>/seq/<databaseA>.2bit << /galaxy-dist/tool-data/genome/<databaseA>/seq/<databaseA>.fa /galaxy-dist/tool-data/genome/<databaseB>/bowtie/ /galaxy-dist/tool-data/genome/<databaseB>/sam/ /galaxy-dist/tool-data/genome/<databaseB>/seq/<databaseB>.2bit << /galaxy-dist/tool-data/genome/<databaseB>/seq/<databaseB>.fa /galaxy-dist/tool-data/genome/<databaseC>/seq/<databaseC>.2bit << /galaxy-dist/tool-data/genome/<databaseC>/seq/<databaseC>.fa /galaxy-dist/tool-data/genome/<databaseD>/seq/<databaseD>.2bit << /galaxy-dist/tool-data/genome/<databaseD>/seq/<databaseD>.fa Then the .loc file is here: /galaxy-dist/tool-data/twobit.loc.sample You will probably have this for all genomes as well: /galaxy-dist/tool-data/all_fasta.loc.sample Remove the ".sample" before using these. Instructions for how to populate each are in the files themselves. The only gtf/gff files associated with this tool would be datasets from the history, so there are no gtf/gff data to stage before using the tool. To have the tool use a particular genome, set the query dataset (interval, bed, gtf) to have the same database identifier as you used for the "<database>" part of the "<database>.2bit" file. (This is why the builds list is required). If you make changes to data, don't forget to restart your server to see the changes. Hopefully this helps, Jen Galaxy team On 5/8/12 12:46 PM, Raja Kelkar wrote:
I have two questions that pertain to a local install of galaxy:
1. I have been having trouble getting the “fetch sequences” à “extract genomic DNA” tool to work. Can someone identify the specific *.loc file that needs to have the info about the location of the genome sequence files?
I get the following error when I run the extract tool:
/No sequences are available for 'hg19’, request them by reporting this error./
//
2. What configuration file(s) need to contain locations for the gtf/gff files?
Thanks.
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
Hi Jen, Thank you for your response. I seem to have all the relevant entries in the two "*.loc" files you mentioned (paths in all_fasta files and the twobit files are different because of the way we have the files stored. I also converted the 2bit files to .fa and have them available in the same twobit directory). But the feature extraction is still not working. Here are the relevant entries in files (I have redacted specific file paths and replaced them with "path_to"): twobit.loc hg18 /path_to/twobit/hg18.2bit hg19 /path_to/twobit/hg19.2bit mm9 /path_to/twobit/mm9.2bit mm8 /path_to/twobit/mm8.2bit all_fasta.loc hg19full hg19 Human (Homo sapiens): hg19 Full /path_to/hg19/bwa_path/hg19_all.fa hg19_chr_only hg19_chr Human (Homo sapiens): hg19_chrom_only /path_to/hg19/bwa_path/hg19.fa hg18full hg18 Human (Homo sapiens): hg18 Full /path_to/hg18/bwa_path/hg18_all.fa hg18_chr_only hg18_chr Human (Homo sapiens): hg18_chrom_only /path_to/hg18/bwa_path/hg18_chrom_only.fa I assume that the second field in the (all_fasta.loc) file <dbkey> has to match the builds.txt file in the "ucsc" directory. Is that correct? It does in this case. I think I am missing something subtle here. The "*.loc.sample" files are great but the information contained in those is confusing. I am not sure why there are two examples of the same info (as far as I can tell) in most sample loc files. Thanks. On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson <jen@bx.psu.edu> wrote:
Hi Raja,
This tool uses a <database>.2bit file to extract sequence data when the 'Locally cashed' option is used. The <database> is a genome that you install locally. ".2bit" format was developed by UCSC and they are the source for many genomes in this format already and for tools (compiled and uncompiled) to transform fasta data into/from .2bit format (faTwoToBit and twoBitToFa): http://hgdownload.cse.ucsc.**edu/downloads.html<http://hgdownload.cse.ucsc.edu/downloads.html>(genomes + source) http://hgdownload.cse.ucsc.**edu/admin/exe/linux.x86_64/<http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/>(compiled utilities)
For the extract tool, the builds list is required: http://wiki.g2.bx.psu.edu/**Admin/Data%20Integration<http://wiki.g2.bx.psu.edu/Admin/Data%20Integration>
You don't actually need to have more NGS set up beyond that. Still, this wiki can help. http://wiki.g2.bx.psu.edu/**Admin/NGS%20Local%20Setup<http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup>
For example, the <database>.2bit file could be placed with your .fa files like:
/galaxy-dist/tool-data/genome/**<databaseA>/seq/<databaseA>.**2bit << /galaxy-dist/tool-data/genome/**<databaseA>/seq/<databaseA>.fa /galaxy-dist/tool-data/genome/**<databaseB>/bowtie/ /galaxy-dist/tool-data/genome/**<databaseB>/sam/ /galaxy-dist/tool-data/genome/**<databaseB>/seq/<databaseB>.**2bit << /galaxy-dist/tool-data/genome/**<databaseB>/seq/<databaseB>.fa /galaxy-dist/tool-data/genome/**<databaseC>/seq/<databaseC>.**2bit << /galaxy-dist/tool-data/genome/**<databaseC>/seq/<databaseC>.fa /galaxy-dist/tool-data/genome/**<databaseD>/seq/<databaseD>.**2bit << /galaxy-dist/tool-data/genome/**<databaseD>/seq/<databaseD>.fa
Then the .loc file is here:
/galaxy-dist/tool-data/twobit.**loc.sample
You will probably have this for all genomes as well:
/galaxy-dist/tool-data/all_**fasta.loc.sample
Remove the ".sample" before using these. Instructions for how to populate each are in the files themselves.
The only gtf/gff files associated with this tool would be datasets from the history, so there are no gtf/gff data to stage before using the tool. To have the tool use a particular genome, set the query dataset (interval, bed, gtf) to have the same database identifier as you used for the "<database>" part of the "<database>.2bit" file. (This is why the builds list is required).
If you make changes to data, don't forget to restart your server to see the changes.
Hopefully this helps,
Jen Galaxy team
On 5/8/12 12:46 PM, Raja Kelkar wrote:
I have two questions that pertain to a local install of galaxy:
1. I have been having trouble getting the “fetch sequences” à “extract genomic DNA” tool to work. Can someone identify the specific *.loc file that needs to have the info about the location of the genome sequence files?
I get the following error when I run the extract tool:
/No sequences are available for 'hg19’, request them by reporting this error./
//
2. What configuration file(s) need to contain locations for the gtf/gff files?
Thanks.
______________________________**_____________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
Hi Raja, Can you check that your fields are tab separated and not spaces (they are spaces below, but that could be a copy and paste artifact)? Thanks for using Galaxy, Dan On May 9, 2012, at 9:45 AM, Raja Kelkar wrote:
Hi Jen,
Thank you for your response. I seem to have all the relevant entries in the two "*.loc" files you mentioned (paths in all_fasta files and the twobit files are different because of the way we have the files stored. I also converted the 2bit files to .fa and have them available in the same twobit directory).
But the feature extraction is still not working.
Here are the relevant entries in files (I have redacted specific file paths and replaced them with "path_to"):
twobit.loc
hg18 /path_to/twobit/hg18.2bit hg19 /path_to/twobit/hg19.2bit mm9 /path_to/twobit/mm9.2bit mm8 /path_to/twobit/mm8.2bit
all_fasta.loc
hg19full hg19 Human (Homo sapiens): hg19 Full /path_to/hg19/bwa_path/hg19_all.fa hg19_chr_only hg19_chr Human (Homo sapiens): hg19_chrom_only /path_to/hg19/bwa_path/hg19.fa hg18full hg18 Human (Homo sapiens): hg18 Full /path_to/hg18/bwa_path/hg18_all.fa hg18_chr_only hg18_chr Human (Homo sapiens): hg18_chrom_only /path_to/hg18/bwa_path/hg18_chrom_only.fa
I assume that the second field in the (all_fasta.loc) file <dbkey> has to match the builds.txt file in the "ucsc" directory. Is that correct? It does in this case. I think I am missing something subtle here.
The "*.loc.sample" files are great but the information contained in those is confusing. I am not sure why there are two examples of the same info (as far as I can tell) in most sample loc files.
Thanks.
On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson <jen@bx.psu.edu> wrote: Hi Raja,
This tool uses a <database>.2bit file to extract sequence data when the 'Locally cashed' option is used. The <database> is a genome that you install locally. ".2bit" format was developed by UCSC and they are the source for many genomes in this format already and for tools (compiled and uncompiled) to transform fasta data into/from .2bit format (faTwoToBit and twoBitToFa): http://hgdownload.cse.ucsc.edu/downloads.html (genomes + source) http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ (compiled utilities)
For the extract tool, the builds list is required: http://wiki.g2.bx.psu.edu/Admin/Data%20Integration
You don't actually need to have more NGS set up beyond that. Still, this wiki can help. http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup
For example, the <database>.2bit file could be placed with your .fa files like:
/galaxy-dist/tool-data/genome/<databaseA>/seq/<databaseA>.2bit << /galaxy-dist/tool-data/genome/<databaseA>/seq/<databaseA>.fa /galaxy-dist/tool-data/genome/<databaseB>/bowtie/ /galaxy-dist/tool-data/genome/<databaseB>/sam/ /galaxy-dist/tool-data/genome/<databaseB>/seq/<databaseB>.2bit << /galaxy-dist/tool-data/genome/<databaseB>/seq/<databaseB>.fa /galaxy-dist/tool-data/genome/<databaseC>/seq/<databaseC>.2bit << /galaxy-dist/tool-data/genome/<databaseC>/seq/<databaseC>.fa /galaxy-dist/tool-data/genome/<databaseD>/seq/<databaseD>.2bit << /galaxy-dist/tool-data/genome/<databaseD>/seq/<databaseD>.fa
Then the .loc file is here:
/galaxy-dist/tool-data/twobit.loc.sample
You will probably have this for all genomes as well:
/galaxy-dist/tool-data/all_fasta.loc.sample
Remove the ".sample" before using these. Instructions for how to populate each are in the files themselves.
The only gtf/gff files associated with this tool would be datasets from the history, so there are no gtf/gff data to stage before using the tool. To have the tool use a particular genome, set the query dataset (interval, bed, gtf) to have the same database identifier as you used for the "<database>" part of the "<database>.2bit" file. (This is why the builds list is required).
If you make changes to data, don't forget to restart your server to see the changes.
Hopefully this helps,
Jen Galaxy team
On 5/8/12 12:46 PM, Raja Kelkar wrote: I have two questions that pertain to a local install of galaxy:
1. I have been having trouble getting the “fetch sequences” à “extract genomic DNA” tool to work. Can someone identify the specific *.loc file that needs to have the info about the location of the genome sequence files?
I get the following error when I run the extract tool:
/No sequences are available for 'hg19’, request them by reporting this error./
//
2. What configuration file(s) need to contain locations for the gtf/gff files?
Thanks.
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On 09/05/2012 22:40, Daniel Blankenberg wrote:
Hi Raja,
Can you check that your fields are tab separated and not spaces (they are spaces below, but that could be a copy and paste artifact)?
Thanks for using Galaxy,
Dan
On May 9, 2012, at 9:45 AM, Raja Kelkar wrote:
Hi Jen,
Thank you for your response. I seem to have all the relevant entries in the two "*.loc" files you mentioned (paths in all_fasta files and the twobit files are different because of the way we have the files stored. I also converted the 2bit files to .fa and have them available in the same twobit directory).
But the feature extraction is still not working.
Here are the relevant entries in files (I have redacted specific file paths and replaced them with "path_to"):
twobit.loc
hg18 /path_to/twobit/hg18.2bit hg19 /path_to/twobit/hg19.2bit mm9 /path_to/twobit/mm9.2bit mm8 /path_to/twobit/mm8.2bit
all_fasta.loc
hg19full hg19 Human (Homo sapiens): hg19 Full /path_to/hg19/bwa_path/hg19_all.fa hg19_chr_only hg19_chr Human (Homo sapiens): hg19_chrom_only /path_to/hg19/bwa_path/hg19.fa hg18full hg18 Human (Homo sapiens): hg18 Full /path_to/hg18/bwa_path/hg18_all.fa hg18_chr_only hg18_chr Human (Homo sapiens): hg18_chrom_only /path_to/hg18/bwa_path/hg18_chrom_only.fa
I assume that the second field in the (all_fasta.loc) file <dbkey> has to match the builds.txt file in the "ucsc" directory. Is that correct? It does in this case. I think I am missing something subtle here.
The "*.loc.sample" files are great but the information contained in those is confusing. I am not sure why there are two examples of the same info (as far as I can tell) in most sample loc files.
Thanks.
On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>> wrote:
Hi Raja,
This tool uses a <database>.2bit file to extract sequence data when the 'Locally cashed' option is used. The <database> is a genome that you install locally. ".2bit" format was developed by UCSC and they are the source for many genomes in this format already and for tools (compiled and uncompiled) to transform fasta data into/from .2bit format (faTwoToBit and twoBitToFa): http://hgdownload.cse.ucsc.__edu/downloads.html <http://hgdownload.cse.ucsc.edu/downloads.html> (genomes + source) http://hgdownload.cse.ucsc.__edu/admin/exe/linux.x86_64/ <http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/> (compiled utilities)
For the extract tool, the builds list is required: http://wiki.g2.bx.psu.edu/__Admin/Data%20Integration <http://wiki.g2.bx.psu.edu/Admin/Data%20Integration>
You don't actually need to have more NGS set up beyond that. Still, this wiki can help. http://wiki.g2.bx.psu.edu/__Admin/NGS%20Local%20Setup <http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup>
For example, the <database>.2bit file could be placed with your .fa files like:
/galaxy-dist/tool-data/genome/__<databaseA>/seq/<databaseA>.__2bit << /galaxy-dist/tool-data/genome/__<databaseA>/seq/<databaseA>.fa /galaxy-dist/tool-data/genome/__<databaseB>/bowtie/ /galaxy-dist/tool-data/genome/__<databaseB>/sam/ /galaxy-dist/tool-data/genome/__<databaseB>/seq/<databaseB>.__2bit << /galaxy-dist/tool-data/genome/__<databaseB>/seq/<databaseB>.fa /galaxy-dist/tool-data/genome/__<databaseC>/seq/<databaseC>.__2bit << /galaxy-dist/tool-data/genome/__<databaseC>/seq/<databaseC>.fa /galaxy-dist/tool-data/genome/__<databaseD>/seq/<databaseD>.__2bit << /galaxy-dist/tool-data/genome/__<databaseD>/seq/<databaseD>.fa
Then the .loc file is here:
/galaxy-dist/tool-data/twobit.__loc.sample
You will probably have this for all genomes as well:
/galaxy-dist/tool-data/all___fasta.loc.sample
Remove the ".sample" before using these. Instructions for how to populate each are in the files themselves.
The only gtf/gff files associated with this tool would be datasets from the history, so there are no gtf/gff data to stage before using the tool. To have the tool use a particular genome, set the query dataset (interval, bed, gtf) to have the same database identifier as you used for the "<database>" part of the "<database>.2bit" file. (This is why the builds list is required).
If you make changes to data, don't forget to restart your server to see the changes.
Hopefully this helps,
Jen Galaxy team
On 5/8/12 12:46 PM, Raja Kelkar wrote:
I have two questions that pertain to a local install of galaxy:
1. I have been having trouble getting the “fetch sequences” à “extract genomic DNA” tool to work. Can someone identify the specific *.loc file that needs to have the info about the location of the genome sequence files?
I get the following error when I run the extract tool:
/No sequences are available for 'hg19’, request them by reporting this error./
//
2. What configuration file(s) need to contain locations for the gtf/gff files?
Thanks.
_____________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org <http://galaxyproject.org/>
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Federico De Masi, PhD, Assistant Professor The Technical University of Denmark - DTU Center for Biological Sequence Analysis - CBS Kemitorvet 208/002 DK-2800 KGS. LYNGBY, DENMARK Telephone: (+45) 45 25 24 21 Fax: (+45) 45 93 15 85 http://rg.cbs.dtu.dk
Hi, I was having the same issue just today and my solution was to add: seq mm9 /path_to/twobit/mm9.2bit in the alignseq.loc file as .nib has been replaced by 2bit. Plus all necessaty entries in all_fasta.loc and twobit.loc That worked :) Hope this helps. Fred On 09/05/2012 22:40, Daniel Blankenberg wrote:
Hi Raja,
Can you check that your fields are tab separated and not spaces (they are spaces below, but that could be a copy and paste artifact)?
Thanks for using Galaxy,
Dan
On May 9, 2012, at 9:45 AM, Raja Kelkar wrote:
Hi Jen,
Thank you for your response. I seem to have all the relevant entries in the two "*.loc" files you mentioned (paths in all_fasta files and the twobit files are different because of the way we have the files stored. I also converted the 2bit files to .fa and have them available in the same twobit directory).
But the feature extraction is still not working.
Here are the relevant entries in files (I have redacted specific file paths and replaced them with "path_to"):
twobit.loc
hg18 /path_to/twobit/hg18.2bit hg19 /path_to/twobit/hg19.2bit mm9 /path_to/twobit/mm9.2bit mm8 /path_to/twobit/mm8.2bit
all_fasta.loc
hg19full hg19 Human (Homo sapiens): hg19 Full /path_to/hg19/bwa_path/hg19_all.fa hg19_chr_only hg19_chr Human (Homo sapiens): hg19_chrom_only /path_to/hg19/bwa_path/hg19.fa hg18full hg18 Human (Homo sapiens): hg18 Full /path_to/hg18/bwa_path/hg18_all.fa hg18_chr_only hg18_chr Human (Homo sapiens): hg18_chrom_only /path_to/hg18/bwa_path/hg18_chrom_only.fa
I assume that the second field in the (all_fasta.loc) file <dbkey> has to match the builds.txt file in the "ucsc" directory. Is that correct? It does in this case. I think I am missing something subtle here.
The "*.loc.sample" files are great but the information contained in those is confusing. I am not sure why there are two examples of the same info (as far as I can tell) in most sample loc files.
Thanks.
On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>> wrote:
Hi Raja,
This tool uses a <database>.2bit file to extract sequence data when the 'Locally cashed' option is used. The <database> is a genome that you install locally. ".2bit" format was developed by UCSC and they are the source for many genomes in this format already and for tools (compiled and uncompiled) to transform fasta data into/from .2bit format (faTwoToBit and twoBitToFa): http://hgdownload.cse.ucsc.__edu/downloads.html <http://hgdownload.cse.ucsc.edu/downloads.html> (genomes + source) http://hgdownload.cse.ucsc.__edu/admin/exe/linux.x86_64/ <http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/> (compiled utilities)
For the extract tool, the builds list is required: http://wiki.g2.bx.psu.edu/__Admin/Data%20Integration <http://wiki.g2.bx.psu.edu/Admin/Data%20Integration>
You don't actually need to have more NGS set up beyond that. Still, this wiki can help. http://wiki.g2.bx.psu.edu/__Admin/NGS%20Local%20Setup <http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup>
For example, the <database>.2bit file could be placed with your .fa files like:
/galaxy-dist/tool-data/genome/__<databaseA>/seq/<databaseA>.__2bit << /galaxy-dist/tool-data/genome/__<databaseA>/seq/<databaseA>.fa /galaxy-dist/tool-data/genome/__<databaseB>/bowtie/ /galaxy-dist/tool-data/genome/__<databaseB>/sam/ /galaxy-dist/tool-data/genome/__<databaseB>/seq/<databaseB>.__2bit << /galaxy-dist/tool-data/genome/__<databaseB>/seq/<databaseB>.fa /galaxy-dist/tool-data/genome/__<databaseC>/seq/<databaseC>.__2bit << /galaxy-dist/tool-data/genome/__<databaseC>/seq/<databaseC>.fa /galaxy-dist/tool-data/genome/__<databaseD>/seq/<databaseD>.__2bit << /galaxy-dist/tool-data/genome/__<databaseD>/seq/<databaseD>.fa
Then the .loc file is here:
/galaxy-dist/tool-data/twobit.__loc.sample
You will probably have this for all genomes as well:
/galaxy-dist/tool-data/all___fasta.loc.sample
Remove the ".sample" before using these. Instructions for how to populate each are in the files themselves.
The only gtf/gff files associated with this tool would be datasets from the history, so there are no gtf/gff data to stage before using the tool. To have the tool use a particular genome, set the query dataset (interval, bed, gtf) to have the same database identifier as you used for the "<database>" part of the "<database>.2bit" file. (This is why the builds list is required).
If you make changes to data, don't forget to restart your server to see the changes.
Hopefully this helps,
Jen Galaxy team
On 5/8/12 12:46 PM, Raja Kelkar wrote:
I have two questions that pertain to a local install of galaxy:
1. I have been having trouble getting the “fetch sequences” à “extract genomic DNA” tool to work. Can someone identify the specific *.loc file that needs to have the info about the location of the genome sequence files?
I get the following error when I run the extract tool:
/No sequences are available for 'hg19’, request them by reporting this error./
//
2. What configuration file(s) need to contain locations for the gtf/gff files?
Thanks.
_____________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org <http://galaxyproject.org/>
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Federico De Masi, PhD, Assistant Professor The Technical University of Denmark - DTU Center for Biological Sequence Analysis - CBS Kemitorvet 208/002 DK-2800 KGS. LYNGBY, DENMARK Telephone: (+45) 45 25 24 21 Fax: (+45) 45 93 15 85 http://rg.cbs.dtu.dk
Hi Fred, Thanks for the tip on the alignseq file. It did work (I do now have sequence that came back from the tool, will have to check if it correct). Anyone have a logical explanation? Perhaps these myriad loc files can be streamlined down to something simple in future. Thanks. Dan: The entries I had in local loc files were all tab delimited. On Wed, May 9, 2012 at 4:49 PM, Federico De Masi <fred.demasi@gmail.com>wrote:
Hi,
I was having the same issue just today and my solution was to add:
seq mm9 /path_to/twobit/mm9.2bit
in the alignseq.loc file as .nib has been replaced by 2bit. Plus all necessaty entries in all_fasta.loc and twobit.loc
That worked :)
Hope this helps.
Fred
On 09/05/2012 22:40, Daniel Blankenberg wrote:
Hi Raja,
Can you check that your fields are tab separated and not spaces (they are spaces below, but that could be a copy and paste artifact)?
Thanks for using Galaxy,
Dan
On May 9, 2012, at 9:45 AM, Raja Kelkar wrote:
Hi Jen,
Thank you for your response. I seem to have all the relevant entries in the two "*.loc" files you mentioned (paths in all_fasta files and the twobit files are different because of the way we have the files stored. I also converted the 2bit files to .fa and have them available in the same twobit directory).
But the feature extraction is still not working.
Here are the relevant entries in files (I have redacted specific file paths and replaced them with "path_to"):
twobit.loc
hg18 /path_to/twobit/hg18.2bit hg19 /path_to/twobit/hg19.2bit mm9 /path_to/twobit/mm9.2bit mm8 /path_to/twobit/mm8.2bit
all_fasta.loc
hg19full hg19 Human (Homo sapiens): hg19 Full /path_to/hg19/bwa_path/hg19_**all.fa hg19_chr_only hg19_chr Human (Homo sapiens): hg19_chrom_only /path_to/hg19/bwa_path/hg19.fa hg18full hg18 Human (Homo sapiens): hg18 Full /path_to/hg18/bwa_path/hg18_**all.fa hg18_chr_only hg18_chr Human (Homo sapiens): hg18_chrom_only /path_to/hg18/bwa_path/hg18_**chrom_only.fa
I assume that the second field in the (all_fasta.loc) file <dbkey> has to match the builds.txt file in the "ucsc" directory. Is that correct? It does in this case. I think I am missing something subtle here.
The "*.loc.sample" files are great but the information contained in those is confusing. I am not sure why there are two examples of the same info (as far as I can tell) in most sample loc files.
Thanks.
On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>> wrote:
Hi Raja,
This tool uses a <database>.2bit file to extract sequence data when the 'Locally cashed' option is used. The <database> is a genome that you install locally. ".2bit" format was developed by UCSC and they are the source for many genomes in this format already and for tools (compiled and uncompiled) to transform fasta data into/from .2bit format (faTwoToBit and twoBitToFa): http://hgdownload.cse.ucsc.__**edu/downloads.html
<http://hgdownload.cse.ucsc.**edu/downloads.html<http://hgdownload.cse.ucsc.edu/downloads.html>> (genomes + source) http://hgdownload.cse.ucsc.__**edu/admin/exe/linux.x86_64/
<http://hgdownload.cse.ucsc.**edu/admin/exe/linux.x86_64/<http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/>> (compiled utilities)
For the extract tool, the builds list is required: http://wiki.g2.bx.psu.edu/__**Admin/Data%20Integration<http://wiki.g2.bx.psu.edu/__Admin/Data%20Integration>
You don't actually need to have more NGS set up beyond that. Still, this wiki can help. http://wiki.g2.bx.psu.edu/__**Admin/NGS%20Local%20Setup<http://wiki.g2.bx.psu.edu/__Admin/NGS%20Local%20Setup>
For example, the <database>.2bit file could be placed with your .fa files like:
/galaxy-dist/tool-data/genome/**__<databaseA>/seq/<databaseA>.**__2bit << /galaxy-dist/tool-data/genome/**__<databaseA>/seq/<databaseA>.**fa /galaxy-dist/tool-data/genome/**__<databaseB>/bowtie/ /galaxy-dist/tool-data/genome/**__<databaseB>/sam/ /galaxy-dist/tool-data/genome/**__<databaseB>/seq/<databaseB>.**__2bit << /galaxy-dist/tool-data/genome/**__<databaseB>/seq/<databaseB>.**fa /galaxy-dist/tool-data/genome/**__<databaseC>/seq/<databaseC>.**__2bit << /galaxy-dist/tool-data/genome/**__<databaseC>/seq/<databaseC>.**fa /galaxy-dist/tool-data/genome/**__<databaseD>/seq/<databaseD>.**__2bit << /galaxy-dist/tool-data/genome/**__<databaseD>/seq/<databaseD>.**fa
Then the .loc file is here:
/galaxy-dist/tool-data/twobit.**__loc.sample
You will probably have this for all genomes as well:
/galaxy-dist/tool-data/all___**fasta.loc.sample
Remove the ".sample" before using these. Instructions for how to populate each are in the files themselves.
The only gtf/gff files associated with this tool would be datasets from the history, so there are no gtf/gff data to stage before using the tool. To have the tool use a particular genome, set the query dataset (interval, bed, gtf) to have the same database identifier as you used for the "<database>" part of the "<database>.2bit" file. (This is why the builds list is required).
If you make changes to data, don't forget to restart your server to see the changes.
Hopefully this helps,
Jen Galaxy team
On 5/8/12 12:46 PM, Raja Kelkar wrote:
I have two questions that pertain to a local install of galaxy:
1. I have been having trouble getting the “fetch sequences” à “extract genomic DNA” tool to work. Can someone identify the specific *.loc file that needs to have the info about the location of the genome sequence files?
I get the following error when I run the extract tool:
/No sequences are available for 'hg19’, request them by reporting this error./
//
2. What configuration file(s) need to contain locations for the gtf/gff files?
Thanks.
______________________________**______________________________**_
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org <http://galaxyproject.org/>
______________________________**_____________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
______________________________**_____________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Federico De Masi, PhD, Assistant Professor The Technical University of Denmark - DTU Center for Biological Sequence Analysis - CBS Kemitorvet 208/002 DK-2800 KGS. LYNGBY, DENMARK Telephone: (+45) 45 25 24 21 Fax: (+45) 45 93 15 85 http://rg.cbs.dtu.dk
Hi, I agree that *.loc files should be consolidated or, at least, we should have proper documentation about which loc file is required for each tool... Took me quite some time to find that solution. I would suggest can we have a open wiki where these things are annotated. ie: if I find a "trick" or a shortcut to something, shouldn't we have a centralised place to share it with the community, rather than having Nate and co waste their precious time answering over and over the same questions? Mailing lists are cool, but sometimes heavy to search and to find the proper answers.. my 2p or maybe I missed something... Cheers, Fred On 10/05/2012 15:35, Raja Kelkar wrote:
Hi Fred,
Thanks for the tip on the alignseq file. It did work (I do now have sequence that came back from the tool, will have to check if it correct).
Anyone have a logical explanation?
Perhaps these myriad loc files can be streamlined down to something simple in future.
Thanks.
Dan: The entries I had in local loc files were all tab delimited.
On Wed, May 9, 2012 at 4:49 PM, Federico De Masi <fred.demasi@gmail.com <mailto:fred.demasi@gmail.com>> wrote:
Hi,
I was having the same issue just today and my solution was to add:
seq mm9 /path_to/twobit/mm9.2bit
in the alignseq.loc file as .nib has been replaced by 2bit. Plus all necessaty entries in all_fasta.loc and twobit.loc
That worked :)
Hope this helps.
Fred
On 09/05/2012 22:40, Daniel Blankenberg wrote:
Hi Raja,
Can you check that your fields are tab separated and not spaces (they are spaces below, but that could be a copy and paste artifact)?
Thanks for using Galaxy,
Dan
On May 9, 2012, at 9:45 AM, Raja Kelkar wrote:
Hi Jen,
Thank you for your response. I seem to have all the relevant entries in the two "*.loc" files you mentioned (paths in all_fasta files and the twobit files are different because of the way we have the files stored. I also converted the 2bit files to .fa and have them available in the same twobit directory).
But the feature extraction is still not working.
Here are the relevant entries in files (I have redacted specific file paths and replaced them with "path_to"):
twobit.loc
hg18 /path_to/twobit/hg18.2bit hg19 /path_to/twobit/hg19.2bit mm9 /path_to/twobit/mm9.2bit mm8 /path_to/twobit/mm8.2bit
all_fasta.loc
hg19full hg19 Human (Homo sapiens): hg19 Full /path_to/hg19/bwa_path/hg19___all.fa hg19_chr_only hg19_chr Human (Homo sapiens): hg19_chrom_only /path_to/hg19/bwa_path/hg19.fa hg18full hg18 Human (Homo sapiens): hg18 Full /path_to/hg18/bwa_path/hg18___all.fa hg18_chr_only hg18_chr Human (Homo sapiens): hg18_chrom_only /path_to/hg18/bwa_path/hg18___chrom_only.fa
I assume that the second field in the (all_fasta.loc) file <dbkey> has to match the builds.txt file in the "ucsc" directory. Is that correct? It does in this case. I think I am missing something subtle here.
The "*.loc.sample" files are great but the information contained in those is confusing. I am not sure why there are two examples of the same info (as far as I can tell) in most sample loc files.
Thanks.
On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu> <mailto:jen@bx.psu.edu <mailto:jen@bx.psu.edu>>> wrote:
Hi Raja,
This tool uses a <database>.2bit file to extract sequence data when the 'Locally cashed' option is used. The <database> is a genome that you install locally. ".2bit" format was developed by UCSC and they are the source for many genomes in this format already and for tools (compiled and uncompiled) to transform fasta data into/from .2bit format (faTwoToBit and twoBitToFa): http://hgdownload.cse.ucsc.____edu/downloads.html
<http://hgdownload.cse.ucsc.__edu/downloads.html <http://hgdownload.cse.ucsc.edu/downloads.html>> (genomes + source) http://hgdownload.cse.ucsc.____edu/admin/exe/linux.x86_64/
<http://hgdownload.cse.ucsc.__edu/admin/exe/linux.x86_64/ <http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/>> (compiled utilities)
For the extract tool, the builds list is required: http://wiki.g2.bx.psu.edu/____Admin/Data%20Integration <http://wiki.g2.bx.psu.edu/__Admin/Data%20Integration>
<http://wiki.g2.bx.psu.edu/__Admin/Data%20Integration <http://wiki.g2.bx.psu.edu/Admin/Data%20Integration>>
You don't actually need to have more NGS set up beyond that. Still, this wiki can help. http://wiki.g2.bx.psu.edu/____Admin/NGS%20Local%20Setup <http://wiki.g2.bx.psu.edu/__Admin/NGS%20Local%20Setup>
<http://wiki.g2.bx.psu.edu/__Admin/NGS%20Local%20Setup <http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup>>
For example, the <database>.2bit file could be placed with your .fa files like:
/galaxy-dist/tool-data/genome/____<databaseA>/seq/<databaseA>.____2bit <<
/galaxy-dist/tool-data/genome/____<databaseA>/seq/<databaseA>.__fa /galaxy-dist/tool-data/genome/____<databaseB>/bowtie/ /galaxy-dist/tool-data/genome/____<databaseB>/sam/
/galaxy-dist/tool-data/genome/____<databaseB>/seq/<databaseB>.____2bit <<
/galaxy-dist/tool-data/genome/____<databaseB>/seq/<databaseB>.__fa
/galaxy-dist/tool-data/genome/____<databaseC>/seq/<databaseC>.____2bit <<
/galaxy-dist/tool-data/genome/____<databaseC>/seq/<databaseC>.__fa
/galaxy-dist/tool-data/genome/____<databaseD>/seq/<databaseD>.____2bit <<
/galaxy-dist/tool-data/genome/____<databaseD>/seq/<databaseD>.__fa
Then the .loc file is here:
/galaxy-dist/tool-data/twobit.____loc.sample
You will probably have this for all genomes as well:
/galaxy-dist/tool-data/all_____fasta.loc.sample
Remove the ".sample" before using these. Instructions for how to populate each are in the files themselves.
The only gtf/gff files associated with this tool would be datasets from the history, so there are no gtf/gff data to stage before using the tool. To have the tool use a particular genome, set the query dataset (interval, bed, gtf) to have the same database identifier as you used for the "<database>" part of the "<database>.2bit" file. (This is why the builds list is required).
If you make changes to data, don't forget to restart your server to see the changes.
Hopefully this helps,
Jen Galaxy team
On 5/8/12 12:46 PM, Raja Kelkar wrote:
I have two questions that pertain to a local install of galaxy:
1. I have been having trouble getting the “fetch sequences” à “extract genomic DNA” tool to work. Can someone identify the specific *.loc file that needs to have the info about the location of the genome sequence files?
I get the following error when I run the extract tool:
/No sequences are available for 'hg19’, request them by reporting this error./
//
2. What configuration file(s) need to contain locations for the gtf/gff files?
Thanks.
_________________________________________________________________
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org <http://galaxyproject.org/>
_____________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
_____________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Federico De Masi, PhD, Assistant Professor The Technical University of Denmark - DTU Center for Biological Sequence Analysis - CBS Kemitorvet 208/002 DK-2800 KGS. LYNGBY, DENMARK Telephone: (+45) 45 25 24 21 <tel:%28%2B45%29%2045%2025%2024%2021> Fax: (+45) 45 93 15 85 <tel:%28%2B45%29%2045%2093%2015%2085> http://rg.cbs.dtu.dk
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Federico De Masi, PhD, Assistant Professor The Technical University of Denmark - DTU Center for Biological Sequence Analysis - CBS Kemitorvet 208/002 DK-2800 KGS. LYNGBY, DENMARK Telephone: (+45) 45 25 24 21 Fax: (+45) 45 93 15 85 http://rg.cbs.dtu.dk
participants (5)
-
Daniel Blankenberg
-
Federico De Masi
-
Federico De Masi
-
Jennifer Jackson
-
Raja Kelkar