Re: [galaxy-dev] Dynamic data library
On 10/01/2013 03:53 PM, Cole, Nathan (NIH/NCI) [C] wrote:
Thank you both for your responses. I will be looking into both of these.
With regard to the from_file option to add the sample selection into the tool: I assume this means that the metadata and everything is loaded into galaxy at the time the tool is run.
Does this create a copy of the loaded file or simply read it in
This depends on how you write your tool. Do you just wanna read the ie fastq file or do you also wanna read the meta data. Also, how is the meta data accessible? eg. is it stored in a txt file at the same location as the fastq file? place? Also are there any efficiency issues created using this method, outside of the tool run time increase due to the load of the data taking place in-tool? It should just read it in place Hans-Rudolf
Thanks, Nathan
-----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh@fmi.ch] Sent: Tuesday, October 01, 2013 4:07 AM To: Cole, Nathan (NIH/NCI) [C] Cc: Martin Čech; galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Dynamic data library
Hi Nathan
Do you have many tools working with those samples or just a few? If you only have a limited, predefined set of tools you might wanna consider adding the sample selection into the tool.
You can use the from_file, or from_data_table options to dynamically create sample selection list. You can even drill down a hierarchical list. Have a look at ~/tools/annotation_profiler/annotation_profiler.xml which uses the file ~/tool-data/annotation_profiler_options.xml
All you need to do is keeping the file in sync with the directory structure of your samples directory
Regards, Hans-Rudolf
On 09/30/2013 09:48 PM, Martin Čech wrote:
Hi Nathan,
Dannon answered similar question few days ago:
There's an import mechanism in libraries that'll allow you to simply link to the file on disk without copy/upload. I believe the "example_watch_folder.py" sample script (in the distribution) does just this via the API, if you want an example.
This might be what you are looking for.
Martin
On Mon, Sep 30, 2013 at 2:43 PM, Cole, Nathan (NIH/NCI) [C] <nathan.cole@nih.gov <mailto:nathan.cole@nih.gov>> wrote:
Hello, we’ve set up a local Galaxy instance in our genotyping and next-gen sequencing lab with local Apache LDAP (AD) integration, NFS mounts to a large NAS, and cluster integration coming. Do to the high volume of samples and staff that will be using the system, I want to set up data libraries (without copying to Galaxy). This is obviously no problem the first time, however I was wondering if there was a way to make a library, added from a system path, be dynamic so that it would stay synchronized with the underlying file structure?____
__ __
If a try dynamic library is not possible, is there a method for adding files to an existing library via that same system path that would not duplicate all of the original files in the data library?____
__ __
I did some scouring of the list and found some old unanswered questions and some things tangentially related topics, but I was unable to find a true answer or solution to my problem. Any information on how to do the tasks above or other solutions to provide the same functionality would be greatly appreciated.____
__ __
Thanks,____
Nathan____
__ __
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
One final question as I dive into looking at these two methods: can you expose whole hierarchies and directories using the "from_file" method or will this only work on an individual sample basis? If not, is there any method for exposing a the whole of a directory on the file system? Thanks, Nathan -----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh@fmi.ch] Sent: Tuesday, October 01, 2013 10:11 AM To: Cole, Nathan (NIH/NCI) [C] Cc: 'galaxy-dev@lists.bx.psu.edu' Subject: Re: [galaxy-dev] Dynamic data library On 10/01/2013 03:53 PM, Cole, Nathan (NIH/NCI) [C] wrote:
Thank you both for your responses. I will be looking into both of these.
With regard to the from_file option to add the sample selection into the tool: I assume this means that the metadata and everything is loaded into galaxy at the time the tool is run.
This depends on how you write your tool. Do you just wanna read the ie fastq file or do you also wanna read the meta data. Also, how is the meta data accessible? eg. is it stored in a txt file at the same location as the fastq file?
Does this create a copy of the loaded file or simply read it in place? Also are there any efficiency issues created using this method, outside of the tool run time increase due to the load of the data taking place in-tool?
It should just read it in place Hans-Rudolf
Thanks, Nathan
-----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh@fmi.ch] Sent: Tuesday, October 01, 2013 4:07 AM To: Cole, Nathan (NIH/NCI) [C] Cc: Martin Čech; galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Dynamic data library
Hi Nathan
Do you have many tools working with those samples or just a few? If you only have a limited, predefined set of tools you might wanna consider adding the sample selection into the tool.
You can use the from_file, or from_data_table options to dynamically create sample selection list. You can even drill down a hierarchical list. Have a look at ~/tools/annotation_profiler/annotation_profiler.xml which uses the file ~/tool-data/annotation_profiler_options.xml
All you need to do is keeping the file in sync with the directory structure of your samples directory
Regards, Hans-Rudolf
On 09/30/2013 09:48 PM, Martin Čech wrote:
Hi Nathan,
Dannon answered similar question few days ago:
There's an import mechanism in libraries that'll allow you to simply link to the file on disk without copy/upload. I believe the "example_watch_folder.py" sample script (in the distribution) does just this via the API, if you want an example.
This might be what you are looking for.
Martin
On Mon, Sep 30, 2013 at 2:43 PM, Cole, Nathan (NIH/NCI) [C] <nathan.cole@nih.gov <mailto:nathan.cole@nih.gov>> wrote:
Hello, we’ve set up a local Galaxy instance in our genotyping and next-gen sequencing lab with local Apache LDAP (AD) integration, NFS mounts to a large NAS, and cluster integration coming. Do to the high volume of samples and staff that will be using the system, I want to set up data libraries (without copying to Galaxy). This is obviously no problem the first time, however I was wondering if there was a way to make a library, added from a system path, be dynamic so that it would stay synchronized with the underlying file structure?____
__ __
If a try dynamic library is not possible, is there a method for adding files to an existing library via that same system path that would not duplicate all of the original files in the data library?____
__ __
I did some scouring of the list and found some old unanswered questions and some things tangentially related topics, but I was unable to find a true answer or solution to my problem. Any information on how to do the tasks above or other solutions to provide the same functionality would be greatly appreciated.____
__ __
Thanks,____
Nathan____
__ __
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On 10/01/2013 07:12 PM, Cole, Nathan (NIH/NCI) [C] wrote:
One final question as I dive into looking at these two methods: can you expose whole hierarchies and directories using the "from_file" method or will this only work on an individual sample basis?
yes, you can. as an example have a look at the affymetrix cel files we offer. See attachment for a screen shot and the coresponding fragment from the xml file: <options> <option name="Affymetrix" value="Affymetrix"> <option name="HumanGeneST10_TissueData" value="HumanGeneST10_TissueData"> <option type="meta_key" name="MouseTP_Brain_01_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Brain_01_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Brain_02_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Brain_02_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Brain_03_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Brain_03_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Embryo_01_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Embryo_01_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Embryo_02_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Embryo_02_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Embryo_03_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Embryo_03_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Heart_01_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Heart_01_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Heart_02_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Heart_02_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Heart_03_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Heart_03_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Kidney_01_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Kidney_01_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Kidney_02_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Kidney_02_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Kidney_03_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Kidney_03_mGENE.CEL"/> // <option type="meta_key" name="MouseTP_Thymus_03_mGENE.CEL" value="/***/***/external/Affymetrix/HumanGeneST10_TissueData/MouseTP_Thymus_03_mGENE.CEL"/> </option> <option name="MouseGeneST10_TissueData" value="MouseGeneST10_TissueData"> <option type="meta_key" name="MouseTP_Brain_01_mGENE.CEL" value="/***/***/external/Affymetrix/MouseGeneST10_TissueData/MouseTP_Brain_01_mGENE.CEL"/> <option type="meta_key" name="MouseTP_Brain_02_mGENE.CEL" value="/***/***/external/Affymetrix/MouseGeneST10_TissueData/MouseTP_Brain_02_mGENE.CEL"/> // <option type="meta_key" name="MouseTP_Thymus_03_mGENE.CEL" value="/***/***/external/Affymetrix/MouseGeneST10_TissueData/MouseTP_Thymus_03_mGENE.CEL"/> </option> </option> <option name="GEO" value="GEO"> // </option> </options>
If not, is there any method for exposing a the whole of a directory on the file system?
Thanks, Nathan
-----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh@fmi.ch] Sent: Tuesday, October 01, 2013 10:11 AM To: Cole, Nathan (NIH/NCI) [C] Cc: 'galaxy-dev@lists.bx.psu.edu' Subject: Re: [galaxy-dev] Dynamic data library
On 10/01/2013 03:53 PM, Cole, Nathan (NIH/NCI) [C] wrote:
Thank you both for your responses. I will be looking into both of these.
With regard to the from_file option to add the sample selection into the tool: I assume this means that the metadata and everything is loaded into galaxy at the time the tool is run.
This depends on how you write your tool. Do you just wanna read the ie fastq file or do you also wanna read the meta data. Also, how is the meta data accessible? eg. is it stored in a txt file at the same location as the fastq file?
Does this create a copy of the loaded file or simply read it in place? Also are there any efficiency issues created using this method, outside of the tool run time increase due to the load of the data taking place in-tool?
It should just read it in place
Hans-Rudolf
Thanks, Nathan
-----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh@fmi.ch] Sent: Tuesday, October 01, 2013 4:07 AM To: Cole, Nathan (NIH/NCI) [C] Cc: Martin Čech; galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Dynamic data library
Hi Nathan
Do you have many tools working with those samples or just a few? If you only have a limited, predefined set of tools you might wanna consider adding the sample selection into the tool.
You can use the from_file, or from_data_table options to dynamically create sample selection list. You can even drill down a hierarchical list. Have a look at ~/tools/annotation_profiler/annotation_profiler.xml which uses the file ~/tool-data/annotation_profiler_options.xml
All you need to do is keeping the file in sync with the directory structure of your samples directory
Regards, Hans-Rudolf
On 09/30/2013 09:48 PM, Martin Čech wrote:
Hi Nathan,
Dannon answered similar question few days ago:
There's an import mechanism in libraries that'll allow you to simply link to the file on disk without copy/upload. I believe the "example_watch_folder.py" sample script (in the distribution) does just this via the API, if you want an example.
This might be what you are looking for.
Martin
On Mon, Sep 30, 2013 at 2:43 PM, Cole, Nathan (NIH/NCI) [C] <nathan.cole@nih.gov <mailto:nathan.cole@nih.gov>> wrote:
Hello, we’ve set up a local Galaxy instance in our genotyping and next-gen sequencing lab with local Apache LDAP (AD) integration, NFS mounts to a large NAS, and cluster integration coming. Do to the high volume of samples and staff that will be using the system, I want to set up data libraries (without copying to Galaxy). This is obviously no problem the first time, however I was wondering if there was a way to make a library, added from a system path, be dynamic so that it would stay synchronized with the underlying file structure?____
__ __
If a try dynamic library is not possible, is there a method for adding files to an existing library via that same system path that would not duplicate all of the original files in the data library?____
__ __
I did some scouring of the list and found some old unanswered questions and some things tangentially related topics, but I was unable to find a true answer or solution to my problem. Any information on how to do the tasks above or other solutions to provide the same functionality would be greatly appreciated.____
__ __
Thanks,____
Nathan____
__ __
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Cole, Nathan (NIH/NCI) [C]
-
Hans-Rudolf Hotz